Model Selection
- class datatoolkit.model_selection.BayesianSearchCV(estimator: sklearn.base.BaseEstimator, parameter_space: dict[str, <module 'hyperopt.hp' from '/home/docs/checkouts/readthedocs.org/user_builds/datatoolkit/envs/latest/lib/python3.9/site-packages/hyperopt/hp.py'>], n_iter: int = 10, scoring=typing.Union[collections.abc.Iterable[str], collections.abc.Callable, NoneType], cv=<class 'sklearn.model_selection._split.StratifiedShuffleSplit'>, refit: str = 'loss', verbose=0, random_state=None, error_score='raise', return_train_score=False)
Bayesian Search Cross Validation.
- Parameters
(BaseEstimator) – Sci-kit learn base estimator.
(ClassifierMixin) – Sci-kit learn classifier mixin.
- Raises
TypeError – When scoring argument is of wrong type.
NotFittedError – When estimator is not fitted.
References
- cross_validate(parameter_space: dict, X: collections.abc.Iterable[float], y: collections.abc.Iterable[float]) dict
Fit estimator on training set and evaluate on validation set, in accordance to cross-validation generator.
- Parameters
parameter_space (dict) – Dict containing parameter space.
X (Iterable[float]) – Array-like of shape (n_samples, n_features) containing predictors.
y (Iterable[float]) – Array-like of shape (n_samples,) containing target label.
- Returns
Dict containing cross validation results.
- Return type
dict
- fit(X: collections.abc.Iterable[float], y: collections.abc.Iterable[float])
Fits estimator.
- Parameters
X (Iterable[float]) – Matrix of shape (n_samples, n_features).
y (Iterable[float]) – Array-like of shape (n_samples,).
- get_dataset_type_score_name_index(split_iterator: Optional[collections.abc.Iterable[int]] = None) collections.abc.Generator[tuple[str, str, int]]
Generates tuple composed of dataset type, score name, and index.
- Parameters
split_iterator (Union[Iterable[int], None], optional) – Array-like of shape (n_splits,) having the size of number of CV splits. Defaults to None.
- Yields
Generator[tuple[str, str, int]] – Tuple composed of dataset type, score name, and index.
- objective(y_true: collections.abc.Iterable[float], y_pred: collections.abc.Iterable[float], score_name: str) float
Objective function to be minimized.
- Parameters
y_true (Iterable[float]) – Array-like of shape (n_samples,) containing true values of target label.
y_pred (Iterable[float]) – Array-like of shape (n_samples,) containing predicted values of target label.
score_name (str) – _description_
- Returns
Returns absolute difference between score and optimal value.
- Return type
float
- optimize(X: collections.abc.Iterable[float], y: collections.abc.Iterable[float]) dict
Runs hyperparameter optimization.
- Parameters
X (Iterable[float]) – Array-like of shape (n_samples, n_features) containing predictors.
y (Iterable[float]) – Array-like of shape (n_samples,) containing target label.
- Returns
Optimal parameter space.
- Return type
dict
- post_process_cv_results()
Process cross validation results by calculating average and standard deviation of scores.
- predict(X: collections.abc.Iterable[float]) collections.abc.Iterable[float]
Predicts observation class
- Parameters
X (Iterable[float]) – Array-like of shape (n_samples, n_features) containing predictors.
- Returns
Classes.
- Return type
Iterable[float]
- predict_proba(X: collections.abc.Iterable[float]) collections.abc.Iterable[float]
Predict probabilities observation of be in a class.
- Parameters
X (Iterable[float]) – Array-like of shape (n_samples, n_features) containing predictors.
- Returns
Classes probabilities.
- Return type
Iterable[float]
- static scorer_class_map(y_pred: np.ndarray[float], score_name: str, threshold: float = 0.5) np.ndarray[float]
Maps score name to class.
- Parameters
y_pred (np.ndarray[float]) – Array-like of shape (n_samples,).
score_name (str) – Name of the performance metric
threshold (float, optional) – Threshold used to transform probability into class. Defaults to 0.5.
- Returns
Array-like of shape (n_samples,).
- Return type
np.ndarray[float]
- static scorer_optimal_value(score_name: str) float
Maps score name to optimal value.
- Parameters
score_name (str) – Name of performance metric
- Returns
Optimal value.
- Return type
float
- class datatoolkit.model_selection.ClassificationCostFunction(metrics: collections.abc.Iterable[str], M: np.ndarray[float] = None, metric_class_opt_val_map: dict[str, tuple[str, float]] = None, proba_threshold: float = 0.5)
- objective(y_true: np.ndarray[float], y_pred: np.ndarray[float]) float
Objective function.
- Parameters
y_true (np.ndarray[float]) – Array-like of true labels of length N.
y_pred (np.ndarray[float]) – Array-like of predicted labels of length N.
- class datatoolkit.model_selection.CostFunction(metrics: collections.abc.Iterable[str], M: np.ndarray[float])
Abstract class for cost functions
- abstract objective(y_true: np.ndarray[float], y_pred: np.ndarray[float]) float
Objective function.
- Parameters
y_true (np.ndarray[float]) – Array-like of true labels of length N.
y_pred (np.ndarray[float]) – Array-like of predicted labels of length N.