well, this is realy the reason i asked my question. It is a pain to move in and ...

westurner · on Sept 15, 2021

Ctrl-F automl https://westurner.github.io/hnlog/

> /? hierarchical automl "sklearn" site:github.com : https://www.google.com/search?q=hierarchical+automl+%22sklea...

https://westurner.github.io/hnlog/#comment-18798244

> Dask-ML works with {scikit-learn, xgboost, tensorflow, TPOT,}. ETL is your responsibility. Loading things into parquet format affords a lot of flexibility in terms of (non-SQL) datastores or just efficiently packed files on disk that need to be paged into/over in RAM. (Edit)

scale-scikit-learn https://examples.dask.org/machine-learning/scale-scikit-lear... -> dask.distributed parallel predication: https://examples.dask.org/machine-learning/parallel-predicti...

"Hyperparameter optimization with Dask" https://examples.dask.org/machine-learning/hyperparam-opt.ht...

> Sklearn.pipeline.Pipeline API: {fit(), transform(), predict(), score(),} https://scikit-learn.org/stable/modules/generated/sklearn.pi... : ```

decision_function(X) # Apply transforms, and decision_function of the final estimator

fit(X[, y]) # Fit the model

fit_predict(X[, y]) # Applies fit_predict of last step in pipeline after transforms.

fit_transform(X[, y]) # Fit the model and transform with the final estimator

get_params([deep]) # Get parameters for this estimator.

predict(X, *predict_params) # Apply transforms to the data, and predict with the final estimator

predict_log_proba(X) # Apply transforms, and predict_log_proba of the final estimator

predict_proba(X) # Apply transforms, and predict_proba of the final estimator

score(X[, y, sample_weight]) # Apply transforms, and score with the final estimator

score_samples(X) # Apply transforms, and score_samples of the final estimator.

set_params(**kwargs) # Set the parameters of this estimator

```

> https://docs.featuretools.com can also minimize ad-hoc boilerplate ETL / feature engineering :

>> Featuretools is a framework to perform automated feature engineering. It excels at transforming temporal and relational datasets into feature matrices for machine learning

From https://featuretools.alteryx.com/en/stable/guides/using_dask... :

> Creating a feature matrix from a very large dataset can be problematic if the underlying pandas dataframes that make up the entities cannot easily fit in memory. To help get around this issue, Featuretools supports creating Entity and EntitySet objects from Dask dataframes. A Dask EntitySet can then be passed to featuretools.dfs or featuretools.calculate_feature_matrix to create a feature matrix, which will be returned as a Dask dataframe. In addition to working on larger than memory datasets, this approach also allows users to take advantage of the parallel and distributed processing capabilities offered by Dask