Added AutoML #51

earino · 2017-05-29T07:24:54Z

As we've discussed in Slack, H2O has recently released some very interesting AutoML functionality. In this case, the leader is the StackedEnsemble generated from a GBM grid, a DL grid, a DRF and an XRT model. On 100k records it trained for a while on some small cloud hardware, and generated a respectable AUC of 0.7284624

> md
An object of class "H2OAutoML"
Slot "project_name":
[1] "<default>"

Slot "leader":
Model Details:
==============

H2OBinomialModel: stackedensemble
Model ID:  StackedEnsemble_model_1496028880431_2818 
NULL


H2OBinomialMetrics: stackedensemble
** Reported on training data. **

MSE:  0.06495612
RMSE:  0.2548649
LogLoss:  0.2435769
Mean Per-Class Error:  0.07056041
AUC:  0.9872952
Gini:  0.9745905

Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
           N     Y    Error         Rate
N      54777  1849 0.032653  =1849/56626
Y       1450 11918 0.108468  =1450/13368
Totals 56227 13767 0.047133  =3299/69994

Maximum Metrics: Maximum metrics at their respective thresholds
                        metric threshold    value idx
1                       max f1  0.299564 0.878423 218
2                       max f2  0.243801 0.912848 242
3                 max f0point5  0.362489 0.896238 193
4                 max accuracy  0.313673 0.953653 213
5                max precision  0.974294 1.000000   0
6                   max recall  0.132957 1.000000 309
7              max specificity  0.974294 1.000000   0
8             max absolute_mcc  0.299564 0.849339 218
9   max min_per_class_accuracy  0.253667 0.943118 237
10 max mean_per_class_accuracy  0.247323 0.944984 240

Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`
H2OBinomialMetrics: stackedensemble
** Reported on validation data. **

MSE:  0.1327237
RMSE:  0.3643127
LogLoss:  0.4226191
Mean Per-Class Error:  0.3271404
AUC:  0.7433911
Gini:  0.4867822

Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
           N    Y    Error         Rate
N       9287 2974 0.242558  =2974/12261
Y       1166 1666 0.411723   =1166/2832
Totals 10453 4640 0.274299  =4140/15093

Maximum Metrics: Maximum metrics at their respective thresholds
                        metric threshold    value idx
1                       max f1  0.196506 0.445931 257
2                       max f2  0.114152 0.591573 329
3                 max f0point5  0.307013 0.439652 188
4                 max accuracy  0.579457 0.822434  82
5                max precision  0.950060 1.000000   0
6                   max recall  0.048541 1.000000 396
7              max specificity  0.950060 1.000000   0
8             max absolute_mcc  0.272812 0.299325 207
9   max min_per_class_accuracy  0.165504 0.672539 281
10 max mean_per_class_accuracy  0.156244 0.677032 289

Gains/Lift Table: Extract with `h2o.gainsLift(<model>, <data>)` or `h2o.gainsLift(<model>, valid=<T/F>, xval=<T/F>)`


Slot "leaderboard":
                                             model_id      auc  logloss
1            StackedEnsemble_model_1496028880431_2818 0.742023 0.424990
2  GBM_grid__a70036165806366cd146a852765f4af0_model_3 0.724540 0.472045
3  GBM_grid__a70036165806366cd146a852765f4af0_model_1 0.722181 0.438297
4  GBM_grid__a70036165806366cd146a852765f4af0_model_0 0.720750 0.475918
5                           DRF_model_1496028880431_4 0.718733 0.471836
6                         XRT_model_1496028880431_366 0.718564 0.439938
7   DL_grid__a70036165806366cd146a852765f4af0_model_0 0.715729 0.453427
8   DL_grid__a70036165806366cd146a852765f4af0_model_1 0.715312 0.453516
9  GBM_grid__a70036165806366cd146a852765f4af0_model_8 0.712989 0.443795
10 GBM_grid__a70036165806366cd146a852765f4af0_model_4 0.711725 0.457926
11  DL_grid__a70036165806366cd146a852765f4af0_model_2 0.711247 0.472706
12 GLM_grid__a70036165806366cd146a852765f4af0_model_0 0.709769 0.443991
13 GLM_grid__a70036165806366cd146a852765f4af0_model_1 0.709769 0.443991
14 GBM_grid__a70036165806366cd146a852765f4af0_model_6 0.705461 0.468157
15 GBM_grid__a70036165806366cd146a852765f4af0_model_2 0.703969 0.444650
16 GBM_grid__a70036165806366cd146a852765f4af0_model_5 0.697802 0.483724
17  DL_grid__a70036165806366cd146a852765f4af0_model_4 0.691404 0.497545
18 GBM_grid__a70036165806366cd146a852765f4af0_model_7 0.668311 0.897990
19  DL_grid__a70036165806366cd146a852765f4af0_model_3 0.658246 0.647369

AUC of 0.7284624 for train-0.1m.csv

Create h2o.R for newly released h2o AutoML

ledell

The leaderboard_frame is used to generate performance on a test set. If you don't provide a leaderboard_frame, it will chop off some training data to use for this purpose.

The way your code is currently written, some valuable training data (15%) goes to waste to score the leaderboard. You can fix by adding leaderboard_frame = "dx_test" in the h2o.automl() function.

Modified:

library(h2o)

h2o.init(max_mem_size="60g", nthreads=-1)

dx_train <- h2o.importFile(path = "train-0.1m.csv")
dx_test <- h2o.importFile(path = "test.csv")

Xnames <- names(dx_train)[which(names(dx_train)!="dep_delayed_15min")]

system.time({
  md <- h2o.automl(x = Xnames, y = "dep_delayed_15min", 
                                training_frame = dx_train, 
                                leaderboard_frame = dx_test)
})

system.time({
  print(h2o.auc(h2o.performance(md@leader, dx_test)))
})

# alternative way to get leader model AUC
system.time({
  print(md@leaderboard$auc[1])
})

szilard · 2017-05-30T03:13:52Z

Ensembles (the new Java implementation) + AutoML has been on my list to look at (I already did some).

However, I think I should keep this repo with the basic algos only and create new repos for looking at things build on top of those (also 99% of the training time in ensembles/autoML is spend in the building blocks, so there is no much to benchmark on speed, while the increase in AUC will be very much dataset dependent).

I already included ensembles in the course I'm teaching at UCLA, see here.

I might create a repo for autoML, thought that's also trivial, the code above changed 2 lines vs original. I would probably run it on 1M records though.

I actually already factored out GBMs from this benchmark in order to keep track with the newest best tools (added LightGBM) and forget about mediocre tools such as Spark. This new repo will have a more targeted focus (only 1M/10M records and only best GBM tools), but I might be able to update it with new versions more regularly (+add GPUs).

szilard · 2017-05-30T03:19:41Z

PS: I also started a deep learning repo a few months ago, but did not get too far (yet).

earino · 2017-05-30T04:40:08Z

following @ledell's advice, the code gives an AUC of 0.7286668 so some enhancement but not drastic on the 100k row dataset. I'm running it on the 1M overnight.

ledell · 2017-05-30T05:54:41Z

@earino How long did you run it for? If it was the default, then it probably ran for 10 minutes. We changed the default to 1 hour very recently, so if you re-run on a newer version, you should make a note of the change. In your results above, it looks like StackedEnsemble_model_1496028880431_2818 had a test AUC of ~0.74, not ~0.72...?

earino · 2017-05-30T14:16:16Z

I'm running off the nightly build I believe? Or at least very recent. This is the exact run, it took 1 hour 1 minute and 16 seconds @ledell -> https://app.dominodatalab.com/u/earino/AutoML/runs/592cf961f5f40862c7badf99 It's the output of h2o.performance that I'm looking at.

…

On Mon, May 29, 2017, 10:54 PM Erin LeDell ***@***.***> wrote: @earino <https://github.com/earino> How long did you run it for? If it was the default, that's 10 minutes. We changed the default to 1 hour recently, so if you re-build you should make a note of the change. In your results above, it looks like StackedEnsemble_model_1496028880431_2818 had a test AUC of ~0.74, not ~0.72...? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#51 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAMumfXzpu9j7FiwL4pejjr5xuBB9tvks5r-68igaJpZM4No-zF> .

earino · 2017-05-30T17:53:41Z

@ledell very explicitly, this is the exact line i'm using to get the performance number. Is it the wrong thing? print(h2o.auc(h2o.performance(md@leader, dx_test)))

ledell · 2017-05-31T20:46:29Z

@earino That line will also work, but it requires re-computing all the performance metrics on the test set. They are already computed as part of the h2o.automl() function and stored in the Leaderboard.

earino added 2 commits May 29, 2017 00:22

Create h2o.R for newly released h2o AutoML

6d6f964

AUC of 0.7284624 for train-0.1m.csv

Merge pull request #1 from earino/earino-patch-1

64f8d81

Create h2o.R for newly released h2o AutoML

ledell suggested changes May 30, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added AutoML #51

Added AutoML #51

earino commented May 29, 2017

ledell left a comment •

edited

Loading

szilard commented May 30, 2017

szilard commented May 30, 2017

earino commented May 30, 2017

ledell commented May 30, 2017 •

edited

Loading

earino commented May 30, 2017 via email

earino commented May 30, 2017

ledell commented May 31, 2017 •

edited

Loading

Added AutoML #51

Are you sure you want to change the base?

Added AutoML #51

Conversation

earino commented May 29, 2017

ledell left a comment • edited Loading

Choose a reason for hiding this comment

szilard commented May 30, 2017

szilard commented May 30, 2017

earino commented May 30, 2017

ledell commented May 30, 2017 • edited Loading

earino commented May 30, 2017 via email

earino commented May 30, 2017

ledell commented May 31, 2017 • edited Loading

ledell left a comment •

edited

Loading

ledell commented May 30, 2017 •

edited

Loading

ledell commented May 31, 2017 •

edited

Loading