Please see my code below.
The r2 value of 0.88 is quite strong, which is not surprising me since it is comparing against the same data that it was trained on.
Since I set up the TabularPredictor with an eval_metric of ‘r2’, I had expected model performance to be based on r2 values, but the strongest model performance value is below 0.60.
Please help me to understand why there is a difference in these scales, and how model performance is determined.
thanks!
Bill
tp_params = {
‘label’ : ‘count’,
‘eval_metric’ : ‘r2’,
‘learner_kwargs’ :{‘ignored_columns’ : [“casual”, “registered”]},
}
fit_params = {
‘time_limit’ : 10*60,
‘presets’ : ‘best_quality’,
}
predictor = TabularPredictor(**tp_params).fit(train, **fit_params)
predictor.fit_summary()
… some stuff deleted …
‘model_performance’: {
‘KNeighborsUnif_BAG_L1’: 0.21572321954622853,
‘KNeighborsDist_BAG_L1’: 0.12381434021469773,
‘LightGBMXT_BAG_L1’: 0.4522208535196661,
‘LightGBM_BAG_L1’: 0.4666187135763119,
‘RandomForestMSE_BAG_L1’: 0.5723300056850734,
‘CatBoost_BAG_L1’: 0.46612979906807883,
‘ExtraTreesMSE_BAG_L1’: 0.494905549777618,
‘NeuralNetFastAI_BAG_L1’: 0.41765027977415214,
‘XGBoost_BAG_L1’: 0.4664618870563133,
‘WeightedEnsemble_L2’: 0.5723564470427751,
‘LightGBMXT_BAG_L2’: 0.5881542849555463,
‘LightGBM_BAG_L2’: 0.5834581176615345,
‘RandomForestMSE_BAG_L2’: 0.565145793306042,
‘CatBoost_BAG_L2’: 0.5841079160637894,
‘ExtraTreesMSE_BAG_L2’: 0.5770066798309672,
‘NeuralNetFastAI_BAG_L2’: 0.5928609303113248,
‘XGBoost_BAG_L2’: 0.5784569030669493,
‘WeightedEnsemble_L3’: 0.5975306638432376},
‘model_best’: ‘WeightedEnsemble_L3’,
predictor.evaluate(train):
{‘r2’: 0.8753146512628698,
‘root_mean_squared_error’: -63.960640700352585,
‘mean_squared_error’: -4090.9635587995995,
‘mean_absolute_error’: -36.114926625038365,
‘pearsonr’: 0.9460815801595843,
‘median_absolute_error’: -18.6116943359375}