{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "edec260f-4232-4817-a035-fe3533030be1", "metadata": {}, "source": [ "# __Create a Regression Model Using AutoML__" ] }, { "attachments": {}, "cell_type": "markdown", "id": "9d44de2f", "metadata": {}, "source": [ "- Tutorial Difficulty: ★☆☆☆☆\n", "- 5 min read\n", "- Languages: [SQL](https://en.wikipedia.org/wiki/SQL) (100%)\n", "- File location: tutorial_en/thanosql_ml/regression/automl_regression.ipynb\n", "- References: [(Kaggle) Bike Sharing Demand](https://www.kaggle.com/competitions/bike-sharing-demand/overview)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "05f3d517", "metadata": {}, "source": [ "## Tutorial Introduction\n", "\n", "
A regression is a type of machine learning(ML) that is used to predict numbers with sequential target values. For example, the model can be used to predict tomorrow's temperature or predict housing prices in a particular area.
\n", "👉 Create a bike demand regression model using the Bike Sharing Demand dataset from Kaggle, a machine learning contest platform. The goals of this contest are as follows (The data for this competition is based on information such as date and time, temperature, humidity, and wind speed from 2011 to 2012.)
\n", "\n", " | datetime | \n", "season | \n", "holiday | \n", "workingday | \n", "weather | \n", "temp | \n", "atemp | \n", "humidity | \n", "windspeed | \n", "count | \n", "
---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "2011-01-01 0:00 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "9.84 | \n", "14.395 | \n", "81 | \n", "0.0 | \n", "16 | \n", "
1 | \n", "2011-01-01 1:00 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "9.02 | \n", "13.635 | \n", "80 | \n", "0.0 | \n", "40 | \n", "
2 | \n", "2011-01-01 2:00 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "9.02 | \n", "13.635 | \n", "80 | \n", "0.0 | \n", "32 | \n", "
3 | \n", "2011-01-01 3:00 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "9.84 | \n", "14.395 | \n", "75 | \n", "0.0 | \n", "13 | \n", "
4 | \n", "2011-01-01 4:00 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "9.84 | \n", "14.395 | \n", "75 | \n", "0.0 | \n", "1 | \n", "
The bike_sharing_train table contains information of the number of bicycle rented for an hour based on information such as date and time, temperature, humidity, and wind speed from January 2011 to December 2012.
\n", "\n", " | metric | \n", "score | \n", "
---|---|---|
0 | \n", "MAE | \n", "78.6563 | \n", "
1 | \n", "MSE | \n", "10986.4542 | \n", "
2 | \n", "R2 | \n", "0.2292 | \n", "
3 | \n", "RMSLE | \n", "1.3861 | \n", "
4 | \n", "MAPE | \n", "0.5028 | \n", "
Normally, train datasets should not be used for evaluation. However, for this tutorial, the train dataset is used for convenience.
\n", "\n", " | datetime | \n", "season | \n", "holiday | \n", "workingday | \n", "weather | \n", "temp | \n", "atemp | \n", "humidity | \n", "windspeed | \n", "predict_result | \n", "
---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "2011-01-20 0:00 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "10.66 | \n", "11.365 | \n", "56 | \n", "26.0027 | \n", "102.836334 | \n", "
1 | \n", "2011-01-20 1:00 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "10.66 | \n", "13.635 | \n", "56 | \n", "0.0000 | \n", "92.060480 | \n", "
2 | \n", "2011-01-20 2:00 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "10.66 | \n", "13.635 | \n", "56 | \n", "0.0000 | \n", "92.060480 | \n", "
3 | \n", "2011-01-20 3:00 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "10.66 | \n", "12.880 | \n", "56 | \n", "11.0014 | \n", "95.181085 | \n", "
4 | \n", "2011-01-20 4:00 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "10.66 | \n", "12.880 | \n", "56 | \n", "11.0014 | \n", "95.181085 | \n", "
5 | \n", "2011-01-20 5:00 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "9.84 | \n", "11.365 | \n", "60 | \n", "15.0013 | \n", "91.816701 | \n", "
6 | \n", "2011-01-20 6:00 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "9.02 | \n", "10.605 | \n", "60 | \n", "15.0013 | \n", "87.213365 | \n", "
7 | \n", "2011-01-20 7:00 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "9.02 | \n", "10.605 | \n", "55 | \n", "15.0013 | \n", "87.054590 | \n", "
8 | \n", "2011-01-20 8:00 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "9.02 | \n", "10.605 | \n", "55 | \n", "19.0012 | \n", "88.568595 | \n", "
9 | \n", "2011-01-20 9:00 | \n", "1 | \n", "0 | \n", "1 | \n", "2 | \n", "9.84 | \n", "11.365 | \n", "52 | \n", "15.0013 | \n", "103.445460 | \n", "
If you have any difficulties creating your own model using ThanoSQL or applying it to your service, please feel free to contact us below😊
\n", "For inquiries regarding building a regression model: contact@smartmind.team
\n", "