Predicting quantity of users using shared bikes daily and hourly
Introduction
Bike sharing is a recent trend that spread fast around the world. It has a control system that gives an user a bike access for a determined time slot. The user get the bike and return to the same spot within the time slot. There are many advantages to go by bike, not only for reducing the traffick and poluition but also to get a healthier population. To the companies, it is indispensible to understand the users behaviour and manage their bikes to be always avaliable.
The problem:
Predict the number of bike rental per day based on the environmental and seasonal data
Methods:
1. Data exploration and explotatory analysis
This dataset can be downloaded here
The data has strong correlated features, for example: temp and atemp, where one is the temperature and the second is the real feeling.
Looking closely:
The CNT is not normal, but quite close:
In general the data provided was very clean and without null values
2. Findind what factors impact on bike rentals
To find impacts a XGBoost regressor was used in combination to the exploratory analysis:
Different from expected, the same dataset sampled daily and hourly have totally different characteristics. The amount of bikes rented daily is most influentiated by the temperature. In contrast, the amount of bikes rented hourly depends if the day is working day or weekday.
3. Predict the number of bikes will be rented daily and hourly
For each data, 2 models were fitted: a simple sklearn regressor and a Xgboost regressor:
Daily sampled:
Hourly sampled:
Predicting the amount of bikes is an important task for the companies that need to supply the service. Luckly, with moder machine learning the task is quite achiavable with linear models. It is a fun task for who is learning ML.
In case you would like to see get the code and improve the model, both code and data is disponible HERE.