Predicting quantity of users using shared bikes daily and hourly

Matheus Cafalchio
3 min readFeb 10, 2021

Introduction

Bike sharing is a recent trend that spread fast around the world. It has a control system that gives an user a bike access for a determined time slot. The user get the bike and return to the same spot within the time slot. There are many advantages to go by bike, not only for reducing the traffick and poluition but also to get a healthier population. To the companies, it is indispensible to understand the users behaviour and manage their bikes to be always avaliable.

image from: https://www.smartlockssupplier.com/news/How-Bike-Share-Programs-Work.html

The problem:

Predict the number of bike rental per day based on the environmental and seasonal data
Methods:
1. Data exploration and explotatory analysis

This dataset can be downloaded here

The data has strong correlated features, for example: temp and atemp, where one is the temperature and the second is the real feeling.

Looking closely:

Temp and atemp are very correlated itself and with the CNT which is our number of bikes per day

The CNT is not normal, but quite close:

Normality test Target (cnt)

In general the data provided was very clean and without null values

2. Findind what factors impact on bike rentals

To find impacts a XGBoost regressor was used in combination to the exploratory analysis:

Most important for bike rentals daily is the temperature
Most important for the same data sampled daily is the working day

Different from expected, the same dataset sampled daily and hourly have totally different characteristics. The amount of bikes rented daily is most influentiated by the temperature. In contrast, the amount of bikes rented hourly depends if the day is working day or weekday.

3. Predict the number of bikes will be rented daily and hourly

For each data, 2 models were fitted: a simple sklearn regressor and a Xgboost regressor:

Daily sampled:

Sklearn regressor
Xgboost regressor

Hourly sampled:

Sklearn regressor hourly sampled
Xgboost regressor hourly sampled

Predicting the amount of bikes is an important task for the companies that need to supply the service. Luckly, with moder machine learning the task is quite achiavable with linear models. It is a fun task for who is learning ML.

In case you would like to see get the code and improve the model, both code and data is disponible HERE.

--

--

Matheus Cafalchio
0 Followers

Neuroscientist and data science enthusiastic