TimeSeries Decomposition in Python with statsmodels and Pandas

A lot of data is recorded in time domain, which means you will have a datapoint in the form of

timestamp: value

A useful approach to get insights into the data is, to decompose the timeseries. That usually means, you seperate your data into

seasonal
trend
residual

This famous library from R (`decompose`) is available in Python via statsmodel since version 0.6. Yeah! Let’s take a look into it with the parking lot data of city of Dresden.

The Data

The Open Data guys of Dresden (@offenesdresden) collected parking lot occupancy of a shopping mall called ‘Centrum-Galerie’ in the city of Dresden for over a year. After my talk at PyData 2015, a guy from NewYork came to me (thank you!) and said, I should decompose the data first and try to predict the occupancy of the parking lots with the decomposed timeseries. I tried, but the results were not that good, like with my approach (see talk video). Give it a try:

Centrum-Galerie-Belegung.csv

Never the less, at least this blogpost came out of this.

Pandas Time Series Decomposition with Python

After loading the .csv with Pandas with

import pandas as pd
centrumGalerie = pd.read_csv('Centrum-Galerie-Belegung.csv',
 names=['Datum', 'Belegung'],
 index_col=['Datum'],
 parse_dates=True)
centrumGalerie.Belegung.plot()

Over a year of OpenData of the parking spot ‘Centrum Galerie’ in Dresden: 100% means, you will never ever find a free place for your car there

we can simply decompose the data with statsmodels:

import statsmodels.api as sm

The `seasonal_decompose()` function needs a parameter called `freq`, which could be computed from the Pandas Timeseries, but is not fully functional right now. So we have to specify it for ourselves. The frequency of decomposition must be an interval, which ‘may’ repeat. Like a hour, a week, a day or something one is interested in. Our data is stored with 15min resolution and I want to see a weekly seasonality, so our `freq` is

\(decompfreq = \frac{24h \cdot 60min}{15min} \cdot 7days\)

The Python implementation is this:

decompfreq = 24*60/15*7

Now we can decompose the Pandas TimeSeries with statsmodels:

res = sm.tsa.seasonal_decompose(centrumGalerie.Belegung.interpolate(),
 freq=decompfreq,
 model='additive')
resplot = res.plot()

The resulting decomposed timeseries is looking like this:

Seasonal Decomposition of the data: Observed is the original data, seasonal is the repetition within freq, trend is the trend and residual is everything, which is not described by seasonal+trend

We chose `additive`, so you can add Trend+Seasonal+Residual, which should result in the `Observed`.

Evaluation of the TimeSeries Decomposition

The most interesting is the ‘Trend’, which is clearly showing some impacts of school holidays and christmas in Germany. Obviously, a lot of people drove back to the city, to gave back or change their christmas presents after 24.12.. One may ask, what the huge increase in the trend in the end of April 2015 was? Well, let’s take a look, what happened next to the ‘Centrum-Galerie’, where also a lot of parking spots were located: Beginning of a huge construction site (sorry, german).

Here is the full IPython Notebook

5 Comments

Yongduek sagt:

19. Januar 2017 um 11:11 Uhr

Danke schoen for the nice article. May I ask some questions?
1. Why is there a large gap in the graphs at around Dec 2014? Maybe Xmas holidays?
2. Does the gap not cause any problem in the process of decomposition?
I am considering a problem of missing data in data file. That is, no data during holidays but it is not regular; some weeks have 5 days but others have 4 or 3. If the freq is simply set to 5, I expect the result will not correct, and therefore a kind of data manipulation is required. But I don’t know how. If you know anything about this, any comment is very welcome. Thanks.

Antworten
X.T.Xiao sagt:

20. Januar 2020 um 03:59 Uhr

when we get the decomposition components, how to predict the future steps?

Antworten
Mister misery 1901 https://www.google.com/ 8651308 sagt:

16. Juli 2020 um 22:10 Uhr

Sun Cellular

Antworten
hello Mr. https://www.google.com/?googlegooglegooglegooglegooglegooglegooglegooglegooglegoogle 1 sagt:

28. August 2020 um 09:44 Uhr

Smart Communications

Antworten
Messing Küche armatur mit ausziehbarem Schlauch sagt:

16. Februar 2021 um 02:35 Uhr

Bitte aktualisiere öfter, weil ich deinen Blog liebe. Vielen Dank!

Antworten

Motorblog

TimeSeries Decomposition in Python with statsmodels and Pandas

The Data

Pandas Time Series Decomposition with Python

Evaluation of the TimeSeries Decomposition

5 Comments

Schreibe einen Kommentar zu Yongduek Antworten abbrechen

The Data

Pandas Time Series Decomposition with Python

Evaluation of the TimeSeries Decomposition

Das interessiert dich vielleicht auch:

5 Comments

Schreibe einen Kommentar zu Yongduek Antworten abbrechen