A New Data Approach in Time Series Analysis

6 min readMay 29, 2022

Empower the data structure, Enhance the data process

Time Series data is a sequence of data points indexed in time order. The most common example of time series data is the daily closing price of the stock market. Beside the stock market, we do encounter a lot of different time series data, for instance, the climate changes across time or the sales revenue of a company. Time series analysis helps organizations understand the underlying causes of trends or systemic patterns over time.

Today, we want to dive into a fancy and powerful time series data structure engine.

Time-Series-Transformer

The package not only provides functionalities to process the time series data, but it shines when we need to process multi-dimensional time series data. Moreover, the submodule, Stock_Transformer, is able to extract the stock data and calculate the technical indicators in just a few lines of code.

Installation

pip install time-series-transform

Note: Make sure tensorflow and plotly are installed

Data

In this example, we will use the climate time series data in Delhi from Kaggle

https://www.kaggle.com/datasets/sumanthvrao/daily-climate-time-series-data?resource=download

Let’s get started

When we are processing the time series data, we always need to take extra attention on the time order, and it is not easy to do so with solely numpy or even pandas. The Time_Series_Transform makes the task easier than ever.

There are two main modules in the package.

Time_Series_Transformer: the core module and it is compatible with user defined time series data
Stock_Transformer: helps us to extract stock data and provide operation to embed the technical indicator calculation library (pandas-ta)

Time Series Transformer Core

Time_Series_Transformer can helps us to format the time series data and we are able to manipulate the time series data easily.

From the dataset, we can see there are 4 different features, and we can load the data into our package from pandas.dataFrame. The timeSeriesCol will be the column of the time index, and the mainCategory field allows us to handle multi-dimensional data, we will talk about it later.

ime_Series_Transformer data
data column
-----------
date
meantemp
humidity
wind_speed
meanpressure
time length: 1462
category: None


We can slice the data from the time_series_data as below
{'date': array(['2013-01-01', '2013-01-02', '2013-01-03'], dtype=object),
 'meantemp': array([10.        ,  7.4       ,  7.16666667]),
 'humidity': array([84.5, 92. , 87. ]),
 'wind_speed': array([0.        , 2.98      , 4.63333333]),
 'meanpressure': array([1015.66666667, 1017.8, 1018.66666667])}

Data Manipulation

Most of the time, we need to preprocess the data before entering the model. In time series analysis, making lag or lead data (shifting data) is one of the most common methods. The package also provide the function to shift data.

make_lead
make_lag
make_lead_sequence
make_lag_sequence

make_lead and make_lag shift the data

make_lead_sequence and make_lag_sequence shift the data and return a window list data as feature. It is useful in producing Deep learning feature

{'date': array(['2013-01-01', '2013-01-02', '2013-01-03'], dtype=object),
 'meantemp': array([10.        ,  7.4       ,  7.16666667]),
 'humidity': array([84.5, 92. , 87. ]),
 'wind_speed': array([0.        , 2.98      , 4.63333333]),
 'meanpressure': array([1015.66666667, 1017.8, 1018.66666667]),
 'meantemp_lead_3': array([8.66666667, 6.        , 7.        ]),
 'wind_speed_lead_3': array([1.23333333, 3.7       , 1.48      ]),
 'meantemp_lead_seq_3': array([[7.4       , 7.16666667, 8.66666667],
        [7.16666667, 8.66666667, 6.        ],
        [8.66666667, 6.        , 7.        ]]),
 'meanpressure_lead_seq_3': array([[1017.8, 1018.6667, 1017.1667],
        [1018.66666667, 1017.16666667, 1016.5       ],
        [1017.16666667, 1016.5       , 1018.        ]])}

User defined function

It is also possible to use the user defined function. We can use the transform function to customize the data process.

Restriction of the customize function:

the function must take an array
output must be an array with the same size as the original data length

Note: time_series_transform.transform_core_api.util provides some general functions as we imported at the beginning

data column
-----------
date
meantemp
humidity
wind_speed
meanpressure
meantemp_lead_3
wind_speed_lead_3
meantemp_lead_seq_3
meanpressure_lead_seq_3
wind_cust_10
time length: 1462
category: None

Plot

The package also provides function to draw plot and it supports several functions customize the plot.

add_line
remove_line
update_layout
add_marker

Multi-Dimensional Time Series Data

Sometimes when we are doing Exploratory Data Analysis or preprocessing the time series data before entering the model, we might want to separate the data by their features. Handling multi-dimensional time series data is the strong point of the package.

Time_Series_Transformer can specify the mainCategoryCol parameter to point out the main category. This class only provide one columns for main category because multiple dimensions can be aggregated into a new column as main category.

From the documentation. With the mainCategoryCol, we are able to separate the data by the category and perform same data operations as we did previously.

Next we can further apply the functionality to perform advance preprocessing. To demonstrate the capability of the Time_Series_Transformer, we can separate the data by their month and investigate the behavior of the data.

Moreover, we can also use the plot function on the multi-dimension data, and it will generate several lines and we can select which month we want.

Figure 3. Multi-dimension plot (option to choose category)

Stock Transformer

Speaking of the time series analysis, the stock market must be the interest of most people. Stock_Transformer makes our life easy and simple. Extraction of the stock data can be done in just one line of code, and we can also use the power of the data manipulation like how we did previously.

Stock extraction methods:

from_stock_engine_date
from_stock_engine_period
from_stock_engine_intraday

To calculate the standard technical indicators, we can use the get_technial_indicator with pandas-ta.

Date       Open       High        Low      Close     Volume  \
25  2019-02-07  41.725962  42.098690  41.227381  41.372601  126966800   
26  2019-02-08  41.076051  41.481974  40.937501  41.421207   95280000   
27  2019-02-11  41.576775  41.615667  41.139252  41.183002   83973600   
28  2019-02-12  41.345854  41.564614  41.248625  41.537876   89134000   
29  2019-02-13  41.659413  41.924356  41.302103  41.365299   89960800   

    Dividends  Stock Splits  MACD_12_26_9  MACDh_12_26_9  MACDs_12_26_9  \
25     0.0000           0.0      1.952193            NaN            NaN   
26     0.1825           0.0      1.915505            NaN            NaN   
27     0.0000           0.0      1.845929            NaN            NaN   
28     0.0000           0.0      1.798691            NaN            NaN   
29     0.0000           0.0      1.727417            NaN            NaN   

       EMA_10        BBL        BBM        BBU  Bandwidth   Percent  
25  40.383502  34.842427  38.655327  42.468227  19.727684  0.856326  
26  40.572176  34.930402  38.865177  42.799952  20.248332  0.824800  
27  40.683235  35.136170  39.081391  43.026611  20.189764  0.766349  
28  40.838624  35.475825  39.343060  43.210295  19.659045  0.783771  
29  40.934383  35.746722  39.558949  43.371175  19.273650  0.736915