Contents
Sites
https://pyecon.org/lecture/ https://github.com/weijie-chen/Econometrics-With-Python https://www.cambridge.org/highereducation/books/introductory-econometrics-for-finance/75E9C608EA95A3AD87FB3BC683B9EBBF/resources/general-resources/10387D9567DA978F7E5DC6565F57F04F/python-code/8BB3DBAE944CBED5FB8F1933281B2FDE https://web.pdx.edu/~crkl/ceR/ https://www.kevinsheppard.com/teaching/python/ http://www.upfie.net/
OLS
import statsmodels.formula as sm X = sm.add_constant(data['X']) results = sm.OLS(data['Y'], X).fit() print(results.summary()) results.params results.tvalues results.t_test beta = np.linalg.inv(X.T.dot(X)).dot(X.T.dot(y)) pd.Series(beta, index=X.columns)
https://colab.research.google.com/notebooks/charts.ipynb#scrollTo=N-u5cYwpS-y0 from bokeh.io import output_notebook output_notebook()
# Create dependent and independent variables, intercept, dummies import patsy as pts y, x = pts.dmatrices('aapl ~ index', data=df, return_type='dataframe') import statsmodels.formula.api as smf result = smf.ols(formula='aapl ~ index', data=df).fit() result.summary()
SimPy
from sympy import * from sympy.plotting import plot, plot3d_parametric_line, plot3d from sympy.solvers.inequalities import reduce_rational_inequalities from sympy.stats import Poisson, Exponential, Binomial, density, moment, E, cdf import numpy as np import matplotlib.pyplot as plt # Enable the mathjax printer init_printing(use_latex='mathjax')
x, y, z = symbols('x y z') expr = (x+y) ** 2 expr expand_expr = expand(expr) expand_expr factor(expand_expr) solve(expr)
Panel Model
Time Series Forecasting
Stationarity
In time series forecasting, white noise refers to a sequence of random variables that are uncorrelated, have a constant mean and variance, and have no predictable pattern. White noise is a fundamental concept in the analysis of time series data and is often used as a benchmark to determine whether a time series contains meaningful information or just random fluctuations.
the daily closing price of GOOGL can be modeled using the random walk model. To do so, we will first determine whether our process is stationary or not. If it is a non-stationary process, we will have to apply transformations, such as differencing, in order to make it stationary. Then we will be able to use the autocorrelation function plot to conclude that the daily closing price of GOOGL can be approximated by the random walk model.
A random walk is a process in which there is an equal chance of going up or down by a random number
A random walk is a series whose first difference is stationary and uncorrelated.
A stationary time series is one whose statistical properties do not change over time. In other words, it has a constant mean, variance, and autocorrelation, and these properties are independent of time.
A transformation is a mathematical operation applied to a time series in order to make it stationary.
Differencing is a transformation that calculates the change from one timestep to another. This transformation is useful for stabilizing the mean.
Applying a log function to the series can stabilize its variance.
Once a transformation is applied to a time series, we need to test for stationarity to determine if we need to apply another transformation to make the time series stationary, or if we need to transform it at all. A common test is the augmented Dickey-Fuller (ADF) test.
The ADF test verifies the following null hypothesis: there is a unit root present in a time series. The alternative hypothesis is that there is no unit root, and therefore the time series is stationary. The result of this test is the ADF statistic, which is a negative number. The more negative it is, the stronger the rejection of the null hypothesis. In its implementation in Python, the p-value is also returned. If its value is less than 0.05, we can also reject the null hypothesis and say the series is stationary.
The Augmented Dickey-Fuller (ADF) test is a statistical test used to determine whether a time series is stationary. A stationary time series has properties such as mean, variance, and autocorrelation that do not change over time. Stationarity is a crucial assumption for many time series models, including ARIMA.
- Null Hypothesis The time series has a unit root (i.e., it is non-stationary). or its coefficient is equal to 1.
- Alternative Hypothesis The time series does not have a unit root (i.e., it is stationary). Or its coefficient is between -1 and +1.
- Reject H0 If the test statistic is less than the critical value or the p-value is less than, say 0.05, the null hypothesis of a unit root is rejected. This indicates that the series is stationary.
- Fail to Reject H0If the test statistic is greater than the critical value, the null hypothesis of a unit root cannot be rejected. This indicates that the series is non-stationary.
import pandas as pd import numpy as np from statsmodels.tsa.stattools import adfuller # Example time series data data = np.random.randn(100).cumsum() # Non-stationary series series = pd.Series(data) # Perform the ADF test result = adfuller(series) # Extract test results adf_statistic = result[0] p_value = result[1] critical_values = result[4] print(f'ADF Statistic: {adf_statistic}') print(f'p-value: {p_value}') print('Critical Values:') for key, value in critical_values.items(): print(f' {key}: {value}') if p_value < 0.05: print("Reject the null hypothesis - The series is stationary.") else: print("Fail to reject the null hypothesis - The series is non-stationary.")
autocorrelation
Autocorrelation, also known as serial correlation, measures the correlation of a time series with its own past and future values. It indicates how the values of a time series are related to its past values. Autocorrelation is a key concept in time series analysis, helping to identify patterns such as trends, seasonality, and cyclic behaviors.
import numpy as np import pandas as pd import matplotlib.pyplot as plt from statsmodels.graphics.tsaplots import plot_acf # Example time series data np.random.seed(42) data = np.random.randn(100).cumsum() # Random walk (non-stationary series) series = pd.Series(data) # Plot the time series plt.figure(figsize=(10, 4)) plt.plot(series) plt.title('Time Series Data') plt.show() # Plot the autocorrelation function (ACF) plot_acf(series, lags=20) plt.show()
moving average process
A moving average process, or the moving average (MA) model, states that the current value is linearly dependent on the current and past error terms. The error terms are assumed to be mutually independent and normally distributed, just like white noise.
An MA process models the current value of a time series as a linear combination of past white noise (random shock) terms. There are two types of MA processes: MA(q) for finite order q and the MA(∞) for infinite order.
Properties of MA Processes
- Stationarity: MA processes are always stationary because they are a weighted sum of a finite number of past white noise terms.
- Autocorrelation: The autocorrelation function (ACF) of an MA(q) process is non-zero up to lag q and zero for lags greater than q. This is a key feature used to identify the order of the MA process in practice.
Forecasting with LSTM
Deep learning has the advantage that it tends to perform better as more data is available, making it a great choice for forecasting high-dimensional time series. a dataset is considered to be large when we have more than 10,000 data points. Of course, this is an approximation rather than a hard-set limit, so if you have 8,000 data points, deep learning could be a viable option. When the size of the dataset is large, any declination of the SARIMAX model will take a long time to fit, which is not ideal for model selection, as we usually fit many models during that step. Ultimately, deep learning is used either when statistical models take too much time to fit or when they result in correlated residuals that do not approximate white noise.
Source: Time Series Forecasting in Python by Peixeiro, 2022
GARCH
https://arch.readthedocs.io/en/latest/univariate/univariate_volatility_modeling.html
https://www.kevinsheppard.com/teaching/python/notes/notebooks/example-gjr-garch/
https://medium.com/@Teckk/volatility-modelling-and-coding-garch-1-1-in-python-a89c75f3e010
- Black, F, Scholes, M (1973). The Pricing of Options and Corporate Liabilities. Journal of Political Economy, 81(3), 637–654.
- Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity. Journal of Econometrics, 31(3), 307–327.
- Engle, R. F. (1982). Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation. Econometrica, 50(4), 987.
- Hull, J. (2017). Options, Futures and Other derivatives. (10th ed.). Upper Saddle River, N.J: Pearson/Prentice Hall.
- Markowitz, H. (1952). Portfolio Selection. The Journal of Finance, 7(1), 77–91.
Volatility clustering represents a challenge in the use of econometrics, the phenomenon being a concrete example of heteroskedasticity. In order to face this, Engle (1982) proposed the ARCH model (standing for Autoregressive Conditional Heteroskedasticity). Later on, in 1986, Bollerslev extended Engle’s model and published his General Autregressive Conditional Heteroskedasticity paper.
Generalized Autoregressive Conditional Heteroskedasticity, or GARCH, is an extension of the ARCH model that incorporates a moving average component together with the autoregressive component.
Specifically, the model includes lag variance terms (e.g. the observations if modeling the white noise residual errors of another process), together with lag residual errors from a mean process.
The introduction of a moving average component allows the model to both model the conditional change in variance over time as well as changes in the time-dependent variance. Examples include conditional increases and decreases in variance.
the model introduces a new parameter “p” that describes the number of lag variance terms:
- p: The number of lag variances to include in the GARCH model.
- q: The number of lag residual errors to include in the GARCH model.
The Generalized Autoregressive Conditional Heteroskedasticity (GARCH) model is used to estimate the volatility of financial time series data. Developed by Robert Engle in 1982 and generalized by Tim Bollerslev in 1986, the GARCH model is widely used in finance to model time series data with time-varying volatility, such as stock returns or exchange rates.
### Key Concepts of the GARCH Model
1. **Autoregressive Conditional Heteroskedasticity (ARCH)**: The ARCH model, developed by Engle, assumes that the variance of the current error term (volatility) is a function of the past squared error terms. This model is useful for modeling time-varying volatility.
2. **Generalized ARCH (GARCH)**: Bollerslev extended the ARCH model to GARCH by including lagged values of both the squared error terms and the past conditional variances. This addition allows the GARCH model to capture more complex volatility structures.
### GARCH Model Equations
The GARCH(p, q) model can be expressed as:
To type a formula in WordPress type $ latex < Your LaTex code> $ without the signs < and >. To display a formula type <p align=”center”> $ latex \displaystyle <Your LaTex code> $</p>.
[latex]epsilon_t = \sigma_t z_t [/latex]
[latex]displaystyle <\epsilon_t = \sigma_t z_t \> [/latex]
[latex]int_0^1 f(x) \, \mathrm{d} x $ produces \int_0^1 f(x) \, \mathrm{d} x[/latex].
To display a formula type
$ latex \displaystyle[/latex]
.
For example
$ latex \displaystyle \sum_{n=1}^\infty \frac{1}{n^2} = \frac{\pi^2}{6}. $
yields
\[ \sigma_t^2 = \alpha_0 + \sum_{i=1}^{q} \alpha_i \epsilon_{t-i}^2 + \sum_{j=1}^{p} \beta_j \sigma_{t-j}^2 \]
– \( \epsilon_t \) is the error term or residual at time t.
– \( \sigma_t^2 \) is the conditional variance at time t.
– \( z_t \) is a white noise error term with zero mean and unit variance.
– \( \alpha_0 \) is a constant term.
– \( \alpha_i \) and \( \beta_j \) are parameters of the model.
– p is the number of lagged conditional variances included in the model.
– q is the number of lagged squared error terms included in the model.
### Assumptions
1. **White Noise Errors**: The error terms \( z_t \) are assumed to be normally distributed white noise.
2. **Stationarity**: The model requires the time series to be weakly stationary, meaning the mean and variance are constant over time.
### Estimation and Forecasting
1. **Maximum Likelihood Estimation (MLE)**: Parameters of the GARCH model are typically estimated using MLE.
2. **Forecasting Volatility**: Once estimated, the GARCH model can be used to forecast future volatility by iteratively applying the conditional variance equation.
### Example in Python
Here’s an example of how to implement a GARCH(1, 1) model using the `arch` library in Python:
```python import numpy as np import pandas as pd import matplotlib.pyplot as plt from arch import arch_model # Generate synthetic data for demonstration np.random.seed(42) n = 1000 omega = 0.1 alpha = 0.1 beta = 0.8 # Simulate GARCH(1, 1) process returns = np.zeros(n) volatility = np.zeros(n) volatility[0] = np.sqrt(omega / (1 - alpha - beta)) for t in range(1, n): volatility[t] = np.sqrt(omega + alpha * returns[t-1]**2 + beta * volatility[t-1]**2) returns[t] = volatility[t] * np.random.normal() # Convert to pandas DataFrame data = pd.DataFrame({'Returns': returns, 'Volatility': volatility}) # Plot the simulated data plt.figure(figsize=(10, 6)) plt.plot(data['Returns'], label='Returns') plt.plot(data['Volatility'], label='Volatility', color='red') plt.title('Simulated GARCH(1, 1) Process') plt.legend() plt.show() # Fit the GARCH(1, 1) model model = arch_model(data['Returns'], vol='Garch', p=1, q=1) model_fit = model.fit() # Print the model summary print(model_fit.summary()) # Forecast volatility forecasts = model_fit.forecast(horizon=10) print(forecasts.variance[-1:]) ```
### Explanation of the Code
1. **Data Generation**: We simulate a GARCH(1, 1) process with specified parameters.
2. **Visualization**: The simulated returns and volatility are plotted to visualize the time series.
3. **Model Fitting**: The `arch` library is used to fit a GARCH(1, 1) model to the simulated returns data.
4. **Forecasting**: The fitted model is used to forecast future volatility.
This example demonstrates the basic steps involved in implementing and using a GARCH model in Python for financial time series analysis.