# Polynomial Regression in Python

The aim of this script is to create in Python the following bivariate polynomial regression model (the observations are represented with blue dots and the predictions with the multicolored 3D surface) : 3D visualization of the observations and the polynomial model in Python

We start by importing the necessary packages :
``` import pandas as pd import numpy as np import statsmodels.formula.api as sm from matplotlib import cm from matplotlib import pyplot from matplotlib.mlab import griddata from mpl_toolkits.mplot3d import Axes3D ```

``` #loading the dataset dataset = pd.read_csv('oil_dataset-2.csv') ```
whose first 30 rows over 1000 are as follows : Oil Search Dataset : Price of refined oil barrel according to the depth and concentration of Mendacium

Then we build another dataset S_poly whose columns corresponds to each monoms of the targeted polynomial formula.

The Python package « sklearn » provides a function to generate this dataset, it can be used as follows :
``` S = dataset.iloc[:,0:2].values from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures poly_feats = PolynomialFeatures(degree = 4) S_poly = poly_feats.fit_transform(S) ```

But in order to keep a clear understanding of the variable (column) that corresponds to each monom of the polynomial formula, we build with the following script the S_poly dataset manually. In the following M denotes the Mendacium column, D the Depth, P the Price and MiDj corresponds to (M^i.D^j).

``` M = dataset.iloc[:,0].values D = dataset.iloc[:,1].values P = dataset.iloc[:,2].values```

``` ```

```#preparing the data according to a degree 4 polynomial scheme : #the possible model formula is of the form : P ~ b00.M^0.D^0 + b10.M^1.D^0 + b20.M^2.D^0 + b30.M^3.D^0 + b40.M^4.D^0 + b01.M^0.D^1 + b11.M^1.D^1 + b21.M^2.D^1 + b31.M^3.D^1 + b02.M^0.D^2 + b12.M^1.D^2 + b22.M^2.D^2 + b03.M^0.D^3 + b13.M^1.D^3 + b04.M^0.D^4 #in the following : MiDj means (M^i.D^j) S_poly = np.zeros((1000, 15)) S_poly[:, 0] = np.ones(1) #M0D0 S_poly[:, 1] = M*1 #M1D0 S_poly[:, 2] = M*M*1 #M2D0 S_poly[:, 3] = M*M*M*1 #M3D0 S_poly[:, 4] = M*M*M*M*1 #M4D0 S_poly[:, 5] = 1*D #M0D1 S_poly[:, 6] = M*D #M1D1 S_poly[:, 7] = M*M*D #M2D1 S_poly[:, 8] = M*M*M*D #M3D1 S_poly[:, 9] = 1*D*D #M0D2 S_poly[:, 10] = M*D*D #M1D2 S_poly[:, 11] = M*M*D*D #M2D2 S_poly[:, 12] = 1*D*D*D #M0D3 S_poly[:, 13] = M*D*D*D #M1D3 S_poly[:, 14] = 1*D*D*D*D #M0D4 ```

S_poly contains the following array : S_poly array ready to use for polynomial regression

We start the backward elimination method by setting the optimal dataset to the 15 variables and we fit the regressor :
``` #Starting the backward elimination process S_opt = S_poly[:, [0, 1, 2, 3, 4, 5, 6, 7 ,8, 9, 10, 11, 12, 13, 14]] regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit() regressor_OLS.summary() ```

This first step produce a regressor with the following summary information. We remove the variable with the greatest P such that P < t, which is x13. Backward Elimination : STEP 1

Then we fit again the regressor to the new dataset with 14 variables with the following script :
``` # - x2 (3rd = index 2) -> M2D0 S_opt = S_poly[:, [0, 1, 3, 4, 5, 6, 7 ,8, 9, 10, 11, 12, 13, 14]] regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit() regressor_OLS.summary() ```
That produces the following regressor summary. We remove again the variable with the greatest probability such that P < t, which is x12. Backward Elimination : STEP 2

And so on, we apply the steps of the backward elimination until reaching an irreducible set of variables.
``` # - x12 (13th = index 13) -> M1D3 S_opt = S_poly[:, [0, 1, 3, 4, 5, 6, 7 ,8, 9, 10, 11, 12, 14]] regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit() regressor_OLS.summary()```

``` # - x4 (5th = index 5) -> M0D1 S_opt = S_poly[:, [0, 1, 3, 4, 6, 7 ,8, 9, 10, 11, 12, 14]] regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit() regressor_OLS.summary() # - x10 (11th = index 14) -> M0D4 S_opt = S_poly[:, [0, 1, 3, 4, 6, 7, 9, 10, 11, 12]] regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit() regressor_OLS.summary() # - x9 (10th = index 12) -> M0D3 S_opt = S_poly[:, [0, 1, 3, 4, 6, 7, 9, 10, 11]] regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit() regressor_OLS.summary() # - x8 (9th = index 11) -> M2D2 S_opt = S_poly[:, [0, 1, 3, 4, 6, 7, 9, 10]] regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit() regressor_OLS.summary() # - x7 (8th = index 10) -> M1D2 S_opt = S_poly[:, [0, 1, 3, 4, 6, 7, 9]] regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit() regressor_OLS.summary() # - x2 (3rd = index 3) -> M3D0 S_opt = S_poly[:, [0, 1, 4, 6, 7, 9]] regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit() regressor_OLS.summary() # - x1 (2nd = index 1) -> M1D0 S_opt = S_poly[:, [0, 4, 6, 7, 9]] regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit() regressor_OLS.summary() ```

```# - x3 (4th = index 7) -> M2D1 S_opt = S_poly[:, [0, 4, 6, 9]] regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit() regressor_OLS.summary() ```

Finaly, we remove the last useless variable with the following script :

``` # - x2 (3rd = index 6) -> M1D1 S_opt = S_poly[:, [0, 4, 9]] regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit() regressor_OLS.summary() ```

that leads to a regressor whose following summary fits our expectations. Backward Elimination : final step

All along the backward elimination process, it is important to check the evolution of the Adjusted R-Squared value for each model built. In our case here, this value remains constantly equal to 0.968. The final (standard) R-squared value is also equal to 0.968. This suggests that the regression model is well fitted to the data (the closer to 1, the better).

It is the end of the backward elimination. The indices 0, 4 and 9 correspond to M0D0, M4D0 and M0D2. Therefore, the optimal model formula is P ~ M^4 + D^2.

We finish this script by displaying in a 3D space the observed and predicted Price along the z axis, where x and y axis correspond to Mendacium and Depth.

``` #displaying the 3D graph x = M y = D z = P zp = regressor_OLS.predict(S_opt) #the predictions```

``` xi = np.linspace(min(x), max(x)) yi = np.linspace(min(y), max(y)) X, Y = np.meshgrid(xi, yi) ZP = griddata(x, y, zp, xi, yi) ```

```fig = pyplot.figure() ax = Axes3D(fig) surf = ax.plot_surface(X, Y, ZP, rstride=3, cstride=3, facecolors=cm.jet(ZP/200), linewidth=0, antialiased=True) ax.scatter(x, y, z) ax.set_zlim3d(np.min(z), np.max(z)) colorscale = cm.ScalarMappable(cmap=cm.jet) colorscale.set_array(z) fig.colorbar(colorscale) pyplot.show() ```

The above code produces the following 3D graph where the predictions take the shape of a colorfaded 3D surface and the observations are represented with blue dots. The model is fitting relatively well the observations since the blue dots are wrapping the 3D surface very closely. 3D visualization of the observations and the polynomial model in Python