The aim of this script is to create in Python the following bivariate polynomial regression model (the observations are represented with blue dots and the predictions with the multicolored 3D surface) :

We start by importing the necessary packages :

import pandas as pd

import numpy as np

import statsmodels.formula.api as sm

from matplotlib import cm

from matplotlib import pyplot

from matplotlib.mlab import griddata

from mpl_toolkits.mplot3d import Axes3D

Then we load the dataset

#loading the dataset

dataset = pd.read_csv('oil_dataset-2.csv')

whose first 30 rows over 1000 are as follows :

Then we build another dataset S_poly whose columns corresponds to each monoms of the targeted polynomial formula.

The Python package « sklearn » provides a function to generate this dataset, it can be used as follows :

S = dataset.iloc[:,0:2].values

from sklearn.linear_model import LinearRegression

from sklearn.preprocessing import PolynomialFeatures

poly_feats = PolynomialFeatures(degree = 4)

S_poly = poly_feats.fit_transform(S)

But in order to keep a clear understanding of the variable (column) that corresponds to each monom of the polynomial formula, we build with the following script the S_poly dataset manually. In the following M denotes the Mendacium column, D the Depth, P the Price and MiDj corresponds to (M^i.D^j).

M = dataset.iloc[:,0].values

D = dataset.iloc[:,1].values

P = dataset.iloc[:,2].values

```
```

`#preparing the data according to a degree 4 polynomial scheme :`

#the possible model formula is of the form : P ~ b00.M^0.D^0 + b10.M^1.D^0 + b20.M^2.D^0 + b30.M^3.D^0 + b40.M^4.D^0 + b01.M^0.D^1 + b11.M^1.D^1 + b21.M^2.D^1 + b31.M^3.D^1 + b02.M^0.D^2 + b12.M^1.D^2 + b22.M^2.D^2 + b03.M^0.D^3 + b13.M^1.D^3 + b04.M^0.D^4

#in the following : MiDj means (M^i.D^j)

S_poly = np.zeros((1000, 15))

S_poly[:, 0] = np.ones(1) #M0D0

S_poly[:, 1] = M*1 #M1D0

S_poly[:, 2] = M*M*1 #M2D0

S_poly[:, 3] = M*M*M*1 #M3D0

S_poly[:, 4] = M*M*M*M*1 #M4D0

S_poly[:, 5] = 1*D #M0D1

S_poly[:, 6] = M*D #M1D1

S_poly[:, 7] = M*M*D #M2D1

S_poly[:, 8] = M*M*M*D #M3D1

S_poly[:, 9] = 1*D*D #M0D2

S_poly[:, 10] = M*D*D #M1D2

S_poly[:, 11] = M*M*D*D #M2D2

S_poly[:, 12] = 1*D*D*D #M0D3

S_poly[:, 13] = M*D*D*D #M1D3

S_poly[:, 14] = 1*D*D*D*D #M0D4

S_poly contains the following array :

We start the backward elimination method by setting the optimal dataset to the 15 variables and we fit the regressor :

#Starting the backward elimination process

S_opt = S_poly[:, [0, 1, 2, 3, 4, 5, 6, 7 ,8, 9, 10, 11, 12, 13, 14]]

regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit()

regressor_OLS.summary()

This first step produce a regressor with the following summary information. We remove the variable with the greatest P such that P < t, which is x13.

Then we fit again the regressor to the new dataset with 14 variables with the following script :

# - x2 (3rd = index 2) -> M2D0

S_opt = S_poly[:, [0, 1, 3, 4, 5, 6, 7 ,8, 9, 10, 11, 12, 13, 14]]

regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit()

regressor_OLS.summary()

That produces the following regressor summary. We remove again the variable with the greatest probability such that P < t, which is x12.

And so on, we apply the steps of the backward elimination until reaching an irreducible set of variables.

# - x12 (13th = index 13) -> M1D3

S_opt = S_poly[:, [0, 1, 3, 4, 5, 6, 7 ,8, 9, 10, 11, 12, 14]]

regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit()

regressor_OLS.summary()

```
```# - x4 (5th = index 5) -> M0D1

S_opt = S_poly[:, [0, 1, 3, 4, 6, 7 ,8, 9, 10, 11, 12, 14]]

regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit()

regressor_OLS.summary()

# - x10 (11th = index 14) -> M0D4

S_opt = S_poly[:, [0, 1, 3, 4, 6, 7, 9, 10, 11, 12]]

regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit()

regressor_OLS.summary()

# - x9 (10th = index 12) -> M0D3

S_opt = S_poly[:, [0, 1, 3, 4, 6, 7, 9, 10, 11]]

regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit()

regressor_OLS.summary()

# - x8 (9th = index 11) -> M2D2

S_opt = S_poly[:, [0, 1, 3, 4, 6, 7, 9, 10]]

regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit()

regressor_OLS.summary()

# - x7 (8th = index 10) -> M1D2

S_opt = S_poly[:, [0, 1, 3, 4, 6, 7, 9]]

regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit()

regressor_OLS.summary()

# - x2 (3rd = index 3) -> M3D0

S_opt = S_poly[:, [0, 1, 4, 6, 7, 9]]

regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit()

regressor_OLS.summary()

# - x1 (2nd = index 1) -> M1D0

S_opt = S_poly[:, [0, 4, 6, 7, 9]]

regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit()

regressor_OLS.summary()

`# - x3 (4th = index 7) -> M2D1`

S_opt = S_poly[:, [0, 4, 6, 9]]

regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit()

regressor_OLS.summary()

Finaly, we remove the last useless variable with the following script :

# - x2 (3rd = index 6) -> M1D1

S_opt = S_poly[:, [0, 4, 9]]

regressor_OLS = sm.OLS(endog = P, exog = S_opt).fit()

regressor_OLS.summary()

that leads to a regressor whose following summary fits our expectations.

All along the backward elimination process, it is important to check the evolution of the Adjusted R-Squared value for each model built. In our case here, this value remains constantly equal to 0.968. The final (standard) R-squared value is also equal to 0.968. This suggests that the regression model is well fitted to the data (the closer to 1, the better).

It is the end of the backward elimination. The indices 0, 4 and 9 correspond to M0D0, M4D0 and M0D2. Therefore, the optimal model formula is P ~ M^4 + D^2.

We finish this script by displaying in a 3D space the observed and predicted Price along the z axis, where x and y axis correspond to Mendacium and Depth.

#displaying the 3D graph

x = M

y = D

z = P

zp = regressor_OLS.predict(S_opt) #the predictions

```
```xi = np.linspace(min(x), max(x))

yi = np.linspace(min(y), max(y))

X, Y = np.meshgrid(xi, yi)

ZP = griddata(x, y, zp, xi, yi)

`fig = pyplot.figure()`

ax = Axes3D(fig)

surf = ax.plot_surface(X, Y, ZP, rstride=3, cstride=3, facecolors=cm.jet(ZP/200), linewidth=0, antialiased=True)

ax.scatter(x, y, z)

ax.set_zlim3d(np.min(z), np.max(z))

colorscale = cm.ScalarMappable(cmap=cm.jet)

colorscale.set_array(z)

fig.colorbar(colorscale)

pyplot.show()

The above code produces the following 3D graph where the predictions take the shape of a colorfaded 3D surface and the observations are represented with blue dots. The model is fitting relatively well the observations since the blue dots are wrapping the 3D surface very closely.