Home / Expert Answers / Computer Science / multiple-linear-regression-part-ii-this-lab-covers-theverificaion-of-multiplelinea-regression-a-pa545

(Solved): Multiple Linear Regression: Part II This lab covers theverificaion of MultipleLinea Regression a ...



Multiple Linear Regression: Part II
This lab covers theverificaion of MultipleLinea Regression assumptions. Similar to lab 05To use the linea regression model of statsmodels library, you need to add a column of ones to serve as an intercept.
\[
\ma1.1. Linearity of the Model
The dependent variable \( (Y) \) is assurned to be alinea function of the independent variables (1.4. H mosced asticity (E qual \( V \) ariance) of Residuals
Heteroscedasticity occursw hen residuals do not have constant va

combined = pd.read_csv(combined.csv)
\[
\begin{array}{l}
k \text { row, ncol = combined. shape } \\
k=\text { combined.ilocTask 6:
Function for drawing the normal QQ-plot of the residuals and running 4 statistical tests to investigate the normalityTask 7: perform Durbin-watson test Also can be checked from the OLS output
?
durbin_watson(lin_reg.resid)

Multiple Linear Regression: Part II This lab covers theverificaion of MultipleLinea Regression assumptions. Similar to lab 05 (Model 1), all independent variables areused to predict 'mpi_u ban'. In thislab, w e wreusing another popul ar library" statsmodels'. Download Files To start with, dow nload "combine.cSV (same datase used in lab 05) and 'Lab_06.ipynb'. This file contains the code tha is used to create the model and its outputs presented in this task sheet. Create a DataFrame combined pd.read_csv "combined.csv") The 'combined' dataset fedures: 79 rows columns Figure 1: Features presented in the 'com bined' datase. The first 9 columns are used as independent variables (x) and the I ast column,' moi_urbar' is the dependent vaiable . nrow, ncol combined. shape combined. iloc , :ncol -1 combined.iloc To use the linea regression model of 'statsmodels' library, you need to add a column of ones to serve as an intercept. Fit OLs Mode lin_res lin_reg. sunmar: 1.1. Linearity of the Model The dependent variable is assurned to be alinea function of the independent variables ( . Using independent variabl es with non-linea paterns causes significant prediction errors. T ask 1: To detect nonlinearity, inspect scater plots of 'observed vs. predicted values' or 'residuals vs. predicted values. Use the provided sample code. I deally, we are looking for points that are symmetrically distributed around a horizonta line in the residuals vs. predicted plot or around a di agond linein the observed vs. predicted values pl ot, in both caseswith a nearly constant variance. Nonlineaity can a so be reveded by systematic pattemsin plots of the residuals . individual features in a multidimensional dataset. Wha is your finding? 1.2. Z ero Mean of Residuals T ask 2: Obtain the mean of your model's residuals and explain your finding. Use the provided sample code. 1.3. Multicollinearity I nopection Not having multicollinearity means the eatures should belinealyindependent of each other. We used Pearson correlation heamap in Lab 05 to identify the independent variales tha show multicollinearity. A nother way to identify multicollinearity is to check the ariance Inflation Factor . All values for VIF should be 1 if fecures are not correl ated. T ask 3: Obtain the VIF value of the independent vaiabl es and explan your finding. U se the provided sample code. A rule of thumb for interpreting the varianceinflation factor. 1 = not correlated Bew een 1 and 5 = moderaly correlaed Greater than 5 = highly correl ated. 1.4. H mosced asticity (E qual ariance) of Residuals Heteroscedasticity occursw hen residuals do not have constant variance. Determining the true standard deviaion of the forecast errors becomes chalenging when the variance of residuas is nor-constant (Heterosedasticity). Task 4: To irvestigate if the residuals are homoscedastic, examine the pl ot of residuals vs. fitted and standardized residuas vs fitted. U se the provided sample code. Wha is your finding? If residuals grow either as a function of predicted value or time (in the case of time series) then heteroscedasticity is detected. Task 5: Staistica tests such as Breusch-Pagan can be also used to test the assumption of homoscedasticity. The null hypothesis assurnes homoscedasticity. If the p-value of thetest is lessthan some significance level (i.e. ) then reject the null hypothesis and conclude that heteroscedasticity is present in the regression model. Use the provided sample code to perform the Breusch-Pagantest. Wha is your finding? 1.5. Normality of the Residuals Task 6: To investigate this assumption, generate the pl ots of the residuals and perform the Anderson-Darling (AD) test. Usetheprovided sample code. If the returned AD staistic is larger than the critica value, then the null hypothesis that the data come from a norma distribution should can not be rejected. Wha is your finding? 1.6. A utocor relation of Residuals In time-series models, seria corredaion in the residuas implies that there is room for improvement in the mode. In non-time-series models, autocorrelaion in the residuals can be a sign of systemaicaly underprediction/overprediction. Task 7: To investigate if autocorreaion is present, use the result of Durbir- Wason (DW) test to evalute the assumption. See the output in Figure 2. - the test staistic aways has a value between 0 and 4 - value of 2 means tha there is no autocorredaion in the sample - values < 2 indicae positive autocorrel aion, values indicate a negative autocorrelation. Wha is your finding? combined = pd.read_csv("combined.csv") \[ \begin{array}{l} \text { K_constant }=\text { sm.add_constant }(X) \\ \text { lin_reg }=\text { sm. OLS }(Y, \bar{X} \text { constant }) \cdot f i t() \\ \text { Lin_reg.summary }() \end{array} \] Task 6: Function for drawing the normal QQ-plot of the residuals and running 4 statistical tests to investigate the normality of residuals. * model - fitted OLS models from statsmodels def normality_of_residuals_test(model): sm. ProbPlot (model.resid). qqplot(line='s'); plt.title("Q-Q plot"); ad = stats. anderson (model. resid, dist= 'norm ') ks = stats.kstest (model.resid, "norm ') print (f'Anderson-Darling test - .- statistic: \{ad.statistic:.4f\}, 5\% critical value: \{ad.critical_values[2]:.4f\}') hormality_of_residuals_test(lin_reg) Task 7: perform Durbin-watson test Also can be checked from the OLS output '? durbin_watson(lin_reg.resid)


We have an Answer from Expert

View Expert Answer

Expert Answer



Ans 1: Here's an example of how to use the provided code to test for linearity assumption using a sample dataset and linear regression model.


We have an Answer from Expert

Buy This Answer $5

Place Order

We Provide Services Across The Globe