Javascript is required
/machine-learning/bootcamp/11-linear-regression.md

11 Linear Regression

Machine Learning algorithm that predicts a continuous value, it aims to minimize the distance between the points and the line of best fit.
It is a supervised learning algorithm.

Representing Linear Regression

The equation of a line is y = mx + b, where m is the slope and b is the y-intercept.
Good plots to represent linear regression are scatterplots, jointplots, linearmodelplots and residualplots.
It's always a good practice to also use a pairgrid to see the relationship between all the features.

Measures for Linear Regression

Common metrics for linear regression are:

  • Mean Absolute Error (MAE)
  • Mean Squared Error (MSE)
  • Root Mean Squared Error (RMSE)
  • Explained Variance Score (EVS)

Residuals

Residuals are the difference between the actual value and the predicted value.
Residuals are a good way to evaluate the performance of a model, and they should be normally distributed.
It's possible to represent residuals with a displot like this:

sns.displot((y_test-predictions),bins=50);

Linear Regression with Python

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

USAhousing = pd.read_csv('USA_Housing.csv')
USAhousing.head()
USAhousing.info()
USAhousing.describe()
USAhousing.columns

sns.pairplot(USAhousing)
sns.displot(USAhousing['Price'])

sns.heatmap(USAhousing.corr())

X = USAhousing[['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms',
       'Avg. Area Number of Bedrooms', 'Area Population']]

y = USAhousing['Price']

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=101)

from sklearn.linear_model import LinearRegression

lm = LinearRegression()

lm.fit(X_train,y_train)

print(lm.intercept_)
print(lm.coef_)
cdf = pd.DataFrame(lm.coef_,X.columns,columns=['Coeff'])

predictions = lm.predict(X_test)

plt.scatter(y_test,predictions)

sns.displot((y_test-predictions),bins=50);

from sklearn import metrics

print('MAE:', metrics.mean_absolute_error(y_test, predictions))
print('MSE:', metrics.mean_squared_error(y_test, predictions))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, predictions)))
print('EVS:', metrics.explained_variance_score(y_test, predictions))

Python

GO

PythonGOmachine-learningnumpypandasseabornlinear-regressionsklearnmatplotlibmetricsmdcsv