#MachineLearning #SupervisedLearning #Classification

By Billy Gustave

Run\Walk Classifier ¶

Goal:

Classify activities as walk or run
Data: run_or_walk.csv
{0:'Walk', 1:'Run'}
Naive Bayes Classifier
Analyzing accuracy of acceleration vs gyro vs all

Data Exploration ¶

import numpy as np, pandas as pd, matplotlib.pyplot as plt, seaborn as sns

df = pd.read_csv('run_or_walk.csv')
df.shape

(88588, 11)

df.head()

df.describe()

Data Cleaning ¶

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 88588 entries, 0 to 88587
Data columns (total 11 columns):
date              88588 non-null object
time              88588 non-null object
username          88588 non-null object
wrist             88588 non-null int64
activity          88588 non-null int64
acceleration_x    88588 non-null float64
acceleration_y    88588 non-null float64
acceleration_z    88588 non-null float64
gyro_x            88588 non-null float64
gyro_y            88588 non-null float64
gyro_z            88588 non-null float64
dtypes: float64(6), int64(2), object(3)
memory usage: 7.4+ MB

No Missing Values

# Features and Target
X = df.drop(['date','time','username','wrist','activity'], axis=1)
y = df.activity
X.shape

(88588, 6)

from sklearn.model_selection import train_test_split
# train/test split
x_train,x_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=1)

fig, ax = plt.subplots(figsize=(16,14))
sns.heatmap(x_train.corr(), cmap='Reds', annot=True, linewidths=.5, ax=ax)

<matplotlib.axes._subplots.AxesSubplot at 0x1e7ca30e788>

No highly correlated features

Modeling ¶

Using Kfold and Cross Validation:

from sklearn.model_selection import cross_val_score, KFold
kfold = KFold(n_splits=10, random_state=1, shuffle=True)

from sklearn.naive_bayes import GaussianNB
nbc = GaussianNB()
acc_features = ['acceleration_x', 'acceleration_y', 'acceleration_z']
gyro_features = ['gyro_x', 'gyro_y', 'gyro_z']
# Calculate accuracy scores 
all_score = cross_val_score(nbc, x_train, y_train, cv=kfold, scoring='accuracy').mean()
print("Accuracy : {} ".format(all_score))
acc_score = cross_val_score(nbc, x_train[acc_features], y_train, cv=kfold, scoring='accuracy').mean()
print("Acceleration accuracy : {} ".format(acc_score))
gyro_score = cross_val_score(nbc, x_train[gyro_features], y_train, cv=kfold, scoring='accuracy').mean()
print("Gyro accuracy : {} ".format(gyro_score))

Accuracy : 0.9565260335826162 
Acceleration accuracy : 0.9574290955270213 
Gyro accuracy : 0.6497389586566953

Testing

nbc = GaussianNB()
all_pred_y = nbc.fit(x_train,y_train).predict(x_test)
acc_pred_y = nbc.fit(x_train[acc_features],y_train).predict(x_test[acc_features])
gyro_pred_y = nbc.fit(x_train[gyro_features],y_train).predict(x_test[gyro_features])

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

All

accuracy_score(all_pred_y, y_test)

0.9554690145614629

confusion_matrix(y_test, all_pred_y)

array([[8583,   90],
       [ 699, 8346]], dtype=int64)

print(classification_report(y_test,all_pred_y))

              precision    recall  f1-score   support

           0       0.92      0.99      0.96      8673
           1       0.99      0.92      0.95      9045

    accuracy                           0.96     17718
   macro avg       0.96      0.96      0.96     17718
weighted avg       0.96      0.96      0.96     17718

Acceleration

accuracy_score(acc_pred_y, y_test)

0.9565978101365843

confusion_matrix(y_test, acc_pred_y)

array([[8610,   63],
       [ 706, 8339]], dtype=int64)

print(classification_report(y_test,acc_pred_y))

              precision    recall  f1-score   support

           0       0.92      0.99      0.96      8673
           1       0.99      0.92      0.96      9045

    accuracy                           0.96     17718
   macro avg       0.96      0.96      0.96     17718
weighted avg       0.96      0.96      0.96     17718

Gyro

accuracy_score(gyro_pred_y, y_test)

0.6475335816683598

confusion_matrix(y_test, gyro_pred_y)

array([[6528, 2145],
       [4100, 4945]], dtype=int64)

print(classification_report(y_test,gyro_pred_y))

              precision    recall  f1-score   support

           0       0.61      0.75      0.68      8673
           1       0.70      0.55      0.61      9045

    accuracy                           0.65     17718
   macro avg       0.66      0.65      0.64     17718
weighted avg       0.66      0.65      0.64     17718

Acceleration is the determining factor for classifying running or walking

	date	time	username	acceleration_x	acceleration_y	acceleration_z	gyro_x	gyro_y	gyro_z
0	2017-6-30	13:51:15:847724020	viktor	0.2650	-0.7814	-0.0076	-0.0590	0.0325	-2.9296
1	2017-6-30	13:51:16:246945023	viktor	0.6722	-1.1233	-0.2344	-0.1757	0.0208	0.1269
2	2017-6-30	13:51:16:446233987	viktor	0.4399	-1.4817	0.0722	-0.9105	0.1063	-2.4367
3	2017-6-30	13:51:16:646117985	viktor	0.3031	-0.8125	0.0888	0.1199	-0.4099	-2.9336
4	2017-6-30	13:51:16:846738994	viktor	0.4814	-0.9312	0.0359	0.0527	0.4379	2.4922

	wrist	activity	acceleration_x	acceleration_y	acceleration_z	gyro_x	gyro_y	gyro_z
count	88588.000000	88588.000000	88588.000000	88588.000000	88588.000000	88588.000000	88588.000000	88588.000000
mean	0.522170	0.500801	-0.074811	-0.562585	-0.313956	0.004160	0.037203	0.022327
std	0.499511	0.500002	1.009299	0.658458	0.486815	1.253423	1.198725	1.914423
min	0.000000	0.000000	-5.350500	-3.299000	-3.753800	-4.430600	-7.464700	-9.480000
25%	0.000000	0.000000	-0.381800	-1.033500	-0.376000	-0.920700	-0.644825	-1.345125
50%	1.000000	1.000000	-0.059500	-0.759100	-0.221000	0.018700	0.039300	0.006900
75%	1.000000	1.000000	0.355500	-0.241775	-0.085900	0.888800	0.733700	1.398200
max	1.000000	1.000000	5.603300	2.668000	1.640300	4.874200	8.498000	11.266200

Billy Gustave

Run/Walk Classifier

Run\Walk Classifier ¶

Data Exploration ¶

Data Cleaning ¶

Modeling ¶

Contact Me

www.linkedin.com/in/billygustave

billygustave.com

Billy Gustave