#MachineLearning #SupervisedLearning #Classification

By Billy Gustave

Run\Walk Classifier

Goal:

  • Classify activities as walk or run
  • Data: run_or_walk.csv
  • {0:'Walk', 1:'Run'}
  • Naive Bayes Classifier
  • Analyzing accuracy of acceleration vs gyro vs all

Data Exploration

In [1]:
import numpy as np, pandas as pd, matplotlib.pyplot as plt, seaborn as sns
In [2]:
df = pd.read_csv('run_or_walk.csv')
df.shape
Out[2]:
(88588, 11)
In [3]:
df.head()
Out[3]:
date time username wrist activity acceleration_x acceleration_y acceleration_z gyro_x gyro_y gyro_z
0 2017-6-30 13:51:15:847724020 viktor 0 0 0.2650 -0.7814 -0.0076 -0.0590 0.0325 -2.9296
1 2017-6-30 13:51:16:246945023 viktor 0 0 0.6722 -1.1233 -0.2344 -0.1757 0.0208 0.1269
2 2017-6-30 13:51:16:446233987 viktor 0 0 0.4399 -1.4817 0.0722 -0.9105 0.1063 -2.4367
3 2017-6-30 13:51:16:646117985 viktor 0 0 0.3031 -0.8125 0.0888 0.1199 -0.4099 -2.9336
4 2017-6-30 13:51:16:846738994 viktor 0 0 0.4814 -0.9312 0.0359 0.0527 0.4379 2.4922
In [4]:
df.describe()
Out[4]:
wrist activity acceleration_x acceleration_y acceleration_z gyro_x gyro_y gyro_z
count 88588.000000 88588.000000 88588.000000 88588.000000 88588.000000 88588.000000 88588.000000 88588.000000
mean 0.522170 0.500801 -0.074811 -0.562585 -0.313956 0.004160 0.037203 0.022327
std 0.499511 0.500002 1.009299 0.658458 0.486815 1.253423 1.198725 1.914423
min 0.000000 0.000000 -5.350500 -3.299000 -3.753800 -4.430600 -7.464700 -9.480000
25% 0.000000 0.000000 -0.381800 -1.033500 -0.376000 -0.920700 -0.644825 -1.345125
50% 1.000000 1.000000 -0.059500 -0.759100 -0.221000 0.018700 0.039300 0.006900
75% 1.000000 1.000000 0.355500 -0.241775 -0.085900 0.888800 0.733700 1.398200
max 1.000000 1.000000 5.603300 2.668000 1.640300 4.874200 8.498000 11.266200

Data Cleaning

In [5]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 88588 entries, 0 to 88587
Data columns (total 11 columns):
date              88588 non-null object
time              88588 non-null object
username          88588 non-null object
wrist             88588 non-null int64
activity          88588 non-null int64
acceleration_x    88588 non-null float64
acceleration_y    88588 non-null float64
acceleration_z    88588 non-null float64
gyro_x            88588 non-null float64
gyro_y            88588 non-null float64
gyro_z            88588 non-null float64
dtypes: float64(6), int64(2), object(3)
memory usage: 7.4+ MB

No Missing Values

In [6]:
# Features and Target
X = df.drop(['date','time','username','wrist','activity'], axis=1)
y = df.activity
X.shape
Out[6]:
(88588, 6)
In [7]:
from sklearn.model_selection import train_test_split
# train/test split
x_train,x_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=1)
In [8]:
fig, ax = plt.subplots(figsize=(16,14))
sns.heatmap(x_train.corr(), cmap='Reds', annot=True, linewidths=.5, ax=ax)
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x1e7ca30e788>

No highly correlated features

Modeling

Using Kfold and Cross Validation:

In [9]:
from sklearn.model_selection import cross_val_score, KFold
kfold = KFold(n_splits=10, random_state=1, shuffle=True)
In [10]:
from sklearn.naive_bayes import GaussianNB
nbc = GaussianNB()
acc_features = ['acceleration_x', 'acceleration_y', 'acceleration_z']
gyro_features = ['gyro_x', 'gyro_y', 'gyro_z']
# Calculate accuracy scores 
all_score = cross_val_score(nbc, x_train, y_train, cv=kfold, scoring='accuracy').mean()
print("Accuracy : {} ".format(all_score))
acc_score = cross_val_score(nbc, x_train[acc_features], y_train, cv=kfold, scoring='accuracy').mean()
print("Acceleration accuracy : {} ".format(acc_score))
gyro_score = cross_val_score(nbc, x_train[gyro_features], y_train, cv=kfold, scoring='accuracy').mean()
print("Gyro accuracy : {} ".format(gyro_score))
Accuracy : 0.9565260335826162 
Acceleration accuracy : 0.9574290955270213 
Gyro accuracy : 0.6497389586566953 

Testing

In [11]:
nbc = GaussianNB()
all_pred_y = nbc.fit(x_train,y_train).predict(x_test)
acc_pred_y = nbc.fit(x_train[acc_features],y_train).predict(x_test[acc_features])
gyro_pred_y = nbc.fit(x_train[gyro_features],y_train).predict(x_test[gyro_features])
In [12]:
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

All

In [13]:
accuracy_score(all_pred_y, y_test)
Out[13]:
0.9554690145614629
In [14]:
confusion_matrix(y_test, all_pred_y)
Out[14]:
array([[8583,   90],
       [ 699, 8346]], dtype=int64)
In [15]:
print(classification_report(y_test,all_pred_y))
              precision    recall  f1-score   support

           0       0.92      0.99      0.96      8673
           1       0.99      0.92      0.95      9045

    accuracy                           0.96     17718
   macro avg       0.96      0.96      0.96     17718
weighted avg       0.96      0.96      0.96     17718

Acceleration

In [16]:
accuracy_score(acc_pred_y, y_test)
Out[16]:
0.9565978101365843
In [17]:
confusion_matrix(y_test, acc_pred_y)
Out[17]:
array([[8610,   63],
       [ 706, 8339]], dtype=int64)
In [18]:
print(classification_report(y_test,acc_pred_y))
              precision    recall  f1-score   support

           0       0.92      0.99      0.96      8673
           1       0.99      0.92      0.96      9045

    accuracy                           0.96     17718
   macro avg       0.96      0.96      0.96     17718
weighted avg       0.96      0.96      0.96     17718

Gyro

In [19]:
accuracy_score(gyro_pred_y, y_test)
Out[19]:
0.6475335816683598
In [20]:
confusion_matrix(y_test, gyro_pred_y)
Out[20]:
array([[6528, 2145],
       [4100, 4945]], dtype=int64)
In [21]:
print(classification_report(y_test,gyro_pred_y))
              precision    recall  f1-score   support

           0       0.61      0.75      0.68      8673
           1       0.70      0.55      0.61      9045

    accuracy                           0.65     17718
   macro avg       0.66      0.65      0.64     17718
weighted avg       0.66      0.65      0.64     17718

Acceleration is the determining factor for classifying running or walking

Contact Me

www.linkedin.com/in/billygustave

billygustave.com