Subscribe to my newsletter
What this post is about
This post provides a template for building data science apps using django framework. The focus is not on modelling but rather on learning django framework. Topics covered
- Data Science modelling and packing.
- Creating Django framework around the model.
- Deploying the model on Heroku
I have been working on a poc using newer FastAPI framework to understand the differences. More on that soon, for now, I am focusing on Django.
Source code is here. Along with Django documentation, here’s the tutorial I followed.
1.1 Building prediction model
As a model, I have used the same use case as in tutorial - predicting bank loan approval status given applicant information. However, for production, I have selected to use SVM instead of Neural network model.
#import libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
from imblearn.over_sampling import SMOTE
from sklearn.preprocessing import MinMaxScaler
import warnings
from collections import Counter
warnings.filterwarnings('ignore')
import seaborn as sns
import matplotlib.pyplot as plt
import tensorflow
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
df = pd.read_csv('../data/bankloan.csv')
df = df.dropna()
df = df.drop('Loan_ID', axis = 1)
df['LoanAmount'] = (df['LoanAmount'] * 1000).astype('int')
Counter(df['Loan_Status'])
pre_y = df['Loan_Status']
pre_x = df.drop('Loan_Status', axis = 1)
dm_x = pd.get_dummies(pre_x)
dm_y = pre_y.map({'Y':1, 'N':0})
smote = SMOTE(sampling_strategy = 'minority')
x1, y = smote.fit_resample(dm_x, dm_y)
sc = MinMaxScaler()
x = sc.fit_transform(x1)
X_train, X_test, y_train, y_test = train_test_split(x , y, test_size = 0.2, random_state =42, shuffle = True)
classifier = Sequential()
classifier.add(Dense(400, activation = 'relu', kernel_initializer='random_normal', input_dim=X_test.shape[1]))
classifier.add(Dense(800, activation = 'relu', kernel_initializer='random_normal'))
classifier.add(Dense(20, activation = 'relu', kernel_initializer='random_normal'))
classifier.add(Dense(1, activation = 'sigmoid', kernel_initializer='random_normal'))
classifier.compile(optimizer = 'adam', loss='binary_crossentropy', metrics=['accuracy'])
classifier.fit(X_train, y_train, batch_size = 20, epochs = 50, verbose=0)
eval_model = classifier.evaluate(X_train, y_train)
classifier.save('./rev00/')
eval_model
#output:
'''
17/17 - 0s 1ms/step - loss: 0.2335 - accuracy: 0.8983
[0.23350711166858673, 0.8983050584793091]
'''
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.4)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
ax = plt.subplot()
sns.heatmap(cm, annot = True, ax=ax)
ax.set_xlabel('predicted')
ax.set_ylabel('actual')
ax.set_title('confusion matrix')
ax.xaxis.set_ticklabels(['No', 'Yes'])
ax.yaxis.set_ticklabels(['No', 'Yes'])
I have tensorflow-gpu version installed. I was able to see GPU being utilized to train and test the model, pretty fast. (comparison gpu vs cpu)
and cpu
To figure out how to install cuda and tensorflow-gpu, I followed,
- python-cuda-set-up-on-windows-10-for-gpu-support
- cuda-installation-guide-microsoft-windows
- cuda-download
- cudnn-archive
- cudnn-archive-tree
- pytorch-getting-started
#K fold cross validation
from sklearn.model_selection import StratifiedKFold
kfold = StratifiedKFold(n_splits=4, shuffle = True, random_state = 42)
cvscores = []
for train, test in kfold.split(x, y):
model = Sequential()
model.add(Dense(400, activation = 'relu', kernel_initializer='random_normal', input_dim=X_test.shape[1]))
model.add(Dense(800, activation = 'relu', kernel_initializer='random_normal'))
model.add(Dense(20, activation = 'relu', kernel_initializer='random_normal'))
model.add(Dense(1, activation = 'sigmoid', kernel_initializer='random_normal'))
model.compile(optimizer = 'adam', loss='binary_crossentropy', metrics=['accuracy'])
model.fit(x[train], y[train], epochs = 100, verbose=0)
scores = model.evaluate(x[test], y[test], verbose =0)
print(f"{model.metrics_names[1]} : {round(scores[1]*100, 2)}" )
cvscores.append(round(scores[1]*100, 2))
print(f"{round(np.mean(cvscores), 2)} : {round(np.std(cvscores), 2)}" )
#output
'''
accuracy : 77.71
accuracy : 80.72
accuracy : 82.53
accuracy : 79.52
80.12 : 1.76
'''
Building SVM classifier
#SVM classifier
from sklearn import svm
clf = svm.SVC()
clf.fit(X_train, y_train) #X_test, y_test
y_pred = clf.predict(X_test)
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
ax = plt.subplot()
sns.heatmap(cm, annot = True, ax=ax)
ax.set_xlabel('predicted')
ax.set_ylabel('actual')
ax.set_title('confusion matrix')
ax.xaxis.set_ticklabels(['No', 'Yes'])
ax.yaxis.set_ticklabels(['No', 'Yes'])
#accuracy of SVM
result = loaded_model.score(X_test, y_test)
print(result)
#output
'''
0.8345864661654135
'''
1.2 Packaging
To be able to deploy this model, we will need to serialize the model using pickle along with scalers and ohe object (we will switch to one hot encoding instead of pandas get_dummies())
Below code shows pickling the required components.
#changes around one hot encoding
test = pd.read_csv('../data/bankloan.csv')
test = test.dropna()
test = test.drop('Loan_ID', axis = 1)
test['LoanAmount'] = (test['LoanAmount'] * 1000).astype('int')
test_y = test['Loan_Status']
test_y = test_y.map({'Y':1, 'N':0})
test_x = test.drop('Loan_Status', axis = 1)
ohe= OneHotEncoder(handle_unknown='ignore', sparse=False)
ohe.fit(test_x[['Gender','Married','Education','Self_Employed','Property_Area']]) #for categorical variables
#use ohe.get_categories__ to ensure the sequencing of output
'''
to resuse saved ohe object
encoder = pickle.load(open("./rev01/ohe_rev00.sav", 'rb'))
encoded_data = encoder.transform(test_x[['Gender','Married','Education','Self_Employed','Property_Area']])
test_encoded = pd.DataFrame(ohe.transform(test_x[['Gender','Married','Education','Self_Employed','Property_Area']]),
columns = ['Gender_Female', 'Gender_Male', \
'Married_No', 'Married_Yes', \
'Education_Graduate', 'Education_Not Graduate', \
'Self_Employed_No', 'Self_Employed_Yes', \
'Property_Area_Rural', 'Property_Area_Semiurban', 'Property_Area_Urban'])
test_x = test_x.reset_index(drop=True)
test_x = pd.concat([test_x[['Dependents','ApplicantIncome','CoapplicantIncome','LoanAmount','Loan_Amount_Term','Credit_History']],
test_encoded], axis = 1)
'''
#saving scaler
with open("./rev01/ohe_rev00.sav", "wb") as f:
pickle.dump(ohe, f)
scaler_filename = "./rev01/scaler_rev00.sav"
joblib.dump(sc, scaler_filename)
'''
#to use scaler object
scaler_filename = "./rev01/scaler_rev00.sav"
scaler = joblib.load(scaler_filename)
test_x = scaler.fit_transform(test_x)
'''
#save the model
import pickle
filename = './rev01/svm_rev00.sav'
pickle.dump(clf, open(filename, 'wb'))
'''
# to use the model
# load the model from disk
filename = './rev01/svm_rev00.sav'
loaded_model = pickle.load(open(filename, 'rb'))
y_pred = loaded_model.predict(test_x)
'''
All the components are serialized.
2.1 Getting started with Django
There are lots of tutorial available on getting started with Django. I went through
I will cover important changes I had to do in my code which are not covered in the tutorial to make this working,
Python virtual environment is must. This helps freeze dependencies and use the requirements.txt at the time of deployment on active servers (here Heroku, I tried pythonanywhere, the virtual environment itself was more than allowed free quota of 516 MB for the entire app, hence switched to heroku)
Once you run initial setup commands
pip install django
pip install django-restframework
django-admin startproject djangoapi
cd djangoapi
python manage.py createsuperuser
python manage.py runserver # for a quick check
127.0.0.1:8000/admin
django-admin startapp myapi
modify settings.py edit apps section,
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'rest_framework',
'myapi',
]
create copy of urls.py in myapi folder as well
cp ./djangoapi/urls.py ./myapi/
in urls.py under djangoapi, add route for myapi
urlpatterns = [
path('admin/', admin.site.urls),
path('api/', myapi.site.urls),
]
2.2 Creating DJango data model
in myapi > models.py, create fields
Here’s the documentation.
# Create your models here.
from django.db import models
# Create your models here.
class approvals(models.Model):
GENDER_CHOICES = (('Male', 'Male'),('Female', 'Female'))
MARRIED_CHOICES = (('Yes', 'Yes'),('No', 'No'))
GRADUATED_CHOICES = (('Graduate', 'Graduated'),('Not_Graduate', 'Not_Graduate'))
SELFEMPLOYED_CHOICES = (('Yes', 'Yes'),('No', 'No'))
PROPERTY_CHOICES = (('Rural', 'Rural'),('Semiurban', 'Semiurban'),('Urban', 'Urban'))
firstname=models.CharField(max_length=15)
lastname=models.CharField(max_length=15)
dependants=models.IntegerField(default=0)
applicantincome=models.IntegerField(default=0)
coapplicatincome=models.IntegerField(default=0)
loanamt=models.IntegerField(default=0)
loanterm=models.IntegerField(default=0)
credithistory=models.IntegerField(default=0)
gender=models.CharField(max_length=15, choices=GENDER_CHOICES)
married=models.CharField(max_length=15, choices=MARRIED_CHOICES)
graduatededucation=models.CharField(max_length=15, choices=GRADUATED_CHOICES)
selfemployed=models.CharField(max_length=15, choices=SELFEMPLOYED_CHOICES)
area=models.CharField(max_length=15, choices=PROPERTY_CHOICES)
def __str__(self):
return '{}, {}'.format(self.lastname, self.firstname)
We need to register the model, in myapi > admin.py
from django.contrib import admin
from . models import approvals
# Register your models here.
admin.site.register(approvals)
add serializer.py in myapi
from rest_framework import serializers
from . models import approvals
class approvalsSerializers(serializers.ModelSerializer):
class Meta:
model=approvals
fields='__all__'
2.3 Creating views
Views is where our internal functionality / endpoints will be defined.
def approvereject(test):
try:
test['LoanAmount'] = test['LoanAmount'].astype('int')
test = test.reset_index(drop=True)
encoder = pickle.load(open("./myapi/pklfiles/ohe_rev00.sav", 'rb'))
encoded_data = encoder.transform(test[['Gender','Married','Education','Self_Employed','Property_Area']])
encoded_df = pd.DataFrame(encoded_data, columns = ['Gender_Female', 'Gender_Male', \
'Married_No', 'Married_Yes', \
'Education_Graduate', 'Education_Not Graduate', \
'Self_Employed_No', 'Self_Employed_Yes', \
'Property_Area_Rural', 'Property_Area_Semiurban', 'Property_Area_Urban'])
test_x = pd.concat([test[['Dependents','ApplicantIncome','CoapplicantIncome','LoanAmount','Loan_Amount_Term','Credit_History']],
encoded_df], axis = 1)
scaler_filename = "./myapi/pklfiles/scaler_rev00.sav"
scaler = joblib.load(scaler_filename)
test_x = scaler.fit_transform(test_x)
filename = './myapi/pklfiles/svm_rev00.sav'
loaded_model = pickle.load(open(filename, 'rb'))
y_pred = loaded_model.predict(test_x)
if y_pred[0] == 1:
print('Approved')
return 'Approved'
elif y_pred[0] == 0:
print('Rejected')
return 'Rejected'
except ValueError as e:
return (e.args[0])
def cxcontact(request):
if request.method=='POST':
form=ApprovalForm(request.POST)
if form.is_valid():
Firstname = form.cleaned_data['firstname']
Lastname = form.cleaned_data['lastname']
Dependents = form.cleaned_data['Dependents']
ApplicantIncome = form.cleaned_data['ApplicantIncome']
CoapplicantIncome = form.cleaned_data['CoapplicantIncome']
LoanAmount = form.cleaned_data['LoanAmount']
Loan_Amount_Term = form.cleaned_data['Loan_Amount_Term']
Credit_History = form.cleaned_data['Credit_History']
Gender = form.cleaned_data['Gender']
Married = form.cleaned_data['Married']
Education = form.cleaned_data['Education']
Self_Employed = form.cleaned_data['Self_Employed']
Property_Area = form.cleaned_data['Property_Area']
myDict = (request.POST).dict()
df=pd.DataFrame(myDict, index=[0])
print('######' , df)
answer=approvereject(df)
if int(df['LoanAmount'])<25000:
messages.success(request,'Application Status: {}'.format(answer))
else:
messages.success(request,'Invalid: Your Loan Request Exceeds the $25,000 Limit')
form=ApprovalForm()
return render(request, 'myform/cxform.html', {'form':form})
Update urls in myapi > urls.py (assign the view we just created to a url in the endpoint.
urlpatterns = [
path('api/', include(router.urls)),
path('status/', views.approvereject),
path('form/', views.cxcontact, name='cxform'),
]
2.4 Creating forms (Front End of the app) using crispy forms
As much fun as HTML coding can be, I am using tutorial html script as is. Here, nice frontend is created using Crispy-forms, (I have shamelessly changed the images used from static files). This will also warrent a change in settings.py > INSTALLED_APPS as
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'rest_framework',
'myapi',
'crispy_forms',
]
Folder structure is important, create myapi > templates > myform > cxform.html
Test the app on local server by running >> python manage.py runserver in cmd for a quick check, the app will be deployed at 127.0.0.1:8000/
We will see all the endpoints we have configured in urls.py
And our model view is on /form/. so if we open, 127.0.0.1:8000/form/
After we submit the loan application, in this case, status is rejected as our model inferred.
3.1 Deploying on pythonanywhere
Basic structure is ready. To deploy it on pythonanywhere I started following tutorial. However, I ran into limited quota issue as my virtual environment itself was taking more than 516 MB allowed space. With miniconda or trimmed down version of dependency, we can try pythonanywhere. For now, I am using Heroku.
3.2 Deploying on Heroku
I followed tutorial. Even though app and setup is different, at high level, I could use the changes recommended in the tutorial to make it production ready.
3.2.1 Download heroku and heroku cli.
Follow the instructionshttps://devcenter.heroku.com/articles/heroku-cli. And create a heroku account.
3.2.2 Login and repo setup
In cms run
heroku login #use browser popup to login
cd /path/where/manage.py/is/
git add .
git commit -m "first push"
heroku create #create app #note the name, in this case it's glacial-temple-61070
heroku git:remote -a glacial-temple-61070
pip install gunicorn #required to run heroku local server
gunicorn folder_name.wsgi #allows to run app #in this case folder_name is djangoapi
Create Procfile for deploy command,
web: gunicorn djangoapi.wsgi
again in cmd
pip freeze > requirements.txt #to capture gunicorn installation in our requirements.txt
git add .
git commit -m "updated requirements.txt and added Procfile"
git push heroku master #first push , it will fail, but the source code will be migrated, we need to link in database.
3.2.3 PostgreSQL
On heroky portal, add an addon on the app page,
Get the config vars for this database,
In settings.py add 3 things
#1 database config , notice the config_var string,
#postgres://kykfhxgomsizc:ac0aa275e38e7ab6f41348b90c446a6d1c65096ee0c362e07b16757e118806@ec2-54-211-176-156.compute-1.amazonaws.com:5432/dd0qa1g99ok65
#break it into NAME (of database), HOST, PPOSRT, USER, and PASSWORD
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': 'dd0qa1g99ok65',
'HOST': 'ec2-54-211-176-156.compute-1.amazonaws.com',
'PORT':5432,
'USER':'kykfhxgomsizc',
'PASSWORD':'ac0aa275e38e7ab6f41348b90c446a6d1c65096ee0c362e07b16757e118806'
}
}
#2. add allowed host
ALLOWED_HOSTS = ['glacial-temple-61070.herokuapp.com','127.0.0.1']
#3. for static files
import os
STATIC_URL = '/static/'
STATIC_ROOT = os.path.join(BASE_DIR, 'staticfiles')
One final push
git add .
git commit -m "updated settings.py"
git push heroku master
Finally