Deploy data science algorithm as Django app

What this post is about

This post provides a template for building data science apps using django framework. The focus is not on modelling but rather on learning django framework. Topics covered

Data Science modelling and packing.
Creating Django framework around the model.
Deploying the model on Heroku

I have been working on a poc using newer FastAPI framework to understand the differences. More on that soon, for now, I am focusing on Django.

Source code is here. Along with Django documentation, here’s the tutorial I followed.

1.1 Building prediction model

As a model, I have used the same use case as in tutorial - predicting bank loan approval status given applicant information. However, for production, I have selected to use SVM instead of Neural network model.


#import libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
from imblearn.over_sampling import SMOTE
from sklearn.preprocessing import MinMaxScaler
import warnings
from collections import Counter
warnings.filterwarnings('ignore')
import seaborn as sns
import matplotlib.pyplot as plt
import tensorflow
from tensorflow import keras
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

df = pd.read_csv('../data/bankloan.csv')
df = df.dropna()
df = df.drop('Loan_ID', axis = 1)
df['LoanAmount'] = (df['LoanAmount'] * 1000).astype('int')
Counter(df['Loan_Status'])

pre_y = df['Loan_Status']
pre_x = df.drop('Loan_Status', axis = 1)
dm_x = pd.get_dummies(pre_x)
dm_y = pre_y.map({'Y':1, 'N':0})

smote = SMOTE(sampling_strategy = 'minority')
x1, y = smote.fit_resample(dm_x, dm_y)
sc = MinMaxScaler()
x = sc.fit_transform(x1)

X_train, X_test, y_train, y_test = train_test_split(x , y, test_size = 0.2, random_state =42, shuffle = True)

classifier = Sequential()
classifier.add(Dense(400, activation = 'relu', kernel_initializer='random_normal', input_dim=X_test.shape[1]))
classifier.add(Dense(800, activation = 'relu', kernel_initializer='random_normal'))
classifier.add(Dense(20, activation = 'relu', kernel_initializer='random_normal'))
classifier.add(Dense(1, activation = 'sigmoid', kernel_initializer='random_normal'))
classifier.compile(optimizer = 'adam', loss='binary_crossentropy', metrics=['accuracy'])
classifier.fit(X_train, y_train, batch_size = 20, epochs = 50, verbose=0)
eval_model = classifier.evaluate(X_train, y_train)
classifier.save('./rev00/')

eval_model

#output:
'''
17/17 - 0s 1ms/step - loss: 0.2335 - accuracy: 0.8983
[0.23350711166858673, 0.8983050584793091]
'''

y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.4)

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
ax = plt.subplot()
sns.heatmap(cm, annot = True, ax=ax)

ax.set_xlabel('predicted')
ax.set_ylabel('actual')
ax.set_title('confusion matrix')
ax.xaxis.set_ticklabels(['No', 'Yes'])
ax.yaxis.set_ticklabels(['No', 'Yes'])

I have tensorflow-gpu version installed. I was able to see GPU being utilized to train and test the model, pretty fast. (comparison gpu vs cpu)

and cpu

To figure out how to install cuda and tensorflow-gpu, I followed,


#K fold cross validation

from sklearn.model_selection import StratifiedKFold
kfold = StratifiedKFold(n_splits=4, shuffle = True, random_state = 42)
cvscores = []
for train, test in kfold.split(x, y):
    
    model = Sequential()
    model.add(Dense(400, activation = 'relu', kernel_initializer='random_normal', input_dim=X_test.shape[1]))
    model.add(Dense(800, activation = 'relu', kernel_initializer='random_normal'))
    model.add(Dense(20, activation = 'relu', kernel_initializer='random_normal'))
    model.add(Dense(1, activation = 'sigmoid', kernel_initializer='random_normal'))
    
    model.compile(optimizer = 'adam', loss='binary_crossentropy', metrics=['accuracy'])
    model.fit(x[train], y[train], epochs = 100, verbose=0)
    scores = model.evaluate(x[test], y[test], verbose =0)
    
    print(f"{model.metrics_names[1]} : {round(scores[1]*100, 2)}" )
    
    cvscores.append(round(scores[1]*100, 2))
    
print(f"{round(np.mean(cvscores), 2)} : {round(np.std(cvscores), 2)}" )

#output
'''
accuracy : 77.71
accuracy : 80.72
accuracy : 82.53
accuracy : 79.52
80.12 : 1.76
'''

Building SVM classifier


#SVM classifier
from sklearn import svm

clf = svm.SVC()
clf.fit(X_train, y_train) #X_test, y_test

y_pred = clf.predict(X_test)

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
ax = plt.subplot()
sns.heatmap(cm, annot = True, ax=ax)

ax.set_xlabel('predicted')
ax.set_ylabel('actual')
ax.set_title('confusion matrix')
ax.xaxis.set_ticklabels(['No', 'Yes'])
ax.yaxis.set_ticklabels(['No', 'Yes'])


#accuracy of SVM

result = loaded_model.score(X_test, y_test)
print(result)
#output

'''
0.8345864661654135
'''

1.2 Packaging

To be able to deploy this model, we will need to serialize the model using pickle along with scalers and ohe object (we will switch to one hot encoding instead of pandas get_dummies())

Below code shows pickling the required components.


#changes around one hot encoding

test = pd.read_csv('../data/bankloan.csv')
test = test.dropna()
test = test.drop('Loan_ID', axis = 1)
test['LoanAmount'] = (test['LoanAmount'] * 1000).astype('int')
test_y = test['Loan_Status']
test_y = test_y.map({'Y':1, 'N':0})
test_x = test.drop('Loan_Status', axis = 1) 

ohe= OneHotEncoder(handle_unknown='ignore', sparse=False)
ohe.fit(test_x[['Gender','Married','Education','Self_Employed','Property_Area']]) #for categorical variables

#use ohe.get_categories__ to ensure the sequencing of output

'''

to resuse saved ohe object
encoder = pickle.load(open("./rev01/ohe_rev00.sav", 'rb'))
encoded_data = encoder.transform(test_x[['Gender','Married','Education','Self_Employed','Property_Area']])

test_encoded = pd.DataFrame(ohe.transform(test_x[['Gender','Married','Education','Self_Employed','Property_Area']]), 
                            columns = ['Gender_Female', 'Gender_Male', \
        'Married_No', 'Married_Yes', \
        'Education_Graduate', 'Education_Not Graduate', \
        'Self_Employed_No', 'Self_Employed_Yes', \
        'Property_Area_Rural', 'Property_Area_Semiurban', 'Property_Area_Urban'])
test_x = test_x.reset_index(drop=True)
test_x = pd.concat([test_x[['Dependents','ApplicantIncome','CoapplicantIncome','LoanAmount','Loan_Amount_Term','Credit_History']], 
                    test_encoded], axis = 1)
                    
'''

#saving scaler
with open("./rev01/ohe_rev00.sav", "wb") as f: 
    pickle.dump(ohe, f)
    
scaler_filename = "./rev01/scaler_rev00.sav"
joblib.dump(sc, scaler_filename)

'''

#to use scaler object

scaler_filename = "./rev01/scaler_rev00.sav"
scaler = joblib.load(scaler_filename) 
test_x = scaler.fit_transform(test_x)

'''

#save the model

import pickle
filename = './rev01/svm_rev00.sav'
pickle.dump(clf, open(filename, 'wb'))

'''

# to use the model
# load the model from disk
filename = './rev01/svm_rev00.sav'
loaded_model = pickle.load(open(filename, 'rb'))
y_pred = loaded_model.predict(test_x)

'''

All the components are serialized.

2.1 Getting started with Django

There are lots of tutorial available on getting started with Django. I went through

I will cover important changes I had to do in my code which are not covered in the tutorial to make this working,

Python virtual environment is must. This helps freeze dependencies and use the requirements.txt at the time of deployment on active servers (here Heroku, I tried pythonanywhere, the virtual environment itself was more than allowed free quota of 516 MB for the entire app, hence switched to heroku)

Once you run initial setup commands


pip install django
pip install django-restframework
django-admin startproject djangoapi
cd djangoapi
python manage.py createsuperuser
python manage.py runserver # for a quick check
127.0.0.1:8000/admin

django-admin startapp myapi

modify settings.py edit apps section,

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'rest_framework',
    'myapi',
]

create copy of urls.py in myapi folder as well

cp ./djangoapi/urls.py ./myapi/

in urls.py under djangoapi, add route for myapi


urlpatterns = [
    path('admin/', admin.site.urls),
    path('api/', myapi.site.urls), 
]

2.2 Creating DJango data model

in myapi > models.py, create fields

Here’s the documentation.


# Create your models here.
from django.db import models

# Create your models here.
class approvals(models.Model):
    GENDER_CHOICES = (('Male', 'Male'),('Female', 'Female'))
    MARRIED_CHOICES = (('Yes', 'Yes'),('No', 'No'))
    GRADUATED_CHOICES = (('Graduate', 'Graduated'),('Not_Graduate', 'Not_Graduate'))
    SELFEMPLOYED_CHOICES = (('Yes', 'Yes'),('No', 'No'))
    PROPERTY_CHOICES = (('Rural', 'Rural'),('Semiurban', 'Semiurban'),('Urban', 'Urban'))
    
    firstname=models.CharField(max_length=15)
    lastname=models.CharField(max_length=15)
    dependants=models.IntegerField(default=0)
    applicantincome=models.IntegerField(default=0)
    coapplicatincome=models.IntegerField(default=0)
    loanamt=models.IntegerField(default=0)
    loanterm=models.IntegerField(default=0)
    credithistory=models.IntegerField(default=0)
    gender=models.CharField(max_length=15, choices=GENDER_CHOICES)
    married=models.CharField(max_length=15, choices=MARRIED_CHOICES)
    graduatededucation=models.CharField(max_length=15, choices=GRADUATED_CHOICES)
    selfemployed=models.CharField(max_length=15, choices=SELFEMPLOYED_CHOICES)
    area=models.CharField(max_length=15, choices=PROPERTY_CHOICES)
    
    def __str__(self):
        return '{}, {}'.format(self.lastname, self.firstname)

We need to register the model, in myapi > admin.py


from django.contrib import admin
from . models import approvals

# Register your models here.
admin.site.register(approvals)

add serializer.py in myapi
from rest_framework import serializers
from . models import approvals

class approvalsSerializers(serializers.ModelSerializer):
	class Meta:
		model=approvals
		fields='__all__'

2.3 Creating views

Views is where our internal functionality / endpoints will be defined.


def approvereject(test):
    try:
        test['LoanAmount'] = test['LoanAmount'].astype('int')
        test = test.reset_index(drop=True)
        
        encoder = pickle.load(open("./myapi/pklfiles/ohe_rev00.sav", 'rb'))
        encoded_data = encoder.transform(test[['Gender','Married','Education','Self_Employed','Property_Area']])
        encoded_df = pd.DataFrame(encoded_data, columns = ['Gender_Female', 'Gender_Male', \
        'Married_No', 'Married_Yes', \
        'Education_Graduate', 'Education_Not Graduate', \
        'Self_Employed_No', 'Self_Employed_Yes', \
        'Property_Area_Rural', 'Property_Area_Semiurban', 'Property_Area_Urban'])
        
        test_x = pd.concat([test[['Dependents','ApplicantIncome','CoapplicantIncome','LoanAmount','Loan_Amount_Term','Credit_History']], 
                    encoded_df], axis = 1)
        
        scaler_filename = "./myapi/pklfiles/scaler_rev00.sav"
        scaler = joblib.load(scaler_filename)
        test_x = scaler.fit_transform(test_x)
        
        filename = './myapi/pklfiles/svm_rev00.sav'
        loaded_model = pickle.load(open(filename, 'rb'))
        y_pred = loaded_model.predict(test_x)
        
        if y_pred[0] == 1:
            print('Approved')
            return 'Approved'
        elif y_pred[0] == 0:
            print('Rejected')
            return 'Rejected'
        
    except ValueError as e:
        return (e.args[0])
        

def cxcontact(request):
    if request.method=='POST':
        form=ApprovalForm(request.POST)
        if form.is_valid():
                Firstname = form.cleaned_data['firstname']
                Lastname = form.cleaned_data['lastname']
                Dependents = form.cleaned_data['Dependents']
                ApplicantIncome = form.cleaned_data['ApplicantIncome']
                CoapplicantIncome = form.cleaned_data['CoapplicantIncome']
                LoanAmount = form.cleaned_data['LoanAmount']
                Loan_Amount_Term = form.cleaned_data['Loan_Amount_Term']
                Credit_History = form.cleaned_data['Credit_History']
                Gender = form.cleaned_data['Gender']
                Married = form.cleaned_data['Married']
                Education = form.cleaned_data['Education']
                Self_Employed = form.cleaned_data['Self_Employed']
                Property_Area = form.cleaned_data['Property_Area']
                myDict = (request.POST).dict()
                df=pd.DataFrame(myDict, index=[0])
                
                print('######' , df)
                
                answer=approvereject(df)
                
                if int(df['LoanAmount'])<25000:
                    messages.success(request,'Application Status: {}'.format(answer))
                else:
                    messages.success(request,'Invalid: Your Loan Request Exceeds the $25,000 Limit')

    form=ApprovalForm()

    return render(request, 'myform/cxform.html', {'form':form})

Update urls in myapi > urls.py (assign the view we just created to a url in the endpoint.


urlpatterns = [
    path('api/', include(router.urls)),
    path('status/', views.approvereject),
    path('form/', views.cxcontact, name='cxform'),
]

2.4 Creating forms (Front End of the app) using crispy forms

As much fun as HTML coding can be, I am using tutorial html script as is. Here, nice frontend is created using Crispy-forms, (I have shamelessly changed the images used from static files). This will also warrent a change in settings.py > INSTALLED_APPS as


INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'rest_framework',
    'myapi',
    'crispy_forms',
]

Folder structure is important, create myapi > templates > myform > cxform.html

Source code here and here.

Test the app on local server by running >> python manage.py runserver in cmd for a quick check, the app will be deployed at 127.0.0.1:8000/

We will see all the endpoints we have configured in urls.py

And our model view is on /form/. so if we open, 127.0.0.1:8000/form/

After we submit the loan application, in this case, status is rejected as our model inferred.

3.1 Deploying on pythonanywhere

Basic structure is ready. To deploy it on pythonanywhere I started following tutorial. However, I ran into limited quota issue as my virtual environment itself was taking more than 516 MB allowed space. With miniconda or trimmed down version of dependency, we can try pythonanywhere. For now, I am using Heroku.

3.2 Deploying on Heroku

I followed tutorial. Even though app and setup is different, at high level, I could use the changes recommended in the tutorial to make it production ready.

3.2.1 Download heroku and heroku cli.

Follow the instructions https://devcenter.heroku.com/articles/heroku-cli. And create a heroku account.

3.2.2 Login and repo setup

In cms run


heroku login #use browser popup to login
cd /path/where/manage.py/is/
git add .
git commit -m "first push"
heroku create #create app #note the name, in this case it's glacial-temple-61070
heroku git:remote -a glacial-temple-61070
pip install gunicorn #required to run heroku local server
gunicorn folder_name.wsgi #allows to run app #in this case folder_name is djangoapi

Create Procfile for deploy command,

web: gunicorn djangoapi.wsgi

again in cmd


pip freeze > requirements.txt #to capture gunicorn installation in our requirements.txt
git add .
git commit -m "updated requirements.txt and added Procfile"
git push heroku master #first push , it will fail, but the source code will be migrated, we need to link in database.

3.2.3 PostgreSQL

On heroky portal, add an addon on the app page,

Get the config vars for this database,

In settings.py add 3 things


#1 database config , notice the config_var string, 
#postgres://kykfhxgomsizc:ac0aa275e38e7ab6f41348b90c446a6d1c65096ee0c362e07b16757e118806@ec2-54-211-176-156.compute-1.amazonaws.com:5432/dd0qa1g99ok65
#break it into NAME (of database), HOST, PPOSRT, USER, and PASSWORD 
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'NAME': 'dd0qa1g99ok65',
        'HOST': 'ec2-54-211-176-156.compute-1.amazonaws.com',
        'PORT':5432,
        'USER':'kykfhxgomsizc',
        'PASSWORD':'ac0aa275e38e7ab6f41348b90c446a6d1c65096ee0c362e07b16757e118806'
    }
}

#2. add allowed host
ALLOWED_HOSTS = ['glacial-temple-61070.herokuapp.com','127.0.0.1']

#3. for static files
import os
STATIC_URL = '/static/'
STATIC_ROOT = os.path.join(BASE_DIR, 'staticfiles')

One final push


git add .
git commit -m "updated settings.py"
git push heroku master

Finally