A full stack React/JavaScript and Python/Django web application that recognizes handwriting and converts it into text, by incorporating multiple machine learning models that were pre-trained using the EMNIST Dataset on Kaggle. These neural network models recognize all digits, all uppercase letters, and all lowercase letters that are visibly different from their uppercase counterparts.
The models were trained on the following characters: 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabdefghnqrt
To account for these "left out" lowercase letters that look like their uppercase complement, the final prediction for these characters are converted into lowercase if the character is drawn less than half the height of the canvas. For "tall" versions of these lowercase characters, klpy
, these characters will be converted into lowercase if their heights are less than 70% of the canvas height.
The best independent model used inside of this application is more accurate than the rest of the models created by Kaggle users who use Tensorflow/Keras. To extend onto this - when this model, a similar model, and 3 other sub-optimal models (due to Heroku limitations) are combined, accuracy increases another 0.5%
.
The Jupyter Notebook inside this repo describes how the neural network models were created for this web application. It goes step by step: from acquiring the outside dataset for learning to Heroku deployment.
- The following characters can be predicted from handwriting:
0-9, a-z, A-Z
(62 characters) - Characters can be placed anywhere on the canvas, providing that the character has some horizontal space between other characters
- Whole sentences can be created
- "Broken" and "messy" letters can be detected with pretty good accuracy
- React as the frontend
Website: Live Heroku App
- Example: A user writes and submits the handwriting, "Hey you", on the client.
- The frontend takes the image data found in the canvas element and converts it into a binary blob.
- The blob is sent as a
POST
request to Django. - The image is saved in Django and the filepath is loaded into
cv2
. - The entire "Hey you" image is trimmed of excess pixels.
- "Hey you" is cut up on each character giving us the 6 images "H", "e", "y", "y", "o", "u".
- Images are cut up where drawing lines in the x-direction are not continuous, and where the space of discontinuity is of a decent size. Small discontinuous spaces are left alone.
- The algorithm will notice a very large discontinuous space in the x-direction between the two "y" letters, which is implied to be a text-space. We will store this knowledge in the variable
space_location
.
- Each image is trimmed of excess pixels. The height of each "raw" image is accounted for in the variable
char_img_heights
. - Each image is padded with extra pixels in a way where the image becomes a square shape. This is so that the image will not be warped when the image is resized down during data normalization.
- Each image is normalized. Each image is converted to a numpy array, reshaped, and the pixel values range from 0 to 1 instead of 0 to 255.
- We loop through all of these images - each model makes a prediction at each image. The most popular prediction between the models will be added to the final character result,
final_prediction
.- Each model prediction for each image will be an output of a number between
0
through46
which corresponds to the index of the 47 characters that each model was trained on. (Ex: an output of17
corresponds toH
in the mapping). - The prediction of each model is mapped and compared with the model group.
- The most popular prediction between the models in the group will be the final prediction.
- If the final prediction between the models is alphabetical, we make sure that the lowercase compliment is found inside of the mapping. If it is not, that means we have a letter where the lower and uppercase are similar, the only difference is the size. We need to make a decision on the output casing based on the size of the image, which we get from
char_img_heights
. This decision will be performed on the images "y", "y", "o" and "u". The letter "y" gets a special constraint because its height is larger than the average lowercase letter. - While iterating, if the number of loop iterations equals a number inside
space_location
, a" "
is appended to the final result. In this example,space_location
will have[2]
signaling that there's a space after "y" - which will give us a"Hey "
at the end of the first "y" iteration.
- Each model prediction for each image will be an output of a number between
- Django responds with
final_prediction
to React with"Hey you"
, and React displays the result on the client.
After a prediction has been decided by the neural network, I personally try to be as hands-off as possible when it comes to manipulating these results.
The current prediction manipulations I use are:
- Convert characters to lowercase if a letter is both small, and the lowercase complement of the prediction is not found in the EMNIST dataset.
- The reason for this manipulation is to have access to all lowercase letters as predictions.
- If a prediction is
0
and the character is drawn quite small, the prediction is manipulated to a lowercaseo
- The reason for this manipulation is so that a small
0
will be read as ano
, much like the manipulation of uppercaseO
- At this time, I am still hesitant on keeping this manipulation.
- The reason for this manipulation is so that a small
I left in commented code where, if either characters 0
or O
were predicted, the final prediction is dependent on the ratio of height/width of the character image. If a user writes a fat circle, the result will be a capital or lowercase O
; if a user writes a narrow circle, the result will be the number 0
.
For determining "i" vs "I" (another issue with the EMNIST dataset), one could cook up some code during the cv
portion and determine if a character has a hovering dot. One could do a better height estimate for casing by taking the total character height and negating the space between the dot and the base of the "i".
- Clone the repo:
git clone https://github.com/MikeM711/Deep-Learning-Handwriting-Recognition.git
- Go into the root file:
cd Deep-Learning-Handwriting-Recognition
- Install npm packages for React:
npm install
- Make sure you have pipenv installed via pip:
sudo -H pip install pipenv
- Create a shell inside a virtual environment, at the address of your root:
pipenv shell
- Install packages for Django while inside your virtual environment:
pip install -r requirements.txt
- Run the frontend server:
npm start
- Run backend server within your virtual environment:
python manage.py runserver
Toubleshooting
- Q: "How do I know that I am in my virtual environment?"
- A: In your terminal tab, you will notice that the address of the folder is in parenthesis. It should look like
(Deep-Learning-Handwriting-Recognition)...