Full-stack Data Science: Building & deploying an ML app tutorial — Part 1

Data Scientists NEED to learn to package and deploy their own models.

I’m not being a gatekeeper here, I’m giving you facts. I interview, hire, and lead data professionals, whether it be Data Scientists, Data Analysts, Machine Learning Engineers, or Data Engineers. Packaging and deploying models are consistently gaps for people without a software engineering background.

That’s why in this article and video I’ll show you a rapid deployment of a Natural Language Processing (NLP) app, from start to finish. I’m not worrying about developing the model because modeling isn’t the gap I see in the market.

Alright, let’s get started.

Setting up the project environment

I used PyCharm as my IDE, as you can see in the video above, but you should be fine with any python IDE.

To get started, we’re going to open our terminal and run the following command to create the application directory:

Now ‘cd’ into the directory

Create the poetry project by running:

Note: Alternatively, you could run poetry new ner-service which would start a structured poetry project for you.

After you run the ‘poetry init’ command, you’ll go through a project setup that looks something like this:

Now that we have the poetry project setup, let’s launch the poetry shell.

If you’ve done everything properly to this point, you should see something like this in your terminal:

Setting up the project file structure

Alright, now that we have our environment up and running, let’s set up our project structure.

The first thing we’ll want to do is create our ‘src’ directory. To do that we’ll run:

Then we’ll want to create the __init__.py and main.py files.

At this point, your project structure should look like this:

Spacy’s Named Entity Recognition (NER)

In this article and video guide, I didn’t spend much time on the NER or Spacy explanations but based on responses from early viewers of the video, that was a mistake. So let’s talk quickly about SpaCy.

Background Information: SpaCy & NER

SpaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython

Wikipedia

SpaCy is a fantastic library used to simplify the building and development of NLP solutions. In this project, for the sake of simplicity, we’re using SpaCy’s built-in named-entity recognition (NER) feature.

If you’d like to learn more about NER, you can check out the screenshot below or see the link in the image description.

Screenshot from Wikipedia definition of named-entity recognition (link)

Implementing SpaCy’s NER system

To get the project moving and keep things iterative, we’re going to use the example code from SpaCy’s Named Entity Recognition 101 as our boilerplate.

Now, to use spaCy, we’ll need to add it to our environment. You can do that by running the following code:

You’re terminal will look something like this after:

Then you’ll want to add the language model.

Then just to make sure everything has properly installed, run:

At this point, your pyproject.toml file should look something like this:

Now, let’s quickly test that the code works in our environment. Run the following command:

If everything is running as expected in your environment you should get an output like this:

Setup the API with FastAPI

Now that we have Poetry setup and SpaCy working in our environment, let’s set up our API.

Before we jump in, I’m going to introduce FastAPI. Feel free to skip to the Implementing FastAPI section.

What is FastAPI?

According to the FastAPI website:

FastAPI is a modern, fast (high-performance), web framework for building APIs with Python 3.6+ based on standard Python type hints.

https://fastapi.tiangolo.com/

I don’t want to be too lazy about this, but that quote pretty much sums it up. FastAPI is fast, easy, clean, and extensible.

Why use FastAPI?

There are a ton of reasons to use FastAPI, but I’ll list a few reasons my team and I at Aptive Resources switched over from Flask to FastAPI for most of our Python services.

  • FastAPI comes with Swagger docs built-in. This is awesome for rapid prototyping and testing your API.
  • Clear, concise documentation and examples.
  • Extensibility.
  • Speed, speed, speed to production.

Implementing FastAPI

Ok, so now that we generally know what FastAPI is and why to use it, let’s add it to our project with the following commands:

That command will install FastAPI and Uvicorn. According to the documentation, “Uvicorn is a lightning-fast ASGI server implementation, using uvloop and httptools.” For our use case, Uvicorn helps us serve our app to the world.

Now that FastAPI and Uvicorn are installed, let’s go back to main.py and implement FastAPI.

That’s a lot of new code added to the file, so let’s go through it piece by piece.

This part simply instantiates the FastAPI application.

This sets our route or path. For example, www.mktr.ai/ner-service would have the above path if this were a service we ran from the MKTR.AI website.

The async for path operation functions are super helpful and I suggest you take a look at FastAPI’s documentation to learn more.

Notice the payload: Payload piece is telling the application the type of data to expect. We’ll get to that in a minute when we make a models.py file. For now, think of it as a way to format the data we’ll accept in a request to our API.

Here we’re using list comprehension to tokenize the text data that’s passed to our API. The List[spacy.tokens.doc.Doc] portion declares the type/format of the data we’re assigning to the tokenize_content variable. This may be a little redundant but becomes more important as you attempt to account for edge cases and potential issues in production.

Here we’re creating a list, document_entities, and using list comprehension to create a dictionary with the text and entity type for each piece of text passed to the API. The document_entities list is a list of dictionaries.

Finally, we format our response object. Based on the previous chunks, you can probably tell what’s going on here. Basically, the Entities() piece hydrates the Entities objects for each text string passed.

Ok, now that the main.py file is good to go, we’re going to create another python file named models.py like this:

Then let’s add the following code to models.py:

Basically, Content() is a single object with a post_url string and a content string. The Payload() object is a list of Content objects.

The same goes for Entities and SingleEntity.

Test your FastAPI locally

Now that you have the baseline code written, let’s test it out by running the following code:

After you run the above code, your terminal should look something like this:

If everything is going as planned, you should be able to visit http://127.0.0.1:8000/docs and check out your FastAPI app and swagger doc.

In closing…

Alright, alright, alright! You have a FastAPI app running locally. Congrats!

In the next post, we’ll containerize our app using Docker, push the docker image to DockerHub, setup a GCP Virtual Machine, and run our app so the world can use it.

If you’re inpatient, you can always cut to the chase and watch the original YouTube video for this project and the GitHub repo.

I hope this was helpful! If you run into issues or have any questions, drop a comment, shoot me an email, or connect with me on LinkedIn and I’ll get you squared away.

This article was originally posted on MKTR.AI.

Data hacker. Tinkerer.