Tag Archives: Python

Building an Image Captioning API with FastAPI and Hugging Face Transformers packaged with Docker

17 Apr

In this blog post, we’ll embark on an exciting journey of building an Image Captioning API using FastAPI and Hugging Face Transformers. Image captioning is a fascinating task that involves generating textual descriptions for given images. By leveraging the power of deep learning and natural language processing, we can create a system that automatically understands the content of an image and generates human-like captions. The example below, I input a image with a rider on a bike in a garage and the caption provides the exact details of the image.

Project Overview

πŸ‘¨β€πŸ’» GitHub: https://github.com/askaresh/blip-image-captioning-api

The goal of this project is to develop a RESTful API that accepts an image as input and returns a generated caption describing the image. We’ll be using FastAPI, a modern and fast web framework for building APIs, along with Hugging Face Transformers, a popular library for natural language processing tasks.

The key components of our project include:

  1. FastAPI: A web framework for building efficient and scalable APIs in Python.
  2. Hugging Face Transformers: A library that provides state-of-the-art pre-trained models for various NLP tasks, including image captioning.
  3. Docker: A containerization platform that allows us to package our application and its dependencies into a portable and reproducible environment.

Implementation Details

To build our Image Captioning API, we started by setting up a FastAPI project and defining the necessary endpoints. The main endpoint accepts an image file and an optional text input for conditional image captioning.

We utilized the pre-trained BLIP (Bootstrapping Language-Image Pre-training) model from Hugging Face Transformers for image captioning. BLIP is a powerful model that has been trained on a large dataset of image-caption pairs and achieves impressive results in generating accurate and coherent captions.

To ensure a smooth development experience, and ability for it to run on AnyCloud I containerized our application using Docker. This allowed us to encapsulate all the dependencies, including Python libraries and the pre-trained model, into a portable and reproducible environment.

HF-IT-DOCKER/
β”‚
β”œβ”€β”€ app/
β”‚ β”œβ”€β”€ config.py
β”‚ β”œβ”€β”€ main.py
β”‚ β”œβ”€β”€ model.py
β”‚ └── utils.py
β”‚
β”œβ”€β”€ .dockerignore
β”œβ”€β”€ .gitignore
β”œβ”€β”€ compose.yaml
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ logging.conf
β”œβ”€β”€ README.Docker.md
└── requirements.txt

Detailed description of each file:

  • app/config.py:
    • This file contains the configuration settings for the application.
    • It defines a Settings class using the pydantic_settings library to store and manage application-specific settings.
    • The blip_model_name setting specifies the name of the BLIP model to be used for image captioning.
  • app/main.py:
    • This is the main entry point of the FastAPI application.
    • It sets up the FastAPI app, loads the BLIP model, and configures logging.
    • It defines the API endpoints, including the root path (“/”) and the image captioning endpoint (“/caption”).
    • The “/caption” endpoint accepts an image file and an optional text input, processes the image, generates a caption using the BLIP model, and returns the generated caption.
  • app/model.py:
    • This file contains the functions related to loading and using the BLIP model for image captioning.
    • The load_model function loads the pre-trained BLIP model and processor based on the specified model name.
    • The generate_caption function takes an image and optional text input, preprocesses the inputs, and generates a caption using the loaded BLIP model.
  • app/utils.py:
    • This file contains utility functions used in the project.
    • The load_image_from_file function reads an image file and converts it to the appropriate format (RGB) using the PIL library.
  • .dockerignore:
    • This file specifies the files and directories that should be excluded when building the Docker image.
    • It helps to reduce the size of the Docker image by excluding unnecessary files and directories.
  • .gitignore:
    • This file specifies the files and directories that should be ignored by Git version control.
    • It helps to keep the repository clean by excluding files that are not necessary to track, such as generated files, cache files, and environment-specific files.
  • compose.yaml:
    • This file contains the configuration for Docker Compose, which is used to define and run multi-container Docker applications.
    • It defines the services, including the FastAPI server, and specifies the build context, ports, and any necessary dependencies.
  • Dockerfile:
    • This file contains the instructions for building the Docker image for the FastAPI application.
    • It specifies the base image, sets up the working directory, installs dependencies, copies the application code, and defines the entry point for running the application.
  • logging.conf:
    • This file contains the configuration for the Python logging system.
    • It defines the loggers, handlers, formatters, and their respective settings.
    • It specifies the log levels, log file paths, and log message formats.
  • README.Docker.md:
    • This file provides documentation and instructions specific to running the application using Docker.
    • It may include information on how to build the Docker image, run the container, and any other Docker-related details.
  • requirements.txt:
    • This file lists the Python dependencies required by the application.
    • It includes the necessary libraries and their versions, such as FastAPI, Hugging Face Transformers, PIL, and others.
    • It is used by pip to install the required packages when building the Docker image or setting up the development environment.

Lessons Learned and Debugging

Throughout the development process, I encountered several challenges and learned valuable lessons:

  1. Dependency Management: Managing dependencies can be tricky, especially when working with large pre-trained models. We learned the importance of properly specifying dependencies in our requirements file and using Docker to ensure consistent environments across different systems.
  2. Debugging Permission Issues: We encountered permission-related issues when running our application inside a Docker container. Through debugging, we learned the significance of properly setting file and directory permissions and running the container as a non-root user to enhance security.
  3. Logging Configuration: Proper logging is crucial for understanding the behavior of our application and troubleshooting issues. I learned how to configure logging using a configuration file and ensure that log files are written to directories with appropriate permissions.
  4. Testing and Error Handling: Comprehensive testing and error handling are essential for building a robust API. We implemented thorough error handling to provide meaningful error messages to API users and conducted extensive testing to ensure the reliability of our image captioning functionality.

Validation of the API

After the container is up and running go to http://localhost:8004/docs and select Post method and pick try out. Upload any image of your choice and enter the text (optional) and further click Execute. You will have the caption below as the output.

Conclusion

Building an Image Captioning API with FastAPI and Hugging Face Transformers has been an incredible learning experience. By leveraging the power of pre-trained models and containerization, I created a scalable and efficient solution for generating image captions automatically.

Through this project, I gained valuable insights into dependency management, debugging permission issues, logging configuration, and the importance of testing and error handling. These lessons will undoubtedly be applicable to future projects and contribute to our growth as developers.

I hope that this blog post has provided you with a comprehensive overview of our Image Captioning API project and inspired you to explore the fascinating world of image captioning and natural language processing. Feel free to reach out with any questions or suggestions, and happy captioning!

Thanks,
Aresh Sarkari

Extract Highlighted Data from PDF using Python – Example CIS Windows Server 2022 Benchmark pdf

10 Jul

In this blog post, we will explore how to extract highlighted data from a PDF using Python. Before we go ahead lets understand what is the usecase, you have the (CIS_Microsoft_Windows_Server_2022_Benchmark_v2.0.0.pdf) which is 1065 pages and you are reviewing the the policy against your enivornment and highlighting the pdf with specific color codes. For example, I use four colors for the following purposes:

  • Red Color – Missing Policies
  • Yellow Color – All existing policies
  • Pink – Policies not applicable
  • Green – Upgraded policies

Example of the highlighted text:

You dont have to use the same color codes like I have done but you get the idea. Once you have done the heavy lifting of reviewing the document and happy with the analysis. The next steps is you want to extract the highlighted data into a csv format so that the teams can review and action them.

Pre-requsites

We will use the PyMuPDF & Pandas library to parse the PDF file and extract the highlighted text. Additionally, we will apply this technique to the CIS Windows Server 2022 Benchmark PDF as an example.

Before we begin, make sure you have installed the necessary dependencies. You can install PyMuPDF and Pandas using pip:

pip install fitz
pip install pandas

First, I created a small script to go within the document pdf and detect the colors. I had to do this although, to my eyes, the colors are RED, Yellow, etc., the RGD color codes seem slightly different.

import fitz  # PyMuPDF

# Open the PDF
doc = fitz.open('CIS_Microsoft_Windows_Server_2022_Benchmark_v2.0.0.pdf')

# Set to store unique colors
unique_colors = set()

# Loop through every page
for i in range(len(doc)):
    page = doc[i]
    # Get the annotations (highlights are a type of annotation)
    annotations = page.annots()
    for annotation in annotations:
        if annotation.type[1] == 'Highlight':
            # Get the color of the highlight
            color = annotation.colors['stroke']  # Returns a RGB tuple
            unique_colors.add(color)

# Print all unique colors
for color in unique_colors:
    print(color)

You will get the following output post executing the script make sure you put the exact name of the PDF file and within the IDE of your choice cd to the directory where the above (CheckColor.py) resides.

Now we have the color codes it’s time to go ahead and extract the highlighted text. We iterate through each page of the PDF and check for any highlighted annotations. If an annotation is found, we extract the content and accumulate it in the extracted_text variable, followed by export to the csv.

Main Code

Replace "CIS_Microsoft_Windows_Server_2022_Benchmark_v2.0.0.pdf" with the actual path to your PDF file.

import fitz  # PyMuPDF
import pandas as pd

# Open the PDF
doc = fitz.open('CIS_Microsoft_Windows_Server_2022_Benchmark_v2.0.0.pdf')

# Define the RGB values for your colors
PINK = (0.9686269760131836, 0.6000000238418579, 0.8196079730987549)
YELLOW = (1.0, 0.9411770105361938, 0.4000000059604645)
GREEN = (0.49019598960876465, 0.9411770105361938, 0.4000000059604645)
RED = (0.9215689897537231, 0.2862749993801117, 0.2862749993801117)

color_definitions = {"Pink": PINK, "Yellow": YELLOW, "Green": GREEN, "Red": RED}

# Create separate lists for each color
data_by_color = {"Pink": [], "Yellow": [], "Green": [], "Red": []}

# Loop through every page
for i in range(len(doc)):
    page = doc[i]
    annotations = page.annots()
    for annotation in annotations:
        if annotation.type[1] == 'Highlight':
            color = annotation.colors['stroke']  # Returns a RGB tuple
            if color in color_definitions.values():
                # Get the detailed structure of the page
                structure = page.get_text("dict")

                # Extract highlighted text line by line
                content = []
                for block in structure["blocks"]:
                    for line in block["lines"]:
                        for span in line["spans"]:
                            r = fitz.Rect(span["bbox"])
                            if r.intersects(annotation.rect):
                                content.append(span["text"])
                
                content = " ".join(content)

                # Append the content to the appropriate color list
                for color_name, color_rgb in color_definitions.items():
                    if color == color_rgb:
                        data_by_color[color_name].append(content)

# Convert each list to a DataFrame and write to a separate .csv file
for color_name, data in data_by_color.items():
    if data:
        df = pd.DataFrame(data, columns=["Text"])
        df.to_csv(f'highlighted_text_{color_name.lower()}.csv', index=False)

After running the script, the extracted highlighted text will be saved under multiple csv files like the below screenshot:

You can now extract the highlighted text from the PDF using the above technique. Feel free to modify and adapt this code to suit your specific requirements. Extracting highlighted data from PDFs can be a powerful data analysis and research technique.

I hope you will find this helpful information for extracting data out from any PDF files. Please let me know if I have missed any steps or details, and I will be happy to update the post.

Thanks,
Aresh Sarkari