OCR Roundup: Exploring Four Cutting-Edge APIs for OCR

Paul Ibeabuchi

August 19, 2024

นาที สำหรับการอ่าน

รูปลักษณ์โฉมใหม่แต่ยังเป็นผลิตภัณฑ์ที่ยอดเยี่ยมเช่นเดิม! HelloSign เปลี่ยนเป็น Dropbox Sign แล้วตอนนี้

Comparing OCR APIs

Optical character recognition (OCR) software is a valuable tool for extracting textual information from image files or scanned documents, facilitating the ability to store or export information in diverse formats, including PDF, TXT, CSV, or XLSX.

OCR software prepares the document for processing, employing techniques like deskewing to rectify alignment irregularities. Once processed, OCR matches text patterns to extract the desired data, which can then be exported or utilized for subsequent operations.

OCR software enables developers to effectively automate and streamline data extraction workflows. This can be useful for processing documents, such as passports, receipts, and invoices. Additionally, OCR software can be used for license plate number recognition in cases like traffic and parking management, further expanding its range of applications.

In this article, you'll learn about four popular OCR APIs: Mindee, Nanonets, Microsoft Computer Vision, and Google Cloud Vision. These OCR APIs will be compared based on their image quality, languages and fonts, accuracy, integration, pricing, and support. By the end of the article, you'll have a better idea which OCR API is right for you.

OCR APIs roundup

OCR APIs enable you to seamlessly integrate OCR capabilities into your applications. In this roundup, you'll compare the four tools mentioned previously and see how they handle images with various qualities and specifications. Following are the five images that will be used to test the capabilities of each OCR:

1. Document A: a document with good image quality:

‍

2. Document B: a document with poor image quality:

‍

3. Document C: a document with a different font style:

‍

4. Document D: a document containing a different language:

‍

5. Document E: a document containing lots of items with a distorted image:

‍

To ensure consistency, the same document has been deliberately modified to fit the different categories mentioned here, excluding document E. Because Document A underwent evaluation against each API yielding accurate results, it will not be discussed but, rather, used for reference as the original document.

Mindee

Mindee is an API-first software that offers developers the features and functionalities they need to process their documents. It leverages machine learning and computer vision to extract information from different types of documents, simplifying the process of capturing data and enhancing operational efficiency.

Image quality

Image quality refers to the overall visual characteristics and clarity of an image. The higher the image quality, the closer the image resembles the original subject or intended representation.

To test how Mindee OCR API performs when it comes to image quality, it was tested on a blurry and noisy image (i.e. Document B):

Image quality test for Mindee OCR API on Document B

‍

With a WER (word error rate) of 0.92, Mindee OCR API exhibited lower accuracy compared to other OCR APIs when tested with this image. While the API managed to correctly extract certain details, including the business name and phone number, it encountered difficulties accurately capturing the address. Additionally, it struggled to identify the date, items list, and total amount listed in the document. If you have images that are lower in quality, Mindee OCR API may not be the best choice for you.

Languages and fonts

Considering the diversity of use cases when it comes to images, it's possible that you may encounter documents with different languages and fonts. To evaluate the performance of each OCR API in handling such variations, Documents C and D were tested.

When Document C (a receipt with an uncommon font style) was uploaded to Mindee, it accurately detected (with a WER close to 0.08) fields such as the merchant address, descriptions, phone number, and total balance. However, it couldn't accurately detect the quantity value for the second item and was unable to detect the date of the transaction:

‍

In contrast, when Document D (text in German) was tested with Mindee, it could accurately detect the language and currency of the document (with a tiny WER of 0.35):

‍

Currently, Mindee OCR API supports multiple languages, including English, Dutch, French, German, and Spanish. If you're working with documents from diverse regions, Mindee OCR API may be a great option, but remember, if your documents have uncommon fonts, it can have trouble identifying the contents.

Accuracy

To further test the accuracy of Mindee, Document E was tested:

‍

Mindee OCR API had trouble accurately detecting some of the fields in this document as well as the distorted image. The WER for this run was close to 0.88.

Integration

Thanks to the comprehensive documentation of the Mindee OCR API, you can easily integrate their wide range of APIs. These APIs facilitate the processing of standard documents via prebuilt models, which include Receipt OCR, Passport OCR, and Invoice OCR.

Additionally, with the Mindee OCR API Prediction endpoint, you can extract your desired data based on your chosen model. Moreover, their software development kits SDKs, including Node.js, Python, and .NET, are helpful for seamless and fast integration.

Using its API Builder tool, you can build custom models. These models enable you to define the specific document fields you want to extract. This means you have the flexibility to continuously update your model with additional data at any time.

It's important to note that the maximum file size you can upload is 10 MB. In addition, the maximum page limit for PDF uploads is 5 pages, with an exception for invoices and receipts, which have a limit of 10 pages.

Pricing and support

Mindee has three major pricing plans:

- The free plan processes 250 pages per month and provides chat-based tech support via the web app.

- The pay-as-you-go plan also processes 250 pages per month at no cost, and then you can pay $0.10 USD per page after that. With this plan, you have access to their chat-based tech support as well as their Slack community.

- For information about the enterprise plan, you need to contact the Mindee team. This plan has exclusive support, including a private Slack channel.

Nanonets

Nanonets is a tool that provides machine learning APIs to enterprises and individual users, helping them seamlessly automate their manual workflows. The Nanonets OCR API is capable of accepting and processing over 300 different document types, including checks, passports, invoices, ID cards, and receipts documents.

Unlike Mindee, Nanonets doesn't provide any SDKs, but it does allow you to train your own model.