Comparing OCR APIs
Optical character recognition (OCR) software is a valuable tool for extracting textual information from image files or scanned documents, facilitating the ability to store or export information in diverse formats, including PDF, TXT, CSV, or XLSX.
OCR software prepares the document for processing, employing techniques like deskewing to rectify alignment irregularities. Once processed, OCR matches text patterns to extract the desired data, which can then be exported or utilized for subsequent operations.
OCR software enables developers to effectively automate and streamline data extraction workflows. This can be useful for processing documents, such as passports, receipts, and invoices. Additionally, OCR software can be used for license plate number recognition in cases like traffic and parking management, further expanding its range of applications.
In this article, you'll learn about four popular OCR APIs: Mindee, Nanonets, Microsoft Computer Vision, and Google Cloud Vision. These OCR APIs will be compared based on their image quality, languages and fonts, accuracy, integration, pricing, and support. By the end of the article, you'll have a better idea which OCR API is right for you.
OCR APIs roundup
OCR APIs enable you to seamlessly integrate OCR capabilities into your applications. In this roundup, you'll compare the four tools mentioned previously and see how they handle images with various qualities and specifications. Following are the five images that will be used to test the capabilities of each OCR:
1. Document A: a document with good image quality:
2. Document B: a document with poor image quality:
3. Document C: a document with a different font style:
4. Document D: a document containing a different language:
5. Document E: a document containing lots of items with a distorted image:
To ensure consistency, the same document has been deliberately modified to fit the different categories mentioned here, excluding document E. Because Document A underwent evaluation against each API yielding accurate results, it will not be discussed but, rather, used for reference as the original document.
Mindee
Mindee is an API-first software that offers developers the features and functionalities they need to process their documents. It leverages machine learning and computer vision to extract information from different types of documents, simplifying the process of capturing data and enhancing operational efficiency.
Image quality
Image quality refers to the overall visual characteristics and clarity of an image. The higher the image quality, the closer the image resembles the original subject or intended representation.
To test how Mindee OCR API performs when it comes to image quality, it was tested on a blurry and noisy image (i.e. Document B):
With a WER (word error rate) of 0.92, Mindee OCR API exhibited lower accuracy compared to other OCR APIs when tested with this image. While the API managed to correctly extract certain details, including the business name and phone number, it encountered difficulties accurately capturing the address. Additionally, it struggled to identify the date, items list, and total amount listed in the document. If you have images that are lower in quality, Mindee OCR API may not be the best choice for you.
Languages and fonts
Considering the diversity of use cases when it comes to images, it's possible that you may encounter documents with different languages and fonts. To evaluate the performance of each OCR API in handling such variations, Documents C and D were tested.
When Document C (a receipt with an uncommon font style) was uploaded to Mindee, it accurately detected (with a WER close to 0.08) fields such as the merchant address, descriptions, phone number, and total balance. However, it couldn't accurately detect the quantity value for the second item and was unable to detect the date of the transaction:
In contrast, when Document D (text in German) was tested with Mindee, it could accurately detect the language and currency of the document (with a tiny WER of 0.35):
Currently, Mindee OCR API supports multiple languages, including English, Dutch, French, German, and Spanish. If you're working with documents from diverse regions, Mindee OCR API may be a great option, but remember, if your documents have uncommon fonts, it can have trouble identifying the contents.
Accuracy
To further test the accuracy of Mindee, Document E was tested:
Mindee OCR API had trouble accurately detecting some of the fields in this document as well as the distorted image. The WER for this run was close to 0.88.
Integration
Thanks to the comprehensive documentation of the Mindee OCR API, you can easily integrate their wide range of APIs. These APIs facilitate the processing of standard documents via prebuilt models, which include Receipt OCR, Passport OCR, and Invoice OCR.
Additionally, with the Mindee OCR API Prediction endpoint, you can extract your desired data based on your chosen model. Moreover, their software development kits SDKs, including Node.js, Python, and .NET, are helpful for seamless and fast integration.
Using its API Builder tool, you can build custom models. These models enable you to define the specific document fields you want to extract. This means you have the flexibility to continuously update your model with additional data at any time.
It's important to note that the maximum file size you can upload is 10 MB. In addition, the maximum page limit for PDF uploads is 5 pages, with an exception for invoices and receipts, which have a limit of 10 pages.
Pricing and support
Mindee has three major pricing plans:
- The free plan processes 250 pages per month and provides chat-based tech support via the web app.
- The pay-as-you-go plan also processes 250 pages per month at no cost, and then you can pay $0.10 USD per page after that. With this plan, you have access to their chat-based tech support as well as their Slack community.
- For information about the enterprise plan, you need to contact the Mindee team. This plan has exclusive support, including a private Slack channel.
Nanonets
Nanonets is a tool that provides machine learning APIs to enterprises and individual users, helping them seamlessly automate their manual workflows. The Nanonets OCR API is capable of accepting and processing over 300 different document types, including checks, passports, invoices, ID cards, and receipts documents.
Unlike Mindee, Nanonets doesn't provide any SDKs, but it does allow you to train your own model.
Image quality
To test the ability of Nanonets to process low-quality images, Document B was tested:
Although Nanonets did a better job than Mindee (with a WER close to 0.33), it was unable to capture fields such as table number, second item description, quantity values, and item amounts.
Languages and fonts
To determine if Nanonets OCR API could handle various fonts, Document C was tested:
Again, Nanonets accurately extracts values for all the desired fields (with a WER close to 0) and outperforms Mindee in this category.
Now, let's see how Nanonets does with various languages:
As you can see, Nanonets successfully processed the German-language document, again with a WER close to 0. Nanonets supports over 40 languages.
Accuracy
To test the accuracy of Nanonets even further, let's see how it does with Document E:
Again, Nanonets OCR API outperformed Mindee and successfully captured several details, including the date, address, merchant name, phone number, total amount, as well as some list items. Its WER was close to 0.15. However, it struggled with accurately identifying prices.
Similar to Mindee, Nanonets couldn't identify the Table No or Customer Name because the default model doesn't recognize these fields. However, one of the benefits of Nanonets is that you can train the model to recognize these fields if you want to.
Integration
Unfortunately, the Nanonets OCR API documentation is not as detailed as the Mindee OCR API documentation. For instance, the endpoints can only be found in the sample code snippets provided. Additionally, even though the documentation indicates that the authentication in your request headers should be Authorization: API_KEY, it actually requires you to use Basic Auth, where your API key is converted to a 'base64' string before it's used.
However, the documentation does include code snippets for common programming languages, including Node.js, Python, and JavaScript, making it easy to get started. There's also a sample Postman collection you can use.
The four main OCR APIs offered by Nanonets include the following:
- Model API contains endpoints that allow you to create an OCR model for your document type. The structure of the model determines the structure of the extracted data (*ie* the API response). A `model_id` property that is returned in the API response can be used as a reference for that model.
- Upload API provides you with endpoints that let you upload various documents that you can use to train your models. Usually, these documents are different copies of similar structures.
- Train API contains endpoints you can use to train your model to help improve accuracy. By supplying the OCR's model ID to this endpoint, Nanonets can identify the model you want to train.
- Predict API includes endpoints that you can use to retrieve your desired data. You simply upload a file, supplying the model ID you want to use, and based on that model, the API returns a response that includes the extracted data.
Pricing and support
Nanonets has three pricing plans:
- At the Starter level, Nanonets lets you upload 500 pages for free. Once you've met that limit, you can pay $0.03 USD for every additional document. The only support offered at this level is the chat feature available on their website (where you get a response from the support team within 24 hours) or via their publicly available email address.
- The Pro plan is targeted at teams that want to automate time-consuming workflows and processes. For this plan, teams are charged $499 USD per month per model. At this level, you're provided with email support that prioritizes the subscribed user with a faster response time.
- The Enterprise plan is aimed at businesses, as it offers the most benefits, such as white-labeled solutions, customized client onboarding, and dedicated support. For further pricing information, you can contact Nanonets.
Microsoft Computer Vision
Microsoft Computer Vision is a cloud-based technology that can process and extract text from images using advanced algorithms. Its OCR solution is widely adopted by businesses around the world, such as Siemens and Uber.
Unlike Mindee and Nanonets, Microsoft Computer Vision extracts raw text data that returns structured data in key-value pairs. It also provides the Form Recognizer API, which you can use to extract data in a structured format similar to Mindee and Nanonets.
The Microsoft Computer Vision OCR APIs provide developers with the necessary tools to create applications with powerful image analysis capabilities. It supports various programming languages, such as JavaScript, Python, .NET, and C#, as well as many spoken languages, including French, German, and Spanish.
Image quality
As with our other tools, Document B was used to test the Microsoft Computer Vision OCR API when it comes to its image quality capabilities:
Compared to Nanonets and Mindee, Microsoft Computer Vision is more reliable when dealing with documents with poor image quality, with a WER close to 0.08. It extracted almost every detail on the document accurately.
Language and fonts
To see how Microsoft Computer Vision handles different fonts, Document C was tested:
And to test a different language, Document D was used:
In both cases, the WER was close to 0. The results show that Microsoft Computer Vision OCR API is extremely accurate when it comes to analyzing both different fonts and different languages.
Accuracy
To further test Microsoft Computer Visions accuracy, Document E was tested:
Despite the image distortion and a lengthy list of items, Microsoft Computer Vision demonstrated remarkable accuracy (with a WER close to 0.04) in accurately recognizing all the characters on the document.
In comparison to other OCR APIs, Microsoft Computer Vision emerges as the superior choice in terms of accuracy.
Integration
Microsoft Computer Vision provides you with exhaustive integration documentation for its OCR APIs. Unfortunately, because it's so exhaustive, beginners may find it complicated and intimidating. However, with its API reference, you can quickly access endpoints that let you train your own model or use preexisting models to extract texts from your image documents.
Similar to Mindee, Microsoft Computer Vision OCR also provides you with SDKs and REST APIs that you can use to integrate OCR functionalities into your application.
Pricing and support
Unlike the other OCR APIs mentioned here, Microsoft Computer Vision does not offer tiered pricing. Instead, they use a pricing calculator to help you anticipate costs.
Following are some basic pricing estimates Microsoft Computer Vision provides:
- 0–1 million transactions: $1.00 USD per 1,000 transactions
- 1–10 million transactions: $0.65 USD per 1,000 transactions
- 10–100 million transactions: $0.60 USD per 1,000 transactions
- 100 million-plus transactions: $0.40 USD per 1,000 transactions
Additionally, you get $200 USD in credit to be used in the first 30 days. Once the 30 days are up, you move to pay-as-you-go pricing, which requires you to pay based on your usage.
As for the support available, there are a few levels:
- Azure Developer Support is for developers just trying out the solution in a nonproduction environment. This plan costs $29 USD per month, and support is only available via email.
- Azure Standard Support is for users running a production workload. This plan costs $100 USD per month and offers 24/7 support by phone or email.
- Azure ProDirect Support is created for business-critical functions that usually require faster response time. This service costs $1,000 USD per month and also offers 24/7 support as well as other benefits, such as creating and managing Azure support tickets programmatically.
- Microsoft Enterprise Support is a comprehensive Microsoft technology support that is targeted at enterprises and offers fast support, which is needed in critical situations.
Google Cloud Vision
The last OCR to review comes from Google Cloud Vision, which offers a range of solutions, including its Vision API that processes images.
Vision API is the OCR solution of Google that is used by top organizations, including PayPal, *The New York Times*, and Twitter. It supports multiple languages, such as French, Spanish, English, and Italian.
Similar to Microsoft Computer Vision, Google Cloud Vision also returns raw unstructured data. However, it also provides you with its Document AI, which takes unstructured data and returns it in a structured format.
Image quality
Let's see how Vision API performs when it comes to low-quality images:
As you can see, Vision API successfully extracted some details from Document B, but it struggled to detect specific fields, such as item amounts, total amount, and an accurate address. Its WER was close to 0.2.
Languages and fonts
Now, let's test how Vision API handles different fonts with Document C:
And let's use Document D to see if Vision API can handle a different language:
As you can see, Vision API successfully handled both scenarios (with WERs close to 0), which means you can rely on it if you have documents that fall into these categories.
Accuracy
As with all the other OCR APIs, let's see if Vision API can accurately interpret Document E, which contains a list of items and a distorted image:
Just like Microsoft Computer Vision, Vision API was able to extract all the details of Document E.
According to our test results, Vision API has excellent accuracy except when it comes to documents with low-quality images.
Integration
To properly integrate Vision API, Google provides simple documentation that supports both REST and RPC APIs. Google also has several tutorials and blog posts discussing Vision API and how to use it.
Additionally, Google has several SDKs that can help enable easier integrations.
Pricing and support
Google offers three pricing plans:
- Free for the first 1,000 units/month
- $1.50 USD for 1,001–5,000,000 units/month
- $0.60 USD for above 5,000,000 units/month
Additionally, there are three available support plans:
- The Standard Support plan is recommended for workloads in the development and trial stages. It costs $29 USD per month, but this plan only offers support during business hours and in English.
- The Enhanced Support plan is designed for workloads in the production environment. It costs $500 USD per month and includes more language support, such as Mandarin, Japanese, Korean, and Chinese. You also get 24/7 response time for critical issues.
- The Premium Support plan includes all the benefits in the enhanced support plan plus additional benefits, such as a dedicated technical account manager and Customer Aware Support. However, it costs $12,500 USD per month.
Conclusion
In this comparison piece, you reviewed four different OCR APIs: Mindee, Nanonets, Microsoft Computer Vision, and Google Cloud Vision. Overall, Microsoft Computer Vision performed well in all document categories listed, as did Google Cloud Vision. And while Mindee and Nanonets fell short when testing some of the documents, such as Document B and Document E, they return data in key-value pairs and provide lots of prebuilt document models for common documents.
If you're looking to quickly extract texts from common documents, such as receipts, invoices, and passports, Mindee and Nanonets are great choices, as they provide preexisting models for these kinds of documents. However, if you're going to be working with lower-quality images, Microsoft Computer Vision or Google Cloud Vision guarantees more accuracy.
OCR APIs can be combined with eSignature APIs, such as the Dropbox Sign API, to automate document-signing processes. Dropbox Sign is an eSignature API that enables seamless integration of eSignature functionality into your applications. It accelerates workflows that involve signatories by eliminating bottlenecks caused by traditional processes. Additionally, the streamlined integration process, supported by comprehensive documentation and out-of-the-box SDKs, enables you to move your solutions into production quickly, typically within days.
Stay in the loop
Thank you!
Thank you for subscribing!