跳到主要内容
未找到任何项目。
dropboxsign 徽标
为什么选择 Dropbox Sign?
展开或折叠手风琴

您可以做什么

在线签署文档
创建电子签名
选择或创建模板
填写和签署 PDF
完成线上合同
文档管理
探索功能
向右箭头图标

用例

销售与业务拓展
人力资源
初创公司
金融科技
房地产
按需服务
产品
展开或折叠手风琴
Dropbox 图标
Sign
实现轻松发送和签字
Dropbox 图标
Sign API
将电子签名集成到您的工作流程中
dropbox fax 图标
Fax
无需传真机便能发送传真
dropbox 集成图标
集成
无缝集成助力您轻松工作
资源
展开或折叠手风琴
博客
工作流程专业知识和产品新闻
客户案例
讲述取得实际成果的真实故事
帮助中心
关于我们产品的深入指南
资源库
报告、视频和信息表
开发人员
定价
展开或折叠手风琴
Dropbox Sign 定价
找到适合您的套餐
Dropbox Sign API 定价
讲述取得实际成果的真实故事
与销售人员联系
注册
联系销售人员
登录
展开或折叠手风琴
Dropbox Sign
Dropbox Forms
Dropbox Fax
免费试用版
博客
/
开发人员

How to extract signatures from paper documents

by 
Gourav Bais
April 7, 2023
9
分钟阅读时间
How to Extract Signatures from Paper Documents using data capture
工具提示图标

外观全新但功能同样出色的产品!HelloSign 现为 Dropbox Sign。

关闭图标

In the machine learning (ML) era, everything from language generation to image processing is becoming automated. One emerging field is online document processing, which is used by banking, insurance, healthcare, and other industries to save the time and effort of manual data verification.

‍

ML technologies like intelligent character recognition (ICR) and natural language processing (NLP) are helping organizations to capture data from documents and process them without the risk of human error.

‍

Document processing isn’t limited to extracting text, though; it also involves images and signatures. In this tutorial, you will learn how to create a system that can extract document signatures.

‍

What Is signature extraction?

Signature extraction is the technique of automatically identifying the signatures in a scanned document and cropping them to use for different verification purposes. First, a signature must be detected and cropped out of the document. ML or computer vision models can extract that signature no matter how many times it is present in a document. Then, the signature can be used for validating the person’s identity, Know Your Customer (KYC) processing services, or contract and agreement processing.

‍

Banking and finance services especially rely on signatures to verify a person’s identity. As more businesses transition to online platforms, they’re also switching from manually handled verification tasks to signature extraction, which is becoming increasingly accurate. This helps save the time and effort of printing, scanning, emailing, and making changes to documents.

‍

The following are some use cases for signature extraction:

  • Banks: Banks rely on signature verification, where an extracted signature is validated against a ground truth to confirm that it’s from the same person.
  • Real estate: Buying and selling property requires a lot of paperwork. Once the contracts and legal documents are signed and shared across multiple cities or countries, they are normally converted to digital images. The signatures in those images can be extracted for verification.
  • Sales and procurement: Many consumers have already shifted from in-store to e-commerce purchases. Businesses that make purchases, however, must complete sales contracts and other paperwork.
  • Company onboarding: New hires, especially at larger organizations, will need to sign a letter of acceptance, offer letter, and/or nondisclosure agreement, among other documents.
  • Legal agreements: Signatures are required for documents in all types of legal proceedings, such as court cases or estate hearings.

Implementing signature extraction

A signature extraction system can be developed in two ways: traditional computer vision using OpenCV and object detection with deep learning. In this tutorial, you’ll be implementing the first solution using Python 3.9 and Anaconda.

‍

If you install the latest version of Anaconda, it comes with Python 3.9 and pip, Python’s package manager. It also includes platforms to run your code, like Jupyter Notebook and Spyder. While you can use any of these platforms to write the code, Spyder is preferred because it is more interactive.

‍

Once you have the dependencies set up, you can clone or download this project repository from Ahmet Özlü to follow along.

‍

You should find the following files/folders inside the project repository:

  • The `inputs` folder stores the input images that are passed to the model to extract signatures.
  • The `outputs` folder stores the extracted signatures, or the output images produced by the code.
  • The `signature_extractor.py` file contains the implementation of computer-vision-based connected component analysis. You’ll need to run this file to produce the output.

Install the OpenCV library and other dependencies for the task. You can do so using pip either on the Anaconda prompt or on any terminal provided by Anaconda:

‍

pip install opencv-python
pip install scikit-image

‍

Other libraries like Matplotlib and NumPy already come with Anaconda. If you run into issues, though, you can download them in the same fashion:

‍

pip install matplotlib
pip install numpy

‍

When you open `signature_extractor.py`, you’ll see a lot of code. To better understand the process of signature extraction using connected component analysis, and the meaning behind each code block, follow along with this article and create a new Python file.

‍

First, import the dependencies:

‍

import cv2
import matplotlib.pyplot as plt
from skimage import measure, morphology
from skimage.color import label2rgb
from skimage.measure import regionprops
import numpy as np

‍

Here, `cv2` (`OpenCV`) and `scikit-image` (a.k.a. `skimage`) libraries are used for overall image processing. numpy is used to expedite the mathematical operations applied on the data, and `matplotlib` is used to plot the images.

‍

Read the input image file from the local path and apply preprocessing that will help in the identification of the signature area:

‍

img = cv2.imread('./inputs/in1.jpg', 0)
img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)[1]

‍

Input image

‍

In the above code, first the input image is read from the local path, and the mode of the image is specified as `0`. This indicates that the image has one color channel; in other words, it’s a black and white or grayscale image. Then, binary thresholding is applied to the image. Binary thresholding is the process of converting image pixels to black or white given a threshold, in this case `127`. Pixel values lower than the threshold are converted to 0 (black), and values greater than or equal to the threshold are converted to 255 (white). The image generated is a binary image with two pixel values.

‍

Now that the image is ready, connected component analysis must be applied to detect the connected regions in the image. This helps in identifying the signature area, as signature characters are coupled together. `skimage` provides a function to do this:

‍

# connected component analysis by scikit-learn framework
blobs = img > img.mean()
blobs_labels = measure.label(blobs, background=1)
image_label_overlay = label2rgb(blobs_labels, image=img)

‍

A blob is a set of pixel values that generally distinguishes an object from its background. In this case, the text and signature are blobs on a background of white pixels. The first line of code identifies blobs whose size is greater than the image pixel average. The next line measures the size of each blob. Finally, the blob labels are converted to RGB and are overlaid on the original image for better visualization.

‍

You might want to see the RGB image after connected component analysis. You can do that with Matplotlib:

‍

# draw image
# fix the figure size to (10, 6)
fig, ax = plt.subplots(figsize=(10, 6))

# plot the connected components (for debugging)
ax.imshow(image_label_overlay)
ax.set_axis_off()
plt.tight_layout()
plt.show()

‍

Connected component analysis

‍

So far you’ve read the image, analyzed its components, and visualized it. Generally, a signature will be bigger than other text areas in a document, so you need to do some measurements. Using component analysis, find the biggest component among the blobs:

‍

# initialize the variables to get the biggest component
the_biggest_component = 0
total_area = 0
counter = 0
average = 0.0

# iterate over each blob and get the highest size component
for region in regionprops(blobs_labels):
    # if blob size is greater than 10 then add it to the total area
    if (region.area > 10):
        total_area = total_area + region.area
        counter = counter + 1

    # take regions with large enough areas and filter the highest component
    if (region.area >= 250):
        if (region.area > the_biggest_component):
                the_biggest_component = region.area

# calculate the average of the blob regions
average = (total_area/counter)
print("the_biggest_component: " + str(the_biggest_component))
print("average: " + str(average))

‍

The above code is a simple calculation to get the highest area component by iterating over each text blob. If its size is greater than `10`, add it to the variable `total_area` to compute the area average. If the blob size is greater than `250`, check if it’s greater than the previous element. If it is, replace it with the previous one; otherwise, keep it the same. Repeat until the highest area element is found.

‍

For sizes that this code uses, `10` is good for scanned images, because the smallest object is almost always around the same length. Meanwhile, the largest object in an image is generally the signature, which has a tested value of greater than `250`.

‍

Next, you need to filter out some outliers that might get confused with the signature blob:

‍

# the parameters are used to remove outliers of small size connected pixels
constant_parameter_1 = 84
constant_parameter_2 = 250
constant_parameter_3 = 100

# the parameter is used to remove outliers of large size connected pixels
constant_parameter_4 = 18

‍

The values in the code above were finalized after testing different sets of values for blob sizes to remove for signature extraction.

‍

For outlier removal, you need to define some thresholds. There are four parameters initialized above: three for small size outlier removal, and one for big size outlier removal. First, check the small size outliers to remove:

‍

# experimental-based ratio calculation, modify it for your cases
a4_small_size_outlier_constant = ((average/constant_parameter_1)*constant_parameter_2)+constant_parameter_3
print("a4_small_size_outlier_constant: " + str(a4_small_size_outlier_constant))

‍

Above, `a4_small_size_outlier_constant` is used as a threshold value to remove outlier connected pixels that are smaller than it in A4 size scanned documents.

‍

Similarly, check the big size outliers:

‍

# experimental-based ratio calculation, modify it for your cases
a4_big_size_outlier_constant = a4_small_size_outlier_constant*constant_parameter_4
print("a4_big_size_outlier_constant: " + str(a4_big_size_outlier_constant))

‍

Here, `a4_big_size_outlier_constant` is used as a threshold value to remove outlier connected pixels that are bigger than it in A4 size scanned documents.

‍

Once you have these components, you can use the `morphology` operation to remove the outliers from your blob collection. You can then store the image locally, and it should be close to the final result:

‍

# remove the connected pixels that are smaller than threshold a4_small_size_outlier_constant
pre_version = morphology.remove_small_objects(blobs_labels, a4_small_size_outlier_constant)
# remove the connected pixels that are bigger than threshold a4_big_size_outlier_constant
component_sizes = np.bincount(pre_version.ravel())
too_small = component_sizes > (a4_big_size_outlier_constant)
too_small_mask = too_small[pre_version]
pre_version[too_small_mask] = 0
# save the pre-version, which is the image with color labels after connected component analysis
plt.imsave('pre_version.png', pre_version)

‍

`pre_version.png` is the image obtained after all the preprocessing. As a final step, read this image again and apply Otsu’s thresholding:

‍

# read the pre-version
img = cv2.imread('pre_version.png', 0)
# ensure a binary image with Otsu’s method
img = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]

‍

To store the final image with signature only, use OpenCV’s write function to save the result:

‍

cv2.imwrite("./outputs/output.png", img)

‍

Output image

‍

Now your document signature extractor is ready.

‍

This signature extractor removes all other entities from the document and leaves only the signature area. If you want to extract the exact location of the signature, you’ll have to use the object detection technique.

‍

Note that you must have a decent amount of data to train the object detection model, which would normally be 200 images. You can follow the steps in this article to prepare your data and train the object detection model for signature extraction.

‍

The best way to extract signatures with Dropbox Sign

You should now have a better understanding of signature extraction and its use cases, as well as how to create a signature extractor. This process offers benefits to a number of industries, because it increases automation and speeds up document processing while reducing human error and freeing up team members to focus on other tasks.

‍

Instead of creating your own signature extractor, though, you can use a ready-made solution. One such application is Dropbox Sign. Its API allows you to sign and track eSignatures while still keeping those documents secure. Dropbox Sign easily integrates into your site or application for a seamless experience. To learn more, check out Dropbox Sign’s documentation.

保持更新

完成!请检查您的收件箱。

Thank you!
Thank you for subscribing!

Lorem ipsum

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Lorem ipsum
向右箭头图标
关闭图标

Up next:

手写签名的特写插图,代表现代数字签名解决方案。
开发人员
15
分钟阅读时间

将 Dropbox Sign 与 Ruby on Rails 集成:分步教程

手写签名的特写插图,代表现代数字签名解决方案。
开发人员
15
分钟阅读时间

Dropbox Sign vs. SignNow for developers

电子书

在疫情后时代培养有效的远程销售文化

产品
Dropbox SignDropbox Sign APIDropbox Fax集成
为什么选择 Dropbox Sign
电子签名签署文档签署和填写 PDF在线合同创建电子签名签名编辑器签署 Word 文档
支持
帮助中心与销售人员联系联系支持人员管理 Cookie开始使用:Dropbox Sign开始使用:Dropbox Sign API
资源
博客客户案例资源中心合法性指南信任中心
合作伙伴
战略合作伙伴合作伙伴查找工具
公司
招贤纳士条款隐私
Facebook 图标YouTube 图标

接受的付款方式

Mastercard 徽标Visa 徽标American Express 徽标Discover 徽标
CPA 合规标记HIPAA 合规标记Sky High Enterprise Ready 标记ISO 9001 认证标志

在美国、欧盟、英国和世界许多其他地区,Dropbox Sign 电子签名均具有法律约束力。
如需了解更多信息,请查看我们的条款和条件以及隐私政策