直接跳至主要內容
找不到項目。
dropbox sign 標誌
為何選擇 Dropbox Sign?
展開或收起手風琴

我們提供的功能

線上簽署文件
建立電子簽章
選擇或建立範本
填寫及簽署 PDF
完成線上合約
文件管理
探索功能
向右箭頭圖示

使用案例

銷售和業務開發
人力資源
新創公司
金融科技
房地產
隨需服務
產品
展開或收起手風琴
Dropbox 圖示
Sign
輕鬆進行簽名與傳送
Dropbox 圖示
Sign API
將 eSign 與既有工作流程整合
Dropbox Fax 圖示
Fax
不用傳真機就能傳真
Dropbox 整合圖示
整合
與您相約在工作地點見
資源
展開或收起手風琴
部落格
工作流程專業與產品新聞
客戶故事
具實績佐證的真實故事
說明中心
詳盡產品指南
資源庫
報告、影片和資訊表
開發人員
價格
展開或收起手風琴
Dropbox Sign 價格
找出符合您需求的方案
Dropbox Sign API 價格
具實績佐證的真實故事
聯絡銷售人員
註冊
聯絡銷售人員
登入
展開或收起手風琴
Dropbox Sign
Dropbox Forms
Dropbox Fax
免費試用
部落格
/
開發人員

How to extract signatures from paper documents

by 
Gourav Bais
April 7, 2023
9
分鐘閱讀時間
How to Extract Signatures from Paper Documents using data capture
工具提示的圖示

全新設計,優秀如初!HelloSign 現已更名為 Dropbox Sign。

關閉圖示

In the machine learning (ML) era, everything from language generation to image processing is becoming automated. One emerging field is online document processing, which is used by banking, insurance, healthcare, and other industries to save the time and effort of manual data verification.

‍

ML technologies like intelligent character recognition (ICR) and natural language processing (NLP) are helping organizations to capture data from documents and process them without the risk of human error.

‍

Document processing isn’t limited to extracting text, though; it also involves images and signatures. In this tutorial, you will learn how to create a system that can extract document signatures.

‍

What Is signature extraction?

Signature extraction is the technique of automatically identifying the signatures in a scanned document and cropping them to use for different verification purposes. First, a signature must be detected and cropped out of the document. ML or computer vision models can extract that signature no matter how many times it is present in a document. Then, the signature can be used for validating the person’s identity, Know Your Customer (KYC) processing services, or contract and agreement processing.

‍

Banking and finance services especially rely on signatures to verify a person’s identity. As more businesses transition to online platforms, they’re also switching from manually handled verification tasks to signature extraction, which is becoming increasingly accurate. This helps save the time and effort of printing, scanning, emailing, and making changes to documents.

‍

The following are some use cases for signature extraction:

  • Banks: Banks rely on signature verification, where an extracted signature is validated against a ground truth to confirm that it’s from the same person.
  • Real estate: Buying and selling property requires a lot of paperwork. Once the contracts and legal documents are signed and shared across multiple cities or countries, they are normally converted to digital images. The signatures in those images can be extracted for verification.
  • Sales and procurement: Many consumers have already shifted from in-store to e-commerce purchases. Businesses that make purchases, however, must complete sales contracts and other paperwork.
  • Company onboarding: New hires, especially at larger organizations, will need to sign a letter of acceptance, offer letter, and/or nondisclosure agreement, among other documents.
  • Legal agreements: Signatures are required for documents in all types of legal proceedings, such as court cases or estate hearings.

Implementing signature extraction

A signature extraction system can be developed in two ways: traditional computer vision using OpenCV and object detection with deep learning. In this tutorial, you’ll be implementing the first solution using Python 3.9 and Anaconda.

‍

If you install the latest version of Anaconda, it comes with Python 3.9 and pip, Python’s package manager. It also includes platforms to run your code, like Jupyter Notebook and Spyder. While you can use any of these platforms to write the code, Spyder is preferred because it is more interactive.

‍

Once you have the dependencies set up, you can clone or download this project repository from Ahmet Özlü to follow along.

‍

You should find the following files/folders inside the project repository:

  • The `inputs` folder stores the input images that are passed to the model to extract signatures.
  • The `outputs` folder stores the extracted signatures, or the output images produced by the code.
  • The `signature_extractor.py` file contains the implementation of computer-vision-based connected component analysis. You’ll need to run this file to produce the output.

Install the OpenCV library and other dependencies for the task. You can do so using pip either on the Anaconda prompt or on any terminal provided by Anaconda:

‍

pip install opencv-python
pip install scikit-image

‍

Other libraries like Matplotlib and NumPy already come with Anaconda. If you run into issues, though, you can download them in the same fashion:

‍

pip install matplotlib
pip install numpy

‍

When you open `signature_extractor.py`, you’ll see a lot of code. To better understand the process of signature extraction using connected component analysis, and the meaning behind each code block, follow along with this article and create a new Python file.

‍

First, import the dependencies:

‍

import cv2
import matplotlib.pyplot as plt
from skimage import measure, morphology
from skimage.color import label2rgb
from skimage.measure import regionprops
import numpy as np

‍

Here, `cv2` (`OpenCV`) and `scikit-image` (a.k.a. `skimage`) libraries are used for overall image processing. numpy is used to expedite the mathematical operations applied on the data, and `matplotlib` is used to plot the images.

‍

Read the input image file from the local path and apply preprocessing that will help in the identification of the signature area:

‍

img = cv2.imread('./inputs/in1.jpg', 0)
img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)[1]

‍

Input image

‍

In the above code, first the input image is read from the local path, and the mode of the image is specified as `0`. This indicates that the image has one color channel; in other words, it’s a black and white or grayscale image. Then, binary thresholding is applied to the image. Binary thresholding is the process of converting image pixels to black or white given a threshold, in this case `127`. Pixel values lower than the threshold are converted to 0 (black), and values greater than or equal to the threshold are converted to 255 (white). The image generated is a binary image with two pixel values.

‍

Now that the image is ready, connected component analysis must be applied to detect the connected regions in the image. This helps in identifying the signature area, as signature characters are coupled together. `skimage` provides a function to do this:

‍

# connected component analysis by scikit-learn framework
blobs = img > img.mean()
blobs_labels = measure.label(blobs, background=1)
image_label_overlay = label2rgb(blobs_labels, image=img)

‍

A blob is a set of pixel values that generally distinguishes an object from its background. In this case, the text and signature are blobs on a background of white pixels. The first line of code identifies blobs whose size is greater than the image pixel average. The next line measures the size of each blob. Finally, the blob labels are converted to RGB and are overlaid on the original image for better visualization.

‍

You might want to see the RGB image after connected component analysis. You can do that with Matplotlib:

‍

# draw image
# fix the figure size to (10, 6)
fig, ax = plt.subplots(figsize=(10, 6))

# plot the connected components (for debugging)
ax.imshow(image_label_overlay)
ax.set_axis_off()
plt.tight_layout()
plt.show()

‍

Connected component analysis

‍

So far you’ve read the image, analyzed its components, and visualized it. Generally, a signature will be bigger than other text areas in a document, so you need to do some measurements. Using component analysis, find the biggest component among the blobs:

‍

# initialize the variables to get the biggest component
the_biggest_component = 0
total_area = 0
counter = 0
average = 0.0

# iterate over each blob and get the highest size component
for region in regionprops(blobs_labels):
    # if blob size is greater than 10 then add it to the total area
    if (region.area > 10):
        total_area = total_area + region.area
        counter = counter + 1

    # take regions with large enough areas and filter the highest component
    if (region.area >= 250):
        if (region.area > the_biggest_component):
                the_biggest_component = region.area

# calculate the average of the blob regions
average = (total_area/counter)
print("the_biggest_component: " + str(the_biggest_component))
print("average: " + str(average))

‍

The above code is a simple calculation to get the highest area component by iterating over each text blob. If its size is greater than `10`, add it to the variable `total_area` to compute the area average. If the blob size is greater than `250`, check if it’s greater than the previous element. If it is, replace it with the previous one; otherwise, keep it the same. Repeat until the highest area element is found.

‍

For sizes that this code uses, `10` is good for scanned images, because the smallest object is almost always around the same length. Meanwhile, the largest object in an image is generally the signature, which has a tested value of greater than `250`.

‍

Next, you need to filter out some outliers that might get confused with the signature blob:

‍

# the parameters are used to remove outliers of small size connected pixels
constant_parameter_1 = 84
constant_parameter_2 = 250
constant_parameter_3 = 100

# the parameter is used to remove outliers of large size connected pixels
constant_parameter_4 = 18

‍

The values in the code above were finalized after testing different sets of values for blob sizes to remove for signature extraction.

‍

For outlier removal, you need to define some thresholds. There are four parameters initialized above: three for small size outlier removal, and one for big size outlier removal. First, check the small size outliers to remove:

‍

# experimental-based ratio calculation, modify it for your cases
a4_small_size_outlier_constant = ((average/constant_parameter_1)*constant_parameter_2)+constant_parameter_3
print("a4_small_size_outlier_constant: " + str(a4_small_size_outlier_constant))

‍

Above, `a4_small_size_outlier_constant` is used as a threshold value to remove outlier connected pixels that are smaller than it in A4 size scanned documents.

‍

Similarly, check the big size outliers:

‍

# experimental-based ratio calculation, modify it for your cases
a4_big_size_outlier_constant = a4_small_size_outlier_constant*constant_parameter_4
print("a4_big_size_outlier_constant: " + str(a4_big_size_outlier_constant))

‍

Here, `a4_big_size_outlier_constant` is used as a threshold value to remove outlier connected pixels that are bigger than it in A4 size scanned documents.

‍

Once you have these components, you can use the `morphology` operation to remove the outliers from your blob collection. You can then store the image locally, and it should be close to the final result:

‍

# remove the connected pixels that are smaller than threshold a4_small_size_outlier_constant
pre_version = morphology.remove_small_objects(blobs_labels, a4_small_size_outlier_constant)
# remove the connected pixels that are bigger than threshold a4_big_size_outlier_constant
component_sizes = np.bincount(pre_version.ravel())
too_small = component_sizes > (a4_big_size_outlier_constant)
too_small_mask = too_small[pre_version]
pre_version[too_small_mask] = 0
# save the pre-version, which is the image with color labels after connected component analysis
plt.imsave('pre_version.png', pre_version)

‍

`pre_version.png` is the image obtained after all the preprocessing. As a final step, read this image again and apply Otsu’s thresholding:

‍

# read the pre-version
img = cv2.imread('pre_version.png', 0)
# ensure a binary image with Otsu’s method
img = cv2.threshold(img, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]

‍

To store the final image with signature only, use OpenCV’s write function to save the result:

‍

cv2.imwrite("./outputs/output.png", img)

‍

Output image

‍

Now your document signature extractor is ready.

‍

This signature extractor removes all other entities from the document and leaves only the signature area. If you want to extract the exact location of the signature, you’ll have to use the object detection technique.

‍

Note that you must have a decent amount of data to train the object detection model, which would normally be 200 images. You can follow the steps in this article to prepare your data and train the object detection model for signature extraction.

‍

The best way to extract signatures with Dropbox Sign

You should now have a better understanding of signature extraction and its use cases, as well as how to create a signature extractor. This process offers benefits to a number of industries, because it increases automation and speeds up document processing while reducing human error and freeing up team members to focus on other tasks.

‍

Instead of creating your own signature extractor, though, you can use a ready-made solution. One such application is Dropbox Sign. Its API allows you to sign and track eSignatures while still keeping those documents secure. Dropbox Sign easily integrates into your site or application for a seamless experience. To learn more, check out Dropbox Sign’s documentation.

時時參與其中

完成!請查看您的收件匣。

Thank you!
Thank you for subscribing!

Lorem ipsum

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat. Aenean faucibus nibh et justo cursus id rutrum lorem imperdiet. Nunc ut sem vitae risus tristique posuere.

Lorem ipsum
向右箭頭圖示
關閉圖示

Up next:

特寫插圖:代表現代數位簽署解決方案的手寫簽名。
開發人員
15
分鐘閱讀時間

將 Dropbox Sign 與 Ruby on Rails 整合:逐步教學

特寫插圖:代表現代數位簽署解決方案的手寫簽名。
開發人員
15
分鐘閱讀時間

Dropbox Sign vs. SignNow for developers

電子書

您的醫療照護電子簽章指南

產品
Dropbox SignDropbox Sign APIDropbox Fax整合
為何選擇 Dropbox Sign
電子簽章簽署文件簽署及填寫 PDF線上合約建立電子簽章簽名編輯工具簽署 Word 文件
支援服務
說明中心聯絡銷售人員聯絡支援團隊管理 Cookie開始使用:Dropbox Sign開始使用:Dropbox Sign API
資源
部落格客戶故事資源中心合法性指南信賴中心
合作夥伴
策略合作夥伴合作夥伴搜尋工具
公司
職涯條款隱私權
Facebook 圖示Youtube 圖示

可接受的付款方式

萬事達卡標誌Visa 卡標誌美國運通卡標誌Discover 標誌
CPA 法規遵循標章HIPAA 法規遵循標章Sky High Enterprise ready 標章ISO 9001 認證標章

Dropbox Sign 電子簽名在美國、歐盟地區、英國和世界上許多國家均已具備法律約束力。
詳情請參閱我們的條款與條件以及隱私權政策