Text Recognition Tool

Google Cloud Vision API

class layoutparser.ocr.GCVFeatureType[source]

Bases: layoutparser.ocr.BaseOCRElementType

The element types from Google Cloud Vision API

PAGE = 0
BLOCK = 1
PARA = 2
WORD = 3
SYMBOL = 4
property child_level
class layoutparser.ocr.GCVAgent(languages=None, ocr_image_decode_type='.png')[source]

Bases: layoutparser.ocr.BaseOCRAgent

A wrapper for Google Cloud Vision (GCV) Text Detection APIs.

Note

Google Cloud Vision API returns the output text in two types:

  • text_annotations:

    In this format, GCV automatically find the best aggregation level for the text, and return the results in a list. We use gather_text_annotations to reterive this type of information.

  • full_text_annotation:

    To support better user control, GCV also provides the full_text_annotation output, where it returns the hierarchical structure of the output text. To process this output, we provide the gather_full_text_annotation function to aggregate the texts of the given aggregation level.

Create a Google Cloud Vision OCR Agent.

Parameters
  • languages (list, optional) – You can specify the language code of the documents to detect to improve accuracy. The supported language and their code can be found on this page. Defaults to None.

  • ocr_image_decode_type (str, optional) –

    The format to convert the input image to before sending for GCV OCR. Defaults to “.png”.

    • ”.png” is suggested as it does not compress the image.

    • But “.jpg” could also be a good choice if the input image is very large.

DEPENDENCIES = ['google-cloud-vision']
MODULES = [{'import_name': '_vision', 'module_path': 'google.cloud.vision'}, {'import_name': '_json_format', 'module_path': 'google.protobuf.json_format'}]
classmethod with_credential(credential_path, **kwargs)[source]

Specifiy the credential to use for the GCV OCR API.

Parameters

credential_path (str) – The path to the credential file

detect(image, return_response=False, return_only_text=False, agg_output_level=None)[source]

Send the input image for OCR.

Parameters
  • image (np.ndarray or str) – The input image array or the name of the image file

  • return_response (bool, optional) – Whether directly return the google cloud response. Defaults to False.

  • return_only_text (bool, optional) – Whether return only the texts in the OCR results. Defaults to False.

  • agg_output_level (GCVFeatureType, optional) – When set, aggregate the GCV output with respect to the specified aggregation level. Defaults to None.

static gather_text_annotations(response)[source]

Convert the text_annotations from GCV output to an Layout object.

Parameters

response (AnnotateImageResponse) – The returned Google Cloud Vision AnnotateImageResponse object.

Returns

The reterived layout from the response.

Return type

Layout

static gather_full_text_annotation(response, agg_level)[source]

Convert the full_text_annotation from GCV output to an Layout object.

Parameters
  • response (AnnotateImageResponse) – The returned Google Cloud Vision AnnotateImageResponse object.

  • agg_level (GCVFeatureType) – The layout level to aggregate the text in full_text_annotation.

Returns

The reterived layout from the response.

Return type

Layout

load_response(filename)[source]
save_response(res, file_name)[source]

Tesseract OCR API

class layoutparser.ocr.TesseractFeatureType[source]

Bases: layoutparser.ocr.BaseOCRElementType

The element types for Tesseract Detection API

PAGE = 0
BLOCK = 1
PARA = 2
LINE = 3
WORD = 4
property group_levels
class layoutparser.ocr.TesseractAgent(languages='eng', **kwargs)[source]

Bases: layoutparser.ocr.BaseOCRAgent

A wrapper for Tesseract Text Detection APIs based on PyTesseract.

Create a Tesseract OCR Agent.

Parameters

languages (list or str, optional) – You can specify the language code(s) of the documents to detect to improve accuracy. The supported language and their code can be found on its github repo. It supports two formats: 1) you can pass in the languages code as a string of format like “eng+fra”, or 2) you can pack them as a list of strings [“eng”, “fra”]. Defaults to ‘eng’.

DEPENDENCIES = ['pytesseract']
MODULES = [{'import_name': '_pytesseract', 'module_path': 'pytesseract'}]
classmethod with_tesseract_executable(tesseract_cmd_path, **kwargs)[source]
detect(image, return_response=False, return_only_text=True, agg_output_level=None)[source]

Send the input image for OCR.

Parameters
  • image (np.ndarray or str) – The input image array or the name of the image file

  • return_response (bool, optional) – Whether directly return all output (string and boxes info) from Tesseract. Defaults to False.

  • return_only_text (bool, optional) – Whether return only the texts in the OCR results. Defaults to False.

  • agg_output_level (TesseractFeatureType, optional) – When set, aggregate the GCV output with respect to the specified aggregation level. Defaults to None.

static gather_data(response, agg_level)[source]

Gather the OCR’ed text, bounding boxes, and confidence in a given aggeragation level.

static load_response(filename)[source]
static save_response(res, file_name)[source]