Text Recognition Tool¶
Google Cloud Vision API¶
-
class
layoutparser.ocr.
GCVFeatureType
[source]¶ Bases:
layoutparser.ocr.base.BaseOCRElementType
The element types from Google Cloud Vision API
-
PAGE
= 0¶
-
BLOCK
= 1¶
-
PARA
= 2¶
-
WORD
= 3¶
-
SYMBOL
= 4¶
-
property
child_level
¶
-
-
class
layoutparser.ocr.
GCVAgent
(languages=None, ocr_image_decode_type='.png')[source]¶ Bases:
layoutparser.ocr.base.BaseOCRAgent
A wrapper for Google Cloud Vision (GCV) Text Detection APIs.
Note
Google Cloud Vision API returns the output text in two types:
text_annotations:
In this format, GCV automatically find the best aggregation level for the text, and return the results in a list. We use
gather_text_annotations
to reterive this type of information.full_text_annotation:
To support better user control, GCV also provides the full_text_annotation output, where it returns the hierarchical structure of the output text. To process this output, we provide the
gather_full_text_annotation
function to aggregate the texts of the given aggregation level.
Create a Google Cloud Vision OCR Agent.
- Parameters
languages (
list
, optional) – You can specify the language code of the documents to detect to improve accuracy. The supported language and their code can be found on this page. Defaults to None.ocr_image_decode_type (
str
, optional) –The format to convert the input image to before sending for GCV OCR. Defaults to “.png”.
”.png” is suggested as it does not compress the image.
But “.jpg” could also be a good choice if the input image is very large.
-
DEPENDENCIES
= ['google-cloud-vision']¶
-
classmethod
with_credential
(credential_path, **kwargs)[source]¶ Specifiy the credential to use for the GCV OCR API.
- Parameters
credential_path (
str
) – The path to the credential file
-
detect
(image, return_response=False, return_only_text=False, agg_output_level=None)[source]¶ Send the input image for OCR.
- Parameters
image (
np.ndarray
orstr
) – The input image array or the name of the image filereturn_response (
bool
, optional) – Whether directly return the google cloud response. Defaults to False.return_only_text (
bool
, optional) – Whether return only the texts in the OCR results. Defaults to False.agg_output_level (
GCVFeatureType
, optional) – When set, aggregate the GCV output with respect to the specified aggregation level. Defaults to None.
-
static
gather_text_annotations
(response)[source]¶ Convert the text_annotations from GCV output to an
Layout
object.- Parameters
response (
AnnotateImageResponse
) – The returned Google Cloud Vision AnnotateImageResponse object.- Returns
The reterived layout from the response.
- Return type
Layout
-
static
gather_full_text_annotation
(response, agg_level)[source]¶ Convert the full_text_annotation from GCV output to an
Layout
object.- Parameters
response (
AnnotateImageResponse
) – The returned Google Cloud Vision AnnotateImageResponse object.agg_level (
GCVFeatureType
) – The layout level to aggregate the text in full_text_annotation.
- Returns
The reterived layout from the response.
- Return type
Layout
Tesseract OCR API¶
-
class
layoutparser.ocr.
TesseractFeatureType
[source]¶ Bases:
layoutparser.ocr.base.BaseOCRElementType
The element types for Tesseract Detection API
-
PAGE
= 0¶
-
BLOCK
= 1¶
-
PARA
= 2¶
-
LINE
= 3¶
-
WORD
= 4¶
-
property
group_levels
¶
-
-
class
layoutparser.ocr.
TesseractAgent
(languages='eng', **kwargs)[source]¶ Bases:
layoutparser.ocr.base.BaseOCRAgent
A wrapper for Tesseract Text Detection APIs based on PyTesseract.
Create a Tesseract OCR Agent.
- Parameters
languages (
list
orstr
, optional) – You can specify the language code(s) of the documents to detect to improve accuracy. The supported language and their code can be found on its github repo. It supports two formats: 1) you can pass in the languages code as a string of format like “eng+fra”, or 2) you can pack them as a list of strings [“eng”, “fra”]. Defaults to ‘eng’.
-
DEPENDENCIES
= ['pytesseract']¶
-
detect
(image, return_response=False, return_only_text=True, agg_output_level=None)[source]¶ Send the input image for OCR.
- Parameters
image (
np.ndarray
orstr
) – The input image array or the name of the image filereturn_response (
bool
, optional) – Whether directly return all output (string and boxes info) from Tesseract. Defaults to False.return_only_text (
bool
, optional) – Whether return only the texts in the OCR results. Defaults to False.agg_output_level (
TesseractFeatureType
, optional) – When set, aggregate the GCV output with respect to the specified aggregation level. Defaults to None.