Welcome to Layout Parser’s documentation!¶
Installation¶
Install Python¶
LayoutParser is a Python package that requires Python >= 3.6. If you do not have Python installed on your computer, you might want to turn to the official instruction to download and install the appropriate version of Python.
Install the LayoutParser library¶
After several major updates, LayoutParser provides various functionalities and deep learning models from different backends. However, you might only need a fraction of the functions, and it would be redundant for you to install all the dependencies when they are not required. Therefore, we design highly customizable ways for installing the LayoutParser library:
Command | Description |
---|---|
pip install layoutparser |
Install the base LayoutParser Library It will support all key functions in LayoutParser, including: 1. Layout Data Structure and operations 2. Layout Visualization 3. Load/export the layout data |
pip install "layoutparser[effdet]" |
Install LayoutParser with Layout Detection Model Support It will install the LayoutParser base library as well as supporting dependencies for the EfficientDet-based layout detection models. |
pip install layoutparser torchvision && pip install "git+https://github.com/facebookresearch/detectron2.git@v0.5#egg=detectron2" |
Install LayoutParser with Layout Detection Model Support It will install the LayoutParser base library as well as supporting dependencies for the Detectron2-based layout detection models. See details in Additional Instruction: Install Detectron2 Layout Model Backend. |
pip install "layoutparser[paddledetection]" |
Install LayoutParser with Layout Detection Model Support It will install the LayoutParser base library as well as supporting dependencies for the PaddleDetection-based layout detection models. |
pip install "layoutparser[ocr]" |
Install LayoutParser with OCR Support It will install the LayoutParser base library as well as supporting dependencies for performing OCRs. See details in Additional Instruction: Install OCR utils. |
Additional Instruction: Install Detectron2 Layout Model Backend¶
For Mac OS and Linux Users¶
If you would like to use the Detectron2 models for layout detection, you might need to run the following command:
pip install layoutparser torchvision && pip install "detectron2@git+https://github.com/facebookresearch/detectron2.git@v0.5#egg=detectron2"
This might take some time as the command will compile the library. If you also want to install a Detectron2 version with GPU support or encounter some issues during the installation process, please refer to the official Detectron2 installation instruction for detailed information.
For Windows users¶
As reported by many users, the installation of Detectron2 can be rather tricky on Windows platforms. In our extensive tests, we find that it is nearly impossible to provide a one-line installation command for Windows users. As a workaround solution, for now we list the possible challenges for installing Detectron2 on Windows, and attach helpful resources for solving them. We are also investigating other possibilities to avoid installing Detectron2 to use pre-trained models. If you have any suggestions or ideas, please feel free to submit an issue in our repo.
Challenges for installing
pycocotools
You can find detailed instructions on this post from Chang Hsin Lee.
Another solution is try to install
pycocotools-windows
, see https://github.com/cocodataset/cocoapi/issues/415.
Challenges for installing
Detectron2
@ivanpp curates a detailed description for installing
Detectron2
on Windows: Detectron2 walkthrough (Windows)Detectron2
maintainers claim that they won’t provide official support for Windows (see 1 and 2), but Detectron2 is continuously built on windows with CircleCI (see 3). Hopefully this situation will be improved in the future.
Additional Instructions: Install OCR utils¶
Layout Parser also comes with supports for OCR functions. In order to use them, you need to install the OCR utils via:
pip install "layoutparser[ocr]"
Additionally, if you want to use the Tesseract-OCR engine, you also need to install it on your computer. Please check the official documentation for detailed installation instructions.
Known issues¶
Error: instantiating `lp.GCVAgent.with_credential` returns module 'google.cloud.vision' has no attribute 'types'.
In this case, you have a newer version of the google-cloud-vision. Please consider downgrading the API using:
pip install -U layoutparser[ocr]
Model Zoo¶
We provide a spectrum of pre-trained models on different datasets.
Example Usage:¶
import layoutparser as lp
model = lp.Detectron2LayoutModel(
config_path ='lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config', # In model catalog
label_map ={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"}, # In model`label_map`
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8] # Optional
)
model.detect(image)
Model Catalog¶
Dataset | Model | Config Path | Eval Result (mAP) |
---|---|---|---|
HJDataset | faster_rcnn_R_50_FPN_3x | lp://HJDataset/faster_rcnn_R_50_FPN_3x/config | |
HJDataset | mask_rcnn_R_50_FPN_3x | lp://HJDataset/mask_rcnn_R_50_FPN_3x/config | |
HJDataset | retinanet_R_50_FPN_3x | lp://HJDataset/retinanet_R_50_FPN_3x/config | |
PubLayNet | faster_rcnn_R_50_FPN_3x | lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config | |
PubLayNet | mask_rcnn_R_50_FPN_3x | lp://PubLayNet/mask_rcnn_R_50_FPN_3x/config | |
PubLayNet | mask_rcnn_X_101_32x8d_FPN_3x | lp://PubLayNet/mask_rcnn_X_101_32x8d_FPN_3x/config | 88.98 eval.csv |
PrimaLayout | mask_rcnn_R_50_FPN_3x | lp://PrimaLayout/mask_rcnn_R_50_FPN_3x/config | 69.35 eval.csv |
NewspaperNavigator | faster_rcnn_R_50_FPN_3x | lp://NewspaperNavigator/faster_rcnn_R_50_FPN_3x/config | |
TableBank | faster_rcnn_R_50_FPN_3x | lp://TableBank/faster_rcnn_R_50_FPN_3x/config | 89.78 eval.csv |
TableBank | faster_rcnn_R_101_FPN_3x | lp://TableBank/faster_rcnn_R_101_FPN_3x/config | 91.26 eval.csv |
Math Formula Detection(MFD) | faster_rcnn_R_50_FPN_3x | lp://MFD/faster_rcnn_R_50_FPN_3x/config | 79.68 eval.csv |
For PubLayNet models, we suggest using
mask_rcnn_X_101_32x8d_FPN_3x
model as it’s trained on the whole training set, while others are only trained on the validation set (the size is only around 1/50). You could expect a 15% AP improvement using themask_rcnn_X_101_32x8d_FPN_3x
model.
Model label_map
¶
Dataset | Label Map |
---|---|
HJDataset | {1:"Page Frame", 2:"Row", 3:"Title Region", 4:"Text Region", 5:"Title", 6:"Subtitle", 7:"Other"} |
PubLayNet | {0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"} |
PrimaLayout | {1:"TextRegion", 2:"ImageRegion", 3:"TableRegion", 4:"MathsRegion", 5:"SeparatorRegion", 6:"OtherRegion"} |
NewspaperNavigator | {0: "Photograph", 1: "Illustration", 2: "Map", 3: "Comics/Cartoon", 4: "Editorial Cartoon", 5: "Headline", 6: "Advertisement"} |
TableBank | {0: "Table"} |
MFD | {1: "Equation"} |
OCR tables and parse the output¶
In this tutorial, we will illustrate how easily the layoutparser
APIs can be used for
Recognizing texts in images and store the results with the specified OCR engine
Postprocessing of the textual results to create structured data
import layoutparser as lp
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd
import numpy as np
import cv2
Initiate GCV OCR engine and check the image¶
Currently, layoutparser
supports two types of OCR engines: Google
Cloud Vision and Tesseract OCR engine. And we are going to provide more
support in the future. In this toturial, we will use the Google Cloud
Vision engine as an example.
ocr_agent = lp.GCVAgent.with_credential("<path/to/your/credential>",
languages = ['en'])
The language_hints
tells the GCV which langeuage shall be used for
OCRing. For a detailed explanation, please check
here.
The example-table
is a scan with complicated table structures from
https://stacks.cdc.gov/view/cdc/42482/.
image = cv2.imread('data/example-table.jpeg')
plt.imshow(image);

Load images and send for OCR¶
The ocr_agent.detect
method can take the image array, or simply the
path of the image, for OCR. By default it will return the text in the
image, i.e., text = ocr_agent.detect(image)
.
However, as the layout is complex, the text information is not enough:
we would like to directly analyze the response from GCV Engine. We can
set the return_response
to True
. This feature is also supported
for other OCR Engines like TesseractOCRAgent
.
res = ocr_agent.detect(image, return_response=True)
# Alternative
# res = ocr_agent.detect('data/example-table.jpeg', return_response=True)
Parse the OCR output and visualize the layout¶
As defined by GCV, there are two different types of output in the response:
text_annotations:
In this format, GCV automatically find the best aggregation level for the text, and return the results in a list. We canuse theocr_agent.gather_text_annotations
to reterive this type of information.full_text_annotations
To support better user control, GCV also provides the
full_text_annotation
output, where it returns the hierarchical structure of the output text. To process this output, we provide theocr_agent.gather_full_text_annotation
function to aggregate the texts of the given aggregation level.There are 5 levels specified in
GCVFeatureType
, namely:PAGE
,BLOCK
,PARA
,WORD
,SYMBOL
.
texts = ocr_agent.gather_text_annotations(res)
# collect all the texts without coordinates
layout = ocr_agent.gather_full_text_annotation(res, agg_level=lp.GCVFeatureType.WORD)
# collect all the layout elements of the `WORD` level
And we can use the draw_box
or draw_text
functions to quickly
visualize the detected layout and text information.
These functions are highly customizable. You can change styles of the drawn boxes and texts easily. Please check the documentation for the detailed explanation of the configurable parameters.
As shown below, the draw_text
function generates a visualization
that:
it draws the detected layout with text on the left side and shows the original image on the right canvas for comparison.
on the text canvas (left), it also draws a red bounding box for each text region.
lp.draw_text(image, layout, font_size=12, with_box_on_text=True,
text_box_width=1)

Filter the returned text blocks¶
We find the coordinates of residence column are in the range of
\(y\in(300,833)\) and \(x\in(132, 264)\). The
layout.filter_by
function can be used to fetch the texts in the
region.
Note: As the OCR engine usually does not provide advanced functions like table detection, the coordinates are found manually by using some image inspecting tools like GIMP.
filtered_residence = layout.filter_by(
lp.Rectangle(x_1=132, y_1=300, x_2=264, y_2=840)
)
lp.draw_text(image, filtered_residence, font_size=16)

And similarily, we can do that for the lot_number
column. As
sometimes there could be irregularities in the layout as well as the OCR
outputs, the layout.filter_by
function also supports a
soft_margin
argument to handle this issue and generate more robust
outputs.
filter_lotno = layout.filter_by(
lp.Rectangle(x_1=810, y_1=300, x_2=910, y_2=840),
soft_margin = {"left":10, "right":20} # Without it, the last 4 rows could not be included
)
lp.draw_text(image, filter_lotno, font_size=16)

Group Rows based on hard-coded parameteres¶
As there are 13 rows, we can iterate the rows and fetch the row-based information:
y_0 = 307
n_rows = 13
height = 41
y_1 = y_0+n_rows*height
row = []
for y in range(y_0, y_1, height):
interval = lp.Interval(y,y+height, axis='y')
residence_row = filtered_residence.\
filter_by(interval).\
get_texts()
lotno_row = filter_lotno.\
filter_by(interval).\
get_texts()
row.append([''.join(residence_row), ''.join(lotno_row)])
row
[['LosAngeles', 'E6037'],
['LosAngeles', 'E6037'],
['LosAngeles', 'E6037'],
['Oakland', '?'],
['Riverside', 'E5928'],
['LosAngeles', 'E6037'],
['LongBeach', '?E6038'],
['LongBeach', '11'],
['Maricopa', '?E5928'],
['FallsChurch', '8122-649334'],
['ChaseCity', '8122-64933?'],
['Houston', '7078-649343'],
['Scott', '7078-649342']]
An Alternative Method - Adaptive Grouping lines based on distances¶
blocks = filter_lotno
blocks = sorted(blocks, key = lambda x: x.coordinates[1])
# Sort the blocks vertically from top to bottom
distances = np.array([b2.coordinates[1] - b1.coordinates[3] for (b1, b2) in zip(blocks, blocks[1:])])
# Calculate the distances:
# y coord for the upper edge of the bottom block -
# y coord for the bottom edge of the upper block
# And convert to np array for easier post processing
plt.hist(distances, bins=50);
plt.axvline(x=3, color='r');
# Let's have some visualization

According to the distance distribution plot, as well as the OCR results visualization, we can conclude:
For the negative distances, it’s because there are texts in the same line, e.g., “Los Angeles”
For the small distances (indicated by the red line in the figure), they are texts in the same table row as the previous one
For larger distances, they are generated from texts pairs of different rows
distance_th = 0
distances = np.append([0], distances) # Append a placeholder for the first word
block_group = (distances>distance_th).cumsum() # Create a block_group based on the distance threshold
block_group
array([ 0, 1, 2, 3, 4, 5, 6, 6, 7, 7, 8, 9, 9, 10, 11, 11, 12,
13])
# Group the blocks by the block_group mask
grouped_blocks = [[] for i in range(max(block_group)+1)]
for i, block in zip(block_group, blocks):
grouped_blocks[i].append(block)
Finally let’s create a function for them
def group_blocks_by_distance(blocks, distance_th):
blocks = sorted(blocks, key = lambda x: x.coordinates[1])
distances = np.array([b2.coordinates[1] - b1.coordinates[3] for (b1, b2) in zip(blocks, blocks[1:])])
distances = np.append([0], distances)
block_group = (distances>distance_th).cumsum()
grouped_blocks = [lp.Layout([]) for i in range(max(block_group)+1)]
for i, block in zip(block_group, blocks):
grouped_blocks[i].append(block)
return grouped_blocks
A = group_blocks_by_distance(filtered_residence, 5)
B = group_blocks_by_distance(filter_lotno, 10)
# And finally we combine the outputs
height_th = 30
idxA, idxB = 0, 0
result = []
while idxA < len(A) and idxB < len(B):
ay = A[idxA][0].coordinates[1]
by = B[idxB][0].coordinates[1]
ares, bres = ''.join(A[idxA].get_texts()), ''.join(B[idxB].get_texts())
if abs(ay - by) < height_th:
idxA += 1; idxB += 1
elif ay < by:
idxA += 1; bres = ''
else:
idxB += 1; ares = ''
result.append([ares, bres])
result
[['LosAngeles', 'E6037'],
['AngelesLos', 'E6037'],
['LosAngeles', 'E6037'],
['Oakland', '?'],
['RiversideCoLosAngeles', 'E5928'],
['', 'E6037'],
['BeachLong', '?E6038?E597211'],
['BeachLong', ''],
['Maricopa', '?E5928'],
['FallsChurch', '8122-649334'],
['ChaseCity', '8122-64933?'],
['Houston', '7078-649343'],
['Scott', '7078-649342']]
As we can find, there are mistakes in the 5th and 6h row -
Riverside Co
and LosAngeles
are wrongly combined. This is
because the extra row co
disrupted the row segmentation algorithm.
Save the results as a table¶
df = pd.DataFrame(row, columns=['residence', 'lot no'])
df
residence | lot no | |
---|---|---|
0 | LosAngeles | E6037 |
1 | LosAngeles | E6037 |
2 | LosAngeles | E6037 |
3 | Oakland | ? |
4 | Riverside | E5928 |
5 | LosAngeles | E6037 |
6 | LongBeach | ?E6038 |
7 | LongBeach | 11 |
8 | Maricopa | ?E5928 |
9 | FallsChurch | 8122-649334 |
10 | ChaseCity | 8122-64933? |
11 | Houston | 7078-649343 |
12 | Scott | 7078-649342 |
df.to_csv('./data/ocred-example-table.csv', index=None)
Deep Layout Parsing¶
In this tutorial, we will show how to use the layoutparser
API to
Load Deep Learning Layout Detection models and predict the layout of the paper image
Use the coordinate system to parse the output
The paper-image
is from https://arxiv.org/abs/2004.08686.
import layoutparser as lp
import cv2
Use Layout Models to detect complex layout¶
layoutparser
can identify the layout of the given document with only
4 lines of code.
image = cv2.imread("data/paper-image.jpg")
image = image[..., ::-1]
# Convert the image from BGR (cv2 default loading style)
# to RGB
model = lp.Detectron2LayoutModel('lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config',
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
label_map={0: "Text", 1: "Title", 2: "List", 3:"Table", 4:"Figure"})
# Load the deep layout model from the layoutparser API
# For all the supported model, please check the Model
# Zoo Page: https://layout-parser.readthedocs.io/en/latest/notes/modelzoo.html
layout = model.detect(image)
# Detect the layout of the input image
lp.draw_box(image, layout, box_width=3)
# Show the detected layout of the input image

Check the results from the model¶
type(layout)
layoutparser.elements.Layout
The layout
variables is a Layout
instance, which is inherited
from list and supports handy methods for layout processing.
layout[0]
TextBlock(block=Rectangle(x_1=646.4182739257812, y_1=1420.1715087890625, x_2=1132.8687744140625, y_2=1479.7222900390625), text=, id=None, type=Text, parent=None, next=None, score=0.9996440410614014)
layout
contains a series of TextBlock
s. They store the
coordinates in the .block
variable and other information of the
blocks like block type in .type
, text in .text
, etc. More
information can be found at the
documentation.
Use the coordinate system to process the detected layout¶
Firstly we filter text region of specific type:
text_blocks = lp.Layout([b for b in layout if b.type=='Text'])
figure_blocks = lp.Layout([b for b in layout if b.type=='Figure'])
As there could be text region detected inside the figure region, we just drop them:
text_blocks = lp.Layout([b for b in text_blocks \
if not any(b.is_in(b_fig) for b_fig in figure_blocks)])
Finally sort the text regions and assign ids:
h, w = image.shape[:2]
left_interval = lp.Interval(0, w/2*1.05, axis='x').put_on_canvas(image)
left_blocks = text_blocks.filter_by(left_interval, center=True)
left_blocks.sort(key = lambda b:b.coordinates[1], inplace=True)
right_blocks = [b for b in text_blocks if b not in left_blocks]
right_blocks.sort(key = lambda b:b.coordinates[1], inplace=True)
# And finally combine the two list and add the index
# according to the order
text_blocks = lp.Layout([b.set(id = idx) for idx, b in enumerate(left_blocks + right_blocks)])
Visualize the cleaned text blocks:
lp.draw_box(image, text_blocks,
box_width=3,
show_element_id=True)

Fetch the text inside each text region¶
We can also combine with the OCR functionality in layoutparser
to
fetch the text in the document.
ocr_agent = lp.TesseractAgent(languages='eng')
# Initialize the tesseract ocr engine. You might need
# to install the OCR components in layoutparser:
# pip install layoutparser[ocr]
for block in text_blocks:
segment_image = (block
.pad(left=5, right=5, top=5, bottom=5)
.crop_image(image))
# add padding in each image segment can help
# improve robustness
text = ocr_agent.detect(segment_image)
block.set(text=text, inplace=True)
for txt in text_blocks.get_texts():
print(txt, end='\n---\n')
Figure 7: Annotation Examples in HJDataset. (a) and (b) show two examples for the labeling of main pages. The boxes
are colored differently to reflect the layout element categories. Illustrated in (c), the items in each index page row are
categorized as title blocks, and the annotations are denser.
---
tion over union (IOU) level [0.50:0.95]’, on the test data. In
general, the high mAP values indicate accurate detection of
the layout elements. The Faster R-CNN and Mask R-CNN
achieve comparable results, better than RetinaNet. Notice-
ably, the detections for small blocks like title are less pre-
cise, and the accuracy drops sharply for the title category. In
Figure 8, (a) and (b) illustrate the accurate prediction results
of the Faster R-CNN model.
---
We also examine how our dataset can help with
world document digitization application. When digitizing
new publications, researchers usually do not generate large
scale ground truth data to train their layout analysis models.
If they are able to adapt our dataset, or models trained on
our dataset, to develop models on their data, they can build
their pipelines more efficiently and develop more accurate
models. To this end, we conduct two experiments. First we
examine how layout analysis models trained on the main
pages can be used for understanding index pages. More-
over, we study how the pre-trained models perform on other
historical Japanese documents.
---
Table 4 compares the performance of five Faster R-CNN
models that are trained differently on index pages. If the
model loads pre-trained weights from HJDataset, it includes
information learned from main pages. Models trained over
---
?This is a core metric developed for the COCO competition [| 2] for
evaluating the object detection quality.
---
all the training data can be viewed as the benchmarks, while
training with few samples (five in this case) are consid-
ered to mimic real-world scenarios. Given different train-
ing data, models pre-trained on HJDataset perform signifi-
cantly better than those initialized with COCO weights. In-
tuitively, models trained on more data perform better than
those with fewer samples. We also directly use the model
trained on main to predict index pages without fine-
tuning. The low zero-shot prediction accuracy indicates the
dissimilarity between index and main pages. The large
increase in mAP from 0.344 to 0.471 after the model is
---
Table 3: Detection mAP @ IOU [0.50:0.95] of different
models for each category on the test set. All values are given
as percentages.
---
* For training Mask R-CNN, the segmentation masks are the quadri-
lateral regions for each block. Compared to the rectangular bounding
boxes, they delineate the text region more accurately.
---
Load COCO Layout Annotations¶
Preparation¶
In this notebook, I will illustrate how to use LayoutParser to load and visualize the layout annotation in the COCO format.
Before starting, please remember to download PubLayNet annotations and
images from their
website
(let’s just use the validation set for now as the training set is very
large). And let’s put all extracted files in the
data/publaynet/annotations
and data/publaynet/val
folder.
And we need to install an additional library for conveniently handling the COCO data format:
pip install pycocotools
OK - Let’s get on the code:
Loading and visualizing layouts using Layout-Parser¶
from pycocotools.coco import COCO
import layoutparser as lp
import random
import cv2
def load_coco_annotations(annotations, coco=None):
"""
Args:
annotations (List):
a list of coco annotaions for the current image
coco (`optional`, defaults to `False`):
COCO annotation object instance. If set, this function will
convert the loaded annotation category ids to category names
set in COCO.categories
"""
layout = lp.Layout()
for ele in annotations:
x, y, w, h = ele['bbox']
layout.append(
lp.TextBlock(
block = lp.Rectangle(x, y, w+x, h+y),
type = ele['category_id'] if coco is None else coco.cats[ele['category_id']]['name'],
id = ele['id']
)
)
return layout
The load_coco_annotations
function will help convert COCO
annotations into the layoutparser objects.
COCO_ANNO_PATH = 'data/publaynet/annotations/val.json'
COCO_IMG_PATH = 'data/publaynet/val'
coco = COCO(COCO_ANNO_PATH)
loading annotations into memory...
Done (t=1.17s)
creating index...
index created!
color_map = {
'text': 'red',
'title': 'blue',
'list': 'green',
'table': 'purple',
'figure': 'pink',
}
for image_id in random.sample(coco.imgs.keys(), 1):
image_info = coco.imgs[image_id]
annotations = coco.loadAnns(coco.getAnnIds([image_id]))
image = cv2.imread(f'{COCO_IMG_PATH}/{image_info["file_name"]}')
layout = load_coco_annotations(annotations, coco)
viz = lp.draw_box(image, layout, color_map=color_map)
display(viz) # show the results

You could add more information in the visualization.
lp.draw_box(image,
[b.set(id=f'{b.id}/{b.type}') for b in layout],
color_map=color_map,
show_element_id=True, id_font_size=10,
id_text_background_color='grey',
id_text_color='white')

Model Predictions on loaded data¶
We could also check how the trained layout model performs on the input image. Following this instruction, we could conveniently load a layout prediction model and run predictions on the existing image.
model = lp.Detectron2LayoutModel('lp://PubLayNet/faster_rcnn_R_50_FPN_3x/config',
extra_config=["MODEL.ROI_HEADS.SCORE_THRESH_TEST", 0.8],
label_map={0: "text", 1: "title", 2: "list", 3:"table", 4:"figure"})
layout_predicted = model.detect(image)
lp.draw_box(image,
[b.set(id=f'{b.type}/{b.score:.2f}') for b in layout_predicted],
color_map=color_map,
show_element_id=True, id_font_size=10,
id_text_background_color='grey',
id_text_color='white')

Layout Elements¶
Coordinate System¶
-
class
layoutparser.elements.
Interval
(start, end, axis, canvas_height=None, canvas_width=None)[source]¶ Bases:
layoutparser.elements.base.BaseCoordElement
This class describes the coordinate system of an interval, a block defined by a pair of start and end point on the designated axis and same length as the base canvas on the other axis.
- Parameters
start (
numeric
) – The coordinate of the start point on the designated axis.end (
numeric
) – The end coordinate on the same axis as start.axis (
str
) – The designated axis that the end points belong to.canvas_height (
numeric
, optional, defaults to 0) – The height of the canvas that the interval is on.canvas_width (
numeric
, optional, defaults to 0) – The width of the canvas that the interval is on.
-
property
height
¶ Calculate the height of the interval. If the interval is along the x-axis, the height will be the height of the canvas, otherwise, it will be the difference between the start and end point.
- Returns
Output the numeric value of the height.
- Return type
numeric
-
property
width
¶ Calculate the width of the interval. If the interval is along the y-axis, the width will be the width of the canvas, otherwise, it will be the difference between the start and end point.
- Returns
Output the numeric value of the width.
- Return type
numeric
-
property
coordinates
¶ This method considers an interval as a rectangle and calculates the coordinates of the upper left and lower right corners to define the interval.
- Returns
Output the numeric values of the coordinates in a Tuple of size four.
- Return type
Tuple(numeric)
-
property
points
¶ Return the coordinates of all four corners of the interval in a clockwise fashion starting from the upper left.
- Returns
A Numpy array of shape 4x2 containing the coordinates.
- Return type
Numpy array
-
property
center
¶ Calculate the mid-point between the start and end point.
- Returns
Returns of coordinate of the center.
- Return type
Tuple(numeric)
-
property
area
¶ Return the area of the covered region of the interval. The area is bounded to the canvas. If the interval is put on a canvas, the area equals to interval width * canvas height (axis=’x’) or interval height * canvas width (axis=’y’). Otherwise, the area is zero.
-
put_on_canvas
(canvas)[source]¶ Set the height and the width of the canvas that the interval is on.
- Parameters
canvas (
Numpy array
orBaseCoordElement
orPIL.Image.Image
) – The base element that the interval is on. The numpy array should be the format of [height, width].- Returns
A copy of the current Interval with its canvas height and width set to those of the input canvas.
- Return type
-
condition_on
(other)[source]¶ Given the current element in relative coordinates to another element which is in absolute coordinates, generate a new element of the current element in absolute coordinates.
- Parameters
other (
BaseCoordElement
) – The other layout element involved in the geometric operations.- Raises
Exception – Raise error when the input type of the other element is invalid.
- Returns
The BaseCoordElement object of the original element in the absolute coordinate system.
- Return type
BaseCoordElement
-
relative_to
(other)[source]¶ Given the current element and another element both in absolute coordinates, generate a new element of the current element in relative coordinates to the other element.
- Parameters
other (
BaseCoordElement
) – The other layout element involved in the geometric operations.- Raises
Exception – Raise error when the input type of the other element is invalid.
- Returns
The BaseCoordElement object of the original element in the relative coordinate system.
- Return type
BaseCoordElement
-
is_in
(other, soft_margin={}, center=False)[source]¶ Identify whether the current element is within another element.
- Parameters
other (
BaseCoordElement
) – The other layout element involved in the geometric operations.soft_margin (
dict
, optional, defaults to {}) – Enlarge the other element with wider margins to relax the restrictions.center (
bool
, optional, defaults to False) – The toggle to determine whether the center (instead of the four corners) of the current element is in the other element.
- Returns
Returns True if the current element is in the other element and False if not.
- Return type
-
intersect
(other: layoutparser.elements.base.BaseCoordElement, strict: bool = True)[source]¶ Intersect the current shape with the other object, with operations defined in Shape Operations.
-
union
(other: layoutparser.elements.base.BaseCoordElement, strict: bool = True)[source]¶ Union the current shape with the other object, with operations defined in Shape Operations.
-
pad
(left=0, right=0, top=0, bottom=0, safe_mode=True)[source]¶ Pad the layout element on the four sides of the polygon with the user-defined pixels. If safe_mode is set to True, the function will cut off the excess padding that falls on the negative side of the coordinates.
- Parameters
left (
int
, optional, defaults to 0) – The number of pixels to pad on the upper side of the polygon.right (
int
, optional, defaults to 0) – The number of pixels to pad on the lower side of the polygon.top (
int
, optional, defaults to 0) – The number of pixels to pad on the left side of the polygon.bottom (
int
, optional, defaults to 0) – The number of pixels to pad on the right side of the polygon.safe_mode (
bool
, optional, defaults to True) – A bool value to toggle the safe_mode.
- Returns
The padded BaseCoordElement object.
- Return type
BaseCoordElement
-
shift
(shift_distance)[source]¶ Shift the interval by a user specified amount along the same axis that the interval is defined on.
- Parameters
shift_distance (
numeric
) – The number of pixels used to shift the interval.- Returns
The shifted Interval object.
- Return type
BaseCoordElement
-
scale
(scale_factor)[source]¶ Scale the layout element by a user specified amount the same axis that the interval is defined on.
- Parameters
scale_factor (
numeric
) – The amount for downscaling or upscaling the element.- Returns
The scaled Interval object.
- Return type
BaseCoordElement
-
crop_image
(image)[source]¶ Crop the input image according to the coordinates of the element.
- Parameters
image (
Numpy array
) – The array of the input image.- Returns
The array of the cropped image.
- Return type
Numpy array
-
to_rectangle
()[source]¶ Convert the Interval to a Rectangle element.
- Returns
The converted Rectangle object.
- Return type
-
class
layoutparser.elements.
Rectangle
(x_1, y_1, x_2, y_2)[source]¶ Bases:
layoutparser.elements.base.BaseCoordElement
This class describes the coordinate system of an axial rectangle box using two points as indicated below:
(x_1, y_1) ---- | | | | | | ---- (x_2, y_2)
- Parameters
x_1 (
numeric
) – x coordinate on the horizontal axis of the upper left corner of the rectangle.y_1 (
numeric
) – y coordinate on the vertical axis of the upper left corner of the rectangle.x_2 (
numeric
) – x coordinate on the horizontal axis of the lower right corner of the rectangle.y_2 (
numeric
) – y coordinate on the vertical axis of the lower right corner of the rectangle.
-
property
height
¶ Calculate the height of the rectangle.
- Returns
Output the numeric value of the height.
- Return type
numeric
-
property
width
¶ Calculate the width of the rectangle.
- Returns
Output the numeric value of the width.
- Return type
numeric
-
property
coordinates
¶ Return the coordinates of the two points that define the rectangle.
- Returns
Output the numeric values of the coordinates in a Tuple of size four.
- Return type
Tuple(numeric)
-
property
points
¶ Return the coordinates of all four corners of the rectangle in a clockwise fashion starting from the upper left.
- Returns
A Numpy array of shape 4x2 containing the coordinates.
- Return type
Numpy array
-
property
center
¶ Calculate the center of the rectangle.
- Returns
Returns of coordinate of the center.
- Return type
Tuple(numeric)
-
property
area
¶ Return the area of the rectangle.
-
condition_on
(other)[source]¶ Given the current element in relative coordinates to another element which is in absolute coordinates, generate a new element of the current element in absolute coordinates.
- Parameters
other (
BaseCoordElement
) – The other layout element involved in the geometric operations.- Raises
Exception – Raise error when the input type of the other element is invalid.
- Returns
The BaseCoordElement object of the original element in the absolute coordinate system.
- Return type
BaseCoordElement
-
relative_to
(other)[source]¶ Given the current element and another element both in absolute coordinates, generate a new element of the current element in relative coordinates to the other element.
- Parameters
other (
BaseCoordElement
) – The other layout element involved in the geometric operations.- Raises
Exception – Raise error when the input type of the other element is invalid.
- Returns
The BaseCoordElement object of the original element in the relative coordinate system.
- Return type
BaseCoordElement
-
is_in
(other, soft_margin={}, center=False)[source]¶ Identify whether the current element is within another element.
- Parameters
other (
BaseCoordElement
) – The other layout element involved in the geometric operations.soft_margin (
dict
, optional, defaults to {}) – Enlarge the other element with wider margins to relax the restrictions.center (
bool
, optional, defaults to False) – The toggle to determine whether the center (instead of the four corners) of the current element is in the other element.
- Returns
Returns True if the current element is in the other element and False if not.
- Return type
-
intersect
(other: layoutparser.elements.base.BaseCoordElement, strict: bool = True)[source]¶ Intersect the current shape with the other object, with operations defined in Shape Operations.
-
union
(other: layoutparser.elements.base.BaseCoordElement, strict: bool = True)[source]¶ Union the current shape with the other object, with operations defined in Shape Operations.
-
pad
(left=0, right=0, top=0, bottom=0, safe_mode=True)[source]¶ Pad the layout element on the four sides of the polygon with the user-defined pixels. If safe_mode is set to True, the function will cut off the excess padding that falls on the negative side of the coordinates.
- Parameters
left (
int
, optional, defaults to 0) – The number of pixels to pad on the upper side of the polygon.right (
int
, optional, defaults to 0) – The number of pixels to pad on the lower side of the polygon.top (
int
, optional, defaults to 0) – The number of pixels to pad on the left side of the polygon.bottom (
int
, optional, defaults to 0) – The number of pixels to pad on the right side of the polygon.safe_mode (
bool
, optional, defaults to True) – A bool value to toggle the safe_mode.
- Returns
The padded BaseCoordElement object.
- Return type
BaseCoordElement
-
shift
(shift_distance=0)[source]¶ Shift the layout element by user specified amounts on x and y axis respectively. If shift_distance is one numeric value, the element will by shifted by the same specified amount on both x and y axis.
- Parameters
shift_distance (
numeric
orTuple(numeric)
orList[numeric]
) – The number of pixels used to shift the element.- Returns
The shifted BaseCoordElement of the same shape-specific class.
- Return type
BaseCoordElement
-
scale
(scale_factor=1)[source]¶ Scale the layout element by a user specified amount on x and y axis respectively. If scale_factor is one numeric value, the element will by scaled by the same specified amount on both x and y axis.
- Parameters
scale_factor (
numeric
orTuple(numeric)
orList[numeric]
) – The amount for downscaling or upscaling the element.- Returns
The scaled BaseCoordElement of the same shape-specific class.
- Return type
BaseCoordElement
-
class
layoutparser.elements.
Quadrilateral
(points: Union[numpy.ndarray, List, List[List]], height=None, width=None)[source]¶ Bases:
layoutparser.elements.base.BaseCoordElement
This class describes the coodinate system of a four-sided polygon. A quadrilateral is defined by the coordinates of its 4 corners in a clockwise order starting with the upper left corner (as shown below):
points[0] -...- points[1] | | . . . . . . | | points[3] -...- points[2]
- Parameters
points (
Numpy array
or list) – A np.ndarray of shape 4x2 for four corner coordinates or a list of length 8 for in the format of [p0_x, p0_y, p1_x, p1_y, p2_x, p2_y, p3_x, p3_y] or a list of length 4 in the format of [[p0_x, p0_y], [p1_x, p1_y], [p2_x, p2_y], [p3_x, p3_y]].height (
numeric
, optional, defaults to None) – The height of the quadrilateral. This is to better support the perspective transformation from the OpenCV library.width (
numeric
, optional, defaults to None) – The width of the quadrilateral. Similarly as height, this is to better support the perspective transformation from the OpenCV library.
-
property
height
¶ Return the user defined height, otherwise the height of its circumscribed rectangle.
- Returns
Output the numeric value of the height.
- Return type
numeric
-
property
width
¶ Return the user defined width, otherwise the width of its circumscribed rectangle.
- Returns
Output the numeric value of the width.
- Return type
numeric
-
property
coordinates
¶ Return the coordinates of the upper left and lower right corners points that define the circumscribed rectangle.
- Returns
Tuple(numeric)
: Output the numeric values of the coordinates in a Tuple of size four.
-
property
points
¶ Return the coordinates of all four corners of the quadrilateral in a clockwise fashion starting from the upper left.
- Returns
A Numpy array of shape 4x2 containing the coordinates.
- Return type
Numpy array
-
property
center
¶ Calculate the center of the quadrilateral.
- Returns
Returns of coordinate of the center.
- Return type
Tuple(numeric)
-
property
area
¶ Return the area of the quadrilateral.
-
property
mapped_rectangle_points
¶
-
property
perspective_matrix
¶
-
condition_on
(other)[source]¶ Given the current element in relative coordinates to another element which is in absolute coordinates, generate a new element of the current element in absolute coordinates.
- Parameters
other (
BaseCoordElement
) – The other layout element involved in the geometric operations.- Raises
Exception – Raise error when the input type of the other element is invalid.
- Returns
The BaseCoordElement object of the original element in the absolute coordinate system.
- Return type
BaseCoordElement
-
relative_to
(other)[source]¶ Given the current element and another element both in absolute coordinates, generate a new element of the current element in relative coordinates to the other element.
- Parameters
other (
BaseCoordElement
) – The other layout element involved in the geometric operations.- Raises
Exception – Raise error when the input type of the other element is invalid.
- Returns
The BaseCoordElement object of the original element in the relative coordinate system.
- Return type
BaseCoordElement
-
is_in
(other, soft_margin={}, center=False)[source]¶ Identify whether the current element is within another element.
- Parameters
other (
BaseCoordElement
) – The other layout element involved in the geometric operations.soft_margin (
dict
, optional, defaults to {}) – Enlarge the other element with wider margins to relax the restrictions.center (
bool
, optional, defaults to False) – The toggle to determine whether the center (instead of the four corners) of the current element is in the other element.
- Returns
Returns True if the current element is in the other element and False if not.
- Return type
-
intersect
(other: layoutparser.elements.base.BaseCoordElement, strict: bool = True)[source]¶ Intersect the current shape with the other object, with operations defined in Shape Operations.
-
union
(other: layoutparser.elements.base.BaseCoordElement, strict: bool = True)[source]¶ Union the current shape with the other object, with operations defined in Shape Operations.
-
pad
(left=0, right=0, top=0, bottom=0, safe_mode=True)[source]¶ Pad the layout element on the four sides of the polygon with the user-defined pixels. If safe_mode is set to True, the function will cut off the excess padding that falls on the negative side of the coordinates.
- Parameters
left (
int
, optional, defaults to 0) – The number of pixels to pad on the upper side of the polygon.right (
int
, optional, defaults to 0) – The number of pixels to pad on the lower side of the polygon.top (
int
, optional, defaults to 0) – The number of pixels to pad on the left side of the polygon.bottom (
int
, optional, defaults to 0) – The number of pixels to pad on the right side of the polygon.safe_mode (
bool
, optional, defaults to True) – A bool value to toggle the safe_mode.
- Returns
The padded BaseCoordElement object.
- Return type
BaseCoordElement
-
shift
(shift_distance=0)[source]¶ Shift the layout element by user specified amounts on x and y axis respectively. If shift_distance is one numeric value, the element will by shifted by the same specified amount on both x and y axis.
- Parameters
shift_distance (
numeric
orTuple(numeric)
orList[numeric]
) – The number of pixels used to shift the element.- Returns
The shifted BaseCoordElement of the same shape-specific class.
- Return type
BaseCoordElement
-
scale
(scale_factor=1)[source]¶ Scale the layout element by a user specified amount on x and y axis respectively. If scale_factor is one numeric value, the element will by scaled by the same specified amount on both x and y axis.
- Parameters
scale_factor (
numeric
orTuple(numeric)
orList[numeric]
) – The amount for downscaling or upscaling the element.- Returns
The scaled BaseCoordElement of the same shape-specific class.
- Return type
BaseCoordElement
TextBlock¶
-
class
layoutparser.elements.
TextBlock
(block, text=None, id=None, type=None, parent=None, next=None, score=None)[source]¶ Bases:
layoutparser.elements.base.BaseLayoutElement
This class constructs content-related information of a layout element in addition to its coordinate definitions (i.e. Interval, Rectangle or Quadrilateral).
- Parameters
block (
BaseCoordElement
) – The shape-specific coordinate systems that the text block belongs to.text (
str
, optional, defaults to None) – The ocr’ed text results within the boundaries of the text block.id (
int
, optional, defaults to None) – The id of the text block.type (
int
, optional, defaults to None) – The type of the text block.parent (
int
, optional, defaults to None) – The id of the parent object.next (
int
, optional, defaults to None) – The id of the next block.score (
numeric
, defaults to None) – The prediction confidence of the block
-
property
height
¶ Return the height of the shape-specific block.
- Returns
Output the numeric value of the height.
- Return type
numeric
-
property
width
¶ Return the width of the shape-specific block.
- Returns
Output the numeric value of the width.
- Return type
numeric
-
property
coordinates
¶ Return the coordinates of the two corner points that define the shape-specific block.
- Returns
Output the numeric values of the coordinates in a Tuple of size four.
- Return type
Tuple(numeric)
-
property
points
¶ Return the coordinates of all four corners of the shape-specific block in a clockwise fashion starting from the upper left.
- Returns
A Numpy array of shape 4x2 containing the coordinates.
- Return type
Numpy array
-
property
area
¶ Return the area of associated block.
-
condition_on
(other)[source]¶ Given the current element in relative coordinates to another element which is in absolute coordinates, generate a new element of the current element in absolute coordinates.
- Parameters
other (
BaseCoordElement
) – The other layout element involved in the geometric operations.- Raises
Exception – Raise error when the input type of the other element is invalid.
- Returns
The BaseCoordElement object of the original element in the absolute coordinate system.
- Return type
BaseCoordElement
-
relative_to
(other)[source]¶ Given the current element and another element both in absolute coordinates, generate a new element of the current element in relative coordinates to the other element.
- Parameters
other (
BaseCoordElement
) – The other layout element involved in the geometric operations.- Raises
Exception – Raise error when the input type of the other element is invalid.
- Returns
The BaseCoordElement object of the original element in the relative coordinate system.
- Return type
BaseCoordElement
-
is_in
(other, soft_margin={}, center=False)[source]¶ Identify whether the current element is within another element.
- Parameters
other (
BaseCoordElement
) – The other layout element involved in the geometric operations.soft_margin (
dict
, optional, defaults to {}) – Enlarge the other element with wider margins to relax the restrictions.center (
bool
, optional, defaults to False) – The toggle to determine whether the center (instead of the four corners) of the current element is in the other element.
- Returns
Returns True if the current element is in the other element and False if not.
- Return type
-
union
(other: layoutparser.elements.base.BaseCoordElement, strict: bool = True)[source]¶ Union the current shape with the other object, with operations defined in Shape Operations.
-
intersect
(other: layoutparser.elements.base.BaseCoordElement, strict: bool = True)[source]¶ Intersect the current shape with the other object, with operations defined in Shape Operations.
-
shift
(shift_distance)[source]¶ Shift the layout element by user specified amounts on x and y axis respectively. If shift_distance is one numeric value, the element will by shifted by the same specified amount on both x and y axis.
- Parameters
shift_distance (
numeric
orTuple(numeric)
orList[numeric]
) – The number of pixels used to shift the element.- Returns
The shifted BaseCoordElement of the same shape-specific class.
- Return type
BaseCoordElement
-
pad
(left=0, right=0, top=0, bottom=0, safe_mode=True)[source]¶ Pad the layout element on the four sides of the polygon with the user-defined pixels. If safe_mode is set to True, the function will cut off the excess padding that falls on the negative side of the coordinates.
- Parameters
left (
int
, optional, defaults to 0) – The number of pixels to pad on the upper side of the polygon.right (
int
, optional, defaults to 0) – The number of pixels to pad on the lower side of the polygon.top (
int
, optional, defaults to 0) – The number of pixels to pad on the left side of the polygon.bottom (
int
, optional, defaults to 0) – The number of pixels to pad on the right side of the polygon.safe_mode (
bool
, optional, defaults to True) – A bool value to toggle the safe_mode.
- Returns
The padded BaseCoordElement object.
- Return type
BaseCoordElement
-
scale
(scale_factor)[source]¶ Scale the layout element by a user specified amount on x and y axis respectively. If scale_factor is one numeric value, the element will by scaled by the same specified amount on both x and y axis.
- Parameters
scale_factor (
numeric
orTuple(numeric)
orList[numeric]
) – The amount for downscaling or upscaling the element.- Returns
The scaled BaseCoordElement of the same shape-specific class.
- Return type
BaseCoordElement
-
crop_image
(image)[source]¶ Crop the input image according to the coordinates of the element.
- Parameters
image (
Numpy array
) – The array of the input image.- Returns
The array of the cropped image.
- Return type
Numpy array
-
to_dict
() → Dict[str, Any][source]¶ Generate a dictionary representation of the current textblock of the format:
{ "block_type": <name of self.block>, <attributes of self.block combined with non-empty self._features> }
-
classmethod
from_dict
(data: Dict[str, Any]) → layoutparser.elements.layout_elements.TextBlock[source]¶ Initialize the textblock based on the dictionary representation. It generate the block based on the block_type and block_attr, and loads the textblock specific features from the dict.
- Parameters
data (
dict
) – The dictionary representation of the object
Layout¶
-
class
layoutparser.elements.
Layout
(blocks: Optional[List] = None, *, page_data: Dict = None)[source]¶ Bases:
collections.abc.MutableSequence
The
Layout
class id designed for processing a list of layout elements on a page. It stores the layout elements in a list and the related page_data, and provides handy APIs for processing all the layout elements in batch. `- Parameters
blocks (
list
) – A list of layout element blockspage_data (Dict, optional) – A dictionary storing the page (canvas) related information like height, width, etc. It should be passed in as a keyword argument to avoid any confusion. Defaults to None.
-
sort
(key=None, reverse=False, inplace=False) → Optional[layoutparser.elements.layout.Layout][source]¶ Sort the list of blocks based on the given
- Parameters
key ([type], optional) – key specifies a function of one argument that
used to extract a comparison key from each list element. (is) –
to None. (Defaults) –
reverse (bool, optional) – reverse is a boolean value. If set to True,
the list elements are sorted as if each comparison were reversed. (then) –
to False. (Defaults) –
inplace (bool, optional) – whether to perform the sort inplace. If set
False, it will return another object instance with _block sorted in (to) –
order. Defaults to False. (the) –
- Examples::
>>> import layoutparser as lp >>> i = lp.Interval(4, 5, axis="y") >>> l = lp.Layout([i, i.shift(2)]) >>> l.sort(key=lambda x: x.coordinates[1], reverse=True)
-
filter_by
(other, soft_margin={}, center=False)[source]¶ Return a Layout object containing the elements that are in the other object.
- Parameters
other (
BaseCoordElement
) – The block to filter the current elements.- Returns
A new layout object after filtering.
- Return type
-
shift
(shift_distance)[source]¶ Shift all layout elements by user specified amounts on x and y axis respectively. If shift_distance is one numeric value, the element will by shifted by the same specified amount on both x and y axis.
- Parameters
shift_distance (
numeric
orTuple(numeric)
orList[numeric]
) – The number of pixels used to shift the element.- Returns
A new layout object with all the elements shifted in the specified values.
- Return type
-
pad
(left=0, right=0, top=0, bottom=0, safe_mode=True)[source]¶ Pad all layout elements on the four sides of the polygon with the user-defined pixels. If safe_mode is set to True, the function will cut off the excess padding that falls on the negative side of the coordinates.
- Parameters
left (
int
, optional, defaults to 0) – The number of pixels to pad on the upper side of the polygon.right (
int
, optional, defaults to 0) – The number of pixels to pad on the lower side of the polygon.top (
int
, optional, defaults to 0) – The number of pixels to pad on the left side of the polygon.bottom (
int
, optional, defaults to 0) – The number of pixels to pad on the right side of the polygon.safe_mode (
bool
, optional, defaults to True) – A bool value to toggle the safe_mode.
- Returns
A new layout object with all the elements padded in the specified values.
- Return type
-
scale
(scale_factor)[source]¶ Scale all layout element by a user specified amount on x and y axis respectively. If scale_factor is one numeric value, the element will by scaled by the same specified amount on both x and y axis.
- Parameters
scale_factor (
numeric
orTuple(numeric)
orList[numeric]
) – The amount for downscaling or upscaling the element.- Returns
A new layout object with all the elements scaled in the specified values.
- Return type
-
get_texts
()[source]¶ Iterate through all the text blocks in the list and append their ocr’ed text results.
- Returns
A list of text strings of the text blocks in the list of layout elements.
- Return type
List[str]
-
get_info
(attr_name)[source]¶ Given user-provided attribute name, check all the elements in the list and return the corresponding attribute values.
- Parameters
attr_name (
str
) – The text string of certain attribute name.- Returns
The list of the corresponding attribute value (if exist) of each element in the list.
- Return type
List
-
to_dict
() → Dict[str, Any][source]¶ Generate a dict representation of the layout object with the page_data and all the blocks in its dict representation.
- Returns
The dictionary representation of the layout object.
- Return type
Dict
-
get_homogeneous_blocks
() → List[layoutparser.elements.base.BaseLayoutElement][source]¶ Convert all elements into blocks of the same type based on the type casting rule:
Interval < Rectangle < Quadrilateral < TextBlock
- Returns
A list of base layout elements of the maximal compatible type
- Return type
List[BaseLayoutElement]
-
to_dataframe
(enforce_same_type=False) → pandas.core.frame.DataFrame[source]¶ Convert the layout object into the dataframe. Warning: the page data won’t be exported.
- Parameters
enforce_same_type (
bool
, optional) – If true, it will convert all the contained blocks to the maximal compatible data type. Defaults to False.- Returns
The dataframe representation of layout object
- Return type
pd.DataFrame
Shape Operations¶
[BETA: the API and behavior will be changed in the future.]
Starting from v0.2, Layout Parser provides supports for two types of shape operations, union
and intersection
, across all BaseCoordElement
s and TextBlock
. We’ve made some design choices to construct a set of generalized APIs across different shape classes, detailed as follows:
The union
Operation¶
▲ The Illustration of Union Operations. The resulting matrix are symmetric so only the lower triangular region is left empty. Each cell shows the visualization of the shape objects, their coordinates, and their object class. For the output visualization, the gray and dashed line delineates the original obj1 and obj2, respectively, for reference.
Notes:
The x-interval and y-interval are both from the
Interval
Class but with different axes. It’s ill-defined to union two intervals from different axes so in this case Layout Parser will raise anInvalidShapeError
.The union of two rectangles is still a rectangle, which is the minimum covering rectangle of the two input rectangles.
For the outputs associated with
Quadrilateral
inputs, please see details in the Problems related to the Quadrilateral Class section.
The intersect
Operation¶
▲ The Illustration of Union Operations. Similar to the previous visualization, the resulting matrix are symmetric so only the lower triangular region is left empty. Each cell shows the visualization of the shape objects, their coordinates, and their object class. For the output visualization, the gray and dashed line delineates the original obj1 and obj2, respectively, for reference.
Text Recognition Tool¶
Google Cloud Vision API¶
-
class
layoutparser.ocr.
GCVFeatureType
[source]¶ Bases:
layoutparser.ocr.base.BaseOCRElementType
The element types from Google Cloud Vision API
-
PAGE
= 0¶
-
BLOCK
= 1¶
-
PARA
= 2¶
-
WORD
= 3¶
-
SYMBOL
= 4¶
-
property
child_level
¶
-
-
class
layoutparser.ocr.
GCVAgent
(languages=None, ocr_image_decode_type='.png')[source]¶ Bases:
layoutparser.ocr.base.BaseOCRAgent
A wrapper for Google Cloud Vision (GCV) Text Detection APIs.
Note
Google Cloud Vision API returns the output text in two types:
text_annotations:
In this format, GCV automatically find the best aggregation level for the text, and return the results in a list. We use
gather_text_annotations
to reterive this type of information.full_text_annotation:
To support better user control, GCV also provides the full_text_annotation output, where it returns the hierarchical structure of the output text. To process this output, we provide the
gather_full_text_annotation
function to aggregate the texts of the given aggregation level.
Create a Google Cloud Vision OCR Agent.
- Parameters
languages (
list
, optional) – You can specify the language code of the documents to detect to improve accuracy. The supported language and their code can be found on this page. Defaults to None.ocr_image_decode_type (
str
, optional) –The format to convert the input image to before sending for GCV OCR. Defaults to “.png”.
”.png” is suggested as it does not compress the image.
But “.jpg” could also be a good choice if the input image is very large.
-
DEPENDENCIES
= ['google-cloud-vision']¶
-
classmethod
with_credential
(credential_path, **kwargs)[source]¶ Specifiy the credential to use for the GCV OCR API.
- Parameters
credential_path (
str
) – The path to the credential file
-
detect
(image, return_response=False, return_only_text=False, agg_output_level=None)[source]¶ Send the input image for OCR.
- Parameters
image (
np.ndarray
orstr
) – The input image array or the name of the image filereturn_response (
bool
, optional) – Whether directly return the google cloud response. Defaults to False.return_only_text (
bool
, optional) – Whether return only the texts in the OCR results. Defaults to False.agg_output_level (
GCVFeatureType
, optional) – When set, aggregate the GCV output with respect to the specified aggregation level. Defaults to None.
-
static
gather_text_annotations
(response)[source]¶ Convert the text_annotations from GCV output to an
Layout
object.- Parameters
response (
AnnotateImageResponse
) – The returned Google Cloud Vision AnnotateImageResponse object.- Returns
The reterived layout from the response.
- Return type
Layout
-
static
gather_full_text_annotation
(response, agg_level)[source]¶ Convert the full_text_annotation from GCV output to an
Layout
object.- Parameters
response (
AnnotateImageResponse
) – The returned Google Cloud Vision AnnotateImageResponse object.agg_level (
GCVFeatureType
) – The layout level to aggregate the text in full_text_annotation.
- Returns
The reterived layout from the response.
- Return type
Layout
Tesseract OCR API¶
-
class
layoutparser.ocr.
TesseractFeatureType
[source]¶ Bases:
layoutparser.ocr.base.BaseOCRElementType
The element types for Tesseract Detection API
-
PAGE
= 0¶
-
BLOCK
= 1¶
-
PARA
= 2¶
-
LINE
= 3¶
-
WORD
= 4¶
-
property
group_levels
¶
-
-
class
layoutparser.ocr.
TesseractAgent
(languages='eng', **kwargs)[source]¶ Bases:
layoutparser.ocr.base.BaseOCRAgent
A wrapper for Tesseract Text Detection APIs based on PyTesseract.
Create a Tesseract OCR Agent.
- Parameters
languages (
list
orstr
, optional) – You can specify the language code(s) of the documents to detect to improve accuracy. The supported language and their code can be found on its github repo. It supports two formats: 1) you can pass in the languages code as a string of format like “eng+fra”, or 2) you can pack them as a list of strings [“eng”, “fra”]. Defaults to ‘eng’.
-
DEPENDENCIES
= ['pytesseract']¶
-
detect
(image, return_response=False, return_only_text=True, agg_output_level=None)[source]¶ Send the input image for OCR.
- Parameters
image (
np.ndarray
orstr
) – The input image array or the name of the image filereturn_response (
bool
, optional) – Whether directly return all output (string and boxes info) from Tesseract. Defaults to False.return_only_text (
bool
, optional) – Whether return only the texts in the OCR results. Defaults to False.agg_output_level (
TesseractFeatureType
, optional) – When set, aggregate the GCV output with respect to the specified aggregation level. Defaults to None.
Layout Detection Models¶
-
class
layoutparser.models.
Detectron2LayoutModel
(config_path, model_path=None, label_map=None, extra_config=None, enforce_cpu=None, device=None)[source]¶ Bases:
layoutparser.models.base_layoutmodel.BaseLayoutModel
Create a Detectron2-based Layout Detection Model
- Parameters
config_path (
str
) – The path to the configuration file.model_path (
str
, None) – The path to the saved weights of the model. If set, overwrite the weights in the configuration file. Defaults to None.label_map (
dict
, optional) – The map from the model prediction (ids) to real word labels (strings). If the config is from one of the supported datasets, Layout Parser will automatically initialize the label_map. Defaults to None.device (
str
, optional) – Whether to use cuda or cpu devices. If not set, LayoutParser will automatically determine the device to initialize the models on.extra_config (
list
, optional) – Extra configuration passed to the Detectron2 model configuration. The argument will be used in the merge_from_list function. Defaults to [].
- Examples::
>>> import layoutparser as lp >>> model = lp.Detectron2LayoutModel('lp://HJDataset/faster_rcnn_R_50_FPN_3x/config') >>> model.detect(image)
-
DEPENDENCIES
= ['detectron2']¶
-
DETECTOR_NAME
= 'detectron2'¶
-
MODEL_CATALOG
= {'HJDataset': {'faster_rcnn_R_50_FPN_3x': 'https://www.dropbox.com/s/6icw6at8m28a2ho/model_final.pth?dl=1', 'mask_rcnn_R_50_FPN_3x': 'https://www.dropbox.com/s/893paxpy5suvlx9/model_final.pth?dl=1', 'retinanet_R_50_FPN_3x': 'https://www.dropbox.com/s/yxsloxu3djt456i/model_final.pth?dl=1'}, 'MFD': {'faster_rcnn_R_50_FPN_3x': 'https://www.dropbox.com/s/7xel0i3iqpm2p8y/model_final.pth?dl=1'}, 'NewspaperNavigator': {'faster_rcnn_R_50_FPN_3x': 'https://www.dropbox.com/s/6ewh6g8rqt2ev3a/model_final.pth?dl=1'}, 'PrimaLayout': {'mask_rcnn_R_50_FPN_3x': 'https://www.dropbox.com/s/h7th27jfv19rxiy/model_final.pth?dl=1'}, 'PubLayNet': {'faster_rcnn_R_50_FPN_3x': 'https://www.dropbox.com/s/dgy9c10wykk4lq4/model_final.pth?dl=1', 'mask_rcnn_R_50_FPN_3x': 'https://www.dropbox.com/s/d9fc9tahfzyl6df/model_final.pth?dl=1', 'mask_rcnn_X_101_32x8d_FPN_3x': 'https://www.dropbox.com/s/57zjbwv6gh3srry/model_final.pth?dl=1'}, 'TableBank': {'faster_rcnn_R_101_FPN_3x': 'https://www.dropbox.com/s/6vzfk8lk9xvyitg/model_final.pth?dl=1', 'faster_rcnn_R_50_FPN_3x': 'https://www.dropbox.com/s/8v4uqmz1at9v72a/model_final.pth?dl=1'}}¶
Layout and Text Visualization¶
-
layoutparser.visualization.
draw_box
(canvas: PIL.Image.Image, layout: layoutparser.elements.layout.Layout, box_width: Union[List[int], int, None] = None, box_alpha: Union[List[float], float, None] = None, box_color: Union[List[str], str, None] = None, color_map: Optional[Dict] = None, show_element_id: bool = False, show_element_type: bool = False, id_font_size: Optional[int] = None, id_font_path: Optional[str] = None, id_text_color: Optional[str] = None, id_text_background_color: Optional[str] = None, id_text_background_alpha: Optional[float] = 1)[source]¶ Draw the layout region on the input canvas(image).
- Parameters
canvas (
ndarray
orImage
) – The canvas to draw the layout boxes.layout (
Layout
orlist
) – The layout of the canvas to show.box_width (
int
orList[int]
, optional) – Set to change the width of the drawn layout box boundary. Defaults to None, when the boundary is automatically calculated as the theDEFAULT_BOX_WIDTH_RATIO
* the maximum of (height, width) of the canvas. If box_with is a list, it will assign different widths to the corresponding layout object, and should have the same length as the number of blocks in layout.box_alpha (
float
orList[float]
, optional) – A float or list of floats ranging from 0 to 1. Set to change the alpha of the drawn layout box. Defaults to 0 - the layout box will be fully transparent. If box_alpha is a list of floats, it will assign different alphas to the corresponding layout object, and should have the same length as the number of blocks in layout.box_color (
str
orList[str]
, optional) – A string or a list of strings for box colors, e.g., [‘red’, ‘green’, ‘blue’] or ‘red’. If box_color is a list of strings, it will assign different colors to the corresponding layout object, and should have the same length as the number of blocks in layout. Defaults to None. When box_color is set, it will override the color_map.color_map (dict, optional) – A map from block.type to the colors, e.g., {1: ‘red’}. You can set it to {} to use only the
DEFAULT_OUTLINE_COLOR
for the outlines. Defaults to None, when a color palette is is automatically created based on the input layout.show_element_id (bool, optional) – Whether to display block.id on the top-left corner of the block. Defaults to False.
show_element_type (bool, optional) – Whether to display block.type on the top-left corner of the block. Defaults to False.
id_font_size (int, optional) – Set to change the font size used for drawing block.id. Defaults to None, when the size is set to
DEFAULT_FONT_SIZE
.id_font_path (
str
, optional) – Set to change the font used for drawing block.id. Defaults to None, when theDEFAULT_FONT_OBJECT
is used.id_text_color (
str
, optional) – Set to change the text color used for drawing block.id. Defaults to None, when the color is set toDEFAULT_TEXT_COLOR
.id_text_background_color (
str
, optional) – Set to change the text region background used for drawing block.id. Defaults to None, when the color is set toDEFAULT_TEXT_BACKGROUND
.id_text_background_alpha (
float
, optional) – A float range from 0 to 1. Set to change the alpha of the drawn text. Defaults to 1 - the text box will be solid.
- Returns
A Image object containing the layout draw upon the input canvas.
- Return type
PIL.Image.Image
-
layoutparser.visualization.
draw_text
(canvas, layout, arrangement: str = 'lr', font_size: Optional[int] = None, font_path: Optional[str] = None, text_color: Optional[str] = None, text_background_color: Optional[str] = None, text_background_alpha: Optional[float] = None, vertical_text: bool = False, with_box_on_text: bool = False, text_box_width: Optional[int] = None, text_box_color: Optional[str] = None, text_box_alpha: Optional[float] = None, with_layout: bool = False, **kwargs)[source]¶ Draw the (detected) text in the layout according to their coordinates next to the input canvas (image) for better comparison.
- Parameters
canvas (
ndarray
orImage
) – The canvas to draw the layout boxes.layout (
Layout
orlist
) – The layout of the canvas to show.arrangement ({‘lr’, ‘ud’}, optional) – The arrangement of the drawn text canvas and the original image canvas: * lr - left and right * ud - up and down Defaults to ‘lr’.
font_size (
str
, optional) – Set to change the size of the font used for drawing block.text. Defaults to None, when the size is set toDEFAULT_FONT_SIZE
.font_path (
str
, optional) – Set to change the font used for drawing block.text. Defaults to None, when theDEFAULT_FONT_OBJECT
is used.text_color ([type], optional) – Set to change the text color used for drawing block.text. Defaults to None, when the color is set to
DEFAULT_TEXT_COLOR
.text_background_color ([type], optional) – Set to change the text region background used for drawing block.text. Defaults to None, when the color is set to
DEFAULT_TEXT_BACKGROUND
.text_background_alpha (
float
, optional) – A float range from 0 to 1. Set to change the alpha of the background of the canvas. Defaults to 1 - the text box will be solid.vertical_text (bool, optional) – Whether the text in a block should be drawn vertically. Defaults to False.
with_box_on_text (bool, optional) – Whether to draw the layout box boundary of a text region on the text canvas. Defaults to False.
text_box_width (
int
, optional) – Set to change the width of the drawn layout box boundary. Defaults to None, when the boundary is automatically calculated as the theDEFAULT_BOX_WIDTH_RATIO
* the maximum of (height, width) of the canvas.text_box_alpha (
float
, optional) – A float range from 0 to 1. Set to change the alpha of the drawn text box. Defaults to 0 - the text box will be fully transparent.text_box_color (
int
, optional) – Set to change the color of the drawn layout box boundary. Defaults to None, when the color is set toDEFAULT_OUTLINE_COLOR
.with_layout (bool, optional) – Whether to draw the layout boxes on the input (image) canvas. Defaults to False. When set to true, you can pass in the arguments in
draw_box
to change the style of the drawn layout boxes.
- Returns
A Image object containing the drawn text from layout.
- Return type
PIL.Image.Image
Load and Export Layout Data¶
Dataframe and CSV¶
-
layoutparser.io.
load_dataframe
(df: pandas.core.frame.DataFrame, block_type: str = None) → layoutparser.elements.layout.Layout[source]¶ Load the Layout object from the given dataframe.
Dict and JSON¶
-
layoutparser.io.
load_dict
(data: Union[Dict, List[Dict]]) → Union[layoutparser.elements.base.BaseLayoutElement, layoutparser.elements.layout.Layout][source]¶ Load a dict of list of dict representations of some layout data, automatically parse its type, and save it as any of BaseLayoutElement or Layout datatype.
- Parameters
data (Union[Dict, List]) – A dict of list of dict representations of the layout data
- Raises
ValueError – If the data format is incompatible with the layout-data-JSON format, raise a ValueError.
ValueError – If any block_type name is not in the available list of layout element names defined in BASECOORD_ELEMENT_NAMEMAP, raise a ValueError.
- Returns
Based on the dict format, it will automatically parse the type of the data and load it accordingly.
- Return type
Union[BaseLayoutElement, Layout]
PDF¶
-
layoutparser.io.
load_pdf
(filename: str, load_images: bool = False, x_tolerance: int = 1.5, y_tolerance: int = 2, keep_blank_chars: bool = False, use_text_flow: bool = True, horizontal_ltr: bool = True, vertical_ttb: bool = True, extra_attrs: Optional[List[str]] = None, dpi: int = 72) → Union[List[layoutparser.elements.layout.Layout], Tuple[List[layoutparser.elements.layout.Layout], List[Image.Image]]][source]¶ Load all tokens for each page from a PDF file, and save them in a list of Layout objects with the original page order.
- Parameters
filename (str) – The path to the PDF file.
load_images (bool, optional) – Whether load screenshot for each page of the PDF file. When set to true, the function will return both the layout and screenshot image for each page. Defaults to False.
x_tolerance (int, optional) – The threshold used for extracting “word tokens” from the pdf file. It will merge the pdf characters into a word token if the difference between the x_2 of one character and the x_1 of the next is less than or equal to x_tolerance. See details in pdf2plumber’s documentation. Defaults to 1.5.
y_tolerance (int, optional) –
The threshold used for extracting “word tokens” from the pdf file. It will merge the pdf characters into a word token if the difference between the y_2 of one character and the y_1 of the next is less than or equal to y_tolerance. See details in pdf2plumber’s documentation. Defaults to 2.
keep_blank_chars (bool, optional) –
When keep_blank_chars is set to True, it will treat blank characters are treated as part of a word, not as a space between words. See details in pdf2plumber’s documentation. Defaults to False.
use_text_flow (bool, optional) –
When use_text_flow is set to True, it will use the PDF’s underlying flow of characters as a guide for ordering and segmenting the words, rather than presorting the characters by x/y position. (This mimics how dragging a cursor highlights text in a PDF; as with that, the order does not always appear to be logical.) See details in pdf2plumber’s documentation. Defaults to True.
horizontal_ltr (bool, optional) – When horizontal_ltr is set to True, it means the doc should read text from left to right, vice versa. Defaults to True.
vertical_ttb (bool, optional) – When vertical_ttb is set to True, it means the doc should read text from top to bottom, vice versa. Defaults to True.
extra_attrs (Optional[List[str]], optional) –
Passing a list of extra_attrs (e.g., [“fontname”, “size”]) will restrict each words to characters that share exactly the same value for each of those attributes extracted by pdfplumber, and the resulting word dicts will indicate those attributes. See details in pdf2plumber’s documentation. Defaults to [“fontname”, “size”].
dpi (int, optional) – When loading images of the pdf, you can also specify the resolution (or DPI, dots per inch) for rendering the images. Higher DPI values mean clearer images (also larger file sizes). Setting dpi will also automatically resizes the extracted pdf_layout to match the sizes of the images. Therefore, when visualizing the pdf_layouts, it can be rendered appropriately. Defaults to DEFAULT_PDF_DPI=72, which is also the default rendering dpi from the pdfplumber PDF parser.
- Returns
- When load_images=False, it will only load the pdf_tokens from
the PDF file. Each element of the list denotes all the tokens appeared on a single page, and the list is ordered the same as the original PDF page order.
- Tuple[List[Layout], List[“Image.Image”]]:
When load_images=True, besides the all_page_layout, it will also return a list of page images.
- Return type
List[Layout]
- Examples::
>>> import layoutparser as lp >>> pdf_layout = lp.load_pdf("path/to/pdf") >>> pdf_layout[0] # the layout for page 0 >>> pdf_layout, pdf_images = lp.load_pdf("path/to/pdf", load_images=True) >>> lp.draw_box(pdf_images[0], pdf_layout[0])
Other Formats¶
Stay tuned! We are working on to support more formats.