class GridTableParser(TableParser):
Parse a grid table using parse()
.
Here's an example of a grid table:
+------------------------+------------+----------+----------+ | Header row, column 1 | Header 2 | Header 3 | Header 4 | +========================+============+==========+==========+ | body row 1, column 1 | column 2 | column 3 | column 4 | +------------------------+------------+----------+----------+ | body row 2 | Cells may span columns. | +------------------------+------------+---------------------+ | body row 3 | Cells may | - Table cells | +------------------------+ span rows. | - contain | | body row 4 | | - body elements. | +------------------------+------------+---------------------+
Intersections use '+', row separators use '-' (except for one optional head/body row separator, which uses '='), and column separators use '|'.
Passing the above table to the parse()
method will result in the
following data structure:
([24, 12, 10, 10], [[(0, 0, 1, ['Header row, column 1']), (0, 0, 1, ['Header 2']), (0, 0, 1, ['Header 3']), (0, 0, 1, ['Header 4'])]], [[(0, 0, 3, ['body row 1, column 1']), (0, 0, 3, ['column 2']), (0, 0, 3, ['column 3']), (0, 0, 3, ['column 4'])], [(0, 0, 5, ['body row 2']), (0, 2, 5, ['Cells may span columns.']), None, None], [(0, 0, 7, ['body row 3']), (1, 0, 7, ['Cells may', 'span rows.', '']), (1, 1, 7, ['- Table cells', '- contain', '- body elements.']), None], [(0, 0, 9, ['body row 4']), None, None, None]])
The first item is a list containing column widths (colspecs). The second item is a list of head rows, and the third is a list of body rows. Each row contains a list of cells. Each cell is either None (for a cell unused because of another cell's span), or a tuple. A cell tuple contains four items: the number of extra rows used by the cell in a vertical span (morerows); the number of extra columns used by the cell in a horizontal span (morecols); the line offset of the first line of the cell contents; and the cell contents, a list of lines of text.
Method | check_parse_complete |
Each text column should have been completely seen. |
Method | mark_done |
For keeping track of how much of each text column has been seen. |
Method | parse_table |
No summary |
Method | scan_cell |
Starting at the top-left corner, start tracing out a cell. |
Method | scan_down |
Look for the bottom-right corner of the cell, making note of all row boundaries. |
Method | scan_left |
Noting column boundaries, look for the bottom-left corner of the cell. It must line up with the starting point. |
Method | scan_right |
Look for the top-right corner of the cell, and make note of all column boundaries ('+'). |
Method | scan_up |
Noting row boundaries, see if we can return to the starting point. |
Method | setup |
Undocumented |
Method | structure_from_cells |
From the data collected by scan_cell() , convert to the final data structure. |
Class Variable | head_body_separator_pat |
Matches the row separator between head rows and body rows. |
Instance Variable | block |
Undocumented |
Instance Variable | bottom |
Undocumented |
Instance Variable | cells |
Undocumented |
Instance Variable | colseps |
Undocumented |
Instance Variable | done |
Undocumented |
Instance Variable | head_body_sep |
Undocumented |
Instance Variable | right |
Undocumented |
Instance Variable | rowseps |
Undocumented |
Inherited from TableParser
:
Method | find_head_body_sep |
Look for a head/body row separator line; store the line index. |
Method | parse |
Analyze the text block and return a table data structure. |
Class Variable | double_width_pad_char |
Padding character for East Asian double-width text. |
Start with a queue of upper-left corners, containing the upper-left corner of the table itself. Trace out one rectangular cell, remember it, and add its upper-right and lower-left corners to the queue of potential upper-left corners of further cells. Process the queue in top-to-bottom order, keeping track of how much of each text column has been seen.
We'll end up knowing all the row and column boundaries, cell positions and their dimensions.
scan_cell()
, convert to the final data
structure.