How to extract table rows when some cells contain multiple lines in them (merging rows)?
AlgoDocs was designed to handle data extraction from tables of any complexities. One of such complex situations is the case with cells having multiple lines. These multiple lines in a cell are actually a single row. Therefore, AlgoDocs offers ‘Merge Rows’ filter for such cases. Consider an example table below that describes this scenario. The ‘Description’ column contains multiple lines per every row in a table. Therefore, we need to transform these multiple lines into a single line when extracting table rows.
First, we start by placing column splitters as shown below. Then, we click on ‘Extract‘ button.
After clicking on ‘Extract‘ button AlgoDocs will extract all data from the document and turn it into 5 columns table. Then we proceed with defining the beginning and the end of our table rows we want to extract. Therefore, we apply ‘Keep Section‘ filter by clicking ‘Add Filter‘ ‘Alter Rows‘ ‘Keep Section‘ and select ‘With Condition’ option to specify the start and end of the table section as shown below.
Next step is to merge the rows and have all rows begin with the date in the first column. Therefore, we apply ‘Merge Rows’ filter by clicking ‘Add Filter‘ ‘Alter Rows‘ ‘Merge Rows‘. Inside this filter we can use ‘Where column 1 has a value‘ condition which meets our requirements and our final table will look as shown below.
Note that we could use many other ways for merging the rows. For example, the following condition with RegEx pattern could also be used inside the ‘Merge Rows‘ filter.
As a final optional step we can set the column headers of our extracted table by clicking ‘Add Filter‘ ‘Alter Columns‘ ‘Set Column Headers‘.