How to extract text from a document?
Text extraction from documents is one of the most common extracting rules used in AlgoDocs. It is even advised to use text instead of a number for Invoice Number or Account Number fields. The reason behind this is that invoice or account numbers sometimes include letters, in which case a number data type will fail. Although text related extracting rules are widely used, it is very easy to create them in AlgoDocs.
In order to create an extracting rule for text data type we first click on ‘Add Rule‘ and then select ‘Text‘ from the list of data types. The next step is to decide whether the data that we want to extract is always at the fixed position. If this is the case, then we can simply select an area around the data we want to extract by drawing a rectangle around it. If the data might change its position in your documents later, then skip selecting an area and move on by clicking on ‘Extract‘ button.
When we get extracted data from the document we can refine data further until we get the desired output in the required format. Therefore, we can add various filters to the extracted text by using ‘Add Filter‘. For example, if the extracted text consists of several lines and you wish it to look properly by making it a single line, then click on ‘Add Filter‘ ‘Format Text‘ ‘Remove line breaks‘. Moreover, one of the most common issues with text data extraction is the multiple spaces problem. Often our users need to remove multiple spaces between extracted words. In order to overcome this problem click in ‘Add Filter‘ ‘Format Text‘ ‘Remove blank spaces‘ and then select ‘Multiple Blank Spaces‘ option from the dropdown list.
Watch the following video tutorial that covers most of the scenarios related to text data extraction.