We are keen on the introduction of optical character recognition with a view to scanning bank statements. There are a number of OCR devices on the market, such as pens, mouse scanners, wand scanners and conversion software for flatbed scanners. If you buy and try any of these devices, you will probably notice quite quickly that the device works quite well some of the time, but it has a tendency to go haywire and produce utter rubbish. Why on earth would an accountant be interested in anything like that?
Our starting point is Able2Extract, which is a decent piece of OCR software which can scan a bank statement, and comprehend like a human being that it is looking at tables of information, just the thing for transfer to a spreadsheet. All the other OCR software we have looked at will fail at times to process the overall form of the table correctly.
Able2Extract will output a pile of bank statements to a multi-page spreadsheet with one statement per page. We then have a post-processor written in Excel Visual Basic for Applications which looks at this spreadsheet. As you select each page, our software will tidy it up before your very eyes to make it readable. This happens in a fraction of a second.
Some bank statements have the paid in and paid out columns swapped, and some may even be upside down. Our software can detect this by a variety of methods. For example, it first looks through the statement like a human reader to see if there are three recognisable column headings. If it finds them, then it knows at once the ordering of the columns. If this fails, then it looks for three columns of numbers, and then goes back to look for any two out of three column headings. If this fails, it will do numerical analysis on the columns of numbers themselves. If this fails, then it will simply count numerical entries and assume that the column with most entries is the paid out column. The operator will notice if this is incorrect either by a display on a button, or by the many highlighted corrections that will be made.
This is massive defence in depth, and the combination of Able2Extract and our post-processor is the solution to the haywire problem. In fact, if a saboteur attacks the bank statement with Tipp-Ex, our software should go on working as best it can, though it will eventually fail if the entire statement is blanked out! If the saboteur converts number 3 to an 8 with a black biro, our software should detect that the number is wrong and correct it, highlighting the correction for a human operator. It will not be able to cope with sabotage on a massive scale, but it will go down fighting.
David Porthouse & Co is a firm of accountants based in Carlisle in North West England. We have a keen interest in new technology with the aim of speeding up accounts production and making accountancy more affordable for our clients. We are pioneers in the introduction of automated processing and optical character recognition.