We would like to use optical character recognition on all our bank statements, but we often find problems in doing this. The bank statements may have ticks or scribbles which can wreck a standard OCR system. Usually there are just enough problems to make it awkward, but not enough to induce us to abandon OCR. Sometimes we can get long runs of small transactions, for which OCR is practically compulsory, but not everything is perfect. There is a threat that whatever we do will turn out to be wrong.
What we do is to display each bank statement as clearly as possible on a spreadsheet after scanning, and invite the clerk to look over the display and compare it to the paper bank statement. Humans are good at pattern recognition and we might as well make use of this ability. We call this the blink comparator after the device which was used by the astronomer Clyde Tombaugh to discover the planet Pluto in 1930.
We can now get the computer to “look” over the statement as well and to highlight any running totals that are obviously wrong. Remember that we do not know the order of the Paid out and Paid in columns, and we do not know the date order of the transactions, which could be in reverse order. We can only say that a running total is obviously wrong if it fails all four tests which we do for the various combinations. This leaves the theoretical possibility that a running total will be wrong, but still pass an inappropriate test, which we will call a false-positive. These are likely to be rare unless there is some pattern to the transactions, such as every item of income being matched by an item of expenditure.
So we get the Computer-Assisted Blink Comparator to flag up obvious errors, and the clerk to fix them by visual inspection. After that the data will be reliable enough to leave it to the computer to fix it if there are really any false-positives. Just to give some overlap, the CABC executes a second stage where it works out the apparent order of the transactions, and then highlights any detected false-positives, or any other errors in the running total that come to light.
To debug or demonstrate this system, the CABC at the first stage only checks running totals to the nearest three pence. Generally if it’s wrong it will be spectacularly wrong, and never wrong by just a penny or two. We can then introduce deliberate errors of a penny or two, and show the second stage CABC in action.
There are other OCR systems on the market. In the ideal case, they are likely to be able to outperform ours. However, in most real cases, once there are a few infelicities in the bank statements, we are expecting our system to be the winner. At one extreme, we could bypass OCR and just type the bank statement in its entirety directly onto the CABC before proceeding. This makes it clear that OCR is now demoted to an auxiliary system and we have some choice.