Here is a mock bank statement which is typical of what we deal with in the real world. It is easy enough for the computer to pick out the business area because of the descriptions “Balance brought forward” and “Balance carried forward”, but someone has written on the statement and ticked off a few transactions without being too heavy-handed. This really is so typical, being neither the best nor the worst that we will see : Here is the statement just after processing by Able2Extract and transfer to Excel. We used a five-column template : Now we run our own cleanup routine from our computer-assisted blink comparator. Non-business areas are greyed out and obvious errors are highlighted. Note that the computer does not know the order of the transactions or the in and out columns at this stage, but it flags anything consistently wrong in all four combinations : The cleanup routine ensures that dates and numbers are properly formatted. Only dates and the word “Date” can appear in the leftmost column, but misreadings like “Data”, “Dote” and “Dale” are automatically corrected. Only numbers and a limited range of column headings and trailers can appear in the three rightmost columns. The routine is able to read narrative such as “Balance brought forward” and to colour it greyish blue to show that it has recognised it. If narrative like this were missing, the routine would use other cues to work out the business area of the bank statement, but sometimes it would give up and let the accounts clerk decide by highlighting it manually. If the cleanup routine knows that it is a looking at a number in the three rightmost columns, then any letters O, I, Z, S or B will be converted to numbers 0, 1, 2 , 5 or 8 before further processing. Actually Able2Extract is so good that this is rarely necessary. A leading or trailing V or slash / or backslash \ before or after a number will be assumed to be a tick and will be ignored, so the routine has a limited ability to deal with extraneous rubbish. After we run the cleanup routine a second time, the grey area is deleted : It is then up to the accounts clerk to overtype anything on the spreadsheet that is wrong, with the error flagging providing a clue. In this case there are two items to fix and one to delete. Then the clerk runs the cleanup routine again. This time the computer is able to see that everything is in the usual order and it colours the “Paid out” and “Paid in” columns accordingly : We now have a spreadsheet-based facsimile of the bank statement which we can happily read into our main software. The OCR job is done. If bank statements arrive covered in ticks and handwriting, then it will take a bit longer to process them, but never catastrophically longer. This is called “graceful degradation” and we feel that it is an essential feature of any OCR system to be found in an accountant’s office. If the statements are just bashed through without a human reviewer in charge, then the computer will be just too stupid to deal with everything that can go wrong and the result will be farcical, something like the tale of the sorcerer’s apprentice. On the other hand, it is still a lot quicker and less tiring to use OCR and then overtype indicated errors, as compared to typing in the whole lot, or to using traditional analysis pad and pencil, or to using a spreadsheet-based analysis pad. Sometimes we get bank statements where the transaction order is upside-down and the column order is different. This is how the statement would end up on the comparator : This uses brighter colours and fancy column headings to keep the clerk awake. As well as using colour, we try to vary the typeface to provide an additional cue. We preserve the transaction order because it is after all a “comparator” which allows us to check back directly to the original paper statement. Software further down the line can rectify the transaction order and transpose the columns. This is done by looking at the date order and the column headings in the first place, but if these were all missing then the internal logic of the numbers would be used. If we were to deliberately change one of the numbers at the right before further processing, the software would spot it and fix it by reference to the running total. This gives a further line of defence, but there must not be too many such errors or the computer will get confused. Obviously there won’t be many errors after the blink comparator stage, if any at all. There were no ticks or scribbles on this second statement, so its quality is good enough that we can just run the cleanup routine three times in succession without bothering to look at it. We can do this by double-clicking on the routine’s button, so we get three for the price of two. This is as much autonomy as we care to give the computer at this stage to avoid that sorcerer’s apprentice problem, and we will still force the clerk to look at each individual bank statement. The comparator is permitted to repair an item by inspection of the item itself, but not by reference to neighbouring items, which is a human prerogative. Further down the line the software makes repairs by reference to neighbouring items, but by that time the error rate is very low and the computer can be allowed more autonomy. Once data is captured on a spreadsheet, life is easy. We can program the computer to list narratives such as Credit card, Paid in, Service Charges and To Deposit Ac in alphabetical order on a mapping table on a second spreadsheet and then to invite the clerk to code them up. This mapping table is reuseable next year, and has a direct link to the Internet to look up unusual items. In the first year we use a small generic mapping table for a typical business in Carlisle. Any new narratives found in a new year will need to be coded by the clerk, but often there won’t be too many. If we had some bank statements where the use of OCR really was impossible, then we have a fast data entry system where we type in all the numbers in one go, with income items being treated as negative numbers. This single-sweep action generates “datepoints” which cues us to enter the dates of significant or material items, and we then interpolate the rest. We then enter a few narratives, guess the rest using a Narrative Prediction routine, and overtype the guesses that are wrong. We also use this slick futuristic system for credit cards and handwritten records which resemble bank statements. Experience indicates that while a single-item OCR pen can just about outperform typing it in, it cannot match typing plus datepointing. There are other technologies that can compete with OCR. Credit card statements do not have a running total. We could scan them using OCR just the same, but we have decided not to for two reasons. Firstly, few clients provide them, and the clerk might have to re-learn using a different OCR system every time they appear, in which case it would probably be quicker just to type them in by the column. Secondly, having a pool of data which always must be typed in provides a stimulus to the development of non-OCR systems so we are not over-dependent on OCR. We do not expect to be able to use OCR for handwritten records, but we have an alternative system to give us competitive edge. One other point to make is that credit card and cash transactions tend to be more repetitive than bank transactions, so they can be dealt with effectively by Narrative Prediction and the advantage of OCR is less evident. It is conceivable that all the credit card payments are for Diesel fuel, in which case NP would be better than OCR. Generally OCR is still the best technology. OCR has a secondary benefit of stimulating us to use our imagination to look for other systems which can almost compete with it, and which may occasionally outperform it, and which are useful to have as backup systems in a real accountant’s office.