Every accounts clerk must have had the experience of finding an indecipherable scribble on a cheque book stub. After a while, you realise that the payee is your employer!
An optical character recognition system is never going to be able to deal with this situation. There are OCR systems which claim to be able to recognise handwriting. We think that they will be not worth the bother. Come back and ask us again in a hundred years’ time. For the moment, OCR is for printed matter.
Some clients keep all-handwritten records which are pretty good, almost like a handwritten bank statement. Other clients annotate the printed bank statements so heavily that OCR will be useless, even in its blink comparator form. A5-sized statements won’t go through the sheet feeder, so they might as well be handwritten. Gringotts Bank issue handwritten statements anyway.
The key to processing handwritten information is to do it by the column, which should at least mean some economies of scale. We do all the numbers, followed by the dates, and then the narratives, in that order. We type in the numbers manually, but we have a single-sweep system which normally works in debit-positive convention. If we type in a negative number, it is interpreted as income and moved to the “paid in” column automatically. The system can work in credit-positive convention as well.
The time at which we type in the number is also recorded, and if we take a long time, then a “datepoint” is generated. After we complete a page, we check that the running total is in agreement if this is possible, and reach for the new page. By the time we have done this, the datepoint time will have been exceeded, so we always get a datepoint set at the top of each new page. Other datepoints will be set for any material items on that page. Datepoints which get set accidentally or after interruptions do no harm.
After entering all the numbers, we can jump through all the datepoints and enter the dates manually, and our software helps us to do this. All the missing dates can then be entered by interpolation, which is good enough for items which are not material.
When we come to enter narratives, we enter the first page or two, and then run a Narrative Prediction routine which guesses the rest. Sometimes this gets it all right first time, and sometimes the result is disappointing, but usually this is a useful thing to do. We then overtype the narratives that were guessed wrong. This overtyping still has the assistance of our super-autocomplete system, or alternatively of our teachable onscreen toolpad (we first teach the system to generate required narratives, possibly based on notes made from last year, and then switch off teachability so we can overtype narratives).
Narrative Prediction is a principal alternative to OCR. It spots the standing orders first of all by looking at their regular amounts, and enters the relevant narratives. It then guesses the other narratives using statistics, working on a logarithmic scale. If a variable payment is made at the same time every month, then at least in theory Narrative Prediction will spot this as well, but we are not claiming too much for this. Generally Narrative Prediction works well when populations of transactions are easy to tell apart. On the income side, if all large receipts are sales income, but all small receipts are bank interest, then Narrative Prediction will operate correctly. However, a few receipts of 5,000 or 10,000 which turn out to be capital introduced may not be spotted.
Another reason why OCR may be useless may be simply the state of the bank statements. If they arrive heavily stapled or crinkled, then we may just use manual data entry as above. We could think of photocopying everything and scanning the photocopies, but by the time this is done, they might have been entered manually in less time anyway, simply because our manual systems are so fast. One alternative is to scan the statements with an OCR mouse scanner, which we are still working on.
We have our own looseleaf bookkeeping system which of course we designed on a spreadsheet. We can give this to our clients in order to encourage them to keep handwritten records in a standard format, so we can easily come along and process everything by the column. Any bank statements can be attached to the back of this system using the treasury tags provided. Clients only need to keep records for cash transactions.
Sometimes we get a mixture of regular bank statements and faint computer print-outs, the latter usually in inverted format and overlapping. We can use a mixture of OCR for the regular statements, and columnwise typing for the awkward bits. We want a robust OCR system which does not fail catastrophically when conditions are less than ideal, and now we have one. We use Narrative Prediction for everything, even when we could scan the narratives after all, simply because this is less vulnerable to poor scanning conditions. This is the right technology to be working on as of 2017 in our judgement, and we welcome competition from anyone who disagrees with us.
Having OCR also stimulates the imagination. We often ask ourselves how we could have something which is almost as good as OCR when OCR is impossible. We do find useful answers to this question. Working with OCR, we pick up some programming knowledge on Visual Basic for Excel. We then apply that knowledge to non-OCR systems.