We are going to aim to process the bank statements of half our clients using full Optical Character Recognition, and the other half using our hybrid system of Optical Number Recognition and Narrative Prediction. We will use NP on those bank statements where the narrative is fairly repetitive, which we call low entropy narratives. We will use full OCR when the narrative is more variable, or high entropy narratives. There are in addition a few rare bank statements where much of the narrative is practically meaningless, for which we will use NP.
There is still scope to develop a better Narrative Prediction system in the area of payments of multiples of 10 or 100 pounds, which we will work on. The OCR system will be customised to be able to deal better with bank statements from specific banks.
It should be remembered that NP is useful for some handwritten records, and it is also the backup system to OCR. Of course, if we are only using OCR on high entropy narrative, it may be objected that NP is less useful as a backup system. The answer to this is that there can be sub-populations of narratives with distinctly lower entropy where NP is still useful.
For the foreseeable future, we expect that an irreducible portion of our bank statement processing will use ONR/NP. The rest will use full OCR, but a significant portion of the rest will rely on NP as the backup system. Let’s guess that these portions will turn out to be about one third each.