This is written for accountants and anyone interested in new technology. We aim to use the very latest technology to prepare accounts and tax returns. In summary, we process bank statements using a combination of optical number recognition and our own system of narrative prediction and correction, which we find works best when the bank statements we receive are less than perfect. Wouldn’t it be nice to have a magic wand that we wave at a pile of bank statements, and they appear at once copied onto our spreadsheet? After that, we can easily write a computer program to turn the information we get into a set of accounts. We do have such a program, and it can code up transactions by re-using last year’s coding table plus updates typed in manually by the accounts clerk, which never takes long. In the first year we use a generic coding table for a small business in Carlisle. Basically, once the data has been captured on a spreadsheet, we’re laughing. This gives us plenty of incentive to look at optical character recognition for scanning bank statements. The trouble here is that real bank statements tend to be faint photocopies done on a cheap printer. Some statements have been replaced by even fainter computer print-outs which are upside-down. The client has helpfully written on some statements, and has “highlighted” some transactions with a dark marker pen. There are a huge number of small transactions so typing it in is out of the question if we have to meet a budget. Some narrative is meaningless and there are a few cheque payments, though not so many these days. There are coffee stains and other marks to contend with. Welcome to the real world. To scan the statements we use Able2Extract OCR software with a preset pcvt file, but only on numbers and dates so it’s a bit easier. After the scan the statements are displayed on a blink comparator (as used by Clyde Tombaugh to discover Pluto) so they can be manually corrected, making use of the human ability to compare patterns quickly. The blink comparator does not know at this stage if the transaction order is normal or inverted, or if the paid out and paid in columns have been transposed, so it does four tests on any running total it finds, and highlights the running total if all four tests are failed. The human user can then make corrections. This is not perfect in theory, but it will lead to a low error rate in practice. Our own software can make subsequent corrections by reference to the running total as long as the error rate is low, which it will be by now. It can detect automatically if the transaction order is inverted and if the columns have been transposed, either by looking at dates and column headings, or as a backup by looking at the internal logic of the numbers. Errors in income and expenditure items show up as single errors which allow their correction. Errors in the running total itself show up as double or reversing errors so they can be corrected as well. Our software corrects for unambiguous errors first of all, and then makes a best effort at ambiguous errors which can be checked by a human supervisor. Dates are a bit of an afterthought because if some dates are missing, we can always fill the gap by interpolation. Our system just rejects unscannable dates, or dates still in the future after deducting a year, or dates more than four years (1,461 days) old. We really only need dates to determine if a bank statement is in inverted format, and even then we can just analyse the internal logic of the numbers as a backup. We will need to start with narratives by typing them in, and Excel’s autocomplete system can help out here. We also have a super-autocomplete system so if we have a mixture of awkward narratives like “Accommodation in Leeds” and “Accommodation in Manchester”, we can enter this at speed using automatically reprogrammed f1 and f2 keys. Many narratives are quite repetitive and it is not unusual to find that all income items have the same narrative, so we can often go quickly. After one month or so of typing in, we can run a Narrative Prediction routine which predicts the narratives for the remaining eleven months using statistics. We then only need to overtype the predictions that are wrong, and we still have super-autocomplete to help out. After two months we can run the NP routine again, and this time it will pick up any standing orders and re-enter them for the remaining ten months. Our NP routine also has a limited ability to spot variable payments made at the same time every month, which we won’t make too much of. The NP routine does not look back more than six months when doing its analysis, and it can be run again whenever the clerk takes a break. As an alternative to super-autocomplete, we can permanently change the function keys f1…f10 on the keyboard to generate narrative. For example, f9 is always some sort of motor fuel expense. It is “Petrol” by default, but we can easily change it to generate “Diesel Fuel” or “EV Charge” or some custom narrative. This change will be remembered next year, so to some extent, every one of our clients has their own private keyboard. Many of the bank statements we get seem to fall into the awkward category, but our systems will go on working when simple OCR will fail. This is called graceful degradation and it’s what we need in a real accountant’s office. We are happy to trade off some ideal-case performance in favour of the ability to do a standard job in a standard time in all real cases. Some bank statements arrive torn or stapled to death. Supposing that OCR is impossible to use, we can enter numbers using a handheld numeric keypad working in debit-positive convention, and there is a timer which adds “datepoints” if we are slow, as we would be at the start of each new page. We can then enter dates by jumping through the datepoints, and we only need to click on a button to enter the day, with month and year being copied down automatically. We still have Narrative Prediction which is invulnerable to unscannability. NP can be used also for other types of data entry such as with handwritten records, and we regard NP as an essential part of the toolkit which gives us OCR-like performance when OCR itself cannot be used. As an example, if the client pays the rates by cheque, and the cheque book stubs have some scribble like C@r1i$1e C17y C0unc11, then NP will guess what this says by reference to the repetitive amount of the payments, while OCR will be a complete failure. It’s nice to have more than one high-technology way of doing things. Sometimes all we get is a pile of invoices and we will just have to type them in. Our normal method of entering dates assumes that each new date is later than the previous entry, as it would be if the invoices were sorted by date. This means that if we type in the dates “15/3/17” followed by “1”, what we will see is “15/3/2017” followed by “1/4/2017”. There is another mode we can use if the client has already batched the invoices by month (in a concertina folder for example). In this mode the month is locked, so what we would see is “15/3/2017” followed by “1/3/2017”. If we enter “1+” then the following month is indicated. If we enter “1-” in any mode then the current month is indicated. We enter “1*” to switch between modes. This by itself may seem a bit trivial, but with lots of little improvements like this, data entry proceeds quite a bit more rapidly when we are working on an industrial scale. We also have another method of entering dates “playing at long stop”. We can run a routine which cues for the entry of the first date, any dates of material transactions and the last date. After this it estimates the dates of non-material items by interpolation. If most of the dates have been scanned in already by OCR or entered by datepointing, then this routine will be needed only for a few missing dates. With experience, about two thirds of our work can use the scanner, with the other third requiring manual data entry. Of the scanned data, about one half allows dates to be scanned, while the other half does not, so we use datepointing on two thirds of jobs altogether. We use NP for everything so there is not much of a cliff edge to fall over if OCR cannot be used. Our non-OCR systems are fast enough that it isn’t that bad if we cannot use OCR, so we have everything marked up to warn next year’s clerk on what system to use to “hit the ground running”. With our emphasis on graceful degradation and the many backup systems which we have, we feel that our hybrid OCR/NP system will normally outperform a pure OCR system in everything but a few ideal cases. Well we are bound to say that since we invented it, but we are happy to test what we say in the marketplace in competition against pure OCR systems adopted by other accountants. Let’s have such a competition, which will be to everybody’s benefit. Lots of other people have thought of OCR and there are software systems to scan bank statements on the market. What is special to us, we believe, is the blink comparator and NP and the private keyboard. Once we have captured everything on a spreadsheet, coding up can re-use last year’s work plus incremental changes, and working papers can be generated as a by-product. Because our systems are so quick, we now have a new policy of getting client data transferred to a spreadsheet as fast as it comes through the door, which we call “EDGE” for Early Data Grabbing with Enthusiasm. We can then contact the client quickly if anything is missing. After that we triage our work, but we can react quickly if anything suddenly acquires additional urgency. Edge and triage help to deal with that vague feeling of worry which plagues public accountants. Edge can of course be implemented by junior staff who will often just be “faxing” bank statements onto a spreadsheet, and it is a sensible replacement for the booking in of client records. Senior staff on higher chargeout rates then see an all-electronic job. Let’s just consider the principal case against OCR, apart from its novelty. There is a saying that “time spent in reconnaissance is seldom wasted”, and sometimes we need to resort to colour-coded ticking off to make sense of client records. More often, the bank analysis is the reconnaissance phase, and if we do it too quickly using OCR, then we have not necessarily achieved anything since maybe things are just a little too streamlined. For this reason, given the speed of our non-OCR systems, there is now a lack of pressure to continue development work on OCR. The use of OCR will be essential for some jobs, but often nothing is lost if we are just too lazy to walk over to the scanner. We can then rationalise everything quite neatly. Use OCR for tedious stuff like numbers and dates, and get junior staff to deal with it. Use NP for narratives, but get senior staff to do it. We then have a balanced usage of new technology. We can modify this a little in February and March by making junior clerks do narratives as well since they have nothing much else to do, while in December and January they only do numbers and not dates or narratives to check at an early stage that the run of bank statements is complete. This helps to spread work and stress more fairly around the year. Suppose that all we get from the client is a pile of invoices. We first sort them by the month (they may be sorted or part-sorted already), and put invoices outside the period on one side. Note that we do not sort the invoices by the day. For the first invoice to type in, we type in the full date, narrative and amount, and then switch to “instant dates” mode where all dates are assumed to fall in the same month. For the next invoice, we just type in the day, and the month and year will be copied down automatically. For a date in the next month, we type in something like “1+” to change the month. As a rule, then, we only need to type in the day. In a pile of invoices we can expect that many will be for motor fuel, say Diesel fuel, some for a mobile phone, say Vodafone, and there will be a couple of regular suppliers like Toolstation Carlisle and Wolseley Centres. We can easily reprogram the function keys on the keyboard so f5=Toolstation Carlisle, f6=Wolseley Centres, f7=Vodafone and f9=Diesel Fuel, and these reprogrammings may already be there inherited from last year for the particular client. We use these function keys to generate the narratives for the commonest items found, and just type in the narrative for rarer items, with basic autocomplete still working. The amount just has to be typed in. There’s nothing new there. Generally with a pile of invoices, there just won’t be that many of them, and with a bit of technological assistance we can proceed quickly. One thing we are aiming for is that the accounts clerk, faced with a pile of invoices, will go for the f9 key as a reflex action whenever a garage receipt is found, whether it be petrol or Diesel. They have a characteristic appearance which should mean a quick response. If the client gives us the handwritten equivalent of bank statements, then we would type them in by the column, with all the numbers first, and then the dates (we have a 31 button toolpad and usually only need to enter the day after the first entry). After typing in a few narratives we can use Narrative Prediction, which is often right first time with this type of record-keeping. We can use overtyping, assisted by the private keyboard, to make corrections. Again there are not likely to be that many entries. If we do the VAT returns, then when we come to do the accounts all the data has already been captured on our system, a benefit which we recognise by giving a discount to our client. When we do a VAT return in the quarter between February and April, we can at the same time do some of the accounts preparation work as such, which our system allows us to do without duplication of effort. This helps to spread the workload around the year and is a benefit that we had not foreseen beforehand. Sometimes a VAT return is based upon a pile of invoices which need to be typed in. We have a system where, if the invoices are batched by supplier, then for each batch we only need to type the supplier’s name once, and it is automatically copied down. We only need to type the day, and the month and year are copied down. We only need to type the consideration, and the value and VAT are calculated for us. Sometimes the VAT will be wrong by a penny or so, but we have a nudge feature to amend it. We have tried scanning invoices with an OCR pen, but the results are disappointing. Finally, if there’s one thing we really want to shout about, it is GRACEFUL DEGRADATION. We believe that we now have a full intermediate layer of technology between OCR and older ways of doing things, which nobody else has. We have an Intranet-style guide to assist the clerk in deciding quickly what sort of client records we have, and how to process them efficiently. The clerk would have a link to this guide on his or her computer desktop. Sometimes we do indeed just have to type it all in, but there’s no need to dither. The research we have done on OCR has often felt like chasing a will o’ the wisp or the other end of the rainbow. We feel that we have now actually found our pot of gold, and are very pleased with the outcome of our efforts. In human terms, processing a big pile of bank statements by traditional methods can leave the clerk feeling quite worn out, but with this system it is fast and straightforward. Our use of new technology enables us to quote fixed fees with confidence. The speed of our system enables us to respond quickly if anything needs to be done in a hurry. We can also talk about free bookkeeping when all the transactions are through the bank account. In this case we would prefer clients not to do any bookkeeping. With our system we can rapidly assemble a director’s current account and charge interest at the official rate (currently 3%) on a daily basis if it is overdrawn, subject to holding meetings of the shareholders and directors to agree to do this. This type of interest, however, may not be chargeable retrospectively, so it would be best to see us as soon as possible. All our hire purchase calculations use the actuarial method so any cash flow statements we do are accurate and we can deal with early terminations of the loan. Other websites which we support deal with Your New Company and Advice on Deadlines. We also have a mobile-friendly website and we are experimenting with AMP. With several websites, we can experiment with new ideas, which has the double benefit of self-promotion and of putting us in a better position to be able to advise our clients. We don’t mind admitting that with some of the things we do, we are not too sure why we are doing them, but we just want to explore the system. The mobile-friendly website is pitched at mobile phones and interaction with search engines. The company website is pitched at tablet computers as well as mobile phones, and at interaction with QR codes. Some of this activity could turn out to be a waste of time, but other activity will surely have benefits.