Guides‎ > ‎

OCR with MS Document Imaging

Scanning normally means converting a piece of paper into an image on the computer. To convert an page (or an image of one) to an editable text document, you need to perform OCR on it.. OCR stands for Optical Character Recognition, and is analogous to what we do when we read a document. 

Most OCR software is commercial and expensive (upwards of £70). Some scanners come with OCR software, of varying quality, and there are some free OCR packages available. However, Microsoft Office 2003, includes a good if basic OCR package which is  very easy to use, as explained below.

1. Start up the Microsoft Document Imaging software. This usually resides in the Start menu under All Programs=>Microsoft Office=>Microsoft Office Tools=>Microsoft Office Document imaging as shown in the screenshots below.



2. Click on the scanner button, circled and marked as"1" in the screenshot below.
3. Place the first page on  your scanner and click the [Scan] button ("2" in the same screenshot). The program will show you a preview of the scanned page and indicate the progress with a "thermometer".

4. When the first page has been scanned, you will be shown the following screen. If the document has more than one page, place the  next one on the scanner and click [Continue]. 

5. Repeat steps 3 and 4 above until you have scanned all the pages, then click [Done].

6. After a short delay, you will be presented with a preview of the document. On the left, you will see a small image of each of the pages you have scanned, and when you click on the image, you will see a larger picture of it on the right. The text will have already been recognised (OCR'ed). 

Now simply click on "Send Text to Word" button, circled in the screenshot below, and a new document will be opened in Word with the recognised text. 

NB: only basic formatting features will be preserved. Bold and italics may be recognised, but tables, columns and other features will probably not be converted properly. There may also be some mistakes, but Word's spell checker will make it easier to identify and correct those.



Comments