Guides‎ > ‎

Data conversion

The products listed below convert data files from one format to another. Some them are free while others cost money. Some are client based (i.e. software to be installed on your computer) hwile others work online. 

While these may be of use to you, I cannot vouch for their suitability to your needs, nor for the quality of the conversion.

Online Services (general)

There are several websites which provide a service of  converting between a wide range of formats. The sites below all offer the basic service for free, though often there is a charge for a high volume of conversions. Zamzar is the best known of these sites and I have used it to great effect before, but this should not put you off trying the other sites too.

Graphics

You can use either Irfanview or  XNView for this purpose. Although both products are primarily Viewers/editors for bitmapped graphic files, they support a wide range of formats and have batch conversion facilities built in. Both products are free.

PDF (to Word etc)

PDF is primarily a display format than a data format. By this, I mean that it is designed to show (or print) the contents very accurately, but the internal structure of the document is often lost. In fact 'security features' built into the PDF standard allow the author to prevent copying of the contents and even to limit printing. With that in mind, the process of converting a PDF file into an editable Word document (or indeed any other editable format) is far from straightfoward. There are many programs which attempt to do this, but none do it perfectly, and only a few do it well enough - frequently enough - to make it worthwhile enterprise. The most viable options are:

  • Obtain the original file if you can.
  • Use Adobe Acrobat (Standard or Professional). Quite good, but very expensive.
  • Use Abbyy PDF Transformer (around £56) or one of the Nuance PDF Converter products (from £50 up). These are both very good at producing an editable document which retain the general layout, but documents with a complex layout (e.g. forms) often require further tweaking in Word to make them truly usable.
  • Use  a good quality OCR program, such as Abbyy FineReader or Nuance OmniPage to 'read' the document. Some OCR packages will even support PDFs directly (that is, they have a command to convert PDFs to editable text).
  • Use an online service such as http://www.pdftoword.com/, http://www.cometdocs.com/index.htm or http://www.ocrterminal.com/ Upload the PDF and download the converted file.


eMail

Transcend

Commercial emil migration tool. Can convert between a wide range of email formats, including PST (Outlook 'Personal folder' format) and MBOX. Basic tool costs $49.95, trial version available.

Download from: http://www.transend.com/products_transend_migrator.asp

Recovery Toolbox

Tool for converting between different versions of PST files (97- 2002, 2007 etc), but also recovers corrupt PST files, and can export messages as txt. and .eml files. $49.90 for 'personal' version.

Download from: http://www.convertpstfiles.com/

AidToMail

Commercial piece of software which claims to do a better conversion job than most (more accurate resutls, easier to use etc.). Prices start at $19.95.
Download from: http://www.aid4mail.com

OCR - Optical Character Recognition

When you scan a document, whether with a scanner or a digital camera, all you are doing is taking a picture of it. The computer does not know or care that there is text on the page, and the scanned image is a picture just like a portrait or a still life photo. In order to make the text editable (say, in a word processor, spreadsheet or email program), you need to convert the picture by means of  OCR software. This software recognises the text and 'reads it' into a document. This is a lot more difficult for a computer to do that you might imagine (then again, we spend years how to read, and we still struggle at times!), and things that to us seem trivial (the page being a little askew, uneven lighting, or specs of dirt on the page) can confuse all but the best software. Furthermore, the range of fonts, text sizes and layout options (columns, tables, white text on coloured background etc) also present a range of challenges. Most OCR software also lacks the 'common sense' to tell it that the vertical stroke at the end of the word 'pal' is a lower case L, but the visually identical character in a date is probably the digit 1.

As a result, the best OCR products are indeed very good, but they usually cost money and are often quite expensive. There are many other OCR products out there, but they are often so limited, or poor in quality, as to be of little use. 

Abby FineReader

In my view, Abby make the best OCR software in the world, bar none. Not only is it very accurate, but it is quite easy to use, and supports a wide range of languages, including those in the Cyrrilic alphabet. Prices start from £65. Abby make related OCR products, including a 'light' version for the Mac (£35) and a 'PhotoReader' product (£36) which OCRs photos taken with a camera, but does not support desktop scanners.

Nuance OmniPage

The OmniPage products (there are a few versions available) are currently the leaders in the OCR market. There are products for Mac & PC, and they are all without a doubt fine products. Prices start at around £130.

I.R.I.S ReadIris

I.R.I.S is a company that sells a range of products for digitising paper documents, including scanning pens, business card scanners and a variety of software products. Their main OCR product, ReadIris Pro (form £110), is geared primarily towards fast scanning and other 'industrial' features. While it will certainly be useful for some people or companies, I would not recommend getting this product without ensuring this is exactly what you are looking for.

Online OCR services

There are a number of websites which you can upload your scanned images to, and which will OCR it and give you the converted result. By definition, these products are a lot less flexible than desktop products such as those listed above, but they can be useful if you only scan documents very ocassionally.

http://finereader.abbyyonline.com - From the makers of Abbyy FineReader, this is the best online OCR product.To use it, you need to buy 'credit points' (the first 20 pages cost £3, larger purchases are cheaper). 

http://www.ocrterminal.com/ - similar in many respects to the Abbyy product above, this service charges $0.09 per page for the first 50 pages, and a lower rate thereafter.

http://www.ocronline.com/ - with OCRonline, you get a free allocation of 10 pages per week. The program supports a large number of languages, in Latin or Cyrillic alphabets.

http://www.onlineocr.net/ - similar to the other services, but includes a relatively generous free allocation of 15 images per hour. If you require more, you need to buy credits.
Comments