Our websites:

Accessible Curriculum Materials for Students with ASN
You are here
Print friendly version of this page

Scan and OCR

To create an accessible digital copy from a paper book, you scan or take a digital photograph of it, and then convert it into editable text. When you scan or take a photograph of the page you get an image of the page - you can't edit the text, or read it out with text-to-speech software, because it's just a bunch of black dots and lines on the page. The process of converting the scanned or photographed image into editable, readable text is called 'Optical Character Recognition' or OCR.

Scan the paper copy (preferably with a multisheet scanner that produces an 'image' PDF)

OCR to convert the PDF image file into editable text; check and correct any errors

Save as PDF if you want a digital copy that looks exactly like the paper book

Save to Word (DOC) if you want to:

  • edit the font or size to make Adapted or Large Print;
  • make a Braille copy,
  • make a digital LIT or Daisy version;
  • create a synthetic audio version;
  • make a Clicker version - copy and paste from Word to Clicker;
  • make a symbolised version - copy and paste from Word to Communicate: In Print or Boardmaker.
 

Scanners

A flatbed scanner is fine for small quantities but scanning a two hundred page novel one page at a time is very time-consuming so if you plan to scan a lot of material consider buying a multi-sheet scanner. Or, if you have access to a modern networked printer/photocopier you may find it has a scanning facility - you break the spine on the book, separate the pages, stack them on the printer/photocopier/scanner and instead of photocopying them, it scans the pages and produces an electronic file (usually PDF, sometimes DOC). If you don't have access to a multi-sheet scanner you can get the book scanned by a company specialising in document scanning. We have had good results from DDSR in Wishaw: you send them the books and they send you back a CD with the scanned PDF file in a few days, at an approximate cost of 6p per page.

Digital cameras

An alternative to scanning is to take digital photographs of each page: most of the OCR programs can recognise text from camera images. Colour images from photos are higher resolution than those from scanners and so this works particularly well for books with lots of illustrations, like story books for young readers. The main issue with using a camera is positioning it and the book so that the images are consistent. See your OCR software manual or the TopOCR tutorial for advice on OCR from camera images.

OCR ('Optical Character Recognition')

Scanners are often supplied with OCR software but they are pretty basic and if you intend to scan a lot of books you are usually better off buying software that is designed for the task like FineReader, OmniPage or ReadIris. We like FineReader but OmniPage and ReadIris are also good. Which OCR program you get depends in part of on what type of accessible copy you want to make (see the Using Books section for more on different formats) and how much money you have. A few observations:

I want to.. CALL recommendations
..save myself time and hassle
  • Get a digital original from the publisher, author, teacher or lecturer. Scanning takes a long time - don't bother if you don't need to
  • Use reasonable quality paper copies. OCR works best with good quality text - 10 year crumpled worksheets that have been photocopied 25 times before won't come out too well
  • Use a multi-sheet scanner or send the book to a document scanning bureau such as DDSR.
  • Learn how to use Headings and Styles in your word processor or text editor so that you can re-purpose files for different formats.
  • Get a specialist OCR program like FineReader, OmniPage or ReadIris.
  • Consider Dolphin's EasyConverter.
..scan a book or document that's mainly text, with a simple page layout Most OCR programs (see below) do a pretty good job of OCRing simple documents and books into Word or another text editor. Check and edit the text and then save it in your preferred format.
..scan a book or document with a fairly complex layout, with lots of images and text boxes
  • If you want a digital version that looks exactly like the paper version (e.g. for a reader with physical impairment who wants to turn the page on screen), make a PDF. (If you scan complex pages with images and text boxes into Word or HTML you will usually find the formatting is a mess.)
  • If you need to re-design the pages so that it doesn't look like the paper version - for example to make Adapted or Large Print, Braille, synthetic audio or an eBook) then scan into Word or another text editor. Don't scan direct to a PDF. Edit the text and layout in Word. Some of the text may not be recognised by the OCR program so you may have to re-type it manually.
  • In either case, we recommend using FineReader, OmniPage or ReadIris because these programs let you check and correct the text before you convert it to PDF or another format. They are also generally more accurate than the OCR engines built into programs like Acrobat Pro.
..have good control over what gets OCRed Get one of the professional OCR programs like FineReader, OmniPage or ReadIris. These let you manually 'zone' on areas of the page you want to OCR and you can correct any recognition errors. They can also save the scanned document in lots of different formats (e.g. PDF, DOC, RTF, HTML etc). If you intend to scan a lot of materials we strongly recommend using FineReader.
..make a simple 'talking book' for a young reader Use your digital camera to take photos of each page (don't use the flash), import into Powerpoint, PhotoStory, SwitchIt! Maker or another 'slide show' style program, and record your narration of the story.
..spend as little money as possible Use the OCR software that came with your scanner, or buy FineReader, OmniPage or ReadIris, then:
  1. scan to PDF and DOC;
  2. open with Microsoft Word (if you have it) or OpenOffice (if you don't);
  3. edit, make it accessible, and save in your required accessible format.

Scanning & OCR software

Some of the many OCR programs are listed below. EasyConverter, FineReader, OmniPage, ReadIris and TopOCR are programs specifically for scanning and OCR: they do the best job of converting the image into different digital formats and they let you check and edit the text once it is OCRed. If you want to create accessible books in several different formats we recommend using these programs to create PDF books that look exactly like the orginal, and also editable Word files from which you can make Large Print in various sizes, Braille, synthetic audio or a variety of digital formats.

The main difference between EasyConverter and the other OCR programs is that EasyConverter has tools for converting your scanned and edited Word file easily into Large Print, Braille, MP3 and Daisy. EasyConverter can't save as PDF though.

Programs like ClaroRead and Read and Write Gold are primarily text-to-speech packages with OCR built in: the reader OCRs the page and then uses text-to-speech to access it.

Scanning / OCR Programs Comment Approximate cost
Adobe Acrobat X Pro With Acrobat Pro you can scan and create a PDF direct from a scanner. Accuracy is quite good and with the latest version, Acrobat Pro X, you can correct misrecognied words. It's not as quick or as flexible as the specialist OCR programs, but it's pretty good if you want to create PDFs. £80 for Scottish schools, from LTS website
EasyConverter EasyConverter OCRs from a scanner or from digital files (e.g. PDF) into Word and converts into: DOC, TXT, RTF, Large Print, Braille, RTF, audio (MP3 & Daisy). from £890
FineReader 10 Pro Professional OCR program for OCRing from scanner, camera and files. Accurate; recognition errors can be corrected. Saves as PDF, RTF, DOC, HTML etc. Free demo copy from Abby web site. £60 single user from LTS.
Microsoft Office Document Scanning Basic OCR supplied as part of MS Office; scans into Word. supplied with MS Office
TopOCR Free OCR software designed to OCR images from cameras. Free
OmniPage 17 Professional OCR program for OCRing from scanner, camera and files. Accurate; recognition errors can be corrected. Saves as PDF, RTF, DOC, HTML etc. £45 for Standard version, about £180 for Pro version
ReadIris 12 Professional OCR program for OCRing from scanner, camera and files. Accurate; recognition errors can be corrected. Saves as PDF, RTF, DOC, HTML etc. Free demo copy from Iris web site. about £100, single user
ClaroRead Plus Incorporates OmniPage OCR to scan books and OCR files into Word. ClaroRead has text to speech and other tools to support reading and writing. £159 single user
Kurzweil 3000 Scan and OCR from books and files. Kurzweil saves as text or in it's own KES  format, which looks like the original page (i.e. like PDF), but you need the Kurzweil Reader (£185) to open the KES files. £725 single user
Read and Write Gold 9 Incorporates FineReader OCR to scan books and OCR files into Word/PDF/HTML. Read and Write Gold has text to speech and other tools to support reading and writing. £320 single user

Quick Guides on scanning

Books for All How-To Video Guides