[Blindmath] Summary - Extracting bitmap images from pdf files

Sat Jan 28 20:42:00 UTC 2012

The purpose of this post is to summarize what I have learned about
extracting images from pdf files during the conversations on this topic in
a similar thread over the past few days.

The primary objective was to find a way for blind students to extract
pictures as individual bitmap images from pdf files that are provided to
them as electronic copies of their textbooks.

Unfortunately, it seems unlikely that blind students can successfully
accomplish this task without sighted assistance. (Perhaps organizations of
blind students should put pressure on Adobe to rectify the situation.) I
only found two ways that it might be possible for a blind student to
accomplish the task, and both are fraught with problems.

The most promising way is a procedure suggested by John Gardner of
ViewPlus. To make a long story short, this involves a several-step process
involving an IVEO system including a touchpad and the Creator Pro software,
which I believe is an extra cost item over and above the basic IVEO system.
I'm not certain if a Tiger is also required, or if some sort of
less-expensive, printer-based embossing system, such as swell paper, would
suffice.

Over and above the cost, this approach has its own set of problems. In
particular, it requires the student to first read the pdf document and to
identify the pages on which the images appear. In the physics textbook that
I am working with, the location of a Figure may be on an entirely different
page from the reference to the Figure in the text. In addition, the book
contains numerous images without Figure numbers and/or captions. The
student is supposed to be able to associate an image to the related text
simply by the physical proximity of the two. A sighted student can probably
succeed in doing this in most cases, A blind student may not be able to
succeed in many cases.

Another approach is to purchase Acrobat Pro from Adobe for $445, which the
Adobe tech support person claimed will extract images intact from a pdf
file. (All of the free approaches that I tried extracted a few images
intact but extracted most images in tiny bitmap files that must be
reassembled to create the image.) Even if Acrobat Pro will extract images
intact, I'm not certain that this will make it possible for blind students
to extract and emboss those images. The textbook pdf files that I am
working with contain thousands of bitmap images that are placed in and
around the text solely for cosmetic purposes. These are thing like arrows,
exclamation marks, etc. If Acrobat Pro will extract all of the images in a
chapter intact, the student could expect to end up with hundreds of images
files for every chapter. Even a dedicated sighted person would have
difficulty sorting through all of those files trying to separate the wheat
from the chaff.

Therefore, as a practical matter, barring the discovery of some technical
capability that I have been unable to identify so far, blind students
probably cannot successfully extract bitmap pictures from the pdf versions
of many textbooks without sighted assistance.

SIGHTED ASSISTANCE
There are several approaches available by which a sighted person could
extract the pictures from the pdf file, but for the most part, they all
involve a labor-intensive procedure using an image editor to crop small
bitmaps out of large bitmaps and to save the small bitmaps in individual
files.

The website at http://www.zamzar.com/ will accept an uploaded pdf file and
send back a set of jpg image files, one for each page in the pdf document.
Jamal has also indicated that he may be able to provide a stand-alone
program that will convert a pdf file to a similar set of jpg files, one for
each page in the pdf document.

The sighted assistant can use either of these two approaches to obtain the
set of jpg files.

Then, using the original pdf document along with a pdf reader such as
Acrobat (free) as a guide, the sighted assistant can open the page-image
files in any of many free image editor programs, such as LView. (At least
it was free when I got my copy.)

Having opened the page in the image editor, the assistant can then draw a
rectangle around an image of interest, crop it out of the larger image, and
save the cropped image under a descriptive file name. The assistant can
repeat this process for every image of interest on every page in the entire
document.

I wish we could have identified a better way to get the job done, but for
now, it looks like this is what we are stuck with.

Dick Baldwin

-- 
Richard G. Baldwin (Dick Baldwin)
Home of Baldwin's on-line Java Tutorials
http://www.DickBaldwin.com

Professor of Computer Information Technology
Austin Community College
(512) 223-4758
mailto:Baldwin at DickBaldwin.com
http://www.austincc.edu/baldwin/