[Blindmath] Extracting bitmap images from pdf files

Richard Baldwin baldwin at dickbaldwin.com
Thu Jan 26 04:57:15 UTC 2012


Hi Jamal,

Thanks for the input. I downloaded Xpdf and got the pdfimages.exe program
running. I ran it against a fairly large pdf file and it produced 1225
image files of type ppm. This step seems to be completely accessible by
blind students.

I was able to open and view those files in a program that I use named Lview.

However, of the 1225 image files, only 14 had a size greater than 2KB. I
recognized the images in those files as being in the pdf file.

There were about 30 files that Windows reported as being 2KB in size and
the remainder were all reported as being 1KB in size.

I could recognize a few of the 2KB and 1KB files as being small icons
(arrows, exclamation marks, etc.) I couldn't recognize any of the remaining
small files. They appear to be small sections of larger images, but I can't
say for sure. I experienced a similar situation using a program named Nitro
Reader to extract the images from a pdf file. In that case also, a very
large number of image files were extracted and converted to type jpg, but
most of them appear to be small sections of larger images.

There were a large number of images in the pdf file that didn't seem to be
included in the set of 1225 images files, at least not as intact images. I
suspect that they are there, but they have been cut into many small pieces
and it is virtually impossible to manually put humpty dumpty back together
again.

I'm beginning to wonder if it is even possible to extract intact images
from a pdf file. Perhaps Adobe has designed things to prohibit us from
extracting most of the images from PDF files.

So far the only things that I have found to work is to either:

1. Convert the pdf file to html format and harvest the images. This
involves a lot of cropping to separate each image from the blank space that
contains it.

2. Convert the pdf file to an image file and use a cropping editor to
harvest the images. This is even more difficult than #1.

Both approaches require assistance from a sighted person for a blind
student to get access to the images as intact bitmap files.

Since my plan is to enhance the images for embossing prior to losing all of
the color information, I need to get the raw full color images in
individual files and process them before turning them over to another
program such as IE9 or IVEO.

Thanks again for the input.
Dick Baldwin





On Wed, Jan 25, 2012 at 4:41 PM, Jamal Mazrui <empower at smart.net> wrote:

> Xpdf is a free package of utilities that includes pdfimages.exe in the
> Windows distribution.  Based only on the documentation (I have not tried
> it), it may do what you want.  I would be curious to know.
>
> The home page of the project is at
>
> http://www.foolabs.com/xpdf/
>
> A direct download URL is
>
> ftp://ftp.foolabs.com/pub/**xpdf/xpdfbin-win-3.03.zip<ftp://ftp.foolabs.com/pub/xpdf/xpdfbin-win-3.03.zip>
>
> Jamal
>
> On Wed, 25 Jan 2012, Richard Baldwin wrote:
>
>  Date: Wed, 25 Jan 2012 12:07:56 -0600
>> From: Richard Baldwin <baldwin at dickbaldwin.com>
>> Reply-To: Blind Math list for those interested in mathematics
>>    <blindmath at nfbnet.org>
>> To: BlindMath Mailing List <blindmath at nfbnet.org>,
>>
>>    accessibleimage at freelists.org
>> Subject: [Blindmath] Extracting bitmap images from pdf files
>>
>> Many blind students receive electronic textbooks in pdf format.
>>
>> Many textbooks contain lots of images.
>>
>> Many images are poorly described in textbooks.
>>
>> Various ways to convert bitmap images into tactile images are available --
>> some fairly good, some not so good, some very poor. However, regardless of
>> the quality of the conversion to tactile format, you must have the
>> original
>> image file in order to get anything.
>>
>> I have tried four or five different online file conversion sites in an
>> attempt to find a clean way that a blind student can extract the images
>> from a pdf textbook file without success. Different sites have different
>> problems, but they all seem to have some kind of problems that make it
>> very
>> difficult to extract the images from pdf files.
>>
>> Has anyone identified an online site or downloadable program that is
>> available either free or at a reasonable price to cleanly extract the
>> images from pdf files, which often range up to 10 or more megabytes  or
>> more in size?
>>
>> Thanks,
>> Dick Baldwin
>>
>> --
>> Richard G. Baldwin (Dick Baldwin)
>> Home of Baldwin's on-line Java Tutorials
>> http://www.DickBaldwin.com
>>
>> Professor of Computer Information Technology
>> Austin Community College
>> (512) 223-4758
>> mailto:Baldwin at DickBaldwin.com
>> http://www.austincc.edu/**baldwin/ <http://www.austincc.edu/baldwin/>
>> ______________________________**_________________
>> Blindmath mailing list
>> Blindmath at nfbnet.org
>> http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org>
>> To unsubscribe, change your list options or get your account info for
>> Blindmath:
>> http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**
>> empower%40smart.net<http://nfbnet.org/mailman/options/blindmath_nfbnet.org/empower%40smart.net>
>>
>>
> ______________________________**_________________
> Blindmath mailing list
> Blindmath at nfbnet.org
> http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org>
> To unsubscribe, change your list options or get your account info for
> Blindmath:
> http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**
> baldwin%40dickbaldwin.com<http://nfbnet.org/mailman/options/blindmath_nfbnet.org/baldwin%40dickbaldwin.com>
>



-- 
Richard G. Baldwin (Dick Baldwin)
Home of Baldwin's on-line Java Tutorials
http://www.DickBaldwin.com

Professor of Computer Information Technology
Austin Community College
(512) 223-4758
mailto:Baldwin at DickBaldwin.com
http://www.austincc.edu/baldwin/



More information about the BlindMath mailing list