[Blindmath] Extracting bitmap images from pdf files

Richard Baldwin baldwin at dickbaldwin.com
Fri Jan 27 17:48:18 UTC 2012


Amanda and others,

I have contacted Adobe technical support. There solution to the problem is
to purchase Acrobat Pro for $445.00. The tech support rep told me that
their program will extract the pictures intact as separate bitmap files.

Dick Baldwin

On Fri, Jan 27, 2012 at 10:44 AM, Michael Whapples <mwhapples at aim.com>wrote:

> Hello,
> From what you are describing, my feeling is that the diagrams/images in
> the PDF in question are created from a number of drawing elements rather
> than a single image object. I'm not an expert on PDF, but I think you could
> think of it like the difference of a bitmap being a single element (I think
> PDF has a way to specify the start of a stream object like a bitmap) and an
> SVG being formed from lots of elements like lines and shapes (I think in
> PDF the lines and such like can be created with basic PDF drawing
> facilities so are not in a separate object). When the image is formed from
> lots of elements then it may be hard for the software to know what makes up
> a given diagram in the book/document, it just lays it out as specified and
> you work out what's related. I think one way to tell whether you have this
> sort of image is to see if NVDA will read some of the text labels of the
> image, if it does then its not a pure bitmap (you probably could use the
> read out lout function of adobe reader as well). Therefore I imagine that
> without clever recognition algorithms you are unlikely to get something
> which will extract it as you want.
>
> There is one option I am aware of for a blind person to do this
> independently, IVEO like John suggested, however IVEO isn't a cheap option
> and depending on how much is to be done would determine whether its worth
> the money if providing accessible diagrams from PDF was its only use. IVEO
> does not require a tiger printer, swell paper would work, other embossers
> may (the outputting from IVEO is the question as I think it may only output
> to devices appearing as standard printers). Interesting, the IVEO route
> again is requiring a human to make the decision on what forms the diagram.
>
> Michael Whapples
>
> -----Original Message----- From: Richard Baldwin
> Sent: Friday, January 27, 2012 3:28 PM
> To: Jamal Mazrui
> Cc: Blind Math list for those interested in mathematics
> Subject: Re: [Blindmath] Extracting bitmap images from pdf files
>
>
> Hi Jamal,
>
> It is a great program, easy to use, and probably totally accessible. I
> particularly like the fact that the program doesn't require a windows
> installation. The output data is well organized and including the page
> numbers in the bmp file names is a great help in analyzing them.
>
> Unfortunately, the output produced by the program suffers from the same
> issues that I have encountered with all of the other image extractor
> programs that I have tried. A few of the images come out intact. Most of
> the images don't come out intact.
>
> For example, page three of one of the pdf files that I tested has a single
> image of a battery. It is the same image that I enhanced and posted in an
> earlier post. Your program produced 54 bmp files for that page. A few of
> them were icons such as arrows exclamation marks, etc. The remaining bmp
> files appear to be a very small pieces of the image of the battery. By the
> way, I got the earlier image of the battery by taking a screen shot of the
> page and using an image editing program to crop out the battery image. None
> of the image extraction programs that I have tested extract the image
> intact.
>
> I don't know anything at all about the internal structure of pdf files, and
> this behavior of breaking an image into many small pieces may depend on how
> the file is constructed in the first place. In any event, my immediate
> problem has to do with a specific set of pdf files that are the chapters
> from a specific physics book, so this program doesn't solve my problem.
>
> Thanks for offering the program.
> Dick Baldwin
>
> On Fri, Jan 27, 2012 at 5:18 AM, Jamal Mazrui <empower at smart.net> wrote:
>
>  In an attempt to facilitate a free, non-web dependent solution, I have
>> written a Windows console-mode utility called PDF2Images, built with
>> PowerBASIC and a PDF library.  The distribution archive, including
>> documentation and source code, is available at
>>
>> http://empowermentzone.com/****pdf2images.zip<http://empowermentzone.com/**pdf2images.zip>
>> <http://**empowermentzone.com/**pdf2images.zip<http://empowermentzone.com/pdf2images.zip>
>> >
>>
>>
>> I am interested in any feedback on how well it works compared to other
>> approaches.
>>
>> Jamal
>>
>>
>>
>>
>
> --
> Richard G. Baldwin (Dick Baldwin)
> Home of Baldwin's on-line Java Tutorials
> http://www.DickBaldwin.com
>
> Professor of Computer Information Technology
> Austin Community College
> (512) 223-4758
> mailto:Baldwin at DickBaldwin.com
> http://www.austincc.edu/**baldwin/ <http://www.austincc.edu/baldwin/>
> ______________________________**_________________
> Blindmath mailing list
> Blindmath at nfbnet.org
> http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org>
> To unsubscribe, change your list options or get your account info for
> Blindmath:
> http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**
> mwhapples%40aim.com<http://nfbnet.org/mailman/options/blindmath_nfbnet.org/mwhapples%40aim.com>
>
> ______________________________**_________________
> Blindmath mailing list
> Blindmath at nfbnet.org
> http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org>
> To unsubscribe, change your list options or get your account info for
> Blindmath:
> http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**
> baldwin%40dickbaldwin.com<http://nfbnet.org/mailman/options/blindmath_nfbnet.org/baldwin%40dickbaldwin.com>
>



-- 
Richard G. Baldwin (Dick Baldwin)
Home of Baldwin's on-line Java Tutorials
http://www.DickBaldwin.com

Professor of Computer Information Technology
Austin Community College
(512) 223-4758
mailto:Baldwin at DickBaldwin.com
http://www.austincc.edu/baldwin/



More information about the BlindMath mailing list