[Blindmath] Extracting bitmap images from pdf files

Michael Whapples mwhapples at aim.com
Fri Jan 27 16:44:15 UTC 2012


Hello,
>From what you are describing, my feeling is that the diagrams/images in the 
PDF in question are created from a number of drawing elements rather than a 
single image object. I'm not an expert on PDF, but I think you could think 
of it like the difference of a bitmap being a single element (I think PDF 
has a way to specify the start of a stream object like a bitmap) and an SVG 
being formed from lots of elements like lines and shapes (I think in PDF the 
lines and such like can be created with basic PDF drawing facilities so are 
not in a separate object). When the image is formed from lots of elements 
then it may be hard for the software to know what makes up a given diagram 
in the book/document, it just lays it out as specified and you work out 
what's related. I think one way to tell whether you have this sort of image 
is to see if NVDA will read some of the text labels of the image, if it does 
then its not a pure bitmap (you probably could use the read out lout 
function of adobe reader as well). Therefore I imagine that without clever 
recognition algorithms you are unlikely to get something which will extract 
it as you want.

There is one option I am aware of for a blind person to do this 
independently, IVEO like John suggested, however IVEO isn't a cheap option 
and depending on how much is to be done would determine whether its worth 
the money if providing accessible diagrams from PDF was its only use. IVEO 
does not require a tiger printer, swell paper would work, other embossers 
may (the outputting from IVEO is the question as I think it may only output 
to devices appearing as standard printers). Interesting, the IVEO route 
again is requiring a human to make the decision on what forms the diagram.

Michael Whapples

-----Original Message----- 
From: Richard Baldwin
Sent: Friday, January 27, 2012 3:28 PM
To: Jamal Mazrui
Cc: Blind Math list for those interested in mathematics
Subject: Re: [Blindmath] Extracting bitmap images from pdf files

Hi Jamal,

It is a great program, easy to use, and probably totally accessible. I
particularly like the fact that the program doesn't require a windows
installation. The output data is well organized and including the page
numbers in the bmp file names is a great help in analyzing them.

Unfortunately, the output produced by the program suffers from the same
issues that I have encountered with all of the other image extractor
programs that I have tried. A few of the images come out intact. Most of
the images don't come out intact.

For example, page three of one of the pdf files that I tested has a single
image of a battery. It is the same image that I enhanced and posted in an
earlier post. Your program produced 54 bmp files for that page. A few of
them were icons such as arrows exclamation marks, etc. The remaining bmp
files appear to be a very small pieces of the image of the battery. By the
way, I got the earlier image of the battery by taking a screen shot of the
page and using an image editing program to crop out the battery image. None
of the image extraction programs that I have tested extract the image
intact.

I don't know anything at all about the internal structure of pdf files, and
this behavior of breaking an image into many small pieces may depend on how
the file is constructed in the first place. In any event, my immediate
problem has to do with a specific set of pdf files that are the chapters
from a specific physics book, so this program doesn't solve my problem.

Thanks for offering the program.
Dick Baldwin

On Fri, Jan 27, 2012 at 5:18 AM, Jamal Mazrui <empower at smart.net> wrote:

> In an attempt to facilitate a free, non-web dependent solution, I have
> written a Windows console-mode utility called PDF2Images, built with
> PowerBASIC and a PDF library.  The distribution archive, including
> documentation and source code, is available at
>
> http://empowermentzone.com/**pdf2images.zip<http://empowermentzone.com/pdf2images.zip>
>
> I am interested in any feedback on how well it works compared to other
> approaches.
>
> Jamal
>
>
>


-- 
Richard G. Baldwin (Dick Baldwin)
Home of Baldwin's on-line Java Tutorials
http://www.DickBaldwin.com

Professor of Computer Information Technology
Austin Community College
(512) 223-4758
mailto:Baldwin at DickBaldwin.com
http://www.austincc.edu/baldwin/
_______________________________________________
Blindmath mailing list
Blindmath at nfbnet.org
http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org
To unsubscribe, change your list options or get your account info for 
Blindmath:
http://nfbnet.org/mailman/options/blindmath_nfbnet.org/mwhapples%40aim.com 





More information about the BlindMath mailing list