[Blindmath] Extracting bitmap images from pdf files

Lewicki, Maureen mlewicki at bcsd.neric.org
Sun Jan 29 00:29:56 UTC 2012


Mike, I'd like to hear more about the techniques and materials you use for your graphics. Feel free to contact me off list

Maureen Murphy Lewicki
Maureen Murphy Lewicki
Teacher of Visually Impaired
Bethlehem Central Schools
(518)439-7681
"When we do the best that we can, we never know what miracle is wrought in our life, or in the life of another." Helen Keller 

-----Original Message-----
From: blindmath-bounces at nfbnet.org [mailto:blindmath-bounces at nfbnet.org] On Behalf Of Bente Casile
Sent: Saturday, January 28, 2012 2:01 PM
To: 'Blind Math list for those interested in mathematics'
Subject: Re: [Blindmath] Extracting bitmap images from pdf files

Ben,

My greatest wish for all the blind students out there is that we in the college system could have a repository of tactile graphics for science and math classes.  If we could stick with a text for more than two years it would be so helpful.  As someone who makes math tactile graphics for our students, I would love to see that happen.  It would allow us to get ahead for students to benefit directly from the hard work of others and not to have to  "re-invent" the wheel every time a new text is adopted.

Oh, and PS .. Austin is very nice..smiles

Bente
Bente J. Casile
Math Learning Specialist
Wake Tech Community College
Raleigh NC

-----Original Message-----
From: blindmath-bounces at nfbnet.org [mailto:blindmath-bounces at nfbnet.org] On Behalf Of Ben Humphreys
Sent: Saturday, January 28, 2012 11:17 AM
To: Blind Math list for those interested in mathematics
Subject: Re: [Blindmath] Extracting bitmap images from pdf files

Hi Richard,

As best I recall, it was a Microsoft Word file which we typically "saved as" HTML in order to get the graphics extracted out in an automated way.  Some came out as GIF, others JPEG, leading me to believe that Word preserves the original file intact.  These were .DOC, not .DOCX, so I don't believe they were really ZIP files in DOCX clothing.

As my instructor routinely"pasted" in JPGs, GIFs, etc from all around the world into her Microsoft Word files, it's anyone's guess why a few got all broken up like that.  Most remained intact.

Part way through the class, I ended up having my assistant extract by hand the images as the automated way was too difficult to distinguish the garbage (i.e. little arrows and such) from the meaningful calculus graphs.

I have a question -- are you using the most popular university Physics textbook, whatever that may be?  If so, and we get to the bottom of this, we could conceivably have a repository of labeled graphics files so others wouldn't have to repeat this step, and joy of joys, I could take physics without moving to Austin, :)  This of course is not to say Austin isn't a great place, it's just that I might have to move again when I want to take biology or chemestry.

As always, thanks for your continued enthusiasm.

And as I said, you're welcome to send me a file or two and we'll throw our Acrobat Pro strategy at it, maybe even consider how it might be automated.

Ben

At 08:59 AM 1/28/2012, you wrote:

But, no, I do not believe we were dealing with PDFs in this case.

Typically, when we have a PDF with a graphic, my assistant draws a box around it I think and saves it out separately.  I'm not clear on the process but he did say it required Acrobat Pro and once it's extracted, it's easy to blow it up to fill the page for easier tactile understanding.


>Hi Ben,
>
>I appreciate your frustration.
>
>Were the  "30 itty bitty graphics files" that apparently were small 
>parts of two actual graphs produced using Acrobat Pro, or were you 
>using some different image extraction software during that lost weekend?
>
>Thanks,
>Dick Baldwin
>
>On Sat, Jan 28, 2012 at 5:55 AM, Ben Humphreys
><brh at opticinspiration.org>wrote:
>
> > I suppose this procedure could work.
> >
> > But when it's this much effort to get to the starting gate, while 
> > other students are already moving forward and you're falling behind, 
> > it's no
fun,
> > and the added time and complexity and brainpower just takes all the 
> > motivation out of you.
> >
> > I remember losing a whole weekend to the issue of 30 itty bitty 
> > graphics files in a Calculus PDF.  Having embossed them, they were 
> > all told to
"fit
> > to page" and were thusly huge.  I was thinking they were all graphs 
> > and problems to be interpreted and worked on and understood, only to 
> > be told later that there were only two graphs and having the benefit 
> > of a
sighted
> > person on Monday morning to finally tell me that they were bits and
pieces
> > of the two relatively simple graphs.
> >
> > It's enough to make you want to be a Steve Jobs and exit school 
> > prematurely.
> >
> > Prof Baldwin, this is certainly not to say I don't appreciate all 
> > your effforts.  In fact, if and when I ever need to take physics, I 
> > am
seriously
> > considering relocating to Austin for a semester.
> >
> > P.S. I do have Acrobat pro so if you can send me the single page PDF 
> > in question, we can attempt to extract as a single image.
> >
> > Ben
> >
> >
> > At 02:56 PM 1/27/2012, you wrote:
> >
> >> In a previous post I wrote:
> >>
> >> "By the way, I don't know how a blind person would carry out the 
> >> second
of
> >> the following two steps in John's procedure:
> >>
> >> * import the PDF into IVEO Creator Pro.
> >> * Check the PDF to find which pages have images of interest and 
> >> emboss those pages.
> >>
> >> It seems that checking the pdf to find which pages have images 
> >> would be similar to checking a screen shot of a page to find and crop the image.
It
> >> seems that you would need to be able to see the pdf on the IVEO 
> >> screen
to
> >> know if it contains an image. I am working with pdf files 
> >> containing anywhere between 30 and 80 pages. Embossing every page 
> >> in order to identify the pages that contain images would not be 
> >> practical."
> >>
> >> I have learned how a blind person could find the pages containing 
> >> the images in a pdf file without having to see the screen. Here is 
> >> one procedure for doing that.
> >>
> >> When you import a pdf file into Creator Pro, a set of SVG files is 
> >> automatically created in the folder than contains the pdf file. 
> >> There
is
> >> one SVG file for each page in the pdf file. The file names indicate 
> >> the pdf page number except that pages in a pdf file are typically 
> >> numbered beginning with 1 while the file numbers produced by 
> >> Creator Pro begin
with
> >> 0. Thus, file number 0 will probably correspond to page 1 in the 
> >> pdf document.
> >>
> >> Read the pdf file in your preferred pdf file reader. If from the 
> >> pdf
text,
> >> you can determine which pages in the pdf file contain images of
interest,
> >> you can record those page numbers using whatever method you use to
record
> >> information of that sort.
> >>
> >> Then you can import the pdf file into Creator Pro, producing the 
> >> set of SVG files described above. Then you can open the SVG files 
> >> that contain interesting images in your IVEO viewer software, 
> >> emboss the pages, and proceed as John explained in an earlier post.
> >>
> >> Dick Baldwin
> >>
> >> On Fri, Jan 27, 2012 at 12:47 PM, Richard Baldwin
> >> <baldwin at dickbaldwin.com>**wrote:
> >>
> >> > Michael wrote " There is one option I am aware of for a blind 
> >> > person
to
> >> > do this independently, IVEO like John suggested,"
> >> >
> >> > I may be wrong, but I didn't get the idea that John's solution 
> >> > will produce an output bitmap file - only an embossed image.
> >> >
> >> > I may be wrong again, but as near as I can tell, IVEO doesn't do 
> >> > any
> >> image
> >> > enhancement prior to embossing the image. If I am wrong on these
points,
> >> > John will probably come online and set the record straight.
> >> >
> >> > IVEO seems to simply convert the bitmap image to gray scale and
emboss
> >> the
> >> > gray scale. While gray scale embossing is okay for some images
> >> (especially
> >> > blank and white images), it is definitely not the best option for
many
> >> > images. After all, if you convert 16 million colors to four 
> >> > levels of
> >> gray
> >> > scale, each level of gray scale represents 4 million different
colors.
> >> > Pixels belonging to each set of 4 million colors will not be 
> >> > distinguishable in the gray scale representation.
> >> >
> >> > My objective is to gain access to full-color bitmap images so 
> >> > that I
can
> >> > enhance the image for embossing prior to throwing away all of the
color
> >> > information.
> >> >
> >> > Embossed versions of bitmap images are often very difficult to
> >> understand,
> >> > even with a decent description. I believe we need to do 
> >> > everything reasonable to improve the understandability of 
> >> > embossed bitmap
images.
> >> In
> >> > some cases, image enhancement techniques at the full-color stage 
> >> > can
be
> >> > used to provide those improvements.
> >> >
> >> > So, my quest continues, hopefully without having to pay $445.00 
> >> > for Acrobat Pro, just to get access to the images.
> >> >
> >> > The fallback position, of course, is to use screen shots and an 
> >> > image editor program to crop out the individual images, but that 
> >> > approach
is
> >> not
> >> > possible for a blind person to use. You can't crop an image out 
> >> > of a
> >> screen
> >> > shot unless you can see the image.
> >> >
> >> > By the way, I don't know how a blind person would carry out the
second
> >> of
> >> > the following two steps in John's procedure:
> >> >
> >> > * import the PDF into IVEO Creator Pro.
> >> > * Check the PDF to find which pages have images of interest and
emboss
> >> > those
> >> > pages.
> >> >
> >> > It seems that checking the pdf to find which pages have images 
> >> > would
be
> >> > similar to checking a screen shot of a page to find and crop the
image.
> >> It
> >> > seems that you would need to be able to see the pdf on the IVEO
screen
> >> to
> >> > know if it contains an image. I am working with pdf files 
> >> > containing anywhere between 30 and 80 pages. Embossing every page 
> >> > in order to
> >> identify
> >> > the pages that contain images would not be practical.
> >> >
> >> > Dick Baldwin
> >> >
> >> >
> >> > On Fri, Jan 27, 2012 at 11:48 AM, Richard Baldwin <
> >> baldwin at dickbaldwin.com
> >> > > wrote:
> >> >
> >> >> Amanda and others,
> >> >>
> >> >> I have contacted Adobe technical support. There solution to the
problem
> >> >> is to purchase Acrobat Pro for $445.00. The tech support rep 
> >> >> told me
> >> that
> >> >> their program will extract the pictures intact as separate 
> >> >> bitmap
> >> files.
> >> >>
> >> >> Dick Baldwin
> >> >>
> >> >>
> >> >> On Fri, Jan 27, 2012 at 10:44 AM, Michael Whapples
<mwhapples at aim.com
> >> >wrote:
> >> >>
> >> >>> Hello,
> >> >>> From what you are describing, my feeling is that the
diagrams/images
> >> in
> >> >>> the PDF in question are created from a number of drawing 
> >> >>> elements
> >> rather
> >> >>> than a single image object. I'm not an expert on PDF, but I 
> >> >>> think
you
> >> could
> >> >>> think of it like the difference of a bitmap being a single 
> >> >>> element
(I
> >> think
> >> >>> PDF has a way to specify the start of a stream object like a
bitmap)
> >> and an
> >> >>> SVG being formed from lots of elements like lines and shapes (I
think
> >> in
> >> >>> PDF the lines and such like can be created with basic PDF 
> >> >>> drawing facilities so are not in a separate object). When the 
> >> >>> image is
formed
> >> from
> >> >>> lots of elements then it may be hard for the software to know 
> >> >>> what
> >> makes up
> >> >>> a given diagram in the book/document, it just lays it out as
> >> specified and
> >> >>> you work out what's related. I think one way to tell whether 
> >> >>> you
have
> >> this
> >> >>> sort of image is to see if NVDA will read some of the text 
> >> >>> labels
of
> >> the
> >> >>> image, if it does then its not a pure bitmap (you probably 
> >> >>> could
use
> >> the
> >> >>> read out lout function of adobe reader as well). Therefore I
imagine
> >> that
> >> >>> without clever recognition algorithms you are unlikely to get
> >> something
> >> >>> which will extract it as you want.
> >> >>>
> >> >>> There is one option I am aware of for a blind person to do this 
> >> >>> independently, IVEO like John suggested, however IVEO isn't a 
> >> >>> cheap
> >> option
> >> >>> and depending on how much is to be done would determine whether 
> >> >>> its
> >> worth
> >> >>> the money if providing accessible diagrams from PDF was its 
> >> >>> only
use.
> >> IVEO
> >> >>> does not require a tiger printer, swell paper would work, other
> >> embossers
> >> >>> may (the outputting from IVEO is the question as I think it may
only
> >> output
> >> >>> to devices appearing as standard printers). Interesting, the 
> >> >>> IVEO
> >> route
> >> >>> again is requiring a human to make the decision on what forms 
> >> >>> the
> >> diagram.
> >> >>>
> >> >>> Michael Whapples
> >> >>>
> >> >>> -----Original Message----- From: Richard Baldwin
> >> >>> Sent: Friday, January 27, 2012 3:28 PM
> >> >>> To: Jamal Mazrui
> >> >>> Cc: Blind Math list for those interested in mathematics
> >> >>> Subject: Re: [Blindmath] Extracting bitmap images from pdf 
> >> >>> files
> >> >>>
> >> >>>
> >> >>> Hi Jamal,
> >> >>>
> >> >>> It is a great program, easy to use, and probably totally
accessible. I
> >> >>> particularly like the fact that the program doesn't require a
windows
> >> >>> installation. The output data is well organized and including 
> >> >>> the
page
> >> >>> numbers in the bmp file names is a great help in analyzing them.
> >> >>>
> >> >>> Unfortunately, the output produced by the program suffers from 
> >> >>> the
> >> same
> >> >>> issues that I have encountered with all of the other image
extractor
> >> >>> programs that I have tried. A few of the images come out intact.
Most
> >> of
> >> >>> the images don't come out intact.
> >> >>>
> >> >>> For example, page three of one of the pdf files that I tested 
> >> >>> has a single image of a battery. It is the same image that I 
> >> >>> enhanced and posted
> >> in an
> >> >>> earlier post. Your program produced 54 bmp files for that page. 
> >> >>> A
few
> >> of
> >> >>> them were icons such as arrows exclamation marks, etc. The
remaining
> >> bmp
> >> >>> files appear to be a very small pieces of the image of the battery.
By
> >> >>> the
> >> >>> way, I got the earlier image of the battery by taking a screen 
> >> >>> shot
of
> >> >>> the
> >> >>> page and using an image editing program to crop out the battery
image.
> >> >>> None
> >> >>> of the image extraction programs that I have tested extract the
image
> >> >>> intact.
> >> >>>
> >> >>> I don't know anything at all about the internal structure of 
> >> >>> pdf
> >> files,
> >> >>> and
> >> >>> this behavior of breaking an image into many small pieces may
depend
> >> on
> >> >>> how
> >> >>> the file is constructed in the first place. In any event, my
immediate
> >> >>> problem has to do with a specific set of pdf files that are the
> >> chapters
> >> >>> from a specific physics book, so this program doesn't solve my
> >> problem.
> >> >>>
> >> >>> Thanks for offering the program.
> >> >>> Dick Baldwin
> >> >>>
> >> >>> On Fri, Jan 27, 2012 at 5:18 AM, Jamal Mazrui 
> >> >>> <empower at smart.net>
> >> wrote:
> >> >>>
> >> >>>  In an attempt to facilitate a free, non-web dependent 
> >> >>> solution, I
> >> have
> >> >>>> written a Windows console-mode utility called PDF2Images, 
> >> >>>> built
with
> >> >>>> PowerBASIC and a PDF library.  The distribution archive, 
> >> >>>> including documentation and source code, is available at
> >> >>>>
> >> >>>> 
>
http://empowermentzone.com/******pdf2images.zip<http://empowermentzone.com/*
***pdf2images.zip>
> >> 
>
<http://**empowermentzone.com/****pdf2images.zip<http://empowermentzone.com/
**pdf2images.zip>
> >> >
> >> >>>> 
>
<http://**empowermentzone.com/****pdf2images.zip<http://empowermentzone.com/
**pdf2images.zip>
> >> 
>
<http://**empowermentzone.com/**pdf2images.zip<http://empowermentzone.com/pd
f2images.zip>
> >> >
> >>
> >> >>>> >
> >> >>>>
> >> >>>>
> >> >>>> I am interested in any feedback on how well it works compared 
> >> >>>> to
> >> other
> >> >>>> approaches.
> >> >>>>
> >> >>>> Jamal
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>
> >> >>> --
> >> >>> Richard G. Baldwin (Dick Baldwin) Home of Baldwin's on-line 
> >> >>> Java Tutorials http://www.DickBaldwin.com
> >> >>>
> >> >>> Professor of Computer Information Technology Austin Community 
> >> >>> College
> >> >>> (512) 223-4758
> >> >>> mailto:Baldwin at DickBaldwin.com
> >> >>> 
> http://www.austincc.edu/****baldwin/<http://www.austincc.edu/**baldwin
> /><
> >> http://www.austincc.edu/**baldwin/ 
> >> <http://www.austincc.edu/baldwin/>>
> >> >>> ______________________________****_________________
> >> >>> Blindmath mailing list
> >> >>> Blindmath at nfbnet.org
> >> >>> 
>
http://nfbnet.org/mailman/****listinfo/blindmath_nfbnet.org<http://nfbnet.or
g/mailman/**listinfo/blindmath_nfbnet.org>
> >> 
>
<**http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<http://nfbnet.o
rg/mailman/listinfo/blindmath_nfbnet.org>
> >> >
> >>
> >> >>> To unsubscribe, change your list options or get your account 
> >> >>> info
for
> >> >>> Blindmath:
> >> >>> 
>
http://nfbnet.org/mailman/****options/blindmath_nfbnet.org/****<http://nfbne
t.org/mailman/**options/blindmath_nfbnet.org/**>
> >> >>> mwhapples%40aim.com<http://**nfbnet.org/mailman/options/**>>
>
blindmath_nfbnet.org/**mwhapples%40aim.com<http://nfbnet.org/mailman/options
/blindmath_nfbnet.org/mwhapples%40aim.com>
> >> >
> >> >>>
> >> >>> ______________________________****_________________
> >> >>> Blindmath mailing list
> >> >>> Blindmath at nfbnet.org
> >> >>> 
>
http://nfbnet.org/mailman/****listinfo/blindmath_nfbnet.org<http://nfbnet.or
g/mailman/**listinfo/blindmath_nfbnet.org>
> >> 
>
<**http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<http://nfbnet.o
rg/mailman/listinfo/blindmath_nfbnet.org>
> >> >
> >>
> >> >>> To unsubscribe, change your list options or get your account 
> >> >>> info
for
> >> >>> Blindmath:
> >> >>> 
>
http://nfbnet.org/mailman/****options/blindmath_nfbnet.org/****<http://nfbne
t.org/mailman/**options/blindmath_nfbnet.org/**>
> >> >>> baldwin%40dickbaldwin.com<http**://nfbnet.org/mailman/options/*
> >> >>> *
> >> 
>
blindmath_nfbnet.org/baldwin%**40dickbaldwin.com<http://nfbnet.org/mailman/o
ptions/blindmath_nfbnet.org/baldwin%40dickbaldwin.com>
> >> >
> >>
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Richard G. Baldwin (Dick Baldwin) Home of Baldwin's on-line Java 
> >> >> Tutorials http://www.DickBaldwin.com
> >> >>
> >> >> Professor of Computer Information Technology Austin Community 
> >> >> College
> >> >> (512) 223-4758
> >> >> mailto:Baldwin at DickBaldwin.com
> >> >> http://www.austincc.edu/**baldwin/
<http://www.austincc.edu/baldwin/>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Richard G. Baldwin (Dick Baldwin) Home of Baldwin's on-line Java 
> >> > Tutorials http://www.DickBaldwin.com
> >> >
> >> > Professor of Computer Information Technology Austin Community 
> >> > College
> >> > (512) 223-4758
> >> > mailto:Baldwin at DickBaldwin.com
> >> > http://www.austincc.edu/**baldwin/ 
> >> > <http://www.austincc.edu/baldwin/>
> >> >
> >>
> >>
> >>
> >> --
> >> Richard G. Baldwin (Dick Baldwin)
> >> Home of Baldwin's on-line Java Tutorials http://www.DickBaldwin.com
> >>
> >> Professor of Computer Information Technology Austin Community 
> >> College
> >> (512) 223-4758
> >> mailto:Baldwin at DickBaldwin.com
> >> http://www.austincc.edu/**baldwin/ 
> >> <http://www.austincc.edu/baldwin/>
> >> ______________________________**_________________
> >> Blindmath mailing list
> >> Blindmath at nfbnet.org
> >> 
>
http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<http://nfbnet.org/
mailman/listinfo/blindmath_nfbnet.org>
> >> To unsubscribe, change your list options or get your account info 
> >> for
> >> Blindmath:
> >> http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**
> >> 
>
brh%40opticinspiration.org<http://nfbnet.org/mailman/options/blindmath_nfbne
t.org/brh%40opticinspiration.org>
> >>
> >
> >
> > ______________________________**_________________
> > Blindmath mailing list
> > Blindmath at nfbnet.org
> > 
>
http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<http://nfbnet.org/
mailman/listinfo/blindmath_nfbnet.org>
> > To unsubscribe, change your list options or get your account info 
> > for
> > Blindmath:
> > http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**
> > 
>
baldwin%40dickbaldwin.com<http://nfbnet.org/mailman/options/blindmath_nfbnet
.org/baldwin%40dickbaldwin.com>
> >
>
>
>
>--
>Richard G. Baldwin (Dick Baldwin)
>Home of Baldwin's on-line Java Tutorials http://www.DickBaldwin.com
>
>Professor of Computer Information Technology Austin Community College
>(512) 223-4758
>mailto:Baldwin at DickBaldwin.com
>http://www.austincc.edu/baldwin/
>_______________________________________________
>Blindmath mailing list
>Blindmath at nfbnet.org
>http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org
>To unsubscribe, change your list options or get your account info for 
>Blindmath:
>http://nfbnet.org/mailman/options/blindmath_nfbnet.org/brh%40opticinspi
>rati
on.org


_______________________________________________
Blindmath mailing list
Blindmath at nfbnet.org
http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org
To unsubscribe, change your list options or get your account info for
Blindmath:
http://nfbnet.org/mailman/options/blindmath_nfbnet.org/bente%40casilenc.com


_______________________________________________
Blindmath mailing list
Blindmath at nfbnet.org
http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org
To unsubscribe, change your list options or get your account info for Blindmath:
http://nfbnet.org/mailman/options/blindmath_nfbnet.org/mlewicki%40bcsd.neric.org




More information about the BlindMath mailing list