[Blindmath] Extracting bitmap images from pdf files

Richard Baldwin baldwin at dickbaldwin.com
Mon Jan 30 03:03:23 UTC 2012


Jamal,

I just realized that the latest version does suppress the output of bmp,
jpg, and txt files.

Thanks,
Dick Baldwin

On Sun, Jan 29, 2012 at 8:55 PM, Richard Baldwin <baldwin at dickbaldwin.com>wrote:

> Great work Jamal,
>
> The program works great. I ran it on the largest pdf file in the set for
> Amanda's physics book with no problems.
>
> Here is what would work well for me.
>
> Two exe files in the same package -- one for 150 bpi and the other for 300
> bpi. The large (300 bpi) pages are hard to deal with on a small monitor,
> but they make it possible to go in and crop out high quality versions of
> small images that were created with Adobe vector graphics.
>
> On the other hand, the small (150 bpi) pages are entirely adequate for
> cropping out images that were originally bitmap images or large vector
> images. And, the small pages are easier to work with.
>
> Therefore, both versions are useful.
>
> If practical, suppress the output of the bmp, jpg, and txt files. I don't
> need them. If not practical, don't worry about it. It is easy enough to
> delete them.
>
> Thanks for taking the initiative and doing this.
>
> Dick Baldwin
>
>
> On Sun, Jan 29, 2012 at 6:02 PM, Jamal Mazrui <empower at smart.net> wrote:
>
>> Hi Dick,
>> Sorry my prior message was not clear about this.  After copying the new
>> pdf2images.exe into the directory you used for PDF2Parts, you would then
>> run pdf2images.exe, passing it the file name of the PDF to analyze.  I
>> suspect that you instead ran pdf2parts.exe again, which would, indeed,
>> produce the same result as before.
>>
>> I just tried this pdf2images.exe with a book that is 873 pages in size.
>>  It appeared to create a .TIF for each page.
>>
>> For just converting PDFs to text, let me suggest my older, PDF2TXT
>> program, based on the same PDF library.  It can convert batches of PDF with
>> a simple GUI dialog.  It can also do OCR on image-only PDFs using the free,
>> open source Tesseract utility from Google.  That OCR is not high quality by
>> today's standards.
>>
>> PDF2TXT is available as a Windows installer at
>>
>> http://EmpowermentZone.com/**p2tsetup.exe<http://EmpowermentZone.com/p2tsetup.exe>
>>
>> Its full documentation may be browsed at
>>
>> http://empowermentzone.com/**pdf2txt.htm<http://empowermentzone.com/pdf2txt.htm>
>>
>> Jamal
>>
>>
>>
>>
>> On 1/29/2012 6:42 PM, Richard Baldwin wrote:
>> > Hi Jamal,
>> > The output from this version is not much different from the previous
>> version. The program still crashed on page 17 of the small pdf file. I also
>> noticed that it skipped page 13.
>> > I tried a larger pdf file and it crashed on page 6 of that file.
>> > I don't believe the tiff files were actually created at 300 dpi. The
>> width of those files is 1275 pixels, which matches 8.5 inches at 150 dpi.
>> > I did discover one thing that may be different. Although I was unable
>> to successfully open the jpg files in Lview Pro, which is the image editor
>> program that I have used for years, I was able to successfully open them in
>> Windows Paint and also in a program named Paint.net that I occasionally
>> use. That was probably also true for the earlier version. I simply didn't
>> try it. Curiously, the jpg files seemed to be in reverse video when opened
>> in those paint programs.
>> > Don't spend time worrying about the jpg files. They add very little
>> benefit to the overall result. As far as I am concerned, you could suppress
>> the output of images from the individual pages, because they are of little
>> value.
>> > Amanda might be happy with the .txt files that appear to contain the
>> text from the pdf file in a plain text format on a page by page basis.
>> > Dick Baldwin
>> >
>> > On Sun, Jan 29, 2012 at 4:54 PM, Jamal Mazrui <empower at smart.net>
>> wrote:
>> >
>> >     Hi Dick,
>> >     With the PDF library I have, I do not see a way of adjusting the
>> format of JPG output, other than the DPI setting, unfortunately.  Perhaps
>> the free Image Magick software could transform those files into something
>> more useful -- not sure.
>> >
>> >     I think I may have found a way, however, to improve the reliability
>> of simply producing a TIF file for each whole page of the PDF.  The library
>> has a function call for this that processes all pages at once.  Memory
>> seems to be managed better than when iterating through each page of the PDF
>> separately, which I suspect is causing the crashes with PDFs that are not
>> relatively small in size.
>> >
>> >     I just posted a utility that only does that task at 300 DPI.  It
>> has the original PDF2Images name and is available at
>> >
>> >     http://EmpowermentZone.com/**pdf2images.zip<http://EmpowermentZone.com/pdf2images.zip>
>> >
>> >     Just unzip it to the same directory as PDF2Parts (it uses the same
>> PDF2Parts.dll).
>> >
>> >     A minor annoyance is that this technique does not right justify
>> page numbers (the single function call mostly handles the names of
>> individual .tif files).  So, the output files do not sort correctly in an
>> alphabetical directory listing.  If files are sorted by time, however, the
>> right order is attained.
>> >
>> >     Can you let me know how well this utility works?  If I get it
>> working adequately, I will incorporate it into a single, coherent package.
>> >
>> >     Jamal
>> >
>> >
>> >
>> >     On 1/29/2012 4:47 PM, Richard Baldwin wrote:
>> >>     Hi Jamal,
>> >>     I ran the new version of the program for a relatively small pdf
>> file, which was one of the smallest chapters in the physics textbook. The
>> program stopped with an error on page 17 of about 24 pages. However, it did
>> produce a lot of output before stopping.
>> >>     The tiff files that represent individual pages look good. If
>> possible, I would like to see if conversion to 300 dpi as opposed to 150
>> dpi would provide improved image quality.
>> >>     The bmp and jpg files for the individual images on each page
>> suffer from the same problem discussed in previous posts. Mostly small
>> pieces of larger images. In addition, the jpg files appear to be corrupt.
>> They appear to suffer from some sort of synchronization problem that causes
>> them to consist mainly of vertical bars. However, it was possible for me to
>> correlate one of them to an actual image in the book. I suspect that these
>> are the images from the pdf file that are stored as raster images in the
>> pdf file.
>> >>     Once you get the program to handle complete pdf files, I will
>> consider it superior to online conversion of pdf files to bitmap pages. If
>> you can fix the problem with the jpg files, that would be useful because
>> they contain images that a sighted assistant won't need to crop out of the
>> larger page images.
>> >>     Thanks,
>> >>     Dick Baldwin
>> >>
>> >>     On Sun, Jan 29, 2012 at 11:21 AM, Jamal Mazrui <empower at smart.net>
>> wrote:
>> >>
>> >>         Dick,
>> >>         I just posted a revised and renamed version of my program,
>> which extracts both text and images.  PDF2Parts is available at
>> >>         http://EmpowermentZone.com/**pdf2parts.zip<http://EmpowermentZone.com/pdf2parts.zip>
>> >>
>> >>         Currently, it creates a .tif version of each PDF page at 150
>> DPI.  Alternatively, I could make it save as .bmp or .jpg, and vary the
>> resolution.  Would another image format or DPI work better for what you are
>> trying to do?
>> >>
>> >>         Jamal
>> >>
>> >>         P.S.  The program seems to hang on large PDFs sometimes.  I
>> have not figured out the pattern and debugged that yet.
>> >>
>> >>
>> >>         On 1/28/2012 2:29 PM, Richard Baldwin wrote:
>> >>
>> >>             I will be responding to questions and comments from
>> several different
>> >>             individuals in this post, so I will refer to each person
>> by name.
>> >>
>> >>             Maureen: I will be happy to send some files off list for
>> you to emboss and
>> >>             evaluate if you would be interested in doing that. I would
>> be interested in
>> >>             your feed back.
>> >>
>> >>             Jamal: You wrote "In reviewing the documentation for the
>> PDF library I'm
>> >>             using, I notice there is also the ability to save each
>> page as an image.
>> >>              Would that be helpful?"
>> >>
>> >>             That would be very helpful. I have generally concluded
>> (more on this in a
>> >>             separate post) that the most practical way for a sighted
>> person to extract
>> >>             images from a pdf file for a blind student is to deal with
>> each page as an
>> >>             image file, crop, cut, copy, and paste. I have identified
>> a free website
>> >>             that will convert a pdf file to a set of image files, but
>> the less often I
>> >>             am required to download files from strange websites, the
>> happier I am. I
>> >>             never know what may be riding those files into my
>> computer. Your
>> >>             stand-alone command-line based program would make it
>> possible to make the
>> >>             conversion locally. Please provide more information.
>> >>
>> >>             Ben: You wrote "I have a question -- are you using the
>> most popular
>> >>             university Physics textbook, whatever that may be?"
>> >>
>> >>             Actually, I teach Computer Science and not physics. Amanda
>> is a Computer
>> >>             Science student, and I am helping her in a required
>> physics course. Her
>> >>             physics book is the only one that I know anything about.
>> However, I believe
>> >>             this pdf-image issue applies to many college-level
>> textbooks, because many
>> >>             blind college students probably receive their electronic
>> textbooks in pdf
>> >>             format. Once again, however, the only one that I have any
>> personal
>> >>             knowledge about is Amanda's physics book.
>> >>
>> >>             I will send you a pdf file of one of the chapters from the
>> textbook off
>> >>             list later today.
>> >>
>> >>             Bente: You wrote "If we could stick with a text for more
>> than two years it
>> >>             would be so helpful."
>> >>
>> >>             I will simply say a loud AMEN to that. In my 18 years of
>> teaching, I have
>> >>             never understood why community colleg instructors insist
>> on changing
>> >>             textbooks so frequently, causing much more work for
>> themselves in the
>> >>             process. I have gotten to the point that I tell my
>> students that the
>> >>             textbook is for reference purposes only and the material
>> for the course is
>> >>             published at http://www.dickbaldwin.com.
>> >>
>> >>             Dick Baldwin
>> >>
>> >>             On Sat, Jan 28, 2012 at 1:01 PM, Bente Casile<
>> bente at casilenc.com>  wrote:
>> >>
>> >>                 Ben,
>> >>
>> >>                 My greatest wish for all the blind students out there
>> is that we in the
>> >>                 college system could have a repository of tactile
>> graphics for science and
>> >>                 math classes.  If we could stick with a text for more
>> than two years it
>> >>                 would be so helpful.  As someone who makes math
>> tactile graphics for our
>> >>                 students, I would love to see that happen.  It would
>> allow us to get ahead
>> >>                 for students to benefit directly from the hard work of
>> others and not to
>> >>                 have to  "re-invent" the wheel every time a new text
>> is adopted.
>> >>
>> >>                 Oh, and PS .. Austin is very nice..smiles
>> >>
>> >>                 Bente
>> >>                 Bente J. Casile
>> >>                 Math Learning Specialist
>> >>                 Wake Tech Community College
>> >>                 Raleigh NC
>> >>
>> >>                 -----Original Message-----
>> >>                 From: blindmath-bounces at nfbnet.org [mailto:
>> blindmath-bounces@**nfbnet.org <blindmath-bounces at nfbnet.org>]
>> >>                 On
>> >>                 Behalf Of Ben Humphreys
>> >>                 Sent: Saturday, January 28, 2012 11:17 AM
>> >>                 To: Blind Math list for those interested in mathematics
>> >>                 Subject: Re: [Blindmath] Extracting bitmap images from
>> pdf files
>> >>
>> >>                 Hi Richard,
>> >>
>> >>                 As best I recall, it was a Microsoft Word file which
>> we typically
>> >>                 "saved as" HTML in order to get the graphics extracted
>> out in an
>> >>                 automated way.  Some came out as GIF, others JPEG,
>> leading me to
>> >>                 believe that Word preserves the original file intact.
>>  These were
>> >>                 .DOC, not .DOCX, so I don't believe they were really
>> ZIP files in
>> >>                 DOCX clothing.
>> >>
>> >>                 As my instructor routinely"pasted" in JPGs, GIFs, etc
>> from all around
>> >>                 the world into her Microsoft Word files, it's anyone's
>> guess why a
>> >>                 few got all broken up like that.  Most remained intact.
>> >>
>> >>                 Part way through the class, I ended up having my
>> assistant extract by
>> >>                 hand the images as the automated way was too difficult
>> to distinguish
>> >>                 the garbage (i.e. little arrows and such) from the
>> meaningful calculus
>> >>                 graphs.
>> >>
>> >>                 I have a question -- are you using the most popular
>> university
>> >>                 Physics textbook, whatever that may be?  If so, and we
>> get to the
>> >>                 bottom of this, we could conceivably have a repository
>> of labeled
>> >>                 graphics files so others wouldn't have to repeat this
>> step, and joy
>> >>                 of joys, I could take physics without moving to
>> Austin, :)  This of
>> >>                 course is not to say Austin isn't a great place, it's
>> just that I
>> >>                 might have to move again when I want to take biology
>> or chemestry.
>> >>
>> >>                 As always, thanks for your continued enthusiasm.
>> >>
>> >>                 And as I said, you're welcome to send me a file or two
>> and we'll
>> >>                 throw our Acrobat Pro strategy at it, maybe even
>> consider how it
>> >>                 might be automated.
>> >>
>> >>                 Ben
>> >>
>> >>                 At 08:59 AM 1/28/2012, you wrote:
>> >>
>> >>                 But, no, I do not believe we were dealing with PDFs in
>> this case.
>> >>
>> >>                 Typically, when we have a PDF with a graphic, my
>> assistant draws a
>> >>                 box around it I think and saves it out separately.
>>  I'm not clear on
>> >>                 the process but he did say it required Acrobat Pro and
>> once it's
>> >>                 extracted, it's easy to blow it up to fill the page
>> for easier
>> >>                 tactile understanding.
>> >>
>> >>
>> >>                     Hi Ben,
>> >>
>> >>                     I appreciate your frustration.
>> >>
>> >>                     Were the  "30 itty bitty graphics files" that
>> apparently were small parts
>> >>                     of two actual graphs produced using Acrobat Pro,
>> or were you using some
>> >>                     different image extraction software during that
>> lost weekend?
>> >>
>> >>                     Thanks,
>> >>                     Dick Baldwin
>> >>
>> >>                     On Sat, Jan 28, 2012 at 5:55 AM, Ben Humphreys
>> >> <brh at opticinspiration.org>**wrote:
>> >>
>> >>                         I suppose this procedure could work.
>> >>
>> >>                         But when it's this much effort to get to the
>> starting gate, while other
>> >>                         students are already moving forward and you're
>> falling behind, it's no
>> >>
>> >>                 fun,
>> >>
>> >>                         and the added time and complexity and
>> brainpower just takes all the
>> >>                         motivation out of you.
>> >>
>> >>                         I remember losing a whole weekend to the issue
>> of 30 itty bitty
>> >>
>> >>                 graphics
>> >>
>> >>                         files in a Calculus PDF.  Having embossed
>> them, they were all told to
>> >>
>> >>                 "fit
>> >>
>> >>                         to page" and were thusly huge.  I was thinking
>> they were all graphs and
>> >>                         problems to be interpreted and worked on and
>> understood, only to be
>> >>
>> >>                 told
>> >>
>> >>                         later that there were only two graphs and
>> having the benefit of a
>> >>
>> >>                 sighted
>> >>
>> >>                         person on Monday morning to finally tell me
>> that they were bits and
>> >>
>> >>                 pieces
>> >>
>> >>                         of the two relatively simple graphs.
>> >>
>> >>                         It's enough to make you want to be a Steve
>> Jobs and exit school
>> >>                         prematurely.
>> >>
>> >>                         Prof Baldwin, this is certainly not to say I
>> don't appreciate all your
>> >>                         effforts.  In fact, if and when I ever need to
>> take physics, I am
>> >>
>> >>                 seriously
>> >>
>> >>                         considering relocating to Austin for a
>> semester.
>> >>
>> >>                         P.S. I do have Acrobat pro so if you can send
>> me the single page PDF in
>> >>                         question, we can attempt to extract as a
>> single image.
>> >>
>> >>                         Ben
>> >>
>> >>
>> >>                         At 02:56 PM 1/27/2012, you wrote:
>> >>
>> >>                             In a previous post I wrote:
>> >>
>> >>                             "By the way, I don't know how a blind
>> person would carry out the
>> >>
>> >>                 second
>> >>                 of
>> >>
>> >>                             the following two steps in John's
>> procedure:
>> >>
>> >>                             * import the PDF into IVEO Creator Pro.
>> >>                             * Check the PDF to find which pages have
>> images of interest and emboss
>> >>                             those
>> >>                             pages.
>> >>
>> >>                             It seems that checking the pdf to find
>> which pages have images would
>> >>
>> >>                 be
>> >>
>> >>                             similar to checking a screen shot of a
>> page to find and crop the
>> >>
>> >>                 image.
>> >>                 It
>> >>
>> >>                             seems that you would need to be able to
>> see the pdf on the IVEO screen
>> >>
>> >>                 to
>> >>
>> >>                             know if it contains an image. I am working
>> with pdf files containing
>> >>                             anywhere between 30 and 80 pages.
>> Embossing every page in order to
>> >>                             identify
>> >>                             the pages that contain images would not be
>> practical."
>> >>
>> >>                             I have learned how a blind person could
>> find the pages containing the
>> >>                             images in a pdf file without having to see
>> the screen. Here is one
>> >>                             procedure for doing that.
>> >>
>> >>                             When you import a pdf file into Creator
>> Pro, a set of SVG files is
>> >>                             automatically created in the folder than
>> contains the pdf file. There
>> >>
>> >>                 is
>> >>
>> >>                             one SVG file for each page in the pdf
>> file. The file names indicate
>> >>
>> >>                 the
>> >>
>> >>                             pdf
>> >>                             page number except that pages in a pdf
>> file are typically numbered
>> >>                             beginning with 1 while the file numbers
>> produced by Creator Pro begin
>> >>
>> >>                 with
>> >>
>> >>                             0. Thus, file number 0 will probably
>> correspond to page 1 in the pdf
>> >>                             document.
>> >>
>> >>                             Read the pdf file in your preferred pdf
>> file reader. If from the pdf
>> >>
>> >>                 text,
>> >>
>> >>                             you can determine which pages in the pdf
>> file contain images of
>> >>
>> >>                 interest,
>> >>
>> >>                             you can record those page numbers using
>> whatever method you use to
>> >>
>> >>                 record
>> >>
>> >>                             information of that sort.
>> >>
>> >>                             Then you can import the pdf file into
>> Creator Pro, producing the set
>> >>
>> >>                 of
>> >>
>> >>                             SVG
>> >>                             files described above. Then you can open
>> the SVG files that contain
>> >>                             interesting images in your IVEO viewer
>> software, emboss the pages, and
>> >>                             proceed as John explained in an earlier
>> post.
>> >>
>> >>                             Dick Baldwin
>> >>
>> >>                             On Fri, Jan 27, 2012 at 12:47 PM, Richard
>> Baldwin
>> >> <baldwin at dickbaldwin.com>****wrote:
>> >>
>> >>                                 Michael wrote " There is one option I
>> am aware of for a blind person
>> >>
>> >>                 to
>> >>
>> >>                                 do this independently, IVEO like John
>> suggested,"
>> >>
>> >>                                 I may be wrong, but I didn't get the
>> idea that John's solution will
>> >>                                 produce an output bitmap file - only
>> an embossed image.
>> >>
>> >>                                 I may be wrong again, but as near as I
>> can tell, IVEO doesn't do any
>> >>
>> >>                             image
>> >>
>> >>                                 enhancement prior to embossing the
>> image. If I am wrong on these
>> >>
>> >>                 points,
>> >>
>> >>                                 John will probably come online and set
>> the record straight.
>> >>
>> >>                                 IVEO seems to simply convert the
>> bitmap image to gray scale and
>> >>
>> >>                 emboss
>> >>
>> >>                             the
>> >>
>> >>                                 gray scale. While gray scale embossing
>> is okay for some images
>> >>
>> >>                             (especially
>> >>
>> >>                                 blank and white images), it is
>> definitely not the best option for
>> >>
>> >>                 many
>> >>
>> >>                                 images. After all, if you convert 16
>> million colors to four levels
>> >>
>> >>                 of
>> >>
>> >>                             gray
>> >>
>> >>                                 scale, each level of gray scale
>> represents 4 million different
>> >>
>> >>                 colors.
>> >>
>> >>                                 Pixels belonging to each set of 4
>> million colors will not be
>> >>                                 distinguishable in the gray scale
>> representation.
>> >>
>> >>                                 My objective is to gain access to
>> full-color bitmap images so that I
>> >>
>> >>                 can
>> >>
>> >>                                 enhance the image for embossing prior
>> to throwing away all of the
>> >>
>> >>                 color
>> >>
>> >>                                 information.
>> >>
>> >>                                 Embossed versions of bitmap images are
>> often very difficult to
>> >>
>> >>                             understand,
>> >>
>> >>                                 even with a decent description. I
>> believe we need to do everything
>> >>                                 reasonable to improve the
>> understandability of embossed bitmap
>> >>
>> >>                 images.
>> >>
>> >>                             In
>> >>
>> >>                                 some cases, image enhancement
>> techniques at the full-color stage can
>> >>
>> >>                 be
>> >>
>> >>                                 used to provide those improvements.
>> >>
>> >>                                 So, my quest continues, hopefully
>> without having to pay $445.00 for
>> >>                                 Acrobat Pro, just to get access to the
>> images.
>> >>
>> >>                                 The fallback position, of course, is
>> to use screen shots and an
>> >>
>> >>                 image
>> >>
>> >>                                 editor program to crop out the
>> individual images, but that approach
>> >>
>> >>                 is
>> >>
>> >>                             not
>> >>
>> >>                                 possible for a blind person to use.
>> You can't crop an image out of a
>> >>
>> >>                             screen
>> >>
>> >>                                 shot unless you can see the image.
>> >>
>> >>                                 By the way, I don't know how a blind
>> person would carry out the
>> >>
>> >>                 second
>> >>
>> >>                             of
>> >>
>> >>                                 the following two steps in John's
>> procedure:
>> >>
>> >>                                 * import the PDF into IVEO Creator Pro.
>> >>                                 * Check the PDF to find which pages
>> have images of interest and
>> >>
>> >>                 emboss
>> >>
>> >>                                 those
>> >>                                 pages.
>> >>
>> >>                                 It seems that checking the pdf to find
>> which pages have images would
>> >>
>> >>                 be
>> >>
>> >>                                 similar to checking a screen shot of a
>> page to find and crop the
>> >>
>> >>                 image.
>> >>
>> >>                             It
>> >>
>> >>                                 seems that you would need to be able
>> to see the pdf on the IVEO
>> >>
>> >>                 screen
>> >>
>> >>                             to
>> >>
>> >>                                 know if it contains an image. I am
>> working with pdf files containing
>> >>                                 anywhere between 30 and 80 pages.
>> Embossing every page in order to
>> >>
>> >>                             identify
>> >>
>> >>                                 the pages that contain images would
>> not be practical.
>> >>
>> >>                                 Dick Baldwin
>> >>
>> >>
>> >>                                 On Fri, Jan 27, 2012 at 11:48 AM,
>> Richard Baldwin<
>> >>
>> >>                             baldwin at dickbaldwin.com
>> >>
>> >>                                     wrote:
>> >>                                     Amanda and others,
>> >>
>> >>                                     I have contacted Adobe technical
>> support. There solution to the
>> >>
>> >>                 problem
>> >>
>> >>                                     is to purchase Acrobat Pro for
>> $445.00. The tech support rep told
>> >>
>> >>                 me
>> >>
>> >>                             that
>> >>
>> >>                                     their program will extract the
>> pictures intact as separate bitmap
>> >>
>> >>                             files.
>> >>
>> >>                                     Dick Baldwin
>> >>
>> >>
>> >>                                     On Fri, Jan 27, 2012 at 10:44 AM,
>> Michael Whapples
>> >>
>> >> <mwhapples at aim.com
>> >>
>> >>                                 wrote:
>> >>
>> >>                                         Hello,
>> >>                                          From what you are describing,
>> my feeling is that the
>> >>
>> >>                 diagrams/images
>> >>
>> >>                             in
>> >>
>> >>                                         the PDF in question are
>> created from a number of drawing elements
>> >>
>> >>                             rather
>> >>
>> >>                                         than a single image object.
>> I'm not an expert on PDF, but I think
>> >>
>> >>                 you
>> >>
>> >>                             could
>> >>
>> >>                                         think of it like the
>> difference of a bitmap being a single element
>> >>
>> >>                 (I
>> >>
>> >>                             think
>> >>
>> >>                                         PDF has a way to specify the
>> start of a stream object like a
>> >>
>> >>                 bitmap)
>> >>
>> >>                             and an
>> >>
>> >>                                         SVG being formed from lots of
>> elements like lines and shapes (I
>> >>
>> >>                 think
>> >>
>> >>                             in
>> >>
>> >>                                         PDF the lines and such like
>> can be created with basic PDF drawing
>> >>                                         facilities so are not in a
>> separate object). When the image is
>> >>
>> >>                 formed
>> >>
>> >>                             from
>> >>
>> >>                                         lots of elements then it may
>> be hard for the software to know what
>> >>
>> >>                             makes up
>> >>
>> >>                                         a given diagram in the
>> book/document, it just lays it out as
>> >>
>> >>                             specified and
>> >>
>> >>                                         you work out what's related. I
>> think one way to tell whether you
>> >>
>> >>                 have
>> >>
>> >>                             this
>> >>
>> >>                                         sort of image is to see if
>> NVDA will read some of the text labels
>> >>
>> >>                 of
>> >>
>> >>                             the
>> >>
>> >>                                         image, if it does then its not
>> a pure bitmap (you probably could
>> >>
>> >>                 use
>> >>
>> >>                             the
>> >>
>> >>                                         read out lout function of
>> adobe reader as well). Therefore I
>> >>
>> >>                 imagine
>> >>
>> >>                             that
>> >>
>> >>                                         without clever recognition
>> algorithms you are unlikely to get
>> >>
>> >>                             something
>> >>
>> >>                                         which will extract it as you
>> want.
>> >>
>> >>                                         There is one option I am aware
>> of for a blind person to do this
>> >>                                         independently, IVEO like John
>> suggested, however IVEO isn't a
>> >>
>> >>                 cheap
>> >>
>> >>                             option
>> >>
>> >>                                         and depending on how much is
>> to be done would determine whether
>> >>
>> >>                 its
>> >>
>> >>                             worth
>> >>
>> >>                                         the money if providing
>> accessible diagrams from PDF was its only
>> >>
>> >>                 use.
>> >>
>> >>                             IVEO
>> >>
>> >>                                         does not require a tiger
>> printer, swell paper would work, other
>> >>
>> >>                             embossers
>> >>
>> >>                                         may (the outputting from IVEO
>> is the question as I think it may
>> >>
>> >>                 only
>> >>
>> >>                             output
>> >>
>> >>                                         to devices appearing as
>> standard printers). Interesting, the IVEO
>> >>
>> >>                             route
>> >>
>> >>                                         again is requiring a human to
>> make the decision on what forms the
>> >>
>> >>                             diagram.
>> >>
>> >>                                         Michael Whapples
>> >>
>> >>                                         -----Original Message-----
>> From: Richard Baldwin
>> >>                                         Sent: Friday, January 27, 2012
>> 3:28 PM
>> >>                                         To: Jamal Mazrui
>> >>                                         Cc: Blind Math list for those
>> interested in mathematics
>> >>                                         Subject: Re: [Blindmath]
>> Extracting bitmap images from pdf files
>> >>
>> >>
>> >>                                         Hi Jamal,
>> >>
>> >>                                         It is a great program, easy to
>> use, and probably totally
>> >>
>> >>                 accessible. I
>> >>
>> >>                                         particularly like the fact
>> that the program doesn't require a
>> >>
>> >>                 windows
>> >>
>> >>                                         installation. The output data
>> is well organized and including the
>> >>
>> >>                 page
>> >>
>> >>                                         numbers in the bmp file names
>> is a great help in analyzing them.
>> >>
>> >>                                         Unfortunately, the output
>> produced by the program suffers from the
>> >>
>> >>                             same
>> >>
>> >>                                         issues that I have encountered
>> with all of the other image
>> >>
>> >>                 extractor
>> >>
>> >>                                         programs that I have tried. A
>> few of the images come out intact.
>> >>
>> >>                 Most
>> >>
>> >>                             of
>> >>
>> >>                                         the images don't come out
>> intact.
>> >>
>> >>                                         For example, page three of one
>> of the pdf files that I tested has
>> >>
>> >>                 a
>> >>
>> >>                                         single
>> >>                                         image of a battery. It is the
>> same image that I enhanced and
>> >>
>> >>                 posted
>> >>
>> >>                             in an
>> >>
>> >>                                         earlier post. Your program
>> produced 54 bmp files for that page. A
>> >>
>> >>                 few
>> >>
>> >>                             of
>> >>
>> >>                                         them were icons such as arrows
>> exclamation marks, etc. The
>> >>
>> >>                 remaining
>> >>
>> >>                             bmp
>> >>
>> >>                                         files appear to be a very
>> small pieces of the image of the
>> >>
>> >>                 battery.
>> >>                 By
>> >>
>> >>                                         the
>> >>                                         way, I got the earlier image
>> of the battery by taking a screen
>> >>
>> >>                 shot
>> >>                 of
>> >>
>> >>                                         the
>> >>                                         page and using an image
>> editing program to crop out the battery
>> >>
>> >>                 image.
>> >>
>> >>                                         None
>> >>                                         of the image extraction
>> programs that I have tested extract the
>> >>
>> >>                 image
>> >>
>> >>                                         intact.
>> >>
>> >>                                         I don't know anything at all
>> about the internal structure of pdf
>> >>
>> >>                             files,
>> >>
>> >>                                         and
>> >>                                         this behavior of breaking an
>> image into many small pieces may
>> >>
>> >>                 depend
>> >>
>> >>                             on
>> >>
>> >>                                         how
>> >>                                         the file is constructed in the
>> first place. In any event, my
>> >>
>> >>                 immediate
>> >>
>> >>                                         problem has to do with a
>> specific set of pdf files that are the
>> >>
>> >>                             chapters
>> >>
>> >>                                         from a specific physics book,
>> so this program doesn't solve my
>> >>
>> >>                             problem.
>> >>
>> >>                                         Thanks for offering the
>> program.
>> >>                                         Dick Baldwin
>> >>
>> >>                                         On Fri, Jan 27, 2012 at 5:18
>> AM, Jamal Mazrui<empower at smart.net>
>> >>
>> >>                             wrote:
>> >>
>> >>                                          In an attempt to facilitate a
>> free, non-web dependent solution, I
>> >>
>> >>                             have
>> >>
>> >>                                             written a Windows
>> console-mode utility called PDF2Images, built
>> >>
>> >>                 with
>> >>
>> >>                                             PowerBASIC and a PDF
>> library.  The distribution archive,
>> >>
>> >>                 including
>> >>
>> >>                                             documentation and source
>> code, is available at
>> >>
>> >>
>> >>                 http://empowermentzone.com/********pdf2images.zip<http://empowermentzone.com/******pdf2images.zip>
>> <
>> >>                 http://empowermentzone.com/*
>> >>                 ***pdf2images.zip>
>> >> <http://**empowermentzone.com/******pdf2images.zip<http://empowermentzone.com/****pdf2images.zip>
>> <
>> >>                 http://empowermentzone.com/
>> >>                 **pdf2images.zip>
>> >> <http://**empowermentzone.com/******pdf2images.zip<http://empowermentzone.com/****pdf2images.zip>
>> <
>> >>                 http://empowermentzone.com/
>> >>                 **pdf2images.zip>
>> >> <http://**empowermentzone.com/****pdf2images.zip<http://empowermentzone.com/**pdf2images.zip>
>> <
>> >>                 http://empowermentzone.com/pd
>> >>                 f2images.zip>
>> >>
>> >>
>> >>                                             I am interested in any
>> feedback on how well it works compared to
>> >>
>> >>                             other
>> >>
>> >>                                             approaches.
>> >>
>> >>                                             Jamal
>> >>
>> >>
>> >>
>> >>
>> >>                                         --
>> >>                                         Richard G. Baldwin (Dick
>> Baldwin)
>> >>                                         Home of Baldwin's on-line Java
>> Tutorials
>> >>                                         http://www.DickBaldwin.com
>> >>
>> >>                                         Professor of Computer
>> Information Technology
>> >>                                         Austin Community College
>> >>                                         (512) 223-4758
>> >>                                         mailto:Baldwin at DickBaldwin.com
>> >>
>> >>                     http://www.austincc.edu/******baldwin/<http://www.austincc.edu/****baldwin/>
>> <http://www.austincc.**edu/**baldwin/<http://www.austincc.edu/**baldwin/>
>> >> <
>> >>
>> >>                             http://www.austincc.edu/****baldwin/<http://www.austincc.edu/**baldwin/>
>> <http://www.austincc.**edu/baldwin/ <http://www.austincc.edu/baldwin/>
>> >>
>> >>                                         ______________________________
>> ******_________________
>> >>                                         Blindmath mailing list
>> >>                                         Blindmath at nfbnet.org
>> >>
>> >>                 http://nfbnet.org/mailman/******
>> listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/****listinfo/blindmath_nfbnet.org>
>> <
>> >>                 http://nfbnet.or
>> >>                 g/mailman/**listinfo/blindmath**_nfbnet.org<http://blindmath_nfbnet.org>
>> >
>> >> <**http://nfbnet.org/mailman/****listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org>
>> **<
>> >>                 http://nfbnet.o
>> >>                 rg/mailman/listinfo/blindmath_**nfbnet.org<http://blindmath_nfbnet.org>
>> >
>> >>
>> >>                                         To unsubscribe, change your
>> list options or get your account info
>> >>
>> >>                 for
>> >>
>> >>                                         Blindmath:
>> >>
>> >>                 http://nfbnet.org/mailman/******
>> options/blindmath_nfbnet.org/******<http://nfbnet.org/mailman/****options/blindmath_nfbnet.org/****>
>> <
>> >>                 http://nfbne
>> >>                 t.org/mailman/**options/**blindmath_nfbnet.org/**<http://t.org/mailman/**options/blindmath_nfbnet.org/**>
>> >
>> >>
>> >>                                         mwhapples%40aim.com<http://**n
>> **fbnet.org/mailman/options/** <http://nfbnet.org/mailman/options/**>>>
>> >>
>> >>                 blindmath_nfbnet.org/****mwhapples%40aim.com<http://blindmath_nfbnet.org/**mwhapples%40aim.com>
>> <
>> >>                 http://nfbnet.org/mailman/**options<http://nfbnet.org/mailman/options>
>> >>                 /blindmath_nfbnet.org/**mwhapples%40aim.com<http://blindmath_nfbnet.org/mwhapples%40aim.com>
>> >
>> >>
>> >>                                         ______________________________
>> ******_________________
>> >>                                         Blindmath mailing list
>> >>                                         Blindmath at nfbnet.org
>> >>
>> >>                 http://nfbnet.org/mailman/******
>> listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/****listinfo/blindmath_nfbnet.org>
>> <
>> >>                 http://nfbnet.or
>> >>                 g/mailman/**listinfo/blindmath**_nfbnet.org<http://blindmath_nfbnet.org>
>> >
>> >> <**http://nfbnet.org/mailman/****listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org>
>> **<
>> >>                 http://nfbnet.o
>> >>                 rg/mailman/listinfo/blindmath_**nfbnet.org<http://blindmath_nfbnet.org>
>> >
>> >>
>> >>                                         To unsubscribe, change your
>> list options or get your account info
>> >>
>> >>                 for
>> >>
>> >>                                         Blindmath:
>> >>
>> >>                 http://nfbnet.org/mailman/******
>> options/blindmath_nfbnet.org/******<http://nfbnet.org/mailman/****options/blindmath_nfbnet.org/****>
>> <
>> >>                 http://nfbne
>> >>                 t.org/mailman/**options/**blindmath_nfbnet.org/**<http://t.org/mailman/**options/blindmath_nfbnet.org/**>
>> >
>> >>
>> >>                                         baldwin%40dickbaldwin.com<**
>> http**://nfbnet.org/mailman/**options/**<http://nfbnet.org/mailman/options/**>
>> >>
>> >>                 blindmath_nfbnet.org/baldwin%****40dickbaldwin.com<
>> >>                 http://nfbnet.org/mailman/o
>> >>                 ptions/blindmath_nfbnet.org/**
>> baldwin%40dickbaldwin.com<http://blindmath_nfbnet.org/baldwin%40dickbaldwin.com>
>> >
>> >>
>> >>
>> >>
>> >>                                     --
>> >>                                     Richard G. Baldwin (Dick Baldwin)
>> >>                                     Home of Baldwin's on-line Java
>> Tutorials
>> >>                                     http://www.DickBaldwin.com
>> >>
>> >>                                     Professor of Computer Information
>> Technology
>> >>                                     Austin Community College
>> >>                                     (512) 223-4758
>> >>                                     mailto:Baldwin at DickBaldwin.com
>> >>                                     http://www.austincc.edu/****
>> baldwin/ <http://www.austincc.edu/**baldwin/>
>> >>
>> >> <http://www.austincc.edu/**baldwin/ <http://www.austincc.edu/baldwin/>
>> >
>> >>
>> >>
>> >>
>> >>                                 --
>> >>                                 Richard G. Baldwin (Dick Baldwin)
>> >>                                 Home of Baldwin's on-line Java
>> Tutorials
>> >>                                 http://www.DickBaldwin.com
>> >>
>> >>                                 Professor of Computer Information
>> Technology
>> >>                                 Austin Community College
>> >>                                 (512) 223-4758
>> >>                                 mailto:Baldwin at DickBaldwin.com
>> >>                                 http://www.austincc.edu/****baldwin/<http://www.austincc.edu/**baldwin/>
>> <
>> >>
>> >>                 http://www.austincc.edu/**baldwin/<http://www.austincc.edu/baldwin/>
>> >
>> >>
>> >>
>> >>
>> >>                             --
>> >>                             Richard G. Baldwin (Dick Baldwin)
>> >>                             Home of Baldwin's on-line Java Tutorials
>> >>                             http://www.DickBaldwin.com
>> >>
>> >>                             Professor of Computer Information
>> Technology
>> >>                             Austin Community College
>> >>                             (512) 223-4758
>> >>                             mailto:Baldwin at DickBaldwin.com
>> >>                             http://www.austincc.edu/****baldwin/<http://www.austincc.edu/**baldwin/>
>> <http://www.austincc.**edu/baldwin/ <http://www.austincc.edu/baldwin/>>
>> >>                             ______________________________**
>> **_________________
>> >>                             Blindmath mailing list
>> >>                             Blindmath at nfbnet.org
>> >>
>> >>                 http://nfbnet.org/mailman/****
>> listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org>
>> <
>> >>                 http://nfbnet.org/
>> >>                 mailman/listinfo/blindmath_**nfbnet.org<http://blindmath_nfbnet.org>
>> >
>> >>
>> >>                             To unsubscribe, change your list options
>> or get your account info for
>> >>                             Blindmath:
>> >>                             http://nfbnet.org/mailman/****
>> options/blindmath_nfbnet.org/****<http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**>
>> >>
>> >>                 brh%40opticinspiration.org<
>> >>                 http://nfbnet.org/mailman/**options/blindmath_nfbne<http://nfbnet.org/mailman/options/blindmath_nfbne>
>> >>                 t.org/brh%40opticinspiration.**org<http://t.org/brh%40opticinspiration.org>
>> >
>> >>
>> >>
>> >>                         ______________________________**
>> **_________________
>> >>                         Blindmath mailing list
>> >>                         Blindmath at nfbnet.org
>> >>
>> >>                 http://nfbnet.org/mailman/****
>> listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org>
>> <
>> >>                 http://nfbnet.org/
>> >>                 mailman/listinfo/blindmath_**nfbnet.org<http://blindmath_nfbnet.org>
>> >
>> >>
>> >>                         To unsubscribe, change your list options or
>> get your account info for
>> >>                         Blindmath:
>> >>                         http://nfbnet.org/mailman/****
>> options/blindmath_nfbnet.org/****<http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**>
>> >>
>> >>                 baldwin%40dickbaldwin.com<
>> >>                 http://nfbnet.org/mailman/**options/blindmath_nfbnet<http://nfbnet.org/mailman/options/blindmath_nfbnet>
>> >>                 .org/baldwin%40dickbaldwin.com**>
>> >>
>> >>
>> >>
>> >>                     --
>> >>                     Richard G. Baldwin (Dick Baldwin)
>> >>                     Home of Baldwin's on-line Java Tutorials
>> >>                     http://www.DickBaldwin.com
>> >>
>> >>                     Professor of Computer Information Technology
>> >>                     Austin Community College
>> >>                     (512) 223-4758
>> >>                     mailto:Baldwin at DickBaldwin.com
>> >>                     http://www.austincc.edu/**baldwin/<http://www.austincc.edu/baldwin/>
>> >>                     ______________________________**_________________
>> >>                     Blindmath mailing list
>> >>                     Blindmath at nfbnet.org
>> >>                     http://nfbnet.org/mailman/**
>> listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org>
>> >>                     To unsubscribe, change your list options or get
>> your account info
>> >>                     for Blindmath:
>> >>
>> >>                 http://nfbnet.org/mailman/**
>> options/blindmath_nfbnet.org/**brh%40opticinspirati<http://nfbnet.org/mailman/options/blindmath_nfbnet.org/brh%40opticinspirati>
>> >>                 on.org
>> >>
>> >>
>> >>                 ______________________________**_________________
>> >>                 Blindmath mailing list
>> >>                 Blindmath at nfbnet.org
>> >>                 http://nfbnet.org/mailman/**
>> listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org>
>> >>                 To unsubscribe, change your list options or get your
>> account info for
>> >>                 Blindmath:
>> >>                 http://nfbnet.org/mailman/**
>> options/blindmath_nfbnet.org/**bente%40casilenc.com<http://nfbnet.org/mailman/options/blindmath_nfbnet.org/bente%40casilenc.com>
>> >>
>> >>
>> >>                 ______________________________**_________________
>> >>                 Blindmath mailing list
>> >>                 Blindmath at nfbnet.org
>> >>                 http://nfbnet.org/mailman/**
>> listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org>
>> >>                 To unsubscribe, change your list options or get your
>> account info for
>> >>                 Blindmath:
>> >>
>> >>                 http://nfbnet.org/mailman/**
>> options/blindmath_nfbnet.org/**baldwin%40dickbaldwin.com<http://nfbnet.org/mailman/options/blindmath_nfbnet.org/baldwin%40dickbaldwin.com>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>     --
>> >>     Richard G. Baldwin (Dick Baldwin)
>> >>     Home of Baldwin's on-line Java Tutorials
>> >>     http://www.DickBaldwin.com
>> >>
>> >>     Professor of Computer Information Technology
>> >>     Austin Community College
>> >>     (512) 223-4758
>> >>     mailto:Baldwin at DickBaldwin.com
>> >>     http://www.austincc.edu/**baldwin/<http://www.austincc.edu/baldwin/>
>> >
>> >
>> >
>> > --
>> > Richard G. Baldwin (Dick Baldwin)
>> > Home of Baldwin's on-line Java Tutorials
>> > http://www.DickBaldwin.com
>> >
>> > Professor of Computer Information Technology
>> > Austin Community College
>> > (512) 223-4758
>> > mailto:Baldwin at DickBaldwin.com
>> > http://www.austincc.edu/**baldwin/ <http://www.austincc.edu/baldwin/>
>>
>>
>
>
> --
> Richard G. Baldwin (Dick Baldwin)
> Home of Baldwin's on-line Java Tutorials
> http://www.DickBaldwin.com
>
> Professor of Computer Information Technology
> Austin Community College
> (512) 223-4758
> mailto:Baldwin at DickBaldwin.com
> http://www.austincc.edu/baldwin/
>



-- 
Richard G. Baldwin (Dick Baldwin)
Home of Baldwin's on-line Java Tutorials
http://www.DickBaldwin.com

Professor of Computer Information Technology
Austin Community College
(512) 223-4758
mailto:Baldwin at DickBaldwin.com
http://www.austincc.edu/baldwin/



More information about the BlindMath mailing list