[Blindmath] Extracting bitmap images from pdf files

Richard Baldwin baldwin at dickbaldwin.com
Mon Jan 30 02:55:43 UTC 2012


Great work Jamal,

The program works great. I ran it on the largest pdf file in the set for
Amanda's physics book with no problems.

Here is what would work well for me.

Two exe files in the same package -- one for 150 bpi and the other for 300
bpi. The large (300 bpi) pages are hard to deal with on a small monitor,
but they make it possible to go in and crop out high quality versions of
small images that were created with Adobe vector graphics.

On the other hand, the small (150 bpi) pages are entirely adequate for
cropping out images that were originally bitmap images or large vector
images. And, the small pages are easier to work with.

Therefore, both versions are useful.

If practical, suppress the output of the bmp, jpg, and txt files. I don't
need them. If not practical, don't worry about it. It is easy enough to
delete them.

Thanks for taking the initiative and doing this.

Dick Baldwin

On Sun, Jan 29, 2012 at 6:02 PM, Jamal Mazrui <empower at smart.net> wrote:

> Hi Dick,
> Sorry my prior message was not clear about this.  After copying the new
> pdf2images.exe into the directory you used for PDF2Parts, you would then
> run pdf2images.exe, passing it the file name of the PDF to analyze.  I
> suspect that you instead ran pdf2parts.exe again, which would, indeed,
> produce the same result as before.
>
> I just tried this pdf2images.exe with a book that is 873 pages in size.
>  It appeared to create a .TIF for each page.
>
> For just converting PDFs to text, let me suggest my older, PDF2TXT
> program, based on the same PDF library.  It can convert batches of PDF with
> a simple GUI dialog.  It can also do OCR on image-only PDFs using the free,
> open source Tesseract utility from Google.  That OCR is not high quality by
> today's standards.
>
> PDF2TXT is available as a Windows installer at
>
> http://EmpowermentZone.com/**p2tsetup.exe<http://EmpowermentZone.com/p2tsetup.exe>
>
> Its full documentation may be browsed at
>
> http://empowermentzone.com/**pdf2txt.htm<http://empowermentzone.com/pdf2txt.htm>
>
> Jamal
>
>
>
>
> On 1/29/2012 6:42 PM, Richard Baldwin wrote:
> > Hi Jamal,
> > The output from this version is not much different from the previous
> version. The program still crashed on page 17 of the small pdf file. I also
> noticed that it skipped page 13.
> > I tried a larger pdf file and it crashed on page 6 of that file.
> > I don't believe the tiff files were actually created at 300 dpi. The
> width of those files is 1275 pixels, which matches 8.5 inches at 150 dpi.
> > I did discover one thing that may be different. Although I was unable to
> successfully open the jpg files in Lview Pro, which is the image editor
> program that I have used for years, I was able to successfully open them in
> Windows Paint and also in a program named Paint.net that I occasionally
> use. That was probably also true for the earlier version. I simply didn't
> try it. Curiously, the jpg files seemed to be in reverse video when opened
> in those paint programs.
> > Don't spend time worrying about the jpg files. They add very little
> benefit to the overall result. As far as I am concerned, you could suppress
> the output of images from the individual pages, because they are of little
> value.
> > Amanda might be happy with the .txt files that appear to contain the
> text from the pdf file in a plain text format on a page by page basis.
> > Dick Baldwin
> >
> > On Sun, Jan 29, 2012 at 4:54 PM, Jamal Mazrui <empower at smart.net> wrote:
> >
> >     Hi Dick,
> >     With the PDF library I have, I do not see a way of adjusting the
> format of JPG output, other than the DPI setting, unfortunately.  Perhaps
> the free Image Magick software could transform those files into something
> more useful -- not sure.
> >
> >     I think I may have found a way, however, to improve the reliability
> of simply producing a TIF file for each whole page of the PDF.  The library
> has a function call for this that processes all pages at once.  Memory
> seems to be managed better than when iterating through each page of the PDF
> separately, which I suspect is causing the crashes with PDFs that are not
> relatively small in size.
> >
> >     I just posted a utility that only does that task at 300 DPI.  It has
> the original PDF2Images name and is available at
> >
> >     http://EmpowermentZone.com/**pdf2images.zip<http://EmpowermentZone.com/pdf2images.zip>
> >
> >     Just unzip it to the same directory as PDF2Parts (it uses the same
> PDF2Parts.dll).
> >
> >     A minor annoyance is that this technique does not right justify page
> numbers (the single function call mostly handles the names of individual
> .tif files).  So, the output files do not sort correctly in an alphabetical
> directory listing.  If files are sorted by time, however, the right order
> is attained.
> >
> >     Can you let me know how well this utility works?  If I get it
> working adequately, I will incorporate it into a single, coherent package.
> >
> >     Jamal
> >
> >
> >
> >     On 1/29/2012 4:47 PM, Richard Baldwin wrote:
> >>     Hi Jamal,
> >>     I ran the new version of the program for a relatively small pdf
> file, which was one of the smallest chapters in the physics textbook. The
> program stopped with an error on page 17 of about 24 pages. However, it did
> produce a lot of output before stopping.
> >>     The tiff files that represent individual pages look good. If
> possible, I would like to see if conversion to 300 dpi as opposed to 150
> dpi would provide improved image quality.
> >>     The bmp and jpg files for the individual images on each page suffer
> from the same problem discussed in previous posts. Mostly small pieces of
> larger images. In addition, the jpg files appear to be corrupt. They appear
> to suffer from some sort of synchronization problem that causes them to
> consist mainly of vertical bars. However, it was possible for me to
> correlate one of them to an actual image in the book. I suspect that these
> are the images from the pdf file that are stored as raster images in the
> pdf file.
> >>     Once you get the program to handle complete pdf files, I will
> consider it superior to online conversion of pdf files to bitmap pages. If
> you can fix the problem with the jpg files, that would be useful because
> they contain images that a sighted assistant won't need to crop out of the
> larger page images.
> >>     Thanks,
> >>     Dick Baldwin
> >>
> >>     On Sun, Jan 29, 2012 at 11:21 AM, Jamal Mazrui <empower at smart.net>
> wrote:
> >>
> >>         Dick,
> >>         I just posted a revised and renamed version of my program,
> which extracts both text and images.  PDF2Parts is available at
> >>         http://EmpowermentZone.com/**pdf2parts.zip<http://EmpowermentZone.com/pdf2parts.zip>
> >>
> >>         Currently, it creates a .tif version of each PDF page at 150
> DPI.  Alternatively, I could make it save as .bmp or .jpg, and vary the
> resolution.  Would another image format or DPI work better for what you are
> trying to do?
> >>
> >>         Jamal
> >>
> >>         P.S.  The program seems to hang on large PDFs sometimes.  I
> have not figured out the pattern and debugged that yet.
> >>
> >>
> >>         On 1/28/2012 2:29 PM, Richard Baldwin wrote:
> >>
> >>             I will be responding to questions and comments from several
> different
> >>             individuals in this post, so I will refer to each person by
> name.
> >>
> >>             Maureen: I will be happy to send some files off list for
> you to emboss and
> >>             evaluate if you would be interested in doing that. I would
> be interested in
> >>             your feed back.
> >>
> >>             Jamal: You wrote "In reviewing the documentation for the
> PDF library I'm
> >>             using, I notice there is also the ability to save each page
> as an image.
> >>              Would that be helpful?"
> >>
> >>             That would be very helpful. I have generally concluded
> (more on this in a
> >>             separate post) that the most practical way for a sighted
> person to extract
> >>             images from a pdf file for a blind student is to deal with
> each page as an
> >>             image file, crop, cut, copy, and paste. I have identified a
> free website
> >>             that will convert a pdf file to a set of image files, but
> the less often I
> >>             am required to download files from strange websites, the
> happier I am. I
> >>             never know what may be riding those files into my computer.
> Your
> >>             stand-alone command-line based program would make it
> possible to make the
> >>             conversion locally. Please provide more information.
> >>
> >>             Ben: You wrote "I have a question -- are you using the most
> popular
> >>             university Physics textbook, whatever that may be?"
> >>
> >>             Actually, I teach Computer Science and not physics. Amanda
> is a Computer
> >>             Science student, and I am helping her in a required physics
> course. Her
> >>             physics book is the only one that I know anything about.
> However, I believe
> >>             this pdf-image issue applies to many college-level
> textbooks, because many
> >>             blind college students probably receive their electronic
> textbooks in pdf
> >>             format. Once again, however, the only one that I have any
> personal
> >>             knowledge about is Amanda's physics book.
> >>
> >>             I will send you a pdf file of one of the chapters from the
> textbook off
> >>             list later today.
> >>
> >>             Bente: You wrote "If we could stick with a text for more
> than two years it
> >>             would be so helpful."
> >>
> >>             I will simply say a loud AMEN to that. In my 18 years of
> teaching, I have
> >>             never understood why community colleg instructors insist on
> changing
> >>             textbooks so frequently, causing much more work for
> themselves in the
> >>             process. I have gotten to the point that I tell my students
> that the
> >>             textbook is for reference purposes only and the material
> for the course is
> >>             published at http://www.dickbaldwin.com.
> >>
> >>             Dick Baldwin
> >>
> >>             On Sat, Jan 28, 2012 at 1:01 PM, Bente Casile<
> bente at casilenc.com>  wrote:
> >>
> >>                 Ben,
> >>
> >>                 My greatest wish for all the blind students out there
> is that we in the
> >>                 college system could have a repository of tactile
> graphics for science and
> >>                 math classes.  If we could stick with a text for more
> than two years it
> >>                 would be so helpful.  As someone who makes math tactile
> graphics for our
> >>                 students, I would love to see that happen.  It would
> allow us to get ahead
> >>                 for students to benefit directly from the hard work of
> others and not to
> >>                 have to  "re-invent" the wheel every time a new text is
> adopted.
> >>
> >>                 Oh, and PS .. Austin is very nice..smiles
> >>
> >>                 Bente
> >>                 Bente J. Casile
> >>                 Math Learning Specialist
> >>                 Wake Tech Community College
> >>                 Raleigh NC
> >>
> >>                 -----Original Message-----
> >>                 From: blindmath-bounces at nfbnet.org [mailto:
> blindmath-bounces@**nfbnet.org <blindmath-bounces at nfbnet.org>]
> >>                 On
> >>                 Behalf Of Ben Humphreys
> >>                 Sent: Saturday, January 28, 2012 11:17 AM
> >>                 To: Blind Math list for those interested in mathematics
> >>                 Subject: Re: [Blindmath] Extracting bitmap images from
> pdf files
> >>
> >>                 Hi Richard,
> >>
> >>                 As best I recall, it was a Microsoft Word file which we
> typically
> >>                 "saved as" HTML in order to get the graphics extracted
> out in an
> >>                 automated way.  Some came out as GIF, others JPEG,
> leading me to
> >>                 believe that Word preserves the original file intact.
>  These were
> >>                 .DOC, not .DOCX, so I don't believe they were really
> ZIP files in
> >>                 DOCX clothing.
> >>
> >>                 As my instructor routinely"pasted" in JPGs, GIFs, etc
> from all around
> >>                 the world into her Microsoft Word files, it's anyone's
> guess why a
> >>                 few got all broken up like that.  Most remained intact.
> >>
> >>                 Part way through the class, I ended up having my
> assistant extract by
> >>                 hand the images as the automated way was too difficult
> to distinguish
> >>                 the garbage (i.e. little arrows and such) from the
> meaningful calculus
> >>                 graphs.
> >>
> >>                 I have a question -- are you using the most popular
> university
> >>                 Physics textbook, whatever that may be?  If so, and we
> get to the
> >>                 bottom of this, we could conceivably have a repository
> of labeled
> >>                 graphics files so others wouldn't have to repeat this
> step, and joy
> >>                 of joys, I could take physics without moving to Austin,
> :)  This of
> >>                 course is not to say Austin isn't a great place, it's
> just that I
> >>                 might have to move again when I want to take biology or
> chemestry.
> >>
> >>                 As always, thanks for your continued enthusiasm.
> >>
> >>                 And as I said, you're welcome to send me a file or two
> and we'll
> >>                 throw our Acrobat Pro strategy at it, maybe even
> consider how it
> >>                 might be automated.
> >>
> >>                 Ben
> >>
> >>                 At 08:59 AM 1/28/2012, you wrote:
> >>
> >>                 But, no, I do not believe we were dealing with PDFs in
> this case.
> >>
> >>                 Typically, when we have a PDF with a graphic, my
> assistant draws a
> >>                 box around it I think and saves it out separately.  I'm
> not clear on
> >>                 the process but he did say it required Acrobat Pro and
> once it's
> >>                 extracted, it's easy to blow it up to fill the page for
> easier
> >>                 tactile understanding.
> >>
> >>
> >>                     Hi Ben,
> >>
> >>                     I appreciate your frustration.
> >>
> >>                     Were the  "30 itty bitty graphics files" that
> apparently were small parts
> >>                     of two actual graphs produced using Acrobat Pro, or
> were you using some
> >>                     different image extraction software during that
> lost weekend?
> >>
> >>                     Thanks,
> >>                     Dick Baldwin
> >>
> >>                     On Sat, Jan 28, 2012 at 5:55 AM, Ben Humphreys
> >> <brh at opticinspiration.org>**wrote:
> >>
> >>                         I suppose this procedure could work.
> >>
> >>                         But when it's this much effort to get to the
> starting gate, while other
> >>                         students are already moving forward and you're
> falling behind, it's no
> >>
> >>                 fun,
> >>
> >>                         and the added time and complexity and
> brainpower just takes all the
> >>                         motivation out of you.
> >>
> >>                         I remember losing a whole weekend to the issue
> of 30 itty bitty
> >>
> >>                 graphics
> >>
> >>                         files in a Calculus PDF.  Having embossed them,
> they were all told to
> >>
> >>                 "fit
> >>
> >>                         to page" and were thusly huge.  I was thinking
> they were all graphs and
> >>                         problems to be interpreted and worked on and
> understood, only to be
> >>
> >>                 told
> >>
> >>                         later that there were only two graphs and
> having the benefit of a
> >>
> >>                 sighted
> >>
> >>                         person on Monday morning to finally tell me
> that they were bits and
> >>
> >>                 pieces
> >>
> >>                         of the two relatively simple graphs.
> >>
> >>                         It's enough to make you want to be a Steve Jobs
> and exit school
> >>                         prematurely.
> >>
> >>                         Prof Baldwin, this is certainly not to say I
> don't appreciate all your
> >>                         effforts.  In fact, if and when I ever need to
> take physics, I am
> >>
> >>                 seriously
> >>
> >>                         considering relocating to Austin for a semester.
> >>
> >>                         P.S. I do have Acrobat pro so if you can send
> me the single page PDF in
> >>                         question, we can attempt to extract as a single
> image.
> >>
> >>                         Ben
> >>
> >>
> >>                         At 02:56 PM 1/27/2012, you wrote:
> >>
> >>                             In a previous post I wrote:
> >>
> >>                             "By the way, I don't know how a blind
> person would carry out the
> >>
> >>                 second
> >>                 of
> >>
> >>                             the following two steps in John's procedure:
> >>
> >>                             * import the PDF into IVEO Creator Pro.
> >>                             * Check the PDF to find which pages have
> images of interest and emboss
> >>                             those
> >>                             pages.
> >>
> >>                             It seems that checking the pdf to find
> which pages have images would
> >>
> >>                 be
> >>
> >>                             similar to checking a screen shot of a page
> to find and crop the
> >>
> >>                 image.
> >>                 It
> >>
> >>                             seems that you would need to be able to see
> the pdf on the IVEO screen
> >>
> >>                 to
> >>
> >>                             know if it contains an image. I am working
> with pdf files containing
> >>                             anywhere between 30 and 80 pages. Embossing
> every page in order to
> >>                             identify
> >>                             the pages that contain images would not be
> practical."
> >>
> >>                             I have learned how a blind person could
> find the pages containing the
> >>                             images in a pdf file without having to see
> the screen. Here is one
> >>                             procedure for doing that.
> >>
> >>                             When you import a pdf file into Creator
> Pro, a set of SVG files is
> >>                             automatically created in the folder than
> contains the pdf file. There
> >>
> >>                 is
> >>
> >>                             one SVG file for each page in the pdf file.
> The file names indicate
> >>
> >>                 the
> >>
> >>                             pdf
> >>                             page number except that pages in a pdf file
> are typically numbered
> >>                             beginning with 1 while the file numbers
> produced by Creator Pro begin
> >>
> >>                 with
> >>
> >>                             0. Thus, file number 0 will probably
> correspond to page 1 in the pdf
> >>                             document.
> >>
> >>                             Read the pdf file in your preferred pdf
> file reader. If from the pdf
> >>
> >>                 text,
> >>
> >>                             you can determine which pages in the pdf
> file contain images of
> >>
> >>                 interest,
> >>
> >>                             you can record those page numbers using
> whatever method you use to
> >>
> >>                 record
> >>
> >>                             information of that sort.
> >>
> >>                             Then you can import the pdf file into
> Creator Pro, producing the set
> >>
> >>                 of
> >>
> >>                             SVG
> >>                             files described above. Then you can open
> the SVG files that contain
> >>                             interesting images in your IVEO viewer
> software, emboss the pages, and
> >>                             proceed as John explained in an earlier
> post.
> >>
> >>                             Dick Baldwin
> >>
> >>                             On Fri, Jan 27, 2012 at 12:47 PM, Richard
> Baldwin
> >> <baldwin at dickbaldwin.com>****wrote:
> >>
> >>                                 Michael wrote " There is one option I
> am aware of for a blind person
> >>
> >>                 to
> >>
> >>                                 do this independently, IVEO like John
> suggested,"
> >>
> >>                                 I may be wrong, but I didn't get the
> idea that John's solution will
> >>                                 produce an output bitmap file - only an
> embossed image.
> >>
> >>                                 I may be wrong again, but as near as I
> can tell, IVEO doesn't do any
> >>
> >>                             image
> >>
> >>                                 enhancement prior to embossing the
> image. If I am wrong on these
> >>
> >>                 points,
> >>
> >>                                 John will probably come online and set
> the record straight.
> >>
> >>                                 IVEO seems to simply convert the bitmap
> image to gray scale and
> >>
> >>                 emboss
> >>
> >>                             the
> >>
> >>                                 gray scale. While gray scale embossing
> is okay for some images
> >>
> >>                             (especially
> >>
> >>                                 blank and white images), it is
> definitely not the best option for
> >>
> >>                 many
> >>
> >>                                 images. After all, if you convert 16
> million colors to four levels
> >>
> >>                 of
> >>
> >>                             gray
> >>
> >>                                 scale, each level of gray scale
> represents 4 million different
> >>
> >>                 colors.
> >>
> >>                                 Pixels belonging to each set of 4
> million colors will not be
> >>                                 distinguishable in the gray scale
> representation.
> >>
> >>                                 My objective is to gain access to
> full-color bitmap images so that I
> >>
> >>                 can
> >>
> >>                                 enhance the image for embossing prior
> to throwing away all of the
> >>
> >>                 color
> >>
> >>                                 information.
> >>
> >>                                 Embossed versions of bitmap images are
> often very difficult to
> >>
> >>                             understand,
> >>
> >>                                 even with a decent description. I
> believe we need to do everything
> >>                                 reasonable to improve the
> understandability of embossed bitmap
> >>
> >>                 images.
> >>
> >>                             In
> >>
> >>                                 some cases, image enhancement
> techniques at the full-color stage can
> >>
> >>                 be
> >>
> >>                                 used to provide those improvements.
> >>
> >>                                 So, my quest continues, hopefully
> without having to pay $445.00 for
> >>                                 Acrobat Pro, just to get access to the
> images.
> >>
> >>                                 The fallback position, of course, is to
> use screen shots and an
> >>
> >>                 image
> >>
> >>                                 editor program to crop out the
> individual images, but that approach
> >>
> >>                 is
> >>
> >>                             not
> >>
> >>                                 possible for a blind person to use. You
> can't crop an image out of a
> >>
> >>                             screen
> >>
> >>                                 shot unless you can see the image.
> >>
> >>                                 By the way, I don't know how a blind
> person would carry out the
> >>
> >>                 second
> >>
> >>                             of
> >>
> >>                                 the following two steps in John's
> procedure:
> >>
> >>                                 * import the PDF into IVEO Creator Pro.
> >>                                 * Check the PDF to find which pages
> have images of interest and
> >>
> >>                 emboss
> >>
> >>                                 those
> >>                                 pages.
> >>
> >>                                 It seems that checking the pdf to find
> which pages have images would
> >>
> >>                 be
> >>
> >>                                 similar to checking a screen shot of a
> page to find and crop the
> >>
> >>                 image.
> >>
> >>                             It
> >>
> >>                                 seems that you would need to be able to
> see the pdf on the IVEO
> >>
> >>                 screen
> >>
> >>                             to
> >>
> >>                                 know if it contains an image. I am
> working with pdf files containing
> >>                                 anywhere between 30 and 80 pages.
> Embossing every page in order to
> >>
> >>                             identify
> >>
> >>                                 the pages that contain images would not
> be practical.
> >>
> >>                                 Dick Baldwin
> >>
> >>
> >>                                 On Fri, Jan 27, 2012 at 11:48 AM,
> Richard Baldwin<
> >>
> >>                             baldwin at dickbaldwin.com
> >>
> >>                                     wrote:
> >>                                     Amanda and others,
> >>
> >>                                     I have contacted Adobe technical
> support. There solution to the
> >>
> >>                 problem
> >>
> >>                                     is to purchase Acrobat Pro for
> $445.00. The tech support rep told
> >>
> >>                 me
> >>
> >>                             that
> >>
> >>                                     their program will extract the
> pictures intact as separate bitmap
> >>
> >>                             files.
> >>
> >>                                     Dick Baldwin
> >>
> >>
> >>                                     On Fri, Jan 27, 2012 at 10:44 AM,
> Michael Whapples
> >>
> >> <mwhapples at aim.com
> >>
> >>                                 wrote:
> >>
> >>                                         Hello,
> >>                                          From what you are describing,
> my feeling is that the
> >>
> >>                 diagrams/images
> >>
> >>                             in
> >>
> >>                                         the PDF in question are created
> from a number of drawing elements
> >>
> >>                             rather
> >>
> >>                                         than a single image object. I'm
> not an expert on PDF, but I think
> >>
> >>                 you
> >>
> >>                             could
> >>
> >>                                         think of it like the difference
> of a bitmap being a single element
> >>
> >>                 (I
> >>
> >>                             think
> >>
> >>                                         PDF has a way to specify the
> start of a stream object like a
> >>
> >>                 bitmap)
> >>
> >>                             and an
> >>
> >>                                         SVG being formed from lots of
> elements like lines and shapes (I
> >>
> >>                 think
> >>
> >>                             in
> >>
> >>                                         PDF the lines and such like can
> be created with basic PDF drawing
> >>                                         facilities so are not in a
> separate object). When the image is
> >>
> >>                 formed
> >>
> >>                             from
> >>
> >>                                         lots of elements then it may be
> hard for the software to know what
> >>
> >>                             makes up
> >>
> >>                                         a given diagram in the
> book/document, it just lays it out as
> >>
> >>                             specified and
> >>
> >>                                         you work out what's related. I
> think one way to tell whether you
> >>
> >>                 have
> >>
> >>                             this
> >>
> >>                                         sort of image is to see if NVDA
> will read some of the text labels
> >>
> >>                 of
> >>
> >>                             the
> >>
> >>                                         image, if it does then its not
> a pure bitmap (you probably could
> >>
> >>                 use
> >>
> >>                             the
> >>
> >>                                         read out lout function of adobe
> reader as well). Therefore I
> >>
> >>                 imagine
> >>
> >>                             that
> >>
> >>                                         without clever recognition
> algorithms you are unlikely to get
> >>
> >>                             something
> >>
> >>                                         which will extract it as you
> want.
> >>
> >>                                         There is one option I am aware
> of for a blind person to do this
> >>                                         independently, IVEO like John
> suggested, however IVEO isn't a
> >>
> >>                 cheap
> >>
> >>                             option
> >>
> >>                                         and depending on how much is to
> be done would determine whether
> >>
> >>                 its
> >>
> >>                             worth
> >>
> >>                                         the money if providing
> accessible diagrams from PDF was its only
> >>
> >>                 use.
> >>
> >>                             IVEO
> >>
> >>                                         does not require a tiger
> printer, swell paper would work, other
> >>
> >>                             embossers
> >>
> >>                                         may (the outputting from IVEO
> is the question as I think it may
> >>
> >>                 only
> >>
> >>                             output
> >>
> >>                                         to devices appearing as
> standard printers). Interesting, the IVEO
> >>
> >>                             route
> >>
> >>                                         again is requiring a human to
> make the decision on what forms the
> >>
> >>                             diagram.
> >>
> >>                                         Michael Whapples
> >>
> >>                                         -----Original Message-----
> From: Richard Baldwin
> >>                                         Sent: Friday, January 27, 2012
> 3:28 PM
> >>                                         To: Jamal Mazrui
> >>                                         Cc: Blind Math list for those
> interested in mathematics
> >>                                         Subject: Re: [Blindmath]
> Extracting bitmap images from pdf files
> >>
> >>
> >>                                         Hi Jamal,
> >>
> >>                                         It is a great program, easy to
> use, and probably totally
> >>
> >>                 accessible. I
> >>
> >>                                         particularly like the fact that
> the program doesn't require a
> >>
> >>                 windows
> >>
> >>                                         installation. The output data
> is well organized and including the
> >>
> >>                 page
> >>
> >>                                         numbers in the bmp file names
> is a great help in analyzing them.
> >>
> >>                                         Unfortunately, the output
> produced by the program suffers from the
> >>
> >>                             same
> >>
> >>                                         issues that I have encountered
> with all of the other image
> >>
> >>                 extractor
> >>
> >>                                         programs that I have tried. A
> few of the images come out intact.
> >>
> >>                 Most
> >>
> >>                             of
> >>
> >>                                         the images don't come out
> intact.
> >>
> >>                                         For example, page three of one
> of the pdf files that I tested has
> >>
> >>                 a
> >>
> >>                                         single
> >>                                         image of a battery. It is the
> same image that I enhanced and
> >>
> >>                 posted
> >>
> >>                             in an
> >>
> >>                                         earlier post. Your program
> produced 54 bmp files for that page. A
> >>
> >>                 few
> >>
> >>                             of
> >>
> >>                                         them were icons such as arrows
> exclamation marks, etc. The
> >>
> >>                 remaining
> >>
> >>                             bmp
> >>
> >>                                         files appear to be a very small
> pieces of the image of the
> >>
> >>                 battery.
> >>                 By
> >>
> >>                                         the
> >>                                         way, I got the earlier image of
> the battery by taking a screen
> >>
> >>                 shot
> >>                 of
> >>
> >>                                         the
> >>                                         page and using an image editing
> program to crop out the battery
> >>
> >>                 image.
> >>
> >>                                         None
> >>                                         of the image extraction
> programs that I have tested extract the
> >>
> >>                 image
> >>
> >>                                         intact.
> >>
> >>                                         I don't know anything at all
> about the internal structure of pdf
> >>
> >>                             files,
> >>
> >>                                         and
> >>                                         this behavior of breaking an
> image into many small pieces may
> >>
> >>                 depend
> >>
> >>                             on
> >>
> >>                                         how
> >>                                         the file is constructed in the
> first place. In any event, my
> >>
> >>                 immediate
> >>
> >>                                         problem has to do with a
> specific set of pdf files that are the
> >>
> >>                             chapters
> >>
> >>                                         from a specific physics book,
> so this program doesn't solve my
> >>
> >>                             problem.
> >>
> >>                                         Thanks for offering the program.
> >>                                         Dick Baldwin
> >>
> >>                                         On Fri, Jan 27, 2012 at 5:18
> AM, Jamal Mazrui<empower at smart.net>
> >>
> >>                             wrote:
> >>
> >>                                          In an attempt to facilitate a
> free, non-web dependent solution, I
> >>
> >>                             have
> >>
> >>                                             written a Windows
> console-mode utility called PDF2Images, built
> >>
> >>                 with
> >>
> >>                                             PowerBASIC and a PDF
> library.  The distribution archive,
> >>
> >>                 including
> >>
> >>                                             documentation and source
> code, is available at
> >>
> >>
> >>                 http://empowermentzone.com/********pdf2images.zip<http://empowermentzone.com/******pdf2images.zip>
> <
> >>                 http://empowermentzone.com/*
> >>                 ***pdf2images.zip>
> >> <http://**empowermentzone.com/******pdf2images.zip<http://empowermentzone.com/****pdf2images.zip>
> <
> >>                 http://empowermentzone.com/
> >>                 **pdf2images.zip>
> >> <http://**empowermentzone.com/******pdf2images.zip<http://empowermentzone.com/****pdf2images.zip>
> <
> >>                 http://empowermentzone.com/
> >>                 **pdf2images.zip>
> >> <http://**empowermentzone.com/****pdf2images.zip<http://empowermentzone.com/**pdf2images.zip>
> <
> >>                 http://empowermentzone.com/pd
> >>                 f2images.zip>
> >>
> >>
> >>                                             I am interested in any
> feedback on how well it works compared to
> >>
> >>                             other
> >>
> >>                                             approaches.
> >>
> >>                                             Jamal
> >>
> >>
> >>
> >>
> >>                                         --
> >>                                         Richard G. Baldwin (Dick
> Baldwin)
> >>                                         Home of Baldwin's on-line Java
> Tutorials
> >>                                         http://www.DickBaldwin.com
> >>
> >>                                         Professor of Computer
> Information Technology
> >>                                         Austin Community College
> >>                                         (512) 223-4758
> >>                                         mailto:Baldwin at DickBaldwin.com
> >>
> >>                     http://www.austincc.edu/******baldwin/<http://www.austincc.edu/****baldwin/>
> <http://www.austincc.**edu/**baldwin/ <http://www.austincc.edu/**baldwin/>
> >> <
> >>
> >>                             http://www.austincc.edu/****baldwin/<http://www.austincc.edu/**baldwin/>
> <http://www.austincc.**edu/baldwin/ <http://www.austincc.edu/baldwin/>
> >>
> >>                                         ______________________________*
> *****_________________
> >>                                         Blindmath mailing list
> >>                                         Blindmath at nfbnet.org
> >>
> >>                 http://nfbnet.org/mailman/******
> listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/****listinfo/blindmath_nfbnet.org>
> <
> >>                 http://nfbnet.or
> >>                 g/mailman/**listinfo/blindmath**_nfbnet.org<http://blindmath_nfbnet.org>
> >
> >> <**http://nfbnet.org/mailman/****listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org>
> **<
> >>                 http://nfbnet.o
> >>                 rg/mailman/listinfo/blindmath_**nfbnet.org<http://blindmath_nfbnet.org>
> >
> >>
> >>                                         To unsubscribe, change your
> list options or get your account info
> >>
> >>                 for
> >>
> >>                                         Blindmath:
> >>
> >>                 http://nfbnet.org/mailman/******
> options/blindmath_nfbnet.org/******<http://nfbnet.org/mailman/****options/blindmath_nfbnet.org/****>
> <
> >>                 http://nfbne
> >>                 t.org/mailman/**options/**blindmath_nfbnet.org/**<http://t.org/mailman/**options/blindmath_nfbnet.org/**>
> >
> >>
> >>                                         mwhapples%40aim.com<http://**n*
> *fbnet.org/mailman/options/** <http://nfbnet.org/mailman/options/**>>>
> >>
> >>                 blindmath_nfbnet.org/****mwhapples%40aim.com<http://blindmath_nfbnet.org/**mwhapples%40aim.com>
> <
> >>                 http://nfbnet.org/mailman/**options<http://nfbnet.org/mailman/options>
> >>                 /blindmath_nfbnet.org/**mwhapples%40aim.com<http://blindmath_nfbnet.org/mwhapples%40aim.com>
> >
> >>
> >>                                         ______________________________*
> *****_________________
> >>                                         Blindmath mailing list
> >>                                         Blindmath at nfbnet.org
> >>
> >>                 http://nfbnet.org/mailman/******
> listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/****listinfo/blindmath_nfbnet.org>
> <
> >>                 http://nfbnet.or
> >>                 g/mailman/**listinfo/blindmath**_nfbnet.org<http://blindmath_nfbnet.org>
> >
> >> <**http://nfbnet.org/mailman/****listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org>
> **<
> >>                 http://nfbnet.o
> >>                 rg/mailman/listinfo/blindmath_**nfbnet.org<http://blindmath_nfbnet.org>
> >
> >>
> >>                                         To unsubscribe, change your
> list options or get your account info
> >>
> >>                 for
> >>
> >>                                         Blindmath:
> >>
> >>                 http://nfbnet.org/mailman/******
> options/blindmath_nfbnet.org/******<http://nfbnet.org/mailman/****options/blindmath_nfbnet.org/****>
> <
> >>                 http://nfbne
> >>                 t.org/mailman/**options/**blindmath_nfbnet.org/**<http://t.org/mailman/**options/blindmath_nfbnet.org/**>
> >
> >>
> >>                                         baldwin%40dickbaldwin.com<**
> http**://nfbnet.org/mailman/**options/**<http://nfbnet.org/mailman/options/**>
> >>
> >>                 blindmath_nfbnet.org/baldwin%****40dickbaldwin.com<
> >>                 http://nfbnet.org/mailman/o
> >>                 ptions/blindmath_nfbnet.org/**baldwin%40dickbaldwin.com<http://blindmath_nfbnet.org/baldwin%40dickbaldwin.com>
> >
> >>
> >>
> >>
> >>                                     --
> >>                                     Richard G. Baldwin (Dick Baldwin)
> >>                                     Home of Baldwin's on-line Java
> Tutorials
> >>                                     http://www.DickBaldwin.com
> >>
> >>                                     Professor of Computer Information
> Technology
> >>                                     Austin Community College
> >>                                     (512) 223-4758
> >>                                     mailto:Baldwin at DickBaldwin.com
> >>                                     http://www.austincc.edu/****
> baldwin/ <http://www.austincc.edu/**baldwin/>
> >>
> >> <http://www.austincc.edu/**baldwin/ <http://www.austincc.edu/baldwin/>>
> >>
> >>
> >>
> >>                                 --
> >>                                 Richard G. Baldwin (Dick Baldwin)
> >>                                 Home of Baldwin's on-line Java Tutorials
> >>                                 http://www.DickBaldwin.com
> >>
> >>                                 Professor of Computer Information
> Technology
> >>                                 Austin Community College
> >>                                 (512) 223-4758
> >>                                 mailto:Baldwin at DickBaldwin.com
> >>                                 http://www.austincc.edu/****baldwin/<http://www.austincc.edu/**baldwin/>
> <
> >>
> >>                 http://www.austincc.edu/**baldwin/<http://www.austincc.edu/baldwin/>
> >
> >>
> >>
> >>
> >>                             --
> >>                             Richard G. Baldwin (Dick Baldwin)
> >>                             Home of Baldwin's on-line Java Tutorials
> >>                             http://www.DickBaldwin.com
> >>
> >>                             Professor of Computer Information Technology
> >>                             Austin Community College
> >>                             (512) 223-4758
> >>                             mailto:Baldwin at DickBaldwin.com
> >>                             http://www.austincc.edu/****baldwin/<http://www.austincc.edu/**baldwin/>
> <http://www.austincc.**edu/baldwin/ <http://www.austincc.edu/baldwin/>>
> >>                             ______________________________**
> **_________________
> >>                             Blindmath mailing list
> >>                             Blindmath at nfbnet.org
> >>
> >>                 http://nfbnet.org/mailman/****
> listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org>
> <
> >>                 http://nfbnet.org/
> >>                 mailman/listinfo/blindmath_**nfbnet.org<http://blindmath_nfbnet.org>
> >
> >>
> >>                             To unsubscribe, change your list options or
> get your account info for
> >>                             Blindmath:
> >>                             http://nfbnet.org/mailman/****
> options/blindmath_nfbnet.org/****<http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**>
> >>
> >>                 brh%40opticinspiration.org<
> >>                 http://nfbnet.org/mailman/**options/blindmath_nfbne<http://nfbnet.org/mailman/options/blindmath_nfbne>
> >>                 t.org/brh%40opticinspiration.**org<http://t.org/brh%40opticinspiration.org>
> >
> >>
> >>
> >>                         ______________________________**
> **_________________
> >>                         Blindmath mailing list
> >>                         Blindmath at nfbnet.org
> >>
> >>                 http://nfbnet.org/mailman/****
> listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org>
> <
> >>                 http://nfbnet.org/
> >>                 mailman/listinfo/blindmath_**nfbnet.org<http://blindmath_nfbnet.org>
> >
> >>
> >>                         To unsubscribe, change your list options or get
> your account info for
> >>                         Blindmath:
> >>                         http://nfbnet.org/mailman/****
> options/blindmath_nfbnet.org/****<http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**>
> >>
> >>                 baldwin%40dickbaldwin.com<
> >>                 http://nfbnet.org/mailman/**options/blindmath_nfbnet<http://nfbnet.org/mailman/options/blindmath_nfbnet>
> >>                 .org/baldwin%40dickbaldwin.com**>
> >>
> >>
> >>
> >>                     --
> >>                     Richard G. Baldwin (Dick Baldwin)
> >>                     Home of Baldwin's on-line Java Tutorials
> >>                     http://www.DickBaldwin.com
> >>
> >>                     Professor of Computer Information Technology
> >>                     Austin Community College
> >>                     (512) 223-4758
> >>                     mailto:Baldwin at DickBaldwin.com
> >>                     http://www.austincc.edu/**baldwin/<http://www.austincc.edu/baldwin/>
> >>                     ______________________________**_________________
> >>                     Blindmath mailing list
> >>                     Blindmath at nfbnet.org
> >>                     http://nfbnet.org/mailman/**
> listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org>
> >>                     To unsubscribe, change your list options or get
> your account info
> >>                     for Blindmath:
> >>
> >>                 http://nfbnet.org/mailman/**
> options/blindmath_nfbnet.org/**brh%40opticinspirati<http://nfbnet.org/mailman/options/blindmath_nfbnet.org/brh%40opticinspirati>
> >>                 on.org
> >>
> >>
> >>                 ______________________________**_________________
> >>                 Blindmath mailing list
> >>                 Blindmath at nfbnet.org
> >>                 http://nfbnet.org/mailman/**
> listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org>
> >>                 To unsubscribe, change your list options or get your
> account info for
> >>                 Blindmath:
> >>                 http://nfbnet.org/mailman/**
> options/blindmath_nfbnet.org/**bente%40casilenc.com<http://nfbnet.org/mailman/options/blindmath_nfbnet.org/bente%40casilenc.com>
> >>
> >>
> >>                 ______________________________**_________________
> >>                 Blindmath mailing list
> >>                 Blindmath at nfbnet.org
> >>                 http://nfbnet.org/mailman/**
> listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org>
> >>                 To unsubscribe, change your list options or get your
> account info for
> >>                 Blindmath:
> >>
> >>                 http://nfbnet.org/mailman/**
> options/blindmath_nfbnet.org/**baldwin%40dickbaldwin.com<http://nfbnet.org/mailman/options/blindmath_nfbnet.org/baldwin%40dickbaldwin.com>
> >>
> >>
> >>
> >>
> >>
> >>     --
> >>     Richard G. Baldwin (Dick Baldwin)
> >>     Home of Baldwin's on-line Java Tutorials
> >>     http://www.DickBaldwin.com
> >>
> >>     Professor of Computer Information Technology
> >>     Austin Community College
> >>     (512) 223-4758
> >>     mailto:Baldwin at DickBaldwin.com
> >>     http://www.austincc.edu/**baldwin/<http://www.austincc.edu/baldwin/>
> >
> >
> >
> > --
> > Richard G. Baldwin (Dick Baldwin)
> > Home of Baldwin's on-line Java Tutorials
> > http://www.DickBaldwin.com
> >
> > Professor of Computer Information Technology
> > Austin Community College
> > (512) 223-4758
> > mailto:Baldwin at DickBaldwin.com
> > http://www.austincc.edu/**baldwin/ <http://www.austincc.edu/baldwin/>
>
>


-- 
Richard G. Baldwin (Dick Baldwin)
Home of Baldwin's on-line Java Tutorials
http://www.DickBaldwin.com

Professor of Computer Information Technology
Austin Community College
(512) 223-4758
mailto:Baldwin at DickBaldwin.com
http://www.austincc.edu/baldwin/



More information about the BlindMath mailing list