[Blindmath] Extracting bitmap images from pdf files

Richard Baldwin baldwin at dickbaldwin.com
Sun Jan 29 23:42:04 UTC 2012


Hi Jamal,

The output from this version is not much different from the previous
version. The program still crashed on page 17 of the small pdf file. I also
noticed that it skipped page 13.

I tried a larger pdf file and it crashed on page 6 of that file.

I don't believe the tiff files were actually created at 300 dpi. The width
of those files is 1275 pixels, which matches 8.5 inches at 150 dpi.

I did discover one thing that may be different. Although I was unable to
successfully open the jpg files in Lview Pro, which is the image editor
program that I have used for years, I was able to successfully open them in
Windows Paint and also in a program named Paint.net that I occasionally
use. That was probably also true for the earlier version. I simply didn't
try it. Curiously, the jpg files seemed to be in reverse video when opened
in those paint programs.

Don't spend time worrying about the jpg files. They add very little benefit
to the overall result. As far as I am concerned, you could suppress the
output of images from the individual pages, because they are of little
value.

Amanda might be happy with the .txt files that appear to contain the text
from the pdf file in a plain text format on a page by page basis.

Dick Baldwin

On Sun, Jan 29, 2012 at 4:54 PM, Jamal Mazrui <empower at smart.net> wrote:

>  Hi Dick,
> With the PDF library I have, I do not see a way of adjusting the format of
> JPG output, other than the DPI setting, unfortunately.  Perhaps the free
> Image Magick software could transform those files into something more
> useful -- not sure.
>
> I think I may have found a way, however, to improve the reliability of
> simply producing a TIF file for each whole page of the PDF.  The library
> has a function call for this that processes all pages at once.  Memory
> seems to be managed better than when iterating through each page of the PDF
> separately, which I suspect is causing the crashes with PDFs that are not
> relatively small in size.
>
> I just posted a utility that only does that task at 300 DPI.  It has the
> original PDF2Images name and is available at
>
> http://EmpowermentZone.com/pdf2images.zip
>
> Just unzip it to the same directory as PDF2Parts (it uses the same
> PDF2Parts.dll).
>
> A minor annoyance is that this technique does not right justify page
> numbers (the single function call mostly handles the names of individual
> .tif files).  So, the output files do not sort correctly in an alphabetical
> directory listing.  If files are sorted by time, however, the right order
> is attained.
>
> Can you let me know how well this utility works?  If I get it working
> adequately, I will incorporate it into a single, coherent package.
>
> Jamal
>
>
>
> On 1/29/2012 4:47 PM, Richard Baldwin wrote:
>
> Hi Jamal,
>
>  I ran the new version of the program for a relatively small pdf file,
> which was one of the smallest chapters in the physics textbook. The program
> stopped with an error on page 17 of about 24 pages. However, it did produce
> a lot of output before stopping.
>
>  The tiff files that represent individual pages look good. If possible, I
> would like to see if conversion to 300 dpi as opposed to 150 dpi would
> provide improved image quality.
>
>  The bmp and jpg files for the individual images on each page suffer from
> the same problem discussed in previous posts. Mostly small pieces of larger
> images. In addition, the jpg files appear to be corrupt. They appear to
> suffer from some sort of synchronization problem that causes them to
> consist mainly of vertical bars. However, it was possible for me to
> correlate one of them to an actual image in the book. I suspect that these
> are the images from the pdf file that are stored as raster images in the
> pdf file.
>
>  Once you get the program to handle complete pdf files, I will consider
> it superior to online conversion of pdf files to bitmap pages. If you can
> fix the problem with the jpg files, that would be useful because they
> contain images that a sighted assistant won't need to crop out of the
> larger page images.
>
>  Thanks,
> Dick Baldwin
>
> On Sun, Jan 29, 2012 at 11:21 AM, Jamal Mazrui <empower at smart.net> wrote:
>
>> Dick,
>> I just posted a revised and renamed version of my program, which extracts
>> both text and images.  PDF2Parts is available at
>> http://EmpowermentZone.com/pdf2parts.zip
>>
>> Currently, it creates a .tif version of each PDF page at 150 DPI.
>>  Alternatively, I could make it save as .bmp or .jpg, and vary the
>> resolution.  Would another image format or DPI work better for what you are
>> trying to do?
>>
>> Jamal
>>
>> P.S.  The program seems to hang on large PDFs sometimes.  I have not
>> figured out the pattern and debugged that yet.
>>
>>
>> On 1/28/2012 2:29 PM, Richard Baldwin wrote:
>>
>>> I will be responding to questions and comments from several different
>>> individuals in this post, so I will refer to each person by name.
>>>
>>> Maureen: I will be happy to send some files off list for you to emboss
>>> and
>>> evaluate if you would be interested in doing that. I would be interested
>>> in
>>> your feed back.
>>>
>>> Jamal: You wrote "In reviewing the documentation for the PDF library I'm
>>> using, I notice there is also the ability to save each page as an image.
>>>  Would that be helpful?"
>>>
>>> That would be very helpful. I have generally concluded (more on this in a
>>> separate post) that the most practical way for a sighted person to
>>> extract
>>> images from a pdf file for a blind student is to deal with each page as
>>> an
>>> image file, crop, cut, copy, and paste. I have identified a free website
>>> that will convert a pdf file to a set of image files, but the less often
>>> I
>>> am required to download files from strange websites, the happier I am. I
>>> never know what may be riding those files into my computer. Your
>>> stand-alone command-line based program would make it possible to make the
>>> conversion locally. Please provide more information.
>>>
>>> Ben: You wrote "I have a question -- are you using the most popular
>>> university Physics textbook, whatever that may be?"
>>>
>>> Actually, I teach Computer Science and not physics. Amanda is a Computer
>>> Science student, and I am helping her in a required physics course. Her
>>> physics book is the only one that I know anything about. However, I
>>> believe
>>> this pdf-image issue applies to many college-level textbooks, because
>>> many
>>> blind college students probably receive their electronic textbooks in pdf
>>> format. Once again, however, the only one that I have any personal
>>> knowledge about is Amanda's physics book.
>>>
>>> I will send you a pdf file of one of the chapters from the textbook off
>>> list later today.
>>>
>>> Bente: You wrote "If we could stick with a text for more than two years
>>> it
>>> would be so helpful."
>>>
>>> I will simply say a loud AMEN to that. In my 18 years of teaching, I have
>>> never understood why community colleg instructors insist on changing
>>> textbooks so frequently, causing much more work for themselves in the
>>> process. I have gotten to the point that I tell my students that the
>>> textbook is for reference purposes only and the material for the course
>>> is
>>> published at http://www.dickbaldwin.com.
>>>
>>> Dick Baldwin
>>>
>>> On Sat, Jan 28, 2012 at 1:01 PM, Bente Casile<bente at casilenc.com>
>>>  wrote:
>>>
>>>  Ben,
>>>>
>>>> My greatest wish for all the blind students out there is that we in the
>>>> college system could have a repository of tactile graphics for science
>>>> and
>>>> math classes.  If we could stick with a text for more than two years it
>>>> would be so helpful.  As someone who makes math tactile graphics for our
>>>> students, I would love to see that happen.  It would allow us to get
>>>> ahead
>>>> for students to benefit directly from the hard work of others and not to
>>>> have to  "re-invent" the wheel every time a new text is adopted.
>>>>
>>>> Oh, and PS .. Austin is very nice..smiles
>>>>
>>>> Bente
>>>> Bente J. Casile
>>>> Math Learning Specialist
>>>> Wake Tech Community College
>>>> Raleigh NC
>>>>
>>>> -----Original Message-----
>>>> From: blindmath-bounces at nfbnet.org [mailto:blindmath-bounces at nfbnet.org
>>>> ]
>>>> On
>>>> Behalf Of Ben Humphreys
>>>> Sent: Saturday, January 28, 2012 11:17 AM
>>>> To: Blind Math list for those interested in mathematics
>>>> Subject: Re: [Blindmath] Extracting bitmap images from pdf files
>>>>
>>>> Hi Richard,
>>>>
>>>> As best I recall, it was a Microsoft Word file which we typically
>>>> "saved as" HTML in order to get the graphics extracted out in an
>>>> automated way.  Some came out as GIF, others JPEG, leading me to
>>>> believe that Word preserves the original file intact.  These were
>>>> .DOC, not .DOCX, so I don't believe they were really ZIP files in
>>>> DOCX clothing.
>>>>
>>>> As my instructor routinely"pasted" in JPGs, GIFs, etc from all around
>>>> the world into her Microsoft Word files, it's anyone's guess why a
>>>> few got all broken up like that.  Most remained intact.
>>>>
>>>> Part way through the class, I ended up having my assistant extract by
>>>> hand the images as the automated way was too difficult to distinguish
>>>> the garbage (i.e. little arrows and such) from the meaningful calculus
>>>> graphs.
>>>>
>>>> I have a question -- are you using the most popular university
>>>> Physics textbook, whatever that may be?  If so, and we get to the
>>>> bottom of this, we could conceivably have a repository of labeled
>>>> graphics files so others wouldn't have to repeat this step, and joy
>>>> of joys, I could take physics without moving to Austin, :)  This of
>>>> course is not to say Austin isn't a great place, it's just that I
>>>> might have to move again when I want to take biology or chemestry.
>>>>
>>>> As always, thanks for your continued enthusiasm.
>>>>
>>>> And as I said, you're welcome to send me a file or two and we'll
>>>> throw our Acrobat Pro strategy at it, maybe even consider how it
>>>> might be automated.
>>>>
>>>> Ben
>>>>
>>>> At 08:59 AM 1/28/2012, you wrote:
>>>>
>>>> But, no, I do not believe we were dealing with PDFs in this case.
>>>>
>>>> Typically, when we have a PDF with a graphic, my assistant draws a
>>>> box around it I think and saves it out separately.  I'm not clear on
>>>> the process but he did say it required Acrobat Pro and once it's
>>>> extracted, it's easy to blow it up to fill the page for easier
>>>> tactile understanding.
>>>>
>>>>
>>>>  Hi Ben,
>>>>>
>>>>> I appreciate your frustration.
>>>>>
>>>>> Were the  "30 itty bitty graphics files" that apparently were small
>>>>> parts
>>>>> of two actual graphs produced using Acrobat Pro, or were you using some
>>>>> different image extraction software during that lost weekend?
>>>>>
>>>>> Thanks,
>>>>> Dick Baldwin
>>>>>
>>>>> On Sat, Jan 28, 2012 at 5:55 AM, Ben Humphreys
>>>>> <brh at opticinspiration.org>wrote:
>>>>>
>>>>>  I suppose this procedure could work.
>>>>>>
>>>>>> But when it's this much effort to get to the starting gate, while
>>>>>> other
>>>>>> students are already moving forward and you're falling behind, it's no
>>>>>>
>>>>>  fun,
>>>>
>>>>>  and the added time and complexity and brainpower just takes all the
>>>>>> motivation out of you.
>>>>>>
>>>>>> I remember losing a whole weekend to the issue of 30 itty bitty
>>>>>>
>>>>>  graphics
>>>>
>>>>>  files in a Calculus PDF.  Having embossed them, they were all told to
>>>>>>
>>>>>  "fit
>>>>
>>>>>  to page" and were thusly huge.  I was thinking they were all graphs
>>>>>> and
>>>>>> problems to be interpreted and worked on and understood, only to be
>>>>>>
>>>>>  told
>>>>
>>>>>  later that there were only two graphs and having the benefit of a
>>>>>>
>>>>>  sighted
>>>>
>>>>>  person on Monday morning to finally tell me that they were bits and
>>>>>>
>>>>>  pieces
>>>>
>>>>>  of the two relatively simple graphs.
>>>>>>
>>>>>> It's enough to make you want to be a Steve Jobs and exit school
>>>>>> prematurely.
>>>>>>
>>>>>> Prof Baldwin, this is certainly not to say I don't appreciate all your
>>>>>> effforts.  In fact, if and when I ever need to take physics, I am
>>>>>>
>>>>>  seriously
>>>>
>>>>>  considering relocating to Austin for a semester.
>>>>>>
>>>>>> P.S. I do have Acrobat pro so if you can send me the single page PDF
>>>>>> in
>>>>>> question, we can attempt to extract as a single image.
>>>>>>
>>>>>> Ben
>>>>>>
>>>>>>
>>>>>> At 02:56 PM 1/27/2012, you wrote:
>>>>>>
>>>>>>  In a previous post I wrote:
>>>>>>>
>>>>>>> "By the way, I don't know how a blind person would carry out the
>>>>>>>
>>>>>>  second
>>>> of
>>>>
>>>>>  the following two steps in John's procedure:
>>>>>>>
>>>>>>> * import the PDF into IVEO Creator Pro.
>>>>>>> * Check the PDF to find which pages have images of interest and
>>>>>>> emboss
>>>>>>> those
>>>>>>> pages.
>>>>>>>
>>>>>>> It seems that checking the pdf to find which pages have images would
>>>>>>>
>>>>>>  be
>>>>
>>>>>  similar to checking a screen shot of a page to find and crop the
>>>>>>>
>>>>>>  image.
>>>> It
>>>>
>>>>>  seems that you would need to be able to see the pdf on the IVEO
>>>>>>> screen
>>>>>>>
>>>>>>  to
>>>>
>>>>>  know if it contains an image. I am working with pdf files containing
>>>>>>> anywhere between 30 and 80 pages. Embossing every page in order to
>>>>>>> identify
>>>>>>> the pages that contain images would not be practical."
>>>>>>>
>>>>>>> I have learned how a blind person could find the pages containing the
>>>>>>> images in a pdf file without having to see the screen. Here is one
>>>>>>> procedure for doing that.
>>>>>>>
>>>>>>> When you import a pdf file into Creator Pro, a set of SVG files is
>>>>>>> automatically created in the folder than contains the pdf file. There
>>>>>>>
>>>>>>  is
>>>>
>>>>>  one SVG file for each page in the pdf file. The file names indicate
>>>>>>>
>>>>>>  the
>>>>
>>>>>  pdf
>>>>>>> page number except that pages in a pdf file are typically numbered
>>>>>>> beginning with 1 while the file numbers produced by Creator Pro begin
>>>>>>>
>>>>>>  with
>>>>
>>>>>  0. Thus, file number 0 will probably correspond to page 1 in the pdf
>>>>>>> document.
>>>>>>>
>>>>>>> Read the pdf file in your preferred pdf file reader. If from the pdf
>>>>>>>
>>>>>>  text,
>>>>
>>>>>  you can determine which pages in the pdf file contain images of
>>>>>>>
>>>>>>  interest,
>>>>
>>>>>  you can record those page numbers using whatever method you use to
>>>>>>>
>>>>>>  record
>>>>
>>>>>  information of that sort.
>>>>>>>
>>>>>>> Then you can import the pdf file into Creator Pro, producing the set
>>>>>>>
>>>>>>  of
>>>>
>>>>>  SVG
>>>>>>> files described above. Then you can open the SVG files that contain
>>>>>>> interesting images in your IVEO viewer software, emboss the pages,
>>>>>>> and
>>>>>>> proceed as John explained in an earlier post.
>>>>>>>
>>>>>>> Dick Baldwin
>>>>>>>
>>>>>>> On Fri, Jan 27, 2012 at 12:47 PM, Richard Baldwin
>>>>>>> <baldwin at dickbaldwin.com>**wrote:
>>>>>>>
>>>>>>>  Michael wrote " There is one option I am aware of for a blind person
>>>>>>>>
>>>>>>>   to
>>>>
>>>>>   do this independently, IVEO like John suggested,"
>>>>>>>>
>>>>>>>> I may be wrong, but I didn't get the idea that John's solution will
>>>>>>>> produce an output bitmap file - only an embossed image.
>>>>>>>>
>>>>>>>> I may be wrong again, but as near as I can tell, IVEO doesn't do any
>>>>>>>>
>>>>>>> image
>>>>>>>
>>>>>>>> enhancement prior to embossing the image. If I am wrong on these
>>>>>>>>
>>>>>>>   points,
>>>>
>>>>>   John will probably come online and set the record straight.
>>>>>>>>
>>>>>>>> IVEO seems to simply convert the bitmap image to gray scale and
>>>>>>>>
>>>>>>>   emboss
>>>>
>>>>>  the
>>>>>>>
>>>>>>>> gray scale. While gray scale embossing is okay for some images
>>>>>>>>
>>>>>>> (especially
>>>>>>>
>>>>>>>> blank and white images), it is definitely not the best option for
>>>>>>>>
>>>>>>>   many
>>>>
>>>>>   images. After all, if you convert 16 million colors to four levels
>>>>>>>>
>>>>>>>   of
>>>>
>>>>>  gray
>>>>>>>
>>>>>>>> scale, each level of gray scale represents 4 million different
>>>>>>>>
>>>>>>>   colors.
>>>>
>>>>>   Pixels belonging to each set of 4 million colors will not be
>>>>>>>> distinguishable in the gray scale representation.
>>>>>>>>
>>>>>>>> My objective is to gain access to full-color bitmap images so that I
>>>>>>>>
>>>>>>>   can
>>>>
>>>>>   enhance the image for embossing prior to throwing away all of the
>>>>>>>>
>>>>>>>   color
>>>>
>>>>>   information.
>>>>>>>>
>>>>>>>> Embossed versions of bitmap images are often very difficult to
>>>>>>>>
>>>>>>> understand,
>>>>>>>
>>>>>>>> even with a decent description. I believe we need to do everything
>>>>>>>> reasonable to improve the understandability of embossed bitmap
>>>>>>>>
>>>>>>>   images.
>>>>
>>>>>  In
>>>>>>>
>>>>>>>> some cases, image enhancement techniques at the full-color stage can
>>>>>>>>
>>>>>>>   be
>>>>
>>>>>   used to provide those improvements.
>>>>>>>>
>>>>>>>> So, my quest continues, hopefully without having to pay $445.00 for
>>>>>>>> Acrobat Pro, just to get access to the images.
>>>>>>>>
>>>>>>>> The fallback position, of course, is to use screen shots and an
>>>>>>>>
>>>>>>>   image
>>>>
>>>>>   editor program to crop out the individual images, but that approach
>>>>>>>>
>>>>>>>   is
>>>>
>>>>>  not
>>>>>>>
>>>>>>>> possible for a blind person to use. You can't crop an image out of a
>>>>>>>>
>>>>>>> screen
>>>>>>>
>>>>>>>> shot unless you can see the image.
>>>>>>>>
>>>>>>>> By the way, I don't know how a blind person would carry out the
>>>>>>>>
>>>>>>>   second
>>>>
>>>>>  of
>>>>>>>
>>>>>>>> the following two steps in John's procedure:
>>>>>>>>
>>>>>>>> * import the PDF into IVEO Creator Pro.
>>>>>>>> * Check the PDF to find which pages have images of interest and
>>>>>>>>
>>>>>>>   emboss
>>>>
>>>>>   those
>>>>>>>> pages.
>>>>>>>>
>>>>>>>> It seems that checking the pdf to find which pages have images would
>>>>>>>>
>>>>>>>   be
>>>>
>>>>>   similar to checking a screen shot of a page to find and crop the
>>>>>>>>
>>>>>>>   image.
>>>>
>>>>>  It
>>>>>>>
>>>>>>>> seems that you would need to be able to see the pdf on the IVEO
>>>>>>>>
>>>>>>>   screen
>>>>
>>>>>  to
>>>>>>>
>>>>>>>> know if it contains an image. I am working with pdf files containing
>>>>>>>> anywhere between 30 and 80 pages. Embossing every page in order to
>>>>>>>>
>>>>>>> identify
>>>>>>>
>>>>>>>> the pages that contain images would not be practical.
>>>>>>>>
>>>>>>>> Dick Baldwin
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jan 27, 2012 at 11:48 AM, Richard Baldwin<
>>>>>>>>
>>>>>>> baldwin at dickbaldwin.com
>>>>>>>
>>>>>>>>  wrote:
>>>>>>>>> Amanda and others,
>>>>>>>>>
>>>>>>>>> I have contacted Adobe technical support. There solution to the
>>>>>>>>>
>>>>>>>>   problem
>>>>
>>>>>   is to purchase Acrobat Pro for $445.00. The tech support rep told
>>>>>>>>>
>>>>>>>>   me
>>>>
>>>>>  that
>>>>>>>
>>>>>>>>  their program will extract the pictures intact as separate bitmap
>>>>>>>>>
>>>>>>>>  files.
>>>>>>>
>>>>>>>>  Dick Baldwin
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jan 27, 2012 at 10:44 AM, Michael Whapples
>>>>>>>>>
>>>>>>>>   <mwhapples at aim.com
>>>>
>>>>>   wrote:
>>>>>>>>
>>>>>>>>>  Hello,
>>>>>>>>>>  From what you are describing, my feeling is that the
>>>>>>>>>>
>>>>>>>>>    diagrams/images
>>>>
>>>>>  in
>>>>>>>
>>>>>>>>  the PDF in question are created from a number of drawing elements
>>>>>>>>>>
>>>>>>>>>  rather
>>>>>>>
>>>>>>>>  than a single image object. I'm not an expert on PDF, but I think
>>>>>>>>>>
>>>>>>>>>    you
>>>>
>>>>>  could
>>>>>>>
>>>>>>>>  think of it like the difference of a bitmap being a single element
>>>>>>>>>>
>>>>>>>>>    (I
>>>>
>>>>>  think
>>>>>>>
>>>>>>>>  PDF has a way to specify the start of a stream object like a
>>>>>>>>>>
>>>>>>>>>    bitmap)
>>>>
>>>>>  and an
>>>>>>>
>>>>>>>>  SVG being formed from lots of elements like lines and shapes (I
>>>>>>>>>>
>>>>>>>>>    think
>>>>
>>>>>  in
>>>>>>>
>>>>>>>>  PDF the lines and such like can be created with basic PDF drawing
>>>>>>>>>> facilities so are not in a separate object). When the image is
>>>>>>>>>>
>>>>>>>>>    formed
>>>>
>>>>>  from
>>>>>>>
>>>>>>>>  lots of elements then it may be hard for the software to know what
>>>>>>>>>>
>>>>>>>>>  makes up
>>>>>>>
>>>>>>>>  a given diagram in the book/document, it just lays it out as
>>>>>>>>>>
>>>>>>>>>  specified and
>>>>>>>
>>>>>>>>  you work out what's related. I think one way to tell whether you
>>>>>>>>>>
>>>>>>>>>    have
>>>>
>>>>>  this
>>>>>>>
>>>>>>>>  sort of image is to see if NVDA will read some of the text labels
>>>>>>>>>>
>>>>>>>>>    of
>>>>
>>>>>  the
>>>>>>>
>>>>>>>>  image, if it does then its not a pure bitmap (you probably could
>>>>>>>>>>
>>>>>>>>>    use
>>>>
>>>>>  the
>>>>>>>
>>>>>>>>  read out lout function of adobe reader as well). Therefore I
>>>>>>>>>>
>>>>>>>>>    imagine
>>>>
>>>>>  that
>>>>>>>
>>>>>>>>  without clever recognition algorithms you are unlikely to get
>>>>>>>>>>
>>>>>>>>>  something
>>>>>>>
>>>>>>>>  which will extract it as you want.
>>>>>>>>>>
>>>>>>>>>> There is one option I am aware of for a blind person to do this
>>>>>>>>>> independently, IVEO like John suggested, however IVEO isn't a
>>>>>>>>>>
>>>>>>>>>    cheap
>>>>
>>>>>  option
>>>>>>>
>>>>>>>>  and depending on how much is to be done would determine whether
>>>>>>>>>>
>>>>>>>>>    its
>>>>
>>>>>  worth
>>>>>>>
>>>>>>>>  the money if providing accessible diagrams from PDF was its only
>>>>>>>>>>
>>>>>>>>>    use.
>>>>
>>>>>  IVEO
>>>>>>>
>>>>>>>>  does not require a tiger printer, swell paper would work, other
>>>>>>>>>>
>>>>>>>>>  embossers
>>>>>>>
>>>>>>>>  may (the outputting from IVEO is the question as I think it may
>>>>>>>>>>
>>>>>>>>>    only
>>>>
>>>>>  output
>>>>>>>
>>>>>>>>  to devices appearing as standard printers). Interesting, the IVEO
>>>>>>>>>>
>>>>>>>>>  route
>>>>>>>
>>>>>>>>  again is requiring a human to make the decision on what forms the
>>>>>>>>>>
>>>>>>>>>  diagram.
>>>>>>>
>>>>>>>>  Michael Whapples
>>>>>>>>>>
>>>>>>>>>> -----Original Message----- From: Richard Baldwin
>>>>>>>>>> Sent: Friday, January 27, 2012 3:28 PM
>>>>>>>>>> To: Jamal Mazrui
>>>>>>>>>> Cc: Blind Math list for those interested in mathematics
>>>>>>>>>> Subject: Re: [Blindmath] Extracting bitmap images from pdf files
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Jamal,
>>>>>>>>>>
>>>>>>>>>> It is a great program, easy to use, and probably totally
>>>>>>>>>>
>>>>>>>>>    accessible. I
>>>>
>>>>>    particularly like the fact that the program doesn't require a
>>>>>>>>>>
>>>>>>>>>    windows
>>>>
>>>>>    installation. The output data is well organized and including the
>>>>>>>>>>
>>>>>>>>>    page
>>>>
>>>>>    numbers in the bmp file names is a great help in analyzing them.
>>>>>>>>>>
>>>>>>>>>> Unfortunately, the output produced by the program suffers from the
>>>>>>>>>>
>>>>>>>>>  same
>>>>>>>
>>>>>>>>  issues that I have encountered with all of the other image
>>>>>>>>>>
>>>>>>>>>    extractor
>>>>
>>>>>    programs that I have tried. A few of the images come out intact.
>>>>>>>>>>
>>>>>>>>>    Most
>>>>
>>>>>  of
>>>>>>>
>>>>>>>>  the images don't come out intact.
>>>>>>>>>>
>>>>>>>>>> For example, page three of one of the pdf files that I tested has
>>>>>>>>>>
>>>>>>>>>    a
>>>>
>>>>>    single
>>>>>>>>>> image of a battery. It is the same image that I enhanced and
>>>>>>>>>>
>>>>>>>>>    posted
>>>>
>>>>>  in an
>>>>>>>
>>>>>>>>  earlier post. Your program produced 54 bmp files for that page. A
>>>>>>>>>>
>>>>>>>>>    few
>>>>
>>>>>  of
>>>>>>>
>>>>>>>>  them were icons such as arrows exclamation marks, etc. The
>>>>>>>>>>
>>>>>>>>>    remaining
>>>>
>>>>>  bmp
>>>>>>>
>>>>>>>>  files appear to be a very small pieces of the image of the
>>>>>>>>>>
>>>>>>>>>    battery.
>>>> By
>>>>
>>>>>    the
>>>>>>>>>> way, I got the earlier image of the battery by taking a screen
>>>>>>>>>>
>>>>>>>>>    shot
>>>> of
>>>>
>>>>>    the
>>>>>>>>>> page and using an image editing program to crop out the battery
>>>>>>>>>>
>>>>>>>>>    image.
>>>>
>>>>>    None
>>>>>>>>>> of the image extraction programs that I have tested extract the
>>>>>>>>>>
>>>>>>>>>    image
>>>>
>>>>>    intact.
>>>>>>>>>>
>>>>>>>>>> I don't know anything at all about the internal structure of pdf
>>>>>>>>>>
>>>>>>>>>  files,
>>>>>>>
>>>>>>>>  and
>>>>>>>>>> this behavior of breaking an image into many small pieces may
>>>>>>>>>>
>>>>>>>>>    depend
>>>>
>>>>>  on
>>>>>>>
>>>>>>>>  how
>>>>>>>>>> the file is constructed in the first place. In any event, my
>>>>>>>>>>
>>>>>>>>>    immediate
>>>>
>>>>>    problem has to do with a specific set of pdf files that are the
>>>>>>>>>>
>>>>>>>>>  chapters
>>>>>>>
>>>>>>>>  from a specific physics book, so this program doesn't solve my
>>>>>>>>>>
>>>>>>>>>  problem.
>>>>>>>
>>>>>>>>  Thanks for offering the program.
>>>>>>>>>> Dick Baldwin
>>>>>>>>>>
>>>>>>>>>> On Fri, Jan 27, 2012 at 5:18 AM, Jamal Mazrui<empower at smart.net>
>>>>>>>>>>
>>>>>>>>>  wrote:
>>>>>>>
>>>>>>>>   In an attempt to facilitate a free, non-web dependent solution, I
>>>>>>>>>>
>>>>>>>>>  have
>>>>>>>
>>>>>>>>   written a Windows console-mode utility called PDF2Images, built
>>>>>>>>>>>
>>>>>>>>>>    with
>>>>
>>>>>    PowerBASIC and a PDF library.  The distribution archive,
>>>>>>>>>>>
>>>>>>>>>>    including
>>>>
>>>>>    documentation and source code, is available at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>     http://empowermentzone.com/******pdf2images.zip<
>>>> http://empowermentzone.com/*
>>>> ***pdf2images.zip>
>>>> <http://**empowermentzone.com/****pdf2images.zip<
>>>> http://empowermentzone.com/
>>>> **pdf2images.zip>
>>>> <http://**empowermentzone.com/****pdf2images.zip<
>>>> http://empowermentzone.com/
>>>> **pdf2images.zip>
>>>> <http://**empowermentzone.com/**pdf2images.zip<
>>>> http://empowermentzone.com/pd
>>>> f2images.zip>
>>>>
>>>>>
>>>>>>>>>>> I am interested in any feedback on how well it works compared to
>>>>>>>>>>>
>>>>>>>>>>   other
>>>>>>>
>>>>>>>>   approaches.
>>>>>>>>>>>
>>>>>>>>>>> Jamal
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  --
>>>>>>>>>> Richard G. Baldwin (Dick Baldwin)
>>>>>>>>>> Home of Baldwin's on-line Java Tutorials
>>>>>>>>>> http://www.DickBaldwin.com
>>>>>>>>>>
>>>>>>>>>> Professor of Computer Information Technology
>>>>>>>>>> Austin Community College
>>>>>>>>>> (512) 223-4758 <%28512%29%20223-4758>
>>>>>>>>>> mailto:Baldwin at DickBaldwin.com
>>>>>>>>>>
>>>>>>>>>>    http://www.austincc.edu/****baldwin/<
>>>>> http://www.austincc.edu/**baldwin/
>>>>> <
>>>>>
>>>>>>  http://www.austincc.edu/**baldwin/<http://www.austincc.edu/baldwin/
>>>>>>>
>>>>>>>>  ______________________________****_________________
>>>>>>>>>> Blindmath mailing list
>>>>>>>>>> Blindmath at nfbnet.org
>>>>>>>>>>
>>>>>>>>>>    http://nfbnet.org/mailman/****listinfo/blindmath_nfbnet.org<
>>>> http://nfbnet.or
>>>> g/mailman/**listinfo/blindmath_nfbnet.org>
>>>> <**http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<
>>>> http://nfbnet.o
>>>> rg/mailman/listinfo/blindmath_nfbnet.org>
>>>>
>>>>>    To unsubscribe, change your list options or get your account info
>>>>>>>>>>
>>>>>>>>>    for
>>>>
>>>>>    Blindmath:
>>>>>>>>>>
>>>>>>>>>>
>>>> http://nfbnet.org/mailman/****options/blindmath_nfbnet.org/****<
>>>> http://nfbne
>>>> t.org/mailman/**options/blindmath_nfbnet.org/**>
>>>>
>>>>>    mwhapples%40aim.com<http://**nfbnet.org/mailman/options/**>>
>>>>>>>>>>
>>>>>>>>>    blindmath_nfbnet.org/**mwhapples%40aim.com<
>>>> http://nfbnet.org/mailman/options
>>>> /blindmath_nfbnet.org/mwhapples%40aim.com>
>>>>
>>>>>    ______________________________****_________________
>>>>>>>>>> Blindmath mailing list
>>>>>>>>>> Blindmath at nfbnet.org
>>>>>>>>>>
>>>>>>>>>>    http://nfbnet.org/mailman/****listinfo/blindmath_nfbnet.org<
>>>> http://nfbnet.or
>>>> g/mailman/**listinfo/blindmath_nfbnet.org>
>>>> <**http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<
>>>> http://nfbnet.o
>>>> rg/mailman/listinfo/blindmath_nfbnet.org>
>>>>
>>>>>    To unsubscribe, change your list options or get your account info
>>>>>>>>>>
>>>>>>>>>    for
>>>>
>>>>>    Blindmath:
>>>>>>>>>>
>>>>>>>>>>
>>>> http://nfbnet.org/mailman/****options/blindmath_nfbnet.org/****<
>>>> http://nfbne
>>>> t.org/mailman/**options/blindmath_nfbnet.org/**>
>>>>
>>>>>    baldwin%40dickbaldwin.com<http**://nfbnet.org/mailman/options/**
>>>>>>>>>>
>>>>>>>>>    blindmath_nfbnet.org/baldwin%**40dickbaldwin.com<
>>>> http://nfbnet.org/mailman/o
>>>> ptions/blindmath_nfbnet.org/baldwin%40dickbaldwin.com>
>>>>
>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Richard G. Baldwin (Dick Baldwin)
>>>>>>>>> Home of Baldwin's on-line Java Tutorials
>>>>>>>>> http://www.DickBaldwin.com
>>>>>>>>>
>>>>>>>>> Professor of Computer Information Technology
>>>>>>>>> Austin Community College
>>>>>>>>> (512) 223-4758 <%28512%29%20223-4758>
>>>>>>>>> mailto:Baldwin at DickBaldwin.com
>>>>>>>>> http://www.austincc.edu/**baldwin/
>>>>>>>>>
>>>>>>>>   <http://www.austincc.edu/baldwin/>
>>>>
>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Richard G. Baldwin (Dick Baldwin)
>>>>>>>> Home of Baldwin's on-line Java Tutorials
>>>>>>>> http://www.DickBaldwin.com
>>>>>>>>
>>>>>>>> Professor of Computer Information Technology
>>>>>>>> Austin Community College
>>>>>>>> (512) 223-4758 <%28512%29%20223-4758>
>>>>>>>> mailto:Baldwin at DickBaldwin.com
>>>>>>>> http://www.austincc.edu/**baldwin/<
>>>>>>>>
>>>>>>>   http://www.austincc.edu/baldwin/>
>>>>
>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Richard G. Baldwin (Dick Baldwin)
>>>>>>> Home of Baldwin's on-line Java Tutorials
>>>>>>> http://www.DickBaldwin.com
>>>>>>>
>>>>>>> Professor of Computer Information Technology
>>>>>>> Austin Community College
>>>>>>> (512) 223-4758 <%28512%29%20223-4758>
>>>>>>> mailto:Baldwin at DickBaldwin.com
>>>>>>> http://www.austincc.edu/**baldwin/<http://www.austincc.edu/baldwin/>
>>>>>>> ______________________________**_________________
>>>>>>> Blindmath mailing list
>>>>>>> Blindmath at nfbnet.org
>>>>>>>
>>>>>>>   http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<
>>>> http://nfbnet.org/
>>>> mailman/listinfo/blindmath_nfbnet.org>
>>>>
>>>>>  To unsubscribe, change your list options or get your account info for
>>>>>>> Blindmath:
>>>>>>> http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**
>>>>>>>
>>>>>>>   brh%40opticinspiration.org<
>>>> http://nfbnet.org/mailman/options/blindmath_nfbne
>>>> t.org/brh%40opticinspiration.org>
>>>>
>>>>>
>>>>>> ______________________________**_________________
>>>>>> Blindmath mailing list
>>>>>> Blindmath at nfbnet.org
>>>>>>
>>>>>>  http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<
>>>> http://nfbnet.org/
>>>> mailman/listinfo/blindmath_nfbnet.org>
>>>>
>>>>>  To unsubscribe, change your list options or get your account info for
>>>>>> Blindmath:
>>>>>> http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**
>>>>>>
>>>>>>  baldwin%40dickbaldwin.com<
>>>> http://nfbnet.org/mailman/options/blindmath_nfbnet
>>>> .org/baldwin%40dickbaldwin.com>
>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Richard G. Baldwin (Dick Baldwin)
>>>>> Home of Baldwin's on-line Java Tutorials
>>>>> http://www.DickBaldwin.com
>>>>>
>>>>> Professor of Computer Information Technology
>>>>> Austin Community College
>>>>> (512) 223-4758 <%28512%29%20223-4758>
>>>>> mailto:Baldwin at DickBaldwin.com
>>>>> http://www.austincc.edu/baldwin/
>>>>> _______________________________________________
>>>>> Blindmath mailing list
>>>>> Blindmath at nfbnet.org
>>>>> http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org
>>>>> To unsubscribe, change your list options or get your account info
>>>>> for Blindmath:
>>>>>
>>>>>
>>>> http://nfbnet.org/mailman/options/blindmath_nfbnet.org/brh%40opticinspirati
>>>> on.org
>>>>
>>>>
>>>> _______________________________________________
>>>> Blindmath mailing list
>>>> Blindmath at nfbnet.org
>>>> http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org
>>>> To unsubscribe, change your list options or get your account info for
>>>> Blindmath:
>>>>
>>>> http://nfbnet.org/mailman/options/blindmath_nfbnet.org/bente%40casilenc.com
>>>>
>>>>
>>>> _______________________________________________
>>>> Blindmath mailing list
>>>> Blindmath at nfbnet.org
>>>> http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org
>>>> To unsubscribe, change your list options or get your account info for
>>>> Blindmath:
>>>>
>>>>
>>>> http://nfbnet.org/mailman/options/blindmath_nfbnet.org/baldwin%40dickbaldwin.com
>>>>
>>>>
>>>
>>>
>
>
>  --
> Richard G. Baldwin (Dick Baldwin)
> Home of Baldwin's on-line Java Tutorials
> http://www.DickBaldwin.com
>
> Professor of Computer Information Technology
> Austin Community College
> (512) 223-4758
> mailto:Baldwin at DickBaldwin.com
> http://www.austincc.edu/baldwin/
>
>


-- 
Richard G. Baldwin (Dick Baldwin)
Home of Baldwin's on-line Java Tutorials
http://www.DickBaldwin.com

Professor of Computer Information Technology
Austin Community College
(512) 223-4758
mailto:Baldwin at DickBaldwin.com
http://www.austincc.edu/baldwin/



More information about the BlindMath mailing list