[Blindmath] Extracting bitmap images from pdf files
Richard Baldwin
baldwin at dickbaldwin.com
Sat Jan 28 01:19:51 UTC 2012
Hi Michael,
My comments are embedded in your text below.
On Fri, Jan 27, 2012 at 3:31 PM, Michael Whapples <mwhapples at aim.com> wrote:
> Sorry, I forgot why you want the bitmap files, its to enhance the image
> further not so it could be used by other embossers.
>
rgb] Well, actually I want bitmap files for both purposes. My immediate
need is to help Amanda with the pictures in her physics book. Having said
that, I also want to be able to enhance the bitmap files for embossing.
Enhancement would be nice for a Tiger, but is absolutely critical for
embossers that have no gray scale capability at all, like Amanda's Juliet
embosser. She and I have developed a program that does a respectable job of
enhancing images for a Juliet embosser once you get your hands on the
bitmap file.
rgb] In solving Amanda's problem, I am hopeful that I can develop a
solution for other blind students as well. This issue not only impacts
blind math and physics students, it also impacts every blind student using
college textbooks with inaccessible pictures.
>
> Is there any image printing drivers? I mean is there anything like those
> software tools which appear like a printer but output to a PDF, but instead
> they output to bitmap or JPG, that would make IVEO do what you want.
>
rgb] I don't know the answer to that question.
>
> However again as I mentioned if this was the only use you would put IVEO
> to, how does it compare on price to the acrobat pro option (providing that
> really works as adobe suggest).
rgb] I can't speak for Amanda, but I suspect that either option would be
prohibitively expensive for many blind students.
Thanks for the input.
Dick Baldwin
>
>
> Michael Whapples
>
> -----Original Message----- From: Richard Baldwin
> Sent: Friday, January 27, 2012 6:47 PM
> To: Blind Math list for those interested in mathematics
>
> Subject: Re: [Blindmath] Extracting bitmap images from pdf files
>
> Michael wrote " There is one option I am aware of for a blind person to do
> this independently, IVEO like John suggested,"
>
> I may be wrong, but I didn't get the idea that John's solution will produce
> an output bitmap file - only an embossed image.
>
> I may be wrong again, but as near as I can tell, IVEO doesn't do any image
> enhancement prior to embossing the image. If I am wrong on these points,
> John will probably come online and set the record straight.
>
> IVEO seems to simply convert the bitmap image to gray scale and emboss the
> gray scale. While gray scale embossing is okay for some images (especially
> blank and white images), it is definitely not the best option for many
> images. After all, if you convert 16 million colors to four levels of gray
> scale, each level of gray scale represents 4 million different colors.
> Pixels belonging to each set of 4 million colors will not be
> distinguishable in the gray scale representation.
>
> My objective is to gain access to full-color bitmap images so that I can
> enhance the image for embossing prior to throwing away all of the color
> information.
>
> Embossed versions of bitmap images are often very difficult to understand,
> even with a decent description. I believe we need to do everything
> reasonable to improve the understandability of embossed bitmap images. In
> some cases, image enhancement techniques at the full-color stage can be
> used to provide those improvements.
>
> So, my quest continues, hopefully without having to pay $445.00 for Acrobat
> Pro, just to get access to the images.
>
> The fallback position, of course, is to use screen shots and an image
> editor program to crop out the individual images, but that approach is not
> possible for a blind person to use. You can't crop an image out of a screen
> shot unless you can see the image.
>
> By the way, I don't know how a blind person would carry out the second of
> the following two steps in John's procedure:
>
> * import the PDF into IVEO Creator Pro.
> * Check the PDF to find which pages have images of interest and emboss
> those
> pages.
>
> It seems that checking the pdf to find which pages have images would be
> similar to checking a screen shot of a page to find and crop the image. It
> seems that you would need to be able to see the pdf on the IVEO screen to
> know if it contains an image. I am working with pdf files containing
> anywhere between 30 and 80 pages. Embossing every page in order to identify
> the pages that contain images would not be practical.
>
> Dick Baldwin
>
> On Fri, Jan 27, 2012 at 11:48 AM, Richard Baldwin
> <baldwin at dickbaldwin.com>**wrote:
>
> Amanda and others,
>>
>> I have contacted Adobe technical support. There solution to the problem is
>> to purchase Acrobat Pro for $445.00. The tech support rep told me that
>> their program will extract the pictures intact as separate bitmap files.
>>
>> Dick Baldwin
>>
>>
>> On Fri, Jan 27, 2012 at 10:44 AM, Michael Whapples <mwhapples at aim.com
>> >wrote:
>>
>> Hello,
>>> From what you are describing, my feeling is that the diagrams/images in
>>> the PDF in question are created from a number of drawing elements rather
>>> than a single image object. I'm not an expert on PDF, but I think you
>>> could
>>> think of it like the difference of a bitmap being a single element (I
>>> think
>>> PDF has a way to specify the start of a stream object like a bitmap) and
>>> an
>>> SVG being formed from lots of elements like lines and shapes (I think in
>>> PDF the lines and such like can be created with basic PDF drawing
>>> facilities so are not in a separate object). When the image is formed
>>> from
>>> lots of elements then it may be hard for the software to know what makes
>>> up
>>> a given diagram in the book/document, it just lays it out as specified
>>> and
>>> you work out what's related. I think one way to tell whether you have
>>> this
>>> sort of image is to see if NVDA will read some of the text labels of the
>>> image, if it does then its not a pure bitmap (you probably could use the
>>> read out lout function of adobe reader as well). Therefore I imagine that
>>> without clever recognition algorithms you are unlikely to get something
>>> which will extract it as you want.
>>>
>>> There is one option I am aware of for a blind person to do this
>>> independently, IVEO like John suggested, however IVEO isn't a cheap
>>> option
>>> and depending on how much is to be done would determine whether its worth
>>> the money if providing accessible diagrams from PDF was its only use.
>>> IVEO
>>> does not require a tiger printer, swell paper would work, other embossers
>>> may (the outputting from IVEO is the question as I think it may only
>>> output
>>> to devices appearing as standard printers). Interesting, the IVEO route
>>> again is requiring a human to make the decision on what forms the
>>> diagram.
>>>
>>> Michael Whapples
>>>
>>> -----Original Message----- From: Richard Baldwin
>>> Sent: Friday, January 27, 2012 3:28 PM
>>> To: Jamal Mazrui
>>> Cc: Blind Math list for those interested in mathematics
>>> Subject: Re: [Blindmath] Extracting bitmap images from pdf files
>>>
>>>
>>> Hi Jamal,
>>>
>>> It is a great program, easy to use, and probably totally accessible. I
>>> particularly like the fact that the program doesn't require a windows
>>> installation. The output data is well organized and including the page
>>> numbers in the bmp file names is a great help in analyzing them.
>>>
>>> Unfortunately, the output produced by the program suffers from the same
>>> issues that I have encountered with all of the other image extractor
>>> programs that I have tried. A few of the images come out intact. Most of
>>> the images don't come out intact.
>>>
>>> For example, page three of one of the pdf files that I tested has a
>>> single
>>> image of a battery. It is the same image that I enhanced and posted in an
>>> earlier post. Your program produced 54 bmp files for that page. A few of
>>> them were icons such as arrows exclamation marks, etc. The remaining bmp
>>> files appear to be a very small pieces of the image of the battery. By
>>> the
>>> way, I got the earlier image of the battery by taking a screen shot of
>>> the
>>> page and using an image editing program to crop out the battery image.
>>> None
>>> of the image extraction programs that I have tested extract the image
>>> intact.
>>>
>>> I don't know anything at all about the internal structure of pdf files,
>>> and
>>> this behavior of breaking an image into many small pieces may depend on
>>> how
>>> the file is constructed in the first place. In any event, my immediate
>>> problem has to do with a specific set of pdf files that are the chapters
>>> from a specific physics book, so this program doesn't solve my problem.
>>>
>>> Thanks for offering the program.
>>> Dick Baldwin
>>>
>>> On Fri, Jan 27, 2012 at 5:18 AM, Jamal Mazrui <empower at smart.net> wrote:
>>>
>>> In an attempt to facilitate a free, non-web dependent solution, I have
>>>
>>>> written a Windows console-mode utility called PDF2Images, built with
>>>> PowerBASIC and a PDF library. The distribution archive, including
>>>> documentation and source code, is available at
>>>>
>>>> http://empowermentzone.com/******pdf2images.zip<http://empowermentzone.com/****pdf2images.zip>
>>>> <http://**empowermentzone.com/****pdf2images.zip<http://empowermentzone.com/**pdf2images.zip>
>>>> >
>>>> <http://**empowermentzone.com/****pdf2images.zip<http://empowermentzone.com/**pdf2images.zip>
>>>> <http://**empowermentzone.com/**pdf2images.zip<http://empowermentzone.com/pdf2images.zip>
>>>> >
>>>>
>>>> >
>>>>
>>>>
>>>> I am interested in any feedback on how well it works compared to other
>>>> approaches.
>>>>
>>>> Jamal
>>>>
>>>>
>>>>
>>>>
>>>>
>>> --
>>> Richard G. Baldwin (Dick Baldwin)
>>> Home of Baldwin's on-line Java Tutorials
>>> http://www.DickBaldwin.com
>>>
>>> Professor of Computer Information Technology
>>> Austin Community College
>>> (512) 223-4758
>>> mailto:Baldwin at DickBaldwin.com
>>> http://www.austincc.edu/****baldwin/<http://www.austincc.edu/**baldwin/><
>>> http://www.austincc.edu/**baldwin/ <http://www.austincc.edu/baldwin/>>
>>> ______________________________****_________________
>>> Blindmath mailing list
>>> Blindmath at nfbnet.org
>>> http://nfbnet.org/mailman/****listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org>
>>> <**http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org>
>>> >
>>>
>>> To unsubscribe, change your list options or get your account info for
>>> Blindmath:
>>> http://nfbnet.org/mailman/****options/blindmath_nfbnet.org/****<http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**>
>>> mwhapples%40aim.com<http://**nfbnet.org/mailman/options/**
>>> blindmath_nfbnet.org/**mwhapples%40aim.com<http://nfbnet.org/mailman/options/blindmath_nfbnet.org/mwhapples%40aim.com>
>>> >
>>>
>>> ______________________________****_________________
>>> Blindmath mailing list
>>> Blindmath at nfbnet.org
>>> http://nfbnet.org/mailman/****listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org>
>>> <**http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org>
>>> >
>>>
>>> To unsubscribe, change your list options or get your account info for
>>> Blindmath:
>>> http://nfbnet.org/mailman/****options/blindmath_nfbnet.org/****<http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**>
>>> baldwin%40dickbaldwin.com<http**://nfbnet.org/mailman/options/**
>>> blindmath_nfbnet.org/baldwin%**40dickbaldwin.com<http://nfbnet.org/mailman/options/blindmath_nfbnet.org/baldwin%40dickbaldwin.com>
>>> >
>>>
>>>
>>
>>
>> --
>> Richard G. Baldwin (Dick Baldwin)
>> Home of Baldwin's on-line Java Tutorials
>> http://www.DickBaldwin.com
>>
>> Professor of Computer Information Technology
>> Austin Community College
>> (512) 223-4758
>> mailto:Baldwin at DickBaldwin.com
>> http://www.austincc.edu/**baldwin/ <http://www.austincc.edu/baldwin/>
>>
>>
>
>
> --
> Richard G. Baldwin (Dick Baldwin)
> Home of Baldwin's on-line Java Tutorials
> http://www.DickBaldwin.com
>
> Professor of Computer Information Technology
> Austin Community College
> (512) 223-4758
> mailto:Baldwin at DickBaldwin.com
> http://www.austincc.edu/**baldwin/ <http://www.austincc.edu/baldwin/>
> ______________________________**_________________
> Blindmath mailing list
> Blindmath at nfbnet.org
> http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org>
> To unsubscribe, change your list options or get your account info for
> Blindmath:
> http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**
> mwhapples%40aim.com<http://nfbnet.org/mailman/options/blindmath_nfbnet.org/mwhapples%40aim.com>
>
> ______________________________**_________________
> Blindmath mailing list
> Blindmath at nfbnet.org
> http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org>
> To unsubscribe, change your list options or get your account info for
> Blindmath:
> http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**
> baldwin%40dickbaldwin.com<http://nfbnet.org/mailman/options/blindmath_nfbnet.org/baldwin%40dickbaldwin.com>
>
--
Richard G. Baldwin (Dick Baldwin)
Home of Baldwin's on-line Java Tutorials
http://www.DickBaldwin.com
Professor of Computer Information Technology
Austin Community College
(512) 223-4758
mailto:Baldwin at DickBaldwin.com
http://www.austincc.edu/baldwin/
More information about the BlindMath
mailing list