[Blindmath] Extracting bitmap images from pdf files

Richard Baldwin baldwin at dickbaldwin.com
Fri Jan 27 18:16:03 UTC 2012


Amanda, you will need to talk to the folks at OSD or directly to the
publisher regarding image files.

Dick Baldwin

On Fri, Jan 27, 2012 at 12:03 PM, Amanda Lacy <lacy925 at gmail.com> wrote:

> I'd think it would be better to put that amount of money toward an IVEO
> system.
>
> What about the publisher? Do you think McGraw Hill has copies of the
> images separate from the text?
>
>
> Amanda
> ----- Original Message ----- From: "Richard Baldwin" <
> baldwin at dickbaldwin.com>
> To: "Blind Math list for those interested in mathematics" <
> blindmath at nfbnet.org>
> Sent: Friday, January 27, 2012 11:48 AM
>
> Subject: Re: [Blindmath] Extracting bitmap images from pdf files
>
>
>  Amanda and others,
>>
>> I have contacted Adobe technical support. There solution to the problem is
>> to purchase Acrobat Pro for $445.00. The tech support rep told me that
>> their program will extract the pictures intact as separate bitmap files.
>>
>> Dick Baldwin
>>
>> On Fri, Jan 27, 2012 at 10:44 AM, Michael Whapples <mwhapples at aim.com
>> >wrote:
>>
>>  Hello,
>>> From what you are describing, my feeling is that the diagrams/images in
>>> the PDF in question are created from a number of drawing elements rather
>>> than a single image object. I'm not an expert on PDF, but I think you
>>> could
>>> think of it like the difference of a bitmap being a single element (I
>>> think
>>> PDF has a way to specify the start of a stream object like a bitmap) and
>>> an
>>> SVG being formed from lots of elements like lines and shapes (I think in
>>> PDF the lines and such like can be created with basic PDF drawing
>>> facilities so are not in a separate object). When the image is formed
>>> from
>>> lots of elements then it may be hard for the software to know what makes
>>> up
>>> a given diagram in the book/document, it just lays it out as specified
>>> and
>>> you work out what's related. I think one way to tell whether you have
>>> this
>>> sort of image is to see if NVDA will read some of the text labels of the
>>> image, if it does then its not a pure bitmap (you probably could use the
>>> read out lout function of adobe reader as well). Therefore I imagine that
>>> without clever recognition algorithms you are unlikely to get something
>>> which will extract it as you want.
>>>
>>> There is one option I am aware of for a blind person to do this
>>> independently, IVEO like John suggested, however IVEO isn't a cheap
>>> option
>>> and depending on how much is to be done would determine whether its worth
>>> the money if providing accessible diagrams from PDF was its only use.
>>> IVEO
>>> does not require a tiger printer, swell paper would work, other embossers
>>> may (the outputting from IVEO is the question as I think it may only
>>> output
>>> to devices appearing as standard printers). Interesting, the IVEO route
>>> again is requiring a human to make the decision on what forms the
>>> diagram.
>>>
>>> Michael Whapples
>>>
>>> -----Original Message----- From: Richard Baldwin
>>> Sent: Friday, January 27, 2012 3:28 PM
>>> To: Jamal Mazrui
>>> Cc: Blind Math list for those interested in mathematics
>>> Subject: Re: [Blindmath] Extracting bitmap images from pdf files
>>>
>>>
>>> Hi Jamal,
>>>
>>> It is a great program, easy to use, and probably totally accessible. I
>>> particularly like the fact that the program doesn't require a windows
>>> installation. The output data is well organized and including the page
>>> numbers in the bmp file names is a great help in analyzing them.
>>>
>>> Unfortunately, the output produced by the program suffers from the same
>>> issues that I have encountered with all of the other image extractor
>>> programs that I have tried. A few of the images come out intact. Most of
>>> the images don't come out intact.
>>>
>>> For example, page three of one of the pdf files that I tested has a
>>> single
>>> image of a battery. It is the same image that I enhanced and posted in an
>>> earlier post. Your program produced 54 bmp files for that page. A few of
>>> them were icons such as arrows exclamation marks, etc. The remaining bmp
>>> files appear to be a very small pieces of the image of the battery. By
>>> the
>>> way, I got the earlier image of the battery by taking a screen shot of
>>> the
>>> page and using an image editing program to crop out the battery image.
>>> None
>>> of the image extraction programs that I have tested extract the image
>>> intact.
>>>
>>> I don't know anything at all about the internal structure of pdf files,
>>> and
>>> this behavior of breaking an image into many small pieces may depend on
>>> how
>>> the file is constructed in the first place. In any event, my immediate
>>> problem has to do with a specific set of pdf files that are the chapters
>>> from a specific physics book, so this program doesn't solve my problem.
>>>
>>> Thanks for offering the program.
>>> Dick Baldwin
>>>
>>> On Fri, Jan 27, 2012 at 5:18 AM, Jamal Mazrui <empower at smart.net> wrote:
>>>
>>>  In an attempt to facilitate a free, non-web dependent solution, I have
>>>
>>>> written a Windows console-mode utility called PDF2Images, built with
>>>> PowerBASIC and a PDF library.  The distribution archive, including
>>>> documentation and source code, is available at
>>>>
>>>> http://empowermentzone.com/******pdf2images.zip<http://empowermentzone.com/****pdf2images.zip>
>>>> <http://**empowermentzone.com/****pdf2images.zip<http://empowermentzone.com/**pdf2images.zip>
>>>> >
>>>> <http://**empowermentzone.com/****pdf2images.zip<http://empowermentzone.com/**pdf2images.zip>
>>>> <http://**empowermentzone.com/**pdf2images.zip<http://empowermentzone.com/pdf2images.zip>
>>>> >
>>>>
>>>> >
>>>>
>>>>
>>>> I am interested in any feedback on how well it works compared to other
>>>> approaches.
>>>>
>>>> Jamal
>>>>
>>>>
>>>>
>>>>
>>>>
>>> --
>>> Richard G. Baldwin (Dick Baldwin)
>>> Home of Baldwin's on-line Java Tutorials
>>> http://www.DickBaldwin.com
>>>
>>> Professor of Computer Information Technology
>>> Austin Community College
>>> (512) 223-4758
>>> mailto:Baldwin at DickBaldwin.com
>>> http://www.austincc.edu/****baldwin/<http://www.austincc.edu/**baldwin/><
>>> http://www.austincc.edu/**baldwin/ <http://www.austincc.edu/baldwin/>>
>>> ______________________________****_________________
>>> Blindmath mailing list
>>> Blindmath at nfbnet.org
>>> http://nfbnet.org/mailman/****listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org>
>>> <**http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org>
>>> >
>>>
>>> To unsubscribe, change your list options or get your account info for
>>> Blindmath:
>>> http://nfbnet.org/mailman/****options/blindmath_nfbnet.org/****<http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**>
>>> mwhapples%40aim.com<http://**nfbnet.org/mailman/options/**
>>> blindmath_nfbnet.org/**mwhapples%40aim.com<http://nfbnet.org/mailman/options/blindmath_nfbnet.org/mwhapples%40aim.com>
>>> >
>>>
>>> ______________________________****_________________
>>> Blindmath mailing list
>>> Blindmath at nfbnet.org
>>> http://nfbnet.org/mailman/****listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org>
>>> <**http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org>
>>> >
>>>
>>> To unsubscribe, change your list options or get your account info for
>>> Blindmath:
>>> http://nfbnet.org/mailman/****options/blindmath_nfbnet.org/****<http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**>
>>> baldwin%40dickbaldwin.com<http**://nfbnet.org/mailman/options/**
>>> blindmath_nfbnet.org/baldwin%**40dickbaldwin.com<http://nfbnet.org/mailman/options/blindmath_nfbnet.org/baldwin%40dickbaldwin.com>
>>> >
>>>
>>>
>>
>>
>> --
>> Richard G. Baldwin (Dick Baldwin)
>> Home of Baldwin's on-line Java Tutorials
>> http://www.DickBaldwin.com
>>
>> Professor of Computer Information Technology
>> Austin Community College
>> (512) 223-4758
>> mailto:Baldwin at DickBaldwin.com
>> http://www.austincc.edu/**baldwin/ <http://www.austincc.edu/baldwin/>
>> ______________________________**_________________
>> Blindmath mailing list
>> Blindmath at nfbnet.org
>> http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org>
>> To unsubscribe, change your list options or get your account info for
>> Blindmath:
>> http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**
>> lacy925%40gmail.com<http://nfbnet.org/mailman/options/blindmath_nfbnet.org/lacy925%40gmail.com>
>>
>
>
> ______________________________**_________________
> Blindmath mailing list
> Blindmath at nfbnet.org
> http://nfbnet.org/mailman/**listinfo/blindmath_nfbnet.org<http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org>
> To unsubscribe, change your list options or get your account info for
> Blindmath:
> http://nfbnet.org/mailman/**options/blindmath_nfbnet.org/**
> baldwin%40dickbaldwin.com<http://nfbnet.org/mailman/options/blindmath_nfbnet.org/baldwin%40dickbaldwin.com>
>



-- 
Richard G. Baldwin (Dick Baldwin)
Home of Baldwin's on-line Java Tutorials
http://www.DickBaldwin.com

Professor of Computer Information Technology
Austin Community College
(512) 223-4758
mailto:Baldwin at DickBaldwin.com
http://www.austincc.edu/baldwin/



More information about the BlindMath mailing list