[Blindmath] problems reading PDF scientific documents

Jonathan Godfrey a.j.godfrey at massey.ac.nz
Wed Jun 6 20:52:07 UTC 2012


Further to Michael's message below, I'd add:

The best word to latex solutions I've tried rely 
on tools like MathType to convert the equations.

The conversions take the highly formatted Word 
document and try to get Latex to replicate that 
same look. This means the conversion gives very 
exhaustive Latex that is very difficult to read. 
I've taken the conversion table used by my 
favourite tool and deleted as much of the 
formatting as I could to clean the output file 
sufficiently. That tool is now a commercial 
product, with conversion assistance built into the service (at a price).

My preferred option now is to use the MathType to 
Latex conversion built into MathType 6. I open 
the Word document and select all and then toggle 
into Latex. The results leave me with a fairly 
readable outcome, although the Latex still isn't 
as clean as I'd prefer. At least I understand why 
it isn't as clean as I prefer and that's good 
enough for me. This means the tool mentioned above is somewhat redundant now.

Michael mentioned math objects being embedded 
properly in pdf documents. Documents starting 
life as Latex are currently not meeting the pdf 
standards for accessibility and work is being 
done. I read a paper by Ross Moore (2009) that 
exposes the problems and the work being done back 
then. Following up that thread is on my to-do list.

I still believe the best way forward on pdf is to 
try the author for the original document, even if 
it is a draft version. Converting the Latex 
source to an xml document is easy enough (once 
learned) and the results are often good enough to 
rely on. Note my standard is reliability not just 
interpretability due to my job. This means I 
continue to need to have equations taken from 
many pdf documents checked by sighted assistance. 
Paranoia  is a beautiful thing if used correctly. <sigh>

I do find some difficulty with Word documents 
created in older versions of MathType or by 
people whose actual machine is a Mac. Something 
seems to have altered in documents that were 
edited using Mac versions of Word circa 2003. 
Many equations were corrupted either by the 
particular software used or the person driving 
it, but I throw my hands up when faced with a 
document that dates back to that era. Yes we work 
on these documents all the time in my job;. Most 
lecturers don't have time to start from scratch 
when writing their notes. We build on the work of 
others all the time, especially in lower level 
undergraduate courses where very little has 
changed for decades (statistics) or centuries 
(mathematics) aside from the software we use to 
assist us. On this basis I'd hope we could find 
sufficient resources to back up the ones chosen 
by lecturers so that when the official modes fail 
us, the blind network gives us a safety net.

Jonathan

At 10:28 p.m. 6/06/2012, you wrote:
>I personally have not used infty reader, however 
>I have heard from some people that they have 
>found it useful. One such person is John Miller, 
>who I think is more likely to be found on the 
>NFB-science list he may be on this one. I have 
>also met people here in the UK who have used it 
>and said they had some good results.
>
>However infty reader is an optical character 
>recognition (OCR) package, so it isn't 
>guaranteed to get everything 100% correct and 
>quality of results is highly dependent on the 
>quality of the original document. I think OCR is 
>the only way currently PDF will be made 
>accessible for maths, even the maxtract project 
>I mentioned said that they cannot get all the 
>information they need by parsing the PDF 
>document. I believe there may be some work on 
>including maths in a PDF in an accessible form, 
>however that probably is years away from 
>completion and will require the PDF to be 
>created in the correct way so won't work for current PDF documents.
>
>As for converting Word documents to LaTeX, its 
>not something I have really done. I do know 
>there is software out there but I haven't tried 
>any. Normally if I have needed to read maths in 
>a Word document I have used MathType from design 
>science, which can either put the document out 
>to HTML with MathML which can be viewed in 
>internet explorer with math player and a screen 
>reader, or you can get it to toggle the 
>equations within the word document to be in 
>LaTeX notation (the rest of the document is 
>still normal word stuff, the equations are the only thing in LaTeX notation).
>
>As for the price of MathType, may be your 
>college/university has a license for it which 
>you could use, if not the student license of MathType isn't too bad on price.
>
>Also while on accessibility of Word and maths, I 
>believe Design science have created some 
>software to allow word documents with maths to 
>be exported as daisy books. I don't know what 
>cost on this is like or how well it works as I 
>haven't used it. Obviously you would also need a 
>daisy reader which is capable of maths content 
>(I believe GH-player and Dolphin's daisy reader 
>both can do this, there may be others).
>
>Michael Whapples
>On 6 Jun 2012, at 00:06, Géssica Michelle dos Santos Pereira wrote:
>
> > Thanks for your advice
> >
> > Michael,
> >
> > Do you think the infty reader really works? I've tried the trial
> > version, but the result is so strange... it means there are characters
> > left. Is there anything that I have to do before processing the pdf
> > documentt? I've got another proble now.It only recognizes the
> > alfanumerics characters and not the Maths characters, I don't know
> > why, but the out file is a "pdf2txt" instead of "txt.
> > Could you also tell me how you do to convert Word format to LaTeX too?
> > I've tried the Grindeq, but I think it doesn't recognize all the
> > document.
> > Thank you for the  clue on the new software , I will be watching.
> >
> > ***
> >
> > Lucas,
> >
> > Exactly! I'm from Curitiba! =)
> > In the graduation phase the teachers translated the documents for me, some
> > thing like this: an integral became I [i, j] f(x) dx...
> > But now I'm taking the master degree, there are lots of materials to
> > read, we can't keep translating like that.
> > And take this: what to do when the teacher doesn't know LaTex?
> > Of course, let's keep talking!
> >
> > ***
> >
> > José,
> >
> > I've seen other softwares in the same line, I mean, softwares to edit
> > LaTeX documents... but how to solve  the reading problem?
> >
> > ***
> >
> > Best wishes,
> >
> > Géssica Michelle
> >
> >
> >
> >
> > 2012/6/1, Michael Whapples <mwhapples at aim.com>:
> >> Hello,
> >> Further to that last message, one of the projects I am aware of which is
> >> working on the problem of accessing maths in PDF is maxtract and the
> >> blog for it can be found at
> >> http://researchblogs.cs.bham.ac.uk/math-access/category/maxtract/
> >>
> >> This is work in progress and I don't know when they plan to have
> >> something usable out.
> >>
> >> Michael Whapples
> >> On 31/05/2012 10:36, Michael Whapples wrote:
> >>> On 31/05/2012 01:26, Lucas Radaelli wrote:
> >>>> Hey Jessica,
> >>>>
> >>>> If you are from Curitiba we are from the same city! :)
> >>>>
> >>>> I have got a lot of problems in this area too. Specially in Brazil
> >>>> where it seems that nobody understands about that to give us a hand.
> >>>>
> >>>>
> >>>> What my teachers have done during graduation was create the scientific
> >>>> documents already in LaTeX, and they offered the normal pdf to other
> >>>> students. The program that michael mentioned I have never used yet,
> >>>> but gonna check it out. I did not oppenned the website yet, but I am
> >>>> almost sure that it is more than 500 dolars... I hope that I am wrong!
> >>>>
> >>>> You can send me a mail too, we can keep in contact to finde solutions
> >>>> together here in Brazil.
> >>>>
> >>>> Greetings!
> >>>>
> >>>>
> >>>> 2012/5/30, Michael Whapples<mwhapples at aim.com>:
> >>>>> The infty reader www.inftyproject.org may help. I understand other
> >>>>> things
> >>>>> are being developed but this the only working solution for now.
> >>>>>
> >>>>> Michael Whapples
> >>>>>
> >>>>> Sent from my iPod
> >>>>>
> >>>>> On 30 May 2012, at 22:24, Géssica Michelle dos Santos
> >>>>> Pereira<gessicamichelle at gmail.com>  wrote:
> >>>>>
> >>>>>> Dear all,
> >>>>>>
> >>>>>> I am Gessica, from the south of Brazil, with visual impairment, having
> >>>>>> difficulties in reading PDF scientific documents, when they show
> >>>>>> formulae.. I use screen reader Jaws and I've heard that LaTeX or
> >>>>>> MathML might be useful but would you know how to convert the PDF
> >>>>>> documents into these formats? .....moreover...how to use them?  I
> >>>>>> wonder if you could help me find the solution.
> >>>>>>
> >>>>>> Best regards,
> >>>>>>
> >>>>>> Gessica Michelle
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Blindmath mailing list
> >>>>>> Blindmath at nfbnet.org
> >>>>>> http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org
> >>>>>> To unsubscribe, change your list options or get your account info for
> >>>>>> Blindmath:
> >>>>>> 
> http://nfbnet.org/mailman/options/blindmath_nfbnet.org/mwhapples%40aim.com
> >>>>>>
> >>>>>>
> >>>>> _______________________________________________
> >>>>> Blindmath mailing list
> >>>>> Blindmath at nfbnet.org
> >>>>> http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org
> >>>>> To unsubscribe, change your list options or get your account info for
> >>>>> Blindmath:
> >>>>> 
> http://nfbnet.org/mailman/options/blindmath_nfbnet.org/lucasradaelli%40gmail.com
> >>>>>
> >>>>>
> >>>>>
> >>>> _______________________________________________
> >>>> Blindmath mailing list
> >>>> Blindmath at nfbnet.org
> >>>> http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org
> >>>> To unsubscribe, change your list options or get your account info for
> >>>> Blindmath:
> >>>> 
> http://nfbnet.org/mailman/options/blindmath_nfbnet.org/mwhapples%40aim.com
> >>>>
> >>>>
> >>> Yes, unfortunately the infty reader software is expensive. It is worth
> >>> noting though that infty reader is OCR software for maths, so could
> >>> put printed maths from paper into an electronic format such as LaTeX
> >>> or MathML as well.
> >>>
> >>> Hopefully one of the other projects I know of will allow access to PDF
> >>> at a lower cost.
> >>>
> >>> In many cases the PDF is created from another format such as LaTeX so
> >>> it might be worth contacting the author to see if they can provide the
> >>> document in another format.
> >>>
> >>> Michael Whapples
> >>>
> >>> _______________________________________________
> >>> Blindmath mailing list
> >>> Blindmath at nfbnet.org
> >>> http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org
> >>> To unsubscribe, change your list options or get your account info for
> >>> Blindmath:
> >>> 
> http://nfbnet.org/mailman/options/blindmath_nfbnet.org/mwhapples%40aim.com
> >>>
> >>>
> >>
> >>
> >> _______________________________________________
> >> Blindmath mailing list
> >> Blindmath at nfbnet.org
> >> http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org
> >> To unsubscribe, change your list options or get your account info for
> >> Blindmath:
> >> 
> http://nfbnet.org/mailman/options/blindmath_nfbnet.org/gessicamichelle%40gmail.com
> >>
> >
> > _______________________________________________
> > Blindmath mailing list
> > Blindmath at nfbnet.org
> > http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org
> > To unsubscribe, change your list options or 
> get your account info for Blindmath:
> > http://nfbnet.org/mailman/options/blindmath_nfbnet.org/mwhapples%40aim.com
>
>
>_______________________________________________
>Blindmath mailing list
>Blindmath at nfbnet.org
>http://nfbnet.org/mailman/listinfo/blindmath_nfbnet.org
>To unsubscribe, change your list options or get 
>your account info for Blindmath:
>http://nfbnet.org/mailman/options/blindmath_nfbnet.org/a.j.godfrey%40massey.ac.nz

_____
Dr A. Jonathan R. Godfrey
Lecturer in Statistics
Institute of Fundamental Sciences
Massey University
Palmerston North

Office: Science Tower B Room 3.15
Phone: +64-6-356 9099 ext 7705
Mobile: +64-29-538-9814
Home Address: 52 Linton St, Palm. Nth.
Home Phone: +64-6-353 2224 (Just think FLEABAG) 





More information about the BlindMath mailing list