[Blindmath] pdf intrigue

Rich Caloggero rjc at MIT.EDU
Mon Mar 23 19:20:47 UTC 2009


> The reason for representing strings such as "fi" as single combined 
> characters
> have to do with kerning or ligatures - I can't remember which one now, but
> they're both standard typographical techniques which TeX supports, and 
> which
> improve the quality of the typeset print.

Its odd that producing a PDF directly from source gives the worst results. 
This, to my mind, is the only way to avoid this ligature issue, since the 
exact text is known beforehand then it should be available for retrieval 
when the access technology goes to read the PDF.

Friends have had good luck with OpenOffice.org in terms of PDF creation. If 
the original document is created using propper tagging of headings, lists, 
tables, etc, OpenOffice.org will respect these structures and convert them 
into correct PDF tags which your adaptive technology can recognize.  The 
issue for me with respect to OpenOffice is the interface. It doesn't work 
well with Jaws on Windows. Its somewhat better on Linux with Orca, but Word 
on windows still provides a much better interface overall.  Word's 
conversion to PDF is not as good as that done by OpenOffice however, so use 
OpenOffice to produce PDFs of real-world documents if possible.


Just my two cents...
-- Rich

----- Original Message ----- 
From: "Jason White" <jason at jasonjgw.net>
To: <blindmath at nfbnet.org>
Sent: Friday, March 20, 2009 12:47 AM
Subject: Re: [Blindmath] pdf intrigue


> Jonathan Godfrey <a.j.godfrey at massey.ac.nz> wrote:
>> 3. many pdf files do not convert to text cleanly as spaces are either
>> included in funny places and then not in others. Line breaks between
>> words is another frustration.
>
> I think this is the result of proportional spacing. Basically, Postscript 
> and
> PDF use layout operators to control the position of each character 
> precisely.
> Software that converts the PDF to text has to examine the spacing and
> determine where to place spaces in the output file.
>
> Good typesetting software such as TeX adjusts the spacing between 
> characters
> and between words so as to align both the left and right edges of the 
> printed
> text, which makes the print easier to read when it is done well. TeX has a
> reputation for being particularly good in this regard.
>
> Thus the effect of the high-quality justification algorithms is to make it
> harder for PDF to text converters to determine the spacing between words
> correctly.
>> 4. If I make a pdf straight from the source code it is often a mess 
>> (point 3).
>> 5. If I make the dvi file and then convert to pdf the problems with
>> point 3 remain.
>> 6. When I go through the process of making the dvi file, then the post
>> script file and then  making the pdf from the post script file it ends up
>> considerably easier to read the text with jaws.
>
> I don't know, but I would suggest running pdftotext on the file to see 
> whether
> it does a better job. It's available for Linux - I'm not a Windows user, 
> thus
> I can't comment about Jaws or Adobe.
>> 7. Points 4 and 5 also lead to character strings involving an f are
>> often not converted to text properly. This includes the strings "ff",
>> "fi", and "ffi" just to illustrate three different problems. The
>> laborious creation of the pdf (point 6) seems to work for these
>> character strings.
>
> In some fonts, those strings are actually represented as single characters
> rather than as two separate characters, and the problem is that your text
> converter isn't recognizing this.
>
> The reason for representing strings such as "fi" as single combined 
> characters
> have to do with kerning or ligatures - I can't remember which one now, but
> they're both standard typographical techniques which TeX supports, and 
> which
> improve the quality of the typeset print.
>
>
>
> _______________________________________________
> Blindmath mailing list
> Blindmath at nfbnet.org
> http://www.nfbnet.org/mailman/listinfo/blindmath_nfbnet.org
> To unsubscribe, change your list options or get your account info for 
> Blindmath:
> http://www.nfbnet.org/mailman/options/blindmath_nfbnet.org/rjc%40mit.edu
> 




More information about the BlindMath mailing list