[Blindmath] PDF Postscript Latex - and "Kerligs"

Bernard M Diaz b.m.diaz at liverpool.ac.uk
Fri Mar 20 19:01:55 UTC 2009


Hi,

> I've been watching the discussion re LaTeX and pdf documents over the 
> last week or more and now have clarified something for myself. 
> Hopefully my comments prove useful for others.

(La)Tex (by Knuth, the doyen, and theory guru of computer science)
is intended for those seeking precise control over their documents,
while Adobe's pdf (other PDFz exist!) is intended as a "document 
description data format" for use with a precise "renderer" (computer
programs converting data in one form, usually to a 2D other form).

My guess is that LaTex is good for the blindmath community because
of the control precision associated with it. The downside, as
Michael W and other have pointed out is, a) it must be learned, and
b) how does one check "precision"? e=mc**2 is precise only to
me, though I guess you can all attempt a guess at what I mean.

The last point concerns precision and you'll note I've already
used several implicit (and confusing) definitions already. Precision
is tough to achieve, and when you do, one peeps precision is another
persons look see.

> As part of being a lecturer I am often making notes for students. 
> LaTeX has been my tool of choice for quite some time. I used to 
> generate post script files and print them for the students. With our 
> moves towards less paper and more electronic delivery, I am now 
> making more pdf documents for them instead.

Nothing actually wrong with (Adobe's) Postscript. It's a
programming language and can "precisely" describe any 2 or 3D
scene and in ASCII to boot. The problem comes a) because it is
normally written by and for computers, and b) when it comes to
coding pixel maps (images) it uses one of the neatest things in
computer science (the image operator) - but this, and the image
generated, are incomprehensible. This is not a problem with
Postscript but a problem of "images" as everyone in this
discussion group will no doubt appreciate.

> I have also been reading more and more papers from other authors in 
> pdf format via our library's connection to various journal databases.

PDF is an evolving format (Adobe has released control of it, I
understand) the recent Wikipedia page is excellent. It (pdf
not the article) suffers the same problem as Postscript, that is,
associated with "rendering" it to the 2D computer screen.

> I have wondered for some time why various different sources led to 
> different quality of pdf documents and have at least for my own pdf 
> creation found out a few things this year.

Entirely to do with the evolution of the format and the nature
of renderers. There is an issue with any PDF that embeds an
image of algebra, of line graphics, or indeed any pix; but that
is another matter.

> Regardless of the source file I use, I find the following results.

Your observations are accurate; but needs the caveats that they
depend on the PDF format version, and renderer being used.

> 1. We can't read post script files - well known fact.

But not one known by, or to me ... :-) It would be more accurate
to say "Most Postscript is not intended to be read by humans ..."

> 2. we can't read the majority of equations in pdf files. (also a well 
> known fact)

Depends on pdf version and whether pix/bitmaps are in use. Adobe
pdfz are the next generation Adobe Postscript and suffer exactly
the same problems as Postscript. It's just that there is more in
there, some of this e.g. scaleable vector graphics or SVG,
allows the use of affine transformations on real number based
data capturing vector representations ... and a deal else ...

The idea has been around for a long time, taken up by Sun,
and when embedded into XML by Microsoft gained an entirely
new lease of life ~last year. In SVG form, it is immediately of
use to us. However, it depends on whether the math meta structure
has been explicitly captured. cf e=mc^2 above! Which needs
someone to explain the meta structure "exponent" among other
things!

> 3. many pdf files do not convert to text cleanly as spaces are either 
> included in funny places and then not in others. Line breaks between 
> words is another frustration.

PDFs are generated by a renderer. It is the renderer that
is the issue. Many, for example, take spaces and replace them
with "tab" based output and then rely on hardware assisted
tab-stopping; or use "table" formats that move the cursor position
to an arbitrary fixed point, defined in a separate "table language".
[The latter is often a problem if you seek to output "computer code"
using a fixed width, column adjusted format ... as I did in an exam 
question recently. Jaws read the pdf generated from my Miktex
rendition of LaTex source as "table of 6 columns, row 1 ..." not at
all like the program code it was supposed to be].

> 4. If I make a pdf straight from the source code it is often a mess (point 3).

Ah! depends on your source and renderer. Other renderers may not
generate quite the same mess. Trouble is, there is no standard
for renderer writers to follow, and none (to my knowledge) reads
the Blindmath pages.

> 5. If I make the dvi file and then convert to pdf the problems with 
> point 3 remain.

OK. The conversion to that other CS guru's (David Fuchs) dvi format
is by one renderer. The dvi format is (dangerous statement here)
'closer to SVG' and makes conversion to other formats, by other
renderers easier. Converting directly to pdf allows the renderer
to take advantage of things that pdf allows (eg tabs) ...

> 6. When I go through the process of making the dvi file, then the 
> post script file and then  making the pdf from the post script file 
> it ends up considerably easier to read the text with jaws.

... going via Postscript, is more restrictive, and so (oh dear
I'm saying it again) closerness to SVG means slightly fewer
problems for blindmath'ors.

> 7. Points 4 and 5 also lead to character strings involving an f are 
> often not converted to text properly. This includes the strings "ff", 
> "fi", and "ffi" just to illustrate three different problems. The 
> laborious creation of the pdf (point 6) seems to work for these 
> character strings.

Kerning seeks to avoid space between letters that "seem to fit
together visually", e.g. capital T followed by an a tucked under
the right arm of the T. Ligatures are similar, letters that
fit together e.g. ff, fj etc (usually involving ascenders - the
top bit of the f and descenders the bottom bit of the j). Knuth
klikes kerligs. They renders the font "beautiful" - so there is a
lot of it about with him, and the rules are a nice bit of
computer science, by the way. Many font designers too like kerligs
(notably Eric Gill, understandably I believe, one of Knuth's
"heroes", because of his beautiful fonts).

All rather lost though if you want to get to the algebra ...

Ligature pairs (triplets), are often just a special single character
in the font, and cause problems to Jaws etc, unless Jaws is wised
up with a .sbl file entry, or some such for the font and it's usage.

> The first few points are thoroughly discussed on this list from time 
> to time. Now if someone can suggest why these other observations are 
> so then maybe I'll learn how to shorten the time taken to get 
> documents that I can at least partially check myself.

OKaay ... can of worms opened and crawling about ... and apologies
to all that believe this has added heat rather than light.

If your algebra is as simple as possible, especially avoiding
explicit spatial alignments, then most renderers will make
a reasonable fist of generating a precise output. If in contrast
there are issues with the linearisation, e.g. of matrices, "piles"
as in algebra sequences, etc ... then there will be precision
issues.

> For the record, the pdf documents that are typeset in LaTeX that have 
> no Greek or complex mathematical operators and  symbols are beautiful 
> to work with, especially with the more recent versions of jaws and 
> acrobat reader. I've come to really like the hyper linking that is 
> possible with LaTeX and pdf documents and given I can read the 
> equations in source code things are really getting quite sweet. Now 
> I'm off to solve the small problem of graphs in statistical analyses <smiles>.

Ah! but you see, you've hit the nail on the head here. A
sinuous line (of more or less snakiness) is what needs to
be captured - the precision required for that is a tad
more tricky to capture than any font however beautiful
which ultimately is what all algebra comes down to!
(whoops ... do I smell burning?)

Knuth has not spoken on this, and in the meantime we must rely
on John Gardner's nice bit of "feely kit" ... which by the way
uses SVGz! e-Nuf said, (and no! I am not an employee, nor
spokesman for him, or his company - if that statement helps
me avoid the flames).

And so, to hell - best regard - Diz.

> Cheers,
> Jonathan
> 
> _____
> Dr A. Jonathan R. Godfrey
> Lecturer in Statistics
> Institute of Fundamental Sciences
> Massey University
> Palmerston North
> Phone: +64-6-356 9099 ext 7705
> Mobile: +64-29-538-9814
> Room: AH2.82
> 
> Home Address: 22 Bond St, Palm. Nth.
> Home Phone: +64-6-353 2224 (or FleaBag if you prefer to remember it that way)

O mio Fleabag, surely ... for us internationals :-)




More information about the BlindMath mailing list