[Blindmath] reading math notation in MS Word

Tue Jul 21 19:10:35 UTC 2015

Hello,

As a blind scientist I have an interest in publishing articles in the fields of science, technology, engineering, or mathematics (STEM) and  collaborating with sighted colleagues in publishing this work. My main interest is the underlying science or mathematical principles of the subject matter itself. As lead author I care about a number of aspects of the document such as layout. I wish to review the placement of commas, spaces, and periods throughout the document. I sometimes write articles using Windows 7 and Microsoft Word 2010. Sighted colleagues sometimes prefer to not author using the Mathtype add-on by Design Science. The set-up I am going to describe now assumes that I am writing in MS Word without using Mathtype. If my colleagues did use the Mathtype add-on, then the collaboration process would be much more accessible. I am interested in hearing about suggestions that you might have and am interested in discussing with Microsoft and other companies about a  path forward that results in fully accessible STEM information. Please send your suggestions to the e-mail list at nfb-science at nfbnet.org or to me directly at johnmillerphd at hotmail.com.

It is possible in a MS Word document to make a subscript by going to the font section of the home menu and selecting subscript. The superscript and subscript information in narrative text can be captured by saving the MS Word document as "web page, filtered" which creates a .htm file. When creating a .htm file in MS Word, a couple of things happen. The file itself is created as well as a folder containing a number of image files. Some of these image files are .png files. Each png file contains a sequence of one or more math notation symbols, an entire math equation, or an image of a figure included in the document. These files may also be .jpg files which contain an image of a figure included in the document. Although a math symbol may appear multiple times in the MS Word document MS Word will generate just a single png file for that symbol. In the htm file at the location where the math symbol would normally appear the htm file places the name of the .png file for the symbol. Working with a sighted assistant, I created a look-up table in order to make the .htm file more accessible. Each entry is one of the .png file names and its corresponding accessible translation. The .htm filtered file contains the accessible translated symbols in place of the image file name.
For the moment I display the translated symbols surrounded by double quotes. I also surround the translated symbols with a prefix "<eq" and a suffix ">". I enter the translated symbols in the table by hand using laTex notation. Any alternate notation could be used. I wonder what prefix and suffix others would prefer.

My approach is providing a good quality text document that I can read independently without sighted assistance. I read the .htm filtered file using refreshable braille and speech. Although the approach requires sighted assistance to prepare the document, preparing the document does not require too much time. Reading the document with the accessible symbols as they appear in the print document allows me to verify that the document is correct. The approach is helpful when receiving an MS Word document prepared by others on a STEM topic. I am interested in exploring other approaches that may be more efficient. It would be ideal if MS Word always generated accessible .htm files. In this case the .htm files would not have references to inaccessible .png files that contained a sequence of math notation symbols or an entire math equation. I am interested in learning if there is a better way to export the word document so that it preserves subscript information. I am interested in how to use Mathtype to export to laTex. I am also interested in how to get the MS Word file to make good quality braille using the Duxbury braille translator. 

The source code for the HTML files contains much distracting markup. I wrote a python filter that removes much of the html markup but keeps some of the markup that is helpful for understanding the content of the paper and some of its format. As you will remember HTML markup begins with "<", has some keywords, and then ends with ">". Many markup notations begin with a "key" and end with "/key". Here key is one of any number of commands.
A list of common HTML commands follows:
 - begin paragraph
 - begin bold
 - begin subscript
 - begin superscript
 - used to group inline-elements in a document and provides no visual change by itself.

Here is an example piece of a .htm file that contains 7 lines as well as a filted example output.
The filtered example output is at the end of this file and begins after ".htm filtered example output:".

.htm example input:
<p class=MsoNormal><b>1.1 Sample Heading</b></p>

<p class=MsoNormal>Initialize a matrix X<sub>m,l,FRU</sub> setting each element
to <span
style='font-size:10.0pt;font-family:"Times New Roman","serif";position:relative;
top:3.0pt'><img width=34 height=37 src="email_ex_files/image001.png"></span> as
given in Equation 6.</p>

.htm filtered example output:
<p><b>1.1 Sample Heading</b></p>
<p>Initialize a matrix X<sub>m,l,FRU</sub> setting each element
to <eq "\sigma_{0,lin}"> as
given in Equation 6.</p>

very best,
John Miller