[Nfb-science] Accessible Math

Bryan Duarte bjduarte at asu.edu
Sun Aug 23 20:19:00 UTC 2015


Hello John,

I just wanted to shoot you a quick note asking what editor you use when developing Python code? I am taking advanced algorithms this semester as I have begun my masters program and we will be using Python to do most of my development. I am fine with writing Python but I really have only ever been developing using standard libraries up to this point so using a text editor like VIM has been sufficient. With this semester requiring me to use advanced data structures and algorithms to develop binary search trees, depth first, breath first, heep, and image manipulation using matrix math, I will have to use more than just standard libraries. I feel like it would also be very helpful to have access to word completion when accessing these libraries if at all possible. 

Do you have an editor you use when writing Python that gives you access to tools such as, auto indent, word completion, error reports, and line numbers? Thank you again and I look forward to your reply.

Go Devils!

Bryan Duarte
ASU Software Engineering
QwikEyes CEO

> On Jul 29, 2015, at 5:48 PM, John Miller via Nfb-science <nfb-science at nfbnet.org> wrote:
> 
> Hello,
> This is John Miller. See the python script html2textForBraille.py. This script appears at the end of this message. This is the script that I mentioned earlier which helps with making html versions of technical MS Word documents more accessible. Python and Perl are very portable programming languages that run on a variety of computers and operating systems with very little installation difficulty.  I wrote this program in python because of its portability.  For reading a technical document as a blind person, there are probably three or five solutions that would work equally well.  In a small community of blind individuals some will tend naturally to favor one of the solutions while another group will tend to favor a second solution.  This is only natural.  It is important to note that python and other scripts can hugely improve the accessibility of technical documents.  I encourage any blind person working in a STEM field to use software skills in order to increase personal pr
> 
> oductivity.  Tailoring an accessible script can improve its usefulness.
> For technical documents I find that I prepare the same document multiple ways.  I prepare it one way for reading with a braille embosser.  In the last step just prior to publication, I review a version of the document made from the pdf file just to make sure the final document contains what I intended.  I may review it a third way focusing on all font size changes, all centering, and all bold or italic information.  I do this to make sure I properly formatted the headers and to make sure I properly transitioned back to standard format when leaving headers.  When possible I ask a sighted colleague to review the final article prior to publication.  My colleague may recommend a more common method for typesetting an equation than I selected.  Their input is invaluable in making sure that figures are sufficiently large for displaying the necessary information in hard copy print or in pdf.  A sighted colleague can also quickly assess if there is too much or too little white space n
> 
> ear equations, tables, and figures.  A sighted colleague can assess if page breaks appear sensible.
> Following are some details about the program. The program html2textForBraille.py helps make MS word-generated HTML files more accessible for reviewing with speech or braille. As you may know comment lines in python begin with the character "number sign." Functions begin with the word def and function bodies are indented.  Lines in the main program have no indention.  The top 18 lines of the program provide some general comments and contain the name of the input file to process, the name of the file that will contain the generated output, and the name of a translation file that must be present.  In order to run the program change the input file name contained in quotes on line 7 to the desired name.  Change the output file name on line 8 contained in quotes to the desired name.  Also change the look-up file name on line 14 to the desired name.  If you choose to use no look-up translations, still specify a look-up file name and make the specified look-up file empty.  Change pre
> 
> fixName on line 17 in order to change the string that begins all translations from imgLut.txt.  Change suffixName to change the string that ends all translations from imglut.txt.  To run the program simply type python html2textForBraille.py.  Examine the specified output file with an ASCII editor of choice or on a braille note taker with 8-dot refreshable braille.  You can search for the comment on line 247 that says "Start main program" to find the beginning of the main features performed by the program.  On line 255 the filter calls a function substituteStringOrDeleteFromList that replaces a number of strings with more readable strings and deletes a list of strings.  On line 258 the filter calls a function delSection that deletes a specified prefix tag, its associated suffix tag, and all the text between these tags.  On line 2260 the filter calls removeAllStartAndEndTagsFromList that removes all prefix tags and associated suffix tags but maintains the text between these ta
> g
> s.  On line 263 it calls fillImagesFromLut that replaces image file names with translation symbols for that file.  On line 265 the filter calls makeSmallImageFileNames that replaces the full file name of images that do not have a translation with a short form of the image file name.  When the file "imageLut.txt" is empty, this function replaces all the image file names in the input file with short names.  The filter calls stripTagFromList on line 267 that removes text from within a prefix tag.  In html a prefix tag such as <p> that means new paragraph may contain additional information inside the prefix tag such as the font size of the information contained in the paragraph.  This function removes that information from the output file. On 269 the filter calls reformatTables to make the tables appear more readable. On line 272 the filter calls removeBlankLines that removes blank lines from the output file.  You may want to comment out this line if you wish to examine the outp
> u
> t file and use it to spot large gaps of white space in the document.  When you run the program it will print to the screen "# Start main program," a number of intermediate print statements, and "Process complete." When the program is finished the specified output file is ready for review.
> html2textForBraille.py:# html2textForBraille.py# author: John Miller# Copywright July 20, 2015# All rights reserved# Contact John Miller with questions at johnmillerphd at hotmail.com.
> inputFileName = "miller_spie_2015_input.htm"outputFileName = "miller_spie_2015_output.htm"# If you do not have an imgLut.txt, create an empty file with this name in the current directory.# imgLut.txt contains one entry per line.# Each line contains an image file name, followed by a space, followed by double quote, followed by image translation symbols, followed by a double quote.# Here is an example entry:# miller_spie_2015_input_files/image001.png "\sigma_0"imageLutFile = "imgLut.txt"# Each translation from imageLut.txt will be preceded by prefixName and followed by suffixName.# Just change these strings as desired.prefixName = "<$markup "suffixName = ">"
> def delSection(tagName, text):  startTag = "<" + tagName  endTag = "</" + tagName + ">"  tmpText = text  while(1):    startSectionIndex = tmpText.find(startTag)    if startSectionIndex >= 0:      endSectionIndex = tmpText.find(endTag)+len(endTag)      tmpText = tmpText[:startSectionIndex] + tmpText[endSectionIndex:]    else:      break  return tmpText
> def removeAllStartTags(tagName, text):  startTag = "<" + tagName  tmpText = text  while(1):    startSectionIndex = tmpText.find(startTag)    if startSectionIndex >= 0:      #we assume the next '>' after the start tag is the end of this tag      endSectionIndex = tmpText.find('>', startSectionIndex)+1      tmpText = tmpText[:startSectionIndex] + tmpText[endSectionIndex:]    else:      break  return tmpText
> def removeAllEndTags(tagName, text):  endTag = "</" + tagName + ">"  tmpText = text  while(1):    startSectionIndex = tmpText.find(endTag)    if startSectionIndex >= 0:      tmpText = tmpText[:startSectionIndex] + tmpText[startSectionIndex+len(endTag):]    else:      break  return tmpText
> def removeAllStartAndEndTags(tagName, text):  tmpText = removeAllStartTags(tagName, text)  tmpText = removeAllEndTags(tagName, tmpText)  return tmpText
> def fillImagesFromLut(imageLutFile, text):  lut_fp = open(imageLutFile)  imageLut = list(lut_fp)  lut_fp.close()  tmpText = text  for line in imageLut:    item = line.split()    currentIndex = 0    while(1):      imgSectionStartIndex = tmpText.find("<img",currentIndex)      if imgSectionStartIndex >= 0:        #we assume that there is always a 'src=' after a <img        sourceImgStartIndex = tmpText.find('src="',imgSectionStartIndex) + 5 #add 5 for 'src="' size        sourceImgEndIndex = tmpText.find('"',sourceImgStartIndex)        sourceImg = tmpText[sourceImgStartIndex:sourceImgEndIndex]        endSectionIndex = tmpText.find('>', imgSectionStartIndex)+1        if sourceImg == item[0]:          equation = ' '.join(item[1:])          #print "startIdx: " + str(sourceImgStartIndex) + ", endIdx: " + str(sourceImgEndIndex)          #print "equals: == \n" + sourceImg + "\n" + item[0] + "\n"  + tmpText[imgSectionStartIndex:endSectionIndex] + "\n"          tmpText = tmpText[:imgSect
> 
> ionStartIndex] + prefixName + equation + suffixName + tmpText[endSectionIndex:]        else:          #print "not equal: != \n" + sourceImg + "\n" + item[0] + "\n"          currentIndex = endSectionIndex      else:        break  return tmpText
> def makeSmallImgNames(text):  tmpText = text  startTag = "<img"  imageNameStart = "/image"  srcTag = 'src="'  startSectionIndex = 0  startSectionIndexOld = 0  while(1):    startSectionIndexOld = startSectionIndex    startSectionIndex = tmpText.find(startTag,startSectionIndex) + 1    #print startSectionIndex    if startSectionIndex >= 0 and startSectionIndex > startSectionIndexOld:      #we assume the next '>' after the start tag is the end of this tag      srcIndex = tmpText.find(srcTag, startSectionIndex) + len(srcTag)      imgNameStartIndex = tmpText.find(imageNameStart, srcIndex)+len(imageNameStart)      imgNameEndIndex = tmpText.find('"', srcIndex)      imgName = tmpText[imgNameStartIndex:imgNameEndIndex]      endSectionIndex = tmpText.find('>',imgNameEndIndex)      #print imgName      tmpText = tmpText[:startSectionIndex - 1 + len(startTag)] + " " + imgName + tmpText[endSectionIndex:]    else:      break  return tmpText
> def stripTag(tagName, text):  startTag = "<" + tagName  tmpText = text  startSectionIndex = 0  startSectionIndexOld = 0  while(1):    startSectionIndexOld = startSectionIndex    startSectionIndex = tmpText.find(startTag, startSectionIndex) + 1    if startSectionIndex >= 0 and startSectionIndex > startSectionIndexOld:      #we assume the next '>' after the start tag is the end of this tag      endSectionIndex = tmpText.find('>', startSectionIndex)      #print "strip: " + tmpText[startSectionIndex-1:endSectionIndex+1]      tmpText = tmpText[:startSectionIndex+len(startTag)-1] + tmpText[endSectionIndex:]    else:      break  return tmpText
> def reformatTables(text):  tmpText = text  # we are assuming this is called after a strip table tag  # this means we can find the table by searching on <table>  startSectionIndex = 0  startSectionIndexOld = 0  while(1):    startSectionIndexOld = startSectionIndex    startSectionIndex = tmpText.find("<table>", startSectionIndex) + 1    if startSectionIndex >= 0 and startSectionIndex > startSectionIndexOld:      endSectionIndex = tmpText.find("</table>", startSectionIndex)      #print "table: " + tmpText[startSectionIndex-1:endSectionIndex+len("</table>")]      tableBody = tmpText[startSectionIndex-1:endSectionIndex+len("</table>")]      tableBody = tableBody.replace('\r\n','');      # replace end td tag start td tag with quote comma quote      tableBody = tableBody.replace('  </td>  <td>  ','","');      tableBody = tableBody.replace('<p>','');      tableBody = tableBody.replace('</p>','');      tableBody = tableBody.replace('</tr>','</tr>\r\n');      tableBody = tableBody.repl
> 
> ace('<table>','<table>\r\n');      # replace last end td tag with quote      tableBody = tableBody.replace('  </td> ','"');      # replace first start td tag with quote      tableBody = tableBody.replace('  <td>  ','"');      tableBody = tableBody.replace('<tr>','');      tableBody = tableBody.replace('</tr>','');      #print "table2: " + tableBody      tmpText = tmpText[:startSectionIndex-1] + tableBody + tmpText[endSectionIndex+len("</table>"):]    else:      break  return tmpText
> def removeBlankLines(text):  tmpText = text  print "# delete '<b>\\r\\n</b>' "  tmpText = tmpText.replace('<b>\r\n</b>','')  print "# delete '<b></b>' "  tmpText = tmpText.replace('<b></b>','')  print "# delete '<p></p>' "  tmpText = tmpText.replace('<p></p>','')  print "# delete '<p> </p>' "  tmpText = tmpText.replace('<p> </p>','')  lines = tmpText.split('\r\n')  newLines = [line for line in lines if line]  tmpText = "\r\n".join(newLines)  return tmpText
> def removeAllStartAndEndTagsFromList(text):  tmpText = text  print "# delete <html> and </html>"  tmpText = removeAllStartAndEndTags("html",tmpText)  print "# delete <body> and </body>"  tmpText = removeAllStartAndEndTags("body",tmpText)  print "# delete <div> and </div>"  tmpText = removeAllStartAndEndTags("div",tmpText)  print "# delete <span> and </span>"  tmpText = removeAllStartAndEndTags("span",tmpText)  print "# delete <a> and </a>"  tmpText = removeAllStartAndEndTags("a",tmpText)  print "# delete <br> and </br>"  tmpText = removeAllStartAndEndTags("br",tmpText)  return tmpText
> def stripTagFromList(text):  tmpText = text  print "# remove all data inside <p>"  tmpText = stripTag("p", tmpText)  print "# remove all data inside <tr>"  tmpText = stripTag("tr", tmpText)  print "# remove all data inside <td>"  tmpText = stripTag("td", tmpText)  print "# remove all data inside <table>"  tmpText = stripTag("table", tmpText)  print "# remove all data inside <br>"  tmpText = stripTag("br", tmpText)  print "# remove all data inside <li>"  tmpText = stripTag("li", tmpText)  print "# remove all data inside <ol>"  tmpText = stripTag("ol", tmpText)  print "# remove all data inside <h1>"  tmpText = stripTag("h1", tmpText)  print "# remove all data inside <h2>"  tmpText = stripTag("h2", tmpText)  print "# remove all data inside <hr>"  tmpText = stripTag("hr", tmpText)  return tmpText
> def substituteStringOrDeleteFromList(text):  # Perform a number of character and string substitutions or deletions.  tmpText = text  print "# replace '\\xa0' with ' '"  tmpText = tmpText.replace('\xa0',' ')  print "# replace '\\x85' with '...'"  tmpText = tmpText.replace('\x85','...')  print """# replace '\\x92' with "'" """  tmpText = tmpText.replace('\x92',"'")  print """# replace '\\x93' with '"' """  tmpText = tmpText.replace('\x93','"')  print """# replace '\\x94' with '"' """  tmpText = tmpText.replace('\x94','"')  print "# replace '\\x95' with '-' "  tmpText = tmpText.replace('\x95','-')  print "# replace '\\x96' with '-' "  tmpText = tmpText.replace('\x96','-')  print "# replace '&' with '&'"  tmpText = tmpText.replace('&','&')  print """# replace '"' with '"' """  tmpText = tmpText.replace('"','"')  print "# delete ' '"  tmpText = tmpText.replace(' ','')  print "# replace 'ρ' with 'rho'"  tmpText = tmpText.replace('ρ','rho')  pri
> 
> nt "# replace '‑' with '-'"  tmpText = tmpText.replace('‑','-')  return tmpText
> # Start main programprint "# Start main program"ifp = open(inputFileName)lines = list(ifp)ifp.close()allLines = "".join(lines)
> # Perform a number of character and string substitutions or deletions.allLines = substituteStringOrDeleteFromList(allLines)# Remove all unnecessary prefix and suffix tags including text between the tags.print "# delete <head> section"allLines = delSection("head",allLines)# Remove all unnecessary prefix and suffix tags.allLines = removeAllStartAndEndTagsFromList(allLines)# Replace image references with translations from imgLut.txt.print "# fill images from lut"allLines = fillImagesFromLut(imageLutFile,allLines)print "# If imgLut.txt does not contain an entry for the image, then make the image name small."allLines = makeSmallImgNames(allLines)# Delete unnecessary text inside a prefix tag.allLines = stripTagFromList(allLines)# Remove tags from tables for a more readable version.allLines = reformatTables(allLines)# You may want to comment out the next two lines to preserve blank lines.print "# remove all blank lines"allLines = removeBlankLines(allLines)
> ofp = open(outputFileName,"w")ofp.write(allLines)ofp.close()print "# Process complete"
> very best,John
> 		 	   		  
> _______________________________________________
> Nfb-science mailing list
> Nfb-science at nfbnet.org
> http://nfbnet.org/mailman/listinfo/nfb-science_nfbnet.org
> To unsubscribe, change your list options or get your account info for Nfb-science:
> http://nfbnet.org/mailman/options/nfb-science_nfbnet.org/bjduarte%40asu.edu




More information about the NFB-Science mailing list