[nabs-l] Announcing PDF2HTM

Jamal Mazrui empower at smart.net
Sun Jan 25 22:08:55 UTC 2009


 From the archive
http://EmpowermentZone.com/pdf2htm.zip

PDF2HTM
Version 1.0
January 25, 2009
Copyright 2009 by Jamal Mazrui
GPL License

PDF2HTM is a command-line utility that converts one or more files from PDF
to HTML format.  The syntax is
pdf2htm.exe SourcePDF
where the parameter is either a file name or a wildcard spec like
*.pdf
Enclose it with quotes if it contains a space.  A resulting HTML file has
the same name except for a .htm extension.

This was built with Python 2.5 and the packages PDFMiner and py2exe.  The
top-level script, pdf2htm.py, is an adaptation of the PDFMiner tool called
pdf2txt.py.  The batch file, RunSetup.bat, runs the py2exe script,
setup.py, to create the stand-alone executable, pdf2htm.exe.

All aspects of the HTML format are determined by underlying PDFMinor
routines.  Visual aspects such as fonts are present, but structural
aspects such as headings do not seem to be converted, unfortunately.
Other programmers interested in this project may wish to work on improving
HTML structure.





More information about the NABS-L mailing list