[nabs-l] PDF2TXT 3.3 released
Jamal Mazrui
empower at smart.net
Sat Apr 4 15:45:00 UTC 2009
http://EmpowermentZone.com/p2tsetup.exe
PDF2TXT now uses an updated version of the QuickPDF library, which is
commercially available at
http://QuickPDF.com
Several years worth of fixes and enhancements are incorporated in this
library version compared to the previous one used. Although the source
code to this and other tools used by PDF2TXT is not available, its own
source code in the PowerBASIC language, PDF2TXT.bas, is now installed
along with the executable.
As before, the Image Format checkbox (Alt+F) is available for optical
character recognition (OCR) -- using Google Tesseract technology -- on
image-based PDFs that elude text extraction methods. Due to technical
issues, there is not a simple way of aborting an OCR process that has
already started. This is now possible, however, via a work-around of
launching another copy of PDF2TXT. It clears the deck during its startup
phase. The Quit button (Alt+Q) may then be invoked to close either copy
of the program (though no harm results from both being loaded).
As Before the Grab URL button (Alt+G) gets the address of the current web
page in Internet Explorer and sets it as the PDF source. This now works
with versions of Internet Explorer above 6.0. The feature makes it easy
to download and convert all PDFs linked to a web page.
An additional hotkey is introduced: F11 for Elevate Version (like the
EdSharp and FileDir programs). This checks whether a newer version of
PDF2TXT is available, and offers to install it. The command makes future
updates to the program particularly convenient to obtain.
As before, the Extra HTML checkbox (Alt+X) uses a different conversion
technology to produce a .htm conversion in addition to the .txt one. To
further increase conversion options via this checkbox, it now also causes
another technology to be used for producing the .txt file, using the
PDFToText.exe utility that is also separately available at
http://www.foolabs.com/xpdf/home.html
Thus PDF2TXT now incorporates three different .txt conversion methods, a
.htm method, and an OCR one -- all of which are possible in a batch mode
that processes every PDF in a directory. The program has become the most
capable, free converter of PDFs available on Windows!
Jamal
More information about the NABS-L
mailing list