[nabs-l] question about Jaws and PDFs

Darrell Shandrow darrell.shandrow at gmail.com
Sun Jan 24 19:21:43 UTC 2010


Hello Rania,

My experience is that PDF2TXT is quite a bit easier to use than Adobe
Reader, but it's free, so you may try it for yourself. :-)
 

-----Original Message-----
From: nabs-l-bounces at nfbnet.org [mailto:nabs-l-bounces at nfbnet.org] On Behalf
Of Rania
Sent: Sunday, January 24, 2010 9:14 AM
To: National Association of Blind Students mailing list
Subject: Re: [nabs-l] question about Jaws and PDFs

Is it easier to use then Adoby? just asking because I don't like Adoby all
that much.
Rania,
"For everyone who thought I couldn't do it.
For everyone who thought I shouldn't do it.
For everyone who said, 'It's impossible."
See you at the finish line."
~Christopher Reeve

----- Original Message -----
From: "Darrell Shandrow" <darrell.shandrow at gmail.com>
To: "'National Association of Blind Students mailing list'" 
<nabs-l at nfbnet.org>
Sent: Sunday, January 24, 2010 9:50 AM
Subject: Re: [nabs-l] question about Jaws and PDFs


> Hello Rania,
>
> I assume you're asking about PDF2TXT? The documentation is below:
>
> Version 3.3
> April 6, 2009
> Copyright 2005 - 2009 by Jamal Mazrui
> LGPL license
> Contents
> Description
> Installation
> Choosing PDF Source and TXT Target
> Text Extraction Settings
> Viewing Area
> Toggling between a File and Folder List
> Configuration Check Boxes
> Action Buttons
> URL Source,
> Hot Keys
> The Log File
> Command Line Operation
> File Association
> Development Notes
> Description
>
> PDF to TXT (also written PDF2TXT) is a free program for converting files 
> in
> Portable Document Format (.pdf extension) to plain text(.txt extension). 
> The
> program lets you convert multiple files in a single, batch operation, 
> either
> from a GUI dialog or a console-mode command line. The resulting text files
> can be read in almost any editing or viewing program. PDF2TXT, itself, 
> also
> includes a plain text view for reading PDF files. The program should work 
> on
> any version of Windows.
>
> Installation
>
> The installation program for PDF2TXT is called p2tsetup.exe. When 
> executed,
> it prompts for an installation folder for the program. The default folder 
> is
> c:\pdf2txt. Although this is not a standard location for programs on a
> Windows computer, a benefit is fewer keystrokes to type whenever you
> manually enter the path to a PDF file or folder. If you want a standard
> installation folder, however, respond to the prompt by entering
> C:\Program Files\PDF to Text
> The installation process creates a program group for PDF2TXT on the 
> Windows
> start menu, containing choices to launch PDF2TXT, read Documentation for
> PDF2TXT, and uninstall PDF2TXT. Also created is a desktop shortcut with an
> associated hot key, enabling PDF2TXT to be conveniently launched by 
> pressing
> Control+Alt+Shift+P. Another shortcut is placed in the Send To folder so
> that a PDF may be viewed in PDF2TXT via the context menu in Windows
> Explorer.
>
>
> Choosing PDF Source and TXT Target
>
> After PDF2TXT is installed, launching it activates a main dialog with
> several capabilities and settings. First, it prompts you to select a PDF
> source. This can be either a single PDF file or a folder containing 
> multiple
> PDF files (another section explains how it can also be an Internet URL). 
> In
> the initial edit box, you can type the full path to the file or folder
> desired. Alternatively, you can tab to buttons that invoke different sub
> dialogs depending on whether you want to choose a file or folder as the 
> PDF
> source. (Yet another option, described later, is to pass the path to the 
> PDF
> source as a parameter on the command line when pdf2txt.exe is launched.)
> By default, the PDF source is the folder c:\pdf2TXT\pdf. Any source may be
> chosen, however, and the program remembers the last one used.
>
> Similarly, an edit box and associated button let you specify the target
> folder for converted files. These will have the same base name, but an
> extension of .txt instead of .pdf. The default target folder is
> c:\pdf2TXT\txt. Note that the PDF source may be either a file or folder, 
> but
> the TXT target is always a folder.
>
>
> Text Extraction Settings
>
> Two settings fundamentally affect how text is extracted from a PDF. If the
> PDF requires a password to unlock its content, type it in the edit box
> provided. If the PDF is an image format without textual characters --  
> e.g.,
> the result of a scan -- mark the checkbox so that optical character
> recognition (OCR) is performed instead of the usual techniques of 
> extracting
> text. This OCR technique was originally posted at
> http://EmpowermentZone.com/pdf2ocr.zip
> OCR is a much slower and more error-prone process, but it may be the best
> option when the usual methods do not work. This technique uses Google
> Tesseract, the best open source OCR available, which is not as good as
> commercial OCR packages. Due to technical issues, there is not a simple 
> way
> of aborting an OCR process that has already started. It is possible,
> however, by launching another copy of PDF2TXT, which clears the deck 
> during
> its startup phase.
>
> Another checkbox lets you additionally produce a .htm target file -- in 
> HTML
> format. This uses a different conversion technique, originally posted at
> http://EmpowermentZone.com/pdf2htm.zip
>
> This may be worth trying if the .txt result is unsatisfactory. It may also
> be useful for webmasters who want to post AN HTML alternative to a PDF. 
> The
> conversion translates visual aspects of the PDF such as fonts, but not
> structural elements such as headings, unfortunately. To further increase
> conversion options, a different technique is also used for producing the
> .txt file with this checkbox, using the PDFToText.exe utility that is also
> seperately available at
> http://www.foolabs.com/xpdf/home.html
>
>
> Viewing Area
>
> Within the main dialog, a read-only, multi-line edit control serves as a
> viewing area between the source and target controls just discussed. This
> scrollable view can show one of three kinds of information: (1) the text 
> of
> a PDF, 2) a list of PDF files, or (3) the results of a batch conversion. 
> The
> label for the viewing area changes to indicate the kind of information 
> being
> shown: "View file," "View folder," or "View results."
> You can navigate the viewing area with standard windows keystrokes, e.g.,
> Control+Home or Control+End to go to the top or bottom of text. Control+F
> lets you search forward for a string of characters, and Control+Shift+F 
> lets
> you search backward. F3 searches for the same string again in the forward
> direction, and Shift+F3 searches again backward. Control+G lets you go to 
> a
> percent completion point through the file being viewed. Control+K sets a
> bookmark for the file, Control+Shift+K clears it, and Alt+K goes to it.
>
> You can press Shift with arrow keys to select text or Control+A to select
> all. Alternatively, you can press F8 to set the starting point of a
> selection, navigate to the ending point desired, and then press Shift+F8 
> to
> select the text between these points.
>
> Press Control+C to copy selected text to the clipboard. Alternatively, 
> press
> Control+Shift+C, or Alt+F8, to copy and append to the clipboard, adding to
> rather than replacing its existing text. A form feed or page break 
> character
> (ANSI code 12) will separate each clip copied there. Control+F8 is a
> shortcut that copies all text in the viewing area without having to select
> it first, equivalent to Control+A followd by Control+C.
>
> If you invoke the Open button and choose a PDF from its sub dialog, the 
> text
> of the PDF will be placed in the viewing area, and keyboard focus will go
> there. If you invoke the Select button to choose a PDF folder instead of a
> file, its list of PDFs will be shown. A status bar at the bottom of the
> dialog indicates the current position in the viewing area.
>
>
> Toggling between a File and Folder List
>
> The Look button behaves in a special way when the viewing area has focus. 
> If
> you press Alt+L when in the viewing area, PDF2TXT will toggle between a
> folder and file view. If viewing a folder, PDF2TXT will switch to a view 
> of
> the file that was on the line containing the caret. If viewing a file,
> PDF2TXT will switch to a view of the folder that contained the file. In
> addition, PDF2TXT will automatically search for the name of the file last
> viewed and place the caret just after it if found.
> This feature lets you easily explore the PDFs in a folder, one after
> another. Initially, You might display a list of files by pressing Alt+L 
> when
> the PDF source is a folder. You can then arrow down through the list until
> you find a PDF you want to view. At that point, press Alt+L to view the
> file. When you want to continue exploring the folder list again, press 
> Alt+L
> to return to it at the position of the file you last viewed.
>
>
> Configuration Check Boxes
>
> Four check boxes let you configure PDF2TXT. The one labeled "Include
> subfolders," will look for PDF files not only in the specified folder, but
> in subfolders under it. For example, you could probably convert many PDF
> files on your computer by checking this setting and specifying the c:\ 
> root
> folder as the PDF source! This setting is unchecked by default.
> The check box labeled "Move PDF when done" will transfer a PDF to a
> subfolder called "Done" after a successful conversion. This is a subfolder
> of the PDF2TXT program folder, with a default location of c:\pdf2TXT\done.
> The benefit of this check box is that PDF files are stored away for backup
> after they have been converted to text. This setting is unchecked by
> default.
>
> The checkbox labeled "Replace TXT if found" determines whether to skip a
> conversion if a corresponding target file already exists. If you do not
> check the setting to move source files when done, you may want to check 
> this
> setting so that unnecessary time is not spent on repeatedly converting PDF
> files left in the source folder, since they then will be skipped if
> corresponding target files already exist. This setting is checked by
> default.
>
> The Append check box determines whether a detailed conversion log file is
> newly created each time a conversion is run. This setting is checked by
> default so that previous information is not lost. A section below further
> describes the log file.
>
>
> Action Buttons
>
> The remaining controls of the main dialog are buttons that perform various
> actions. The Convert button is the default: the one that will be activated
> by pressing Enter on any control except another button. The viewing area
> will show the results of a batch conversion. This information includes the
> number of pages in each PDF converted. It also indicates when a conversion
> was either not possible or was skipped because the target file already
> existed and you chose not to replace files.
> Press Escape if you need to abort a batch conversion of many files that is
> taking too long! Note that this program is relatively quick, however,
> compared to other available methods of converting PDF files to text.
> Moreover, its batch mode feature lets you run conversions unattended.
>
> The source for a conversion is treated differently if the viewing area has
> focus. If viewing a list of PDFs in a folder or on a web page, then 
> PDF2TXT
> regards the source as the file name on the current line (the one 
> containing
> the caret). Thus, you can cursor to a PDF of interest and press Enter to
> convert it to text. If successfully converted, PDF2TXT assumes you may 
> also
> want to examine its content in the viewing area, so a Look command is
> automatically performed as well (see below). If there is a conversion 
> error,
> however, PDF2TXT leaves the error message in the viewing area. If you have
> been examining a list of PDFs and decide you want to convert them all 
> rather
> than a single file, navigate to the top line of the viewing area that 
> lists
> the number of PDFs in the list, and then press Enter.
>
> If the source edit box already specifies what you want to view, or a path 
> is
> easy to type into it, then the Look button is quicker to use than the Open
> or Select sub dialog. Activating the Look button takes the current source
> specification and goes to a view of either the text of a source file or 
> the
> list of a source folder, putting focus in the view area so you can read 
> the
> information.
>
> The Defaults button restores the default configuration settings of 
> PDF2TXT.
> You can use it to return to the initial folders and checkbox settings.
>
> The Explorer button lets you browse the source, target, or done folder 
> with
> Windows Explorer. It allows you to examine files that either have been
> converted or would not convert--thus needing other approaches to access
> their content.
>
> The Quit button closes PDF2TXT. Alt+F4 does the same thing.
>
> The Help button displays this complete documentation in the default web
> browser. For context-sensitive help on a particular control, press F1 when
> it has focus. Hence, you can tab through the dialog and press F1 on each
> control to learn how to use it.
>
>
> URL Source,
>
> If you are connected to the Internet, you can specify a URL as a PDF 
> source
> instead of a file or folder on your local computer. The URL can be the
> complete download path to a PDF on the Internet. Alternatively, the URL 
> can
> be the path to a web page containing one or more links to PDF files. You 
> can
> use Internet Explorer to navigate to such a web page and then invoke the
> "Grab URL" button to put its URL into the source edit box of PDF2TXT.
> The Look button works with a URL source similarly to a local file or 
> folder.
> For example, you can press Alt+L to view a list of PDFs on a web page. The
> toggling feature, described above, is also supported, allowing you to
> consecutively examine the PDFs linked to a web page. If you view a PDF on
> the Internet, PDF2TXT will automatically download a copy to the PDF
> subfolder of the program folder, e.g., to
> c:\pdf2txt\pdf
>
> The Convert button also works with a URL source. Thus, you can easily
> convert all PDFs on a web page with a single batch operation!
>
>
> Hot Keys
>
> Almost all controls of PDF2TXT are directly usable with unique, mnemonic 
> Alt
> key combinations based on the initial letter of the control's label. Thus,
> as you become familiar with the controls, you can operate them more 
> quickly
> with hot keys rather than navigating to them with the tab key or mouse. 
> For
> example, press Alt+P to go to the edit box for typing a PDF source, or 
> Alt+S
> to select a source folder from a tree view of your computer. Press Alt+L 
> to
> look at a file or folder, or Alt+V to red what is already in the viewing
> area. Press Alt+I to toggle the "Include subfolders" setting, or Alt+D to
> restore all defaults. The text extraction settings in the second row of
> controls use a letter corresponding to the second syllable or word, i.e.,
> Alt+W for the Password edit box and Alt+F for the Image Format checkbox.
>
> The Log File
>
> The conversion log file is named log.txt and located in the Done subfolder
> of the PDF2TXT program folder. It records information about each attempt 
> to
> convert a PDF to TXT file. It indicates whether the conversion succeeded
> (meaning any resulting text), and then lists many attributes of the PDF,
> including security settings that could explain a failed conversion.
> There is a choice to view the log file in the PDF2TXT program group off 
> the
> Start Menu. You can also get to the file via the Explore button of the
> PDF2TXT program, choosing the Done folder to navigate with Windows 
> Explorer.
> Additionally, you can open the file in another application through its
> direct path (default settings):
> c:\pdf2txt\done\log.txt
>
> If the log file grows larger than you want, simply delete it or uncheck 
> the
> setting that configures PDF2TXT to append to an existing log file. Each 
> use
> of the Convert button would then generate a new log file. This information
> is more detailed than the results placed in the viewing area.
>
>
> Command Line Operation
>
> The pdf2txt.exe executable may be run with various command line 
> parameters.
> The parameters can set values for controls in the main dialog. Parameters
> can also cause PDF2TXT to run in an automatic, console mode--without a
> dialog box or further user intervention involved.
> When the .pdf extension is associated with the PDF2TXT program (explained 
> in
> another section), Windows Explorer or Internet Explorer will open a PDF 
> file
> by launching PDF2TXT with the name of the PDF passed as a parameter on the
> command line. If PDF2TXT is launched with more than one command line
> parameter, however, the program will assume you want to run it in console
> rather than GUI mode. The syntax for parameters is described as follows. 
> If
> a PDF source file, folder, or URL is specified, it must be the first
> parameter. If a TXT target folder is specified, it must be the second
> parameter. The source or target must be enclosed in quotes if its name
> contains spaces.
>
> All parameters besides source and target names begin with a space and
> forward slash (/), followed by the hot key letter in the dialog
> corresponding to the setting affected. A trailing plus (+) sign in the
> parameter indicates a status of On, and a minus (-) sign indicates Off. 
> The
> plus sign can also be omitted to indicate On. Capitalization does not
> matter. Here is a list of parameters:
>
> a = Automatic, console mode (use /a- to force GUI mode with multiple
> parameters)
> i = Include subfolders
> m = Move PDF when done
> r = Replace TXT if found
> d = Default settings (no /d- is defined)
> g = Grab URL as source from Internet Explorer (no /g- is defined)
>
> For example, to convert all files using default settings except for the 
> Move
> setting, you could enter:
> pdf2txt /d /m
>
> To use current settings except grab a URL as source, enter:
> pdf2txt /a /g
>
> To convert files from a temporary folder to the current folder, enter:
> pdf2txt "c:\temp files" .
>
> To do the same, but in GUI rather than console mode, enter:
> pdf2txt "c:\temp files" . /a-
>
> For greater console mode convenience, another version of PDF2TXT, having 
> the
> abbreviated name p2t.exe, is also available in the program folder. This
> version only runs in console mode, whether zero, one, or more parameters 
> are
> specified. It uses "standard output" to display conversion results. The
> shorter executable name means less characters to type on the command line.
> For example, to run a batch conversion in console mode using the current
> settings of PDF2TXT, you could simply enter
> p2t
>
> Like DOS commands generally, the above assumes that you have either made
> c:\pdf2txt the current directory or included it in a PATH statement.
>
>
> File Association
>
> The PDF2TXT group on the Start Menu contains shortcuts for changing what
> program automatically opens a file with a .pdf extension in Windows
> Explorer. If you decide that you like the interface of PDF2TXT enough to
> make it the default program for PDF files, you can set the file 
> association
> accordingly. Later, if you decide you want to return to the conventional
> association, you can do that, too.
> When the .pdf extension is associated with PDF2TXT, an application such as
> Windows Explorer when opening a file, or Internet Explorer after 
> downloading
> a file, will pass the name of the PDF as a command-line parameter to
> pdf2txt.exe. When the program is launched in this way, it automatically
> invokes the Look button, placing text of the PDF in the viewing area and
> putting keyboard focus there.
>
>
> Development Notes
>
> I welcome comments and suggestions on PDF to TXT. For the technically
> curious, I developed it with the PowerBASIC programming language from
> http://PowerBASIC.com
> and a couple of third party libraries: EZGUI from
> http://EZGUI.com
> and QuickPDF from
> http://QuickPDF.com
> An alternate text extraction technique is tried if the first one fails,
> using the GetText.exe utility that is also available seperately at
> http://www.kryltech.com
> The file GetText.txt in the PDF2TXT program folder contains the license 
> for
> this utility.
>
> The OCR is done by incorporating the open source PDF2OCR package, 
> available
> at
> http://EmpowermentZone.com/pdf2ocr.zip
>
> Some status messages are spoken with the JAWS, System Access, or 
> Window-Eyes
> screen reader if currently active. These direct speech messages are 
> produced
> with APIs via a component of the SayTools library, which is also available
> seperately at
> http://EmpowermentZone.com/saysetup.exe
>
> The PowerBASIC code to PDF2TXT, itself (but not commercial libraries 
> used),
> is open source under the Lesser General Public License (LGPL), documented 
> at
> http://gnu.org
>
> This Windows program is the successor to my first version of PDF2TXT,
> developed several years ago as a DOS-based, command-line only utility. 
> Ideas
> and feedbak from the discussion list
> ProgrammingBlind at FreeLists.org
> have aided the design and testing of PDF2TXT. The latest version is
> available at the same address,
> http://EmpowermentZone.com/p2tsetup.exe
>
> You can download it with the Elevate Version hotkey, F11. This checks
> whether a newer version is available, and offers to install it.
>
> Jamal Mazrui
> jamal at EmpowermentZone.com
>
> -----Original Message-----
> From: nabs-l-bounces at nfbnet.org [mailto:nabs-l-bounces at nfbnet.org] On 
> Behalf
> Of Rania
> Sent: Sunday, January 24, 2010 7:21 AM
> To: National Association of Blind Students mailing list
> Subject: Re: [nabs-l] question about Jaws and PDFs
>
> I have never herd about this!
> Can you give me more information?
> Rania,
> "For everyone who thought I couldn't do it.
> For everyone who thought I shouldn't do it.
> For everyone who said, 'It's impossible."
> See you at the finish line."
> ~Christopher Reeve
>
> ----- Original Message -----
> From: "Darrell Shandrow" <darrell.shandrow at gmail.com>
> To: "'National Association of Blind Students mailing list'"
> <nabs-l at nfbnet.org>
> Sent: Sunday, January 24, 2010 6:31 AM
> Subject: Re: [nabs-l] question about Jaws and PDFs
>
>
>> Hello Rachel,
>>
>> In order to help you in the most effective way possible, let me start by
>> asking you some questions. Don't worry if you can't answer them all. Just
>> do
>> your best and I'll guide you to the rest of the needed answers.
>>
>> What version of JAWS are you running? Do you have Adobe Reader on your
>> computer? If so, which version? Do you have any scanning and reading
>> products like Kurzweil K1000 or OpenBook installed on your computer? If
>> so,
>> what versions?
>>
>> There are a number of ways to read PDF documents. Some PDFs are fully
>> accessible, many can be read with some difficulty and far too many remain
>> completely out of our reach without a significant amount of expensive
>> assistive technology.
>>
>> There is one free solution that can read many PDF documents. It is called
>> PDF2TXT, and it has been developed by a blind computer programmer. Visit
>> http://www.empowermentzone.com/p2tsetup.exe to install the program.
>>
>> Regards,
>>
>> Darrell
>>
>>
>>
>> _______________________________________________
>> nabs-l mailing list
>> nabs-l at nfbnet.org
>> http://www.nfbnet.org/mailman/listinfo/nabs-l_nfbnet.org
>> To unsubscribe, change your list options or get your account info for
>> nabs-l:
>>
>
http://www.nfbnet.org/mailman/options/nabs-l_nfbnet.org/raniaismail04%40gmai
> l.com
>
>
> _______________________________________________
> nabs-l mailing list
> nabs-l at nfbnet.org
> http://www.nfbnet.org/mailman/listinfo/nabs-l_nfbnet.org
> To unsubscribe, change your list options or get your account info for
> nabs-l:
>
http://www.nfbnet.org/mailman/options/nabs-l_nfbnet.org/darrell.shandrow%40g
> mail.com
>
>
> _______________________________________________
> nabs-l mailing list
> nabs-l at nfbnet.org
> http://www.nfbnet.org/mailman/listinfo/nabs-l_nfbnet.org
> To unsubscribe, change your list options or get your account info for 
> nabs-l:
>
http://www.nfbnet.org/mailman/options/nabs-l_nfbnet.org/raniaismail04%40gmai
l.com 


_______________________________________________
nabs-l mailing list
nabs-l at nfbnet.org
http://www.nfbnet.org/mailman/listinfo/nabs-l_nfbnet.org
To unsubscribe, change your list options or get your account info for
nabs-l:
http://www.nfbnet.org/mailman/options/nabs-l_nfbnet.org/darrell.shandrow%40g
mail.com





More information about the NABS-L mailing list