[Dtb-talk] Bookshare.org, internet archive and other automated DAISY production

Greg Kearney gkearney at gmail.com
Thu May 13 23:16:31 UTC 2010


It was never my intention to be overly critical of Bookshare, Internet Archive or any other automated producer. They do provide a wealth of books that might not otherwise be accessible and that is a good thing.

However automated systems are no, on their own going to be able to detect where a chapter starts for example. I wonder if these books might be a starting point for human intervention to make a second more complete version. For example a full text full audio version which could be played back on non-TTS enabled devices such as the NLS player.

There are any number of issues involved here. Just as an example the copyrighted works at the Internet Archive site are DAISY text only with NLS encryption applied to them. This is the first time I have encountered NLS encryption applied to a text only book and I assume these will read in NLS enabled devices  such as the VictorReader Stream. While this method insures that only those with access to and NLS authorised player can access the books in the U.S. the Internet Achieve seem unaware that the NLS scheme is only used in the U.S. and that blind and print disabled persons in other parts of the world are locked out of access to the books. I sure this was not what Brewster Kahle intended or at least I would hope not.

Gregory Kearney | Manager Accessible Media
Association for the Blind of WA - Guide Dogs WA
PO Box 101, Victoria Park WA 6979 | 61 Kitchener Ave, Victoria Park WA 6100
Tel: 08 9311 8246 | Fax: 08 9361 8696 | www.guidedogswa.com.au
Tel: 307-224-4022 (North America)
Email: greg.kearney at guidedogswa.com.au
Email: gkearney at gmail.com

On 14/05/2010, at 12:29 AM, Burke, Dan (DSS) wrote:

> Greetings all --
> 
> A couple of things below from the Bookshare home page -- the point is, I
> guess, that Bookshare has really moved access to books forward by a huge
> leap.  At the same time it's put 70,000 titles into circulation for us,
> it has also had to push hard to get the quality of its products better
> and better.  Remember when the rating "Fair" was among those that we
> might download?  Well, they're not offered anymore.  I agree that the
> titles are not always perfect, some are rated as acceptable sometimes
> that should have been rejected.    But as my father always reminded me
> -- saying it was a Chinese proverb - the journey of a thousand miles
> begins with the first step.  
> 
> So, it's a question of whether the glass is half full or half empty.  Or
> more appropriately, are duplicate pages better or worse than missing
> pages? (grin)
> 
> I personally don't scan perfect books, though I do my best.  I know that
> others contributing their scans are doing their best as well.  A few
> years ago I mentioned to a Bookshare staff member that they ought to
> create a list of the Pulitzer Prize, National Book Award, and Nobel
> literature prize titles they had.  So there you go -- another volunteer
> puts those lists together for the web sites.  Last summer I mentioned to
> another Bookshare staff member that those lists - great as they are -
> didn't have a single poetry prize listed, and I scanned and contributed
> a few and it looks like other volunteers are doing the same.  These are
> books I want to read and I want to read them with my Braille display,
> not have them read to me by even the best NLS narrator or have a screen
> reader read them to me.    Braille is the only answer to me with my
> poetry obsession, and in that category alone Bookshare has put more
> poetry in Braille out there in just a few years than I can possibly
> read.  That gives me choices I wouldn't otherwise have.
> 
> The only way we'll get perfect and navigable electronic books -- DAISY,
> I hope -- is when the publishing industry figures out that's the best
> way to go, and either contributes their files to Bookshare or other such
> entity, or creates them themselves.
> 
> Bookshare's best argument is the statement from the home page that
> immediately follows:
> 
> *Bookshare dramatically increases the accessibility of books. 
> 
> *Simon & Schuster Provides Digital Files to Bookshare.
> *Merriam-Webster Signs Agreement to Provide Digital Files with Worldwide
> Rights to Bookshare
> 
> 
> Best -
> 
> Dan
> 
> Dan Burke
> Assistant Director/Assistive Technology Coordinator
> 
> Disability Services for Studentstss
> The University of Montana
> Emma B. Lommasson Center 154
> Missoula, MT 59812
> 
> 406.24.4424
> 406.243.5330 FAX
> 
> www.umt.edu/disability
> 
> -----Original Message-----
> From: dtb-talk-bounces at nfbnet.org [mailto:dtb-talk-bounces at nfbnet.org]
> On Behalf Of Greg Kearney
> Sent: Wednesday, May 12, 2010 8:12 PM
> To: Discussion of Digital Talking Books
> Subject: Re: [Dtb-talk] Bookshare.org,internet archive and other
> automated DAISY production
> 
> That right automated processes can not detect chapter, subsections and
> so on. However at a very basic standard all the books should pass the
> validator in DAISY Pipeline and that is not always the case. For example
> I find books with duplicate page numbers all the time.
> 
> 
> Gregory Kearney | Manager Accessible Media
> Association for the Blind of WA - Guide Dogs WA
> PO Box 101, Victoria Park WA 6979 | 61 Kitchener Ave, Victoria Park WA
> 6100
> Tel: 08 9311 8246 | Fax: 08 9361 8696 | www.guidedogswa.com.au
> Tel: 307-224-4022 (North America)
> Email: greg.kearney at guidedogswa.com.au
> Email: gkearney at gmail.com
> 
> On 13/05/2010, at 10:08 AM, David Andrews wrote:
> 
>> Two things ... we can pass all the resolutions we want, but it won't
> make a bit of difference with them.  That isn't how change will happen
> there.
>> 
>> Secondly, the volunteers do not mark up daisy, they prepare files, and
> daisy and brf are generated from the same source file automatically by
> their back end.  While I think we could potentially get more markup in
> their books, it will never be perfect, or even close, because of the
> nature of the process, and is unlikely to be consistent from book to
> book.
>> 
>> Dave
>> 
>> At 08:29 PM 5/12/2010, you wrote:
>>> Hello Dave and everyone,
>>> 
>>>   I'm not sure what the status is now but several years ago we
> passed a
>>> resolution urging them to proofread their Braille files to be sure
> they
>>> comply with NLS and Braille textbook standards. Perhaps a similar
> resolution
>>> should be passed urging all of these folks to mark up their books in
> DAISY
>>> to improve their navigability.
>>> 
>>> Peter Donahue
>>> 
>>> ----- Original Message -----
>>> From: "David Andrews" <dandrews at visi.com>
>>> To: "Discussion of Digital Talking Books" <dtb-talk at nfbnet.org>
>>> Sent: Wednesday, May 12, 2010 7:43 PM
>>> Subject: Re: [Dtb-talk] Bookshare.org, internet archive and other
> automated
>>> DAISY production
>>> 
>>> 
>>> They do not "markup in DAISY."  They scan the files and prepare them
>>> in Kurzweil, Word, or another program, submitting a rtf file to
>>> Bookshare.  they are required to put pages in, at one time they
>>> weren't.  I can't speak for BSO, to know if they are going to up the
>>> requirements again.
>>> 
>>> Dave
>>> 
>>> At 06:54 PM 5/12/2010, you wrote:
>>>> Hello Dave and everyone,
>>>> 
>>>>    But are they teaching these people how to properly mark up
> books in
>>>> DAISY as Greg suggested?
>>>> 
>>>> Peter Donahue
>>>> 
>>>> ----- Original Message -----
>>>> From: "Andrews, David B B (DEED)" <David.B.Andrews at state.mn.us>
>>>> To: "Discussion of Digital Talking Books" <dtb-talk at nfbnet.org>
>>>> Sent: Wednesday, May 12, 2010 3:28 PM
>>>> Subject: Re: [Dtb-talk] Bookshare.org, internet archive and other
> automated
>>>> DAISY production
>>>> 
>>>> 
>>>> Peter:
>>>> 
>>>> All that sounds good, but the reality is that with a large number of
>>>> volunteers, both scanners and proofers, and a wide variety of tools,
>>>> consistent, reliable content is going to be difficult to achieve.
>>>> 
>>>> I think though that Bookshare.org is probably moving away from
> volunteers.
>>>> They are getting more and more from publishers directly, and I
> believe they
>>>> are paying people in India to do input, and staff is doing some too.
> It is
>>>> my guess that ultimately volunteer-produced content will become a
> small
>>>> part
>>>> of their overall operation.
>>>> 
>>>> Dave
>>>> 
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: dtb-talk-bounces at nfbnet.org
> [mailto:dtb-talk-bounces at nfbnet.org] On
>>>> Behalf Of Peter Donahue
>>>> Sent: Wednesday, May 12, 2010 1:36 PM
>>>> To: Discussion of Digital Talking Books
>>>> Subject: Re: [Dtb-talk] Bookshare.org, internet archive and other
> automated
>>>> DAISY production
>>>> 
>>>> Hello everyone,
>>>> 
>>>>    It's obvious that bookshare.org and company need to teach their
>>>> volunteers the proper way to mark up DAISY books in addition to
> scanning
>>>> them. I will admit to being very disappointed after reading just how
> great
>>>> bookshare.org's DAISY books would be only to find that they were
> nothing
>>>> but
>>>> a practical joke. What a great idea for a resolution.
>>>> 
>>>> Peter Donahue
>>>> 
>>>> ----- Original Message -----
>>>> From: "Burke, Dan (DSS)" <burke at mso.umt.edu>
>>>> To: "Discussion of Digital Talking Books" <dtb-talk at nfbnet.org>
>>>> Sent: Wednesday, May 12, 2010 10:40 AM
>>>> Subject: Re: [Dtb-talk] Bookshare.org,internet archive and other
> automated
>>>> DAISY production
>>>> 
>>>> 
>>>> Bookshare's limited use of navigation built on inclusion of headings
> and
>>>> matching pages is not just an automoation issue -- it is also a
> matter
>>>> of the volunteer submissions.  It's pretty easy to scan a book, a
> good
>>>> bit trickier to know how to use the pagination  features of MS Word
> to
>>>> make the prefatory pages one set of page numbers, and the remaining
>>>> pages normal page numbers.  And then the headings too ...   I doubt
> that
>>>> most volunteers know how to do such things.
>>>> 
>>>> That's why we end up with a couple of headings in the beginning of
> the
>>>> book and none for the chapters.  Bookshare could promote the use of
> such
>>>> improvements in raw scans before submission.  Some of the books I
> have
>>>> read from the -- I think publisher contributions -- have been
>>>> well-formed and highly navigabile.
>>>> 
>>>> On the other hand, as I have had more and more experience with
> creating
>>>> and reading Daisy books with navigable headings and so forth, my
>>>> expectations have risen accordingly.  I wish Bookshare would at
> least do
>>>> more to promote increased inclusion of of headings in submitted rich
>>>> text files - they aren't at all difficult to do.
>>>> 
>>>> Dan
>>>> 
>>>> Dan Burke
>>>> Assistant Director/Assistive Technology Coordinator
>>>> 
>>>> Disability Services for Studentstss
>>>> The University of Montana
>>>> Emma B. Lommasson Center 154
>>>> Missoula, MT 59812
>>>> 
>>>> 406.24.4424
>>>> 406.243.5330 FAX
>>>> 
>>>> www.umt.edu/disability
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: dtb-talk-bounces at nfbnet.org
> [mailto:dtb-talk-bounces at nfbnet.org]
>>>> On Behalf Of Jim Barbour
>>>> Sent: Tuesday, May 11, 2010 11:38 PM
>>>> To: Robert Jaquiss; Discussion of Digital Talking Books
>>>> Subject: Re: [Dtb-talk] Bookshare.org, internet archive and other
>>>> automated DAISY production
>>>> 
>>>> Hey Everyone,
>>>> 
>>>> The challenges Robert outlines below are the same challenges we, as
>>>> blind
>>>> college students, have faced for decades.  I'd love to see these
>>>> issues tackled and solved.  However, even if they aren't solved,
> daisy
>>>> books are a huge step forward from cassette tapes.
>>>> 
>>>> Automating the daisy production process is a trade off.  On the plus
>>>> side, there will be more daisy books available, since the production
>>>> process is less limited by the number of volunteers available.  On
> the
>>>> minus side, the quality of the markup will be limited to what can be
>>>> reliably done using automation.
>>>> 
>>>> I personally think that automation wins and that we should not hold
> up
>>>> a workable solution while search for a nearer perfect one.
>>>> 
>>>> Just my $.02
>>>> 
>>>> Jim Barbour
>>>> 
>>>> On Tue, May 11, 2010 at 10:53:50PM -0500, Robert Jaquiss wrote:
>>>>> Hello Greg:
>>>>> 
>>>>>    It is my opinion that DAISY books should absolutely reflect
> the
>>>>> structure of the original printed book. If they don't how could a
>>>>> student deal with a teacher's instructions to turn to page XX. In
>>>>> situations where a citation is needed, a reader couldn't produce
> a
>>>>> professionally acceptable citation. If the DAISY book is to be
> used
>>>>> to produce braille, proper pagination is a must. Sections and
>>>>> chapters also must be preserved.
>>>>> 
>>>>> Regards,
>>>>> 
>>>>> Robert Jaquiss
>>>>> 
>>>>> ----- Original Message ----- From: "Greg Kearney"
>>>>> <gkearney at gmail.com>
>>>>> To: "daisy group" <dmfc-ig at mail.daisy.org>; "Discussion of
> Digital
>>>>> Talking Books" <dtb-talk at nfbnet.org>
>>>>> Sent: Tuesday, May 11, 2010 8:41 PM
>>>>> Subject: [Dtb-talk] Bookshare.org,internet archive and other
>>>>> automated DAISY production
>>>>> 
>>>>> 
>>>>>> I have been thinking of late about the various attempts to
>>>>>> generate DAISY from fully automated systems such as seems to be
>>>>>> the case at Bookshare.org, Internet Archive and some other
>>>>>> sources.
>>>>>> 
>>>>>> The issue I have with these books is that their DAISY structure
>>>>>> does not reflect the printed book. In most cases that I have
> seen
>>>>>> the book is a single long heading level 1 with perhaps page
>>>>>> numbering in place. In many cases these page numbers are again
> not
>>>>>> reflective of the printed book, for example I have found books
>>>>>> with duplicate page numbers. For example where there is a page
>>>>>> number 4 in the front matter and a page number 4 in the body
>>>>>> matter.
>>>>>> 
>>>>>> More troubling however is the lack of navigation to chapters and
>>>>>> other subsections of the books. This is of particular concern in
>>>>>> non-fiction text.
>>>>>> 
>>>>>> Am I just being overly picky here? What do you all think?
>>>>>> 
>>>>>> Gregory Kearney | Manager Accessible Media
>>>>>> Association for the Blind of WA - Guide Dogs WA
>>>>>> PO Box 101, Victoria Park WA 6979 | 61 Kitchener Ave, Victoria
>>>>>> Park WA 6100
>>>>>> Tel: 08 9311 8246 | Fax: 08 9361 8696 | www.guidedogswa.com.au
>>>>>> Tel: 307-224-4022 (North America)
>>>>>> Email: greg.kearney at guidedogswa.com.au
>>>>>> Email: gkearney at gmail.com
>>>> 
>> 
>>                       David Andrews:  dandrews at visi.com
>> Follow me on Twitter:  http://www.twitter.com/dandrews920
>> 
>> 
>> _______________________________________________
>> Dtb-talk mailing list
>> Dtb-talk at nfbnet.org
>> http://www.nfbnet.org/mailman/listinfo/dtb-talk_nfbnet.org
>> To unsubscribe, change your list options or get your account info for
> Dtb-talk:
>> 
> http://www.nfbnet.org/mailman/options/dtb-talk_nfbnet.org/gkearney%40gma
> il.com
> 
> 
> _______________________________________________
> Dtb-talk mailing list
> Dtb-talk at nfbnet.org
> http://www.nfbnet.org/mailman/listinfo/dtb-talk_nfbnet.org
> To unsubscribe, change your list options or get your account info for
> Dtb-talk:
> http://www.nfbnet.org/mailman/options/dtb-talk_nfbnet.org/burke%40mso.um
> t.edu
> 
> _______________________________________________
> Dtb-talk mailing list
> Dtb-talk at nfbnet.org
> http://www.nfbnet.org/mailman/listinfo/dtb-talk_nfbnet.org
> To unsubscribe, change your list options or get your account info for Dtb-talk:
> http://www.nfbnet.org/mailman/options/dtb-talk_nfbnet.org/gkearney%40gmail.com





More information about the DTB-Talk mailing list