[NFB-Science] Emacspeak --- A Speech Odyssey

David Andrews dandrews920 at comcast.net
Sun Aug 11 10:32:39 UTC 2024


EMACSPEAK The Complete Audio Desktop - Wednesday, July 31, 2024 at 4:25 PM


Emacspeak --- A Speech Odyssey


Emacspeak: A Speech Odyssey




1. Dedication: To My Guiding Eyes Aster, Hubbell and Tilden


<https://emacspeak.sourceforge.net/raman/aster-labrador/>
Aster Labrador


(2/15/1987 
12/05/1999)<https://emacspeak.sourcceforge.net/raman/hubbell-labrador/>
 Hubbell Labrador

(12/21/1997 
4/111/2011)<https://emacspeak.sourceforge.net/raman/tilden-labrador/>
Tilden Labrador

(8/4/2009 9/3/2022)



2. Key Insights

    * 
<https://www.drdobbs.com/user-interface-a-means-to-an-end/184410453>User 
interface is a means to an end.
    * Open Source is essential for discovering new interaction paradigms.
    * This is not mere idealism. Openness is a 
key enabler for creating user journeys that were 
not envisioned by a system’s designers.
    * <https://www.gnu.org/s/emacs/>Emacs and 
<https://en.wikipedia.org/wiki/TeX>TeX are good 
exemplars. They permit maximal freedom when seen 
from the viewpoint of user extensibility and 
creativity. TeX enabled 
<https://emacspeak.blogspot.com/2022/12/aster-spoken-math-on-emacspeak-audio_21.html>Audio 
System For Technical Readings (AsTeR); Emacs 
enabled <https://emacspeak.sourceforge.net>Emacspeak.
    * Rapid, reliable task completion is the most 
important metric and trumps secondary items such 
as eye-candy the latter only leads to bloatt as evinced by the HTML Web.
    * Having a well-identified problem when designing a system is paramount.
    * Usability is important, but to matter, the 
system needs to be useful first.
    * Ease of use by itself is often marketing hype.
    * Useful systems are fun to learn and give 
back more than what you put in with respect to time and effort.
    * A steep learning curve in and of itself is 
not to be feared it can be fun to learn and gets you farther faster.
    * True empowerment: Ensure that the user grows continuously.


3. Emacspeak The Complete Audio Desktop

    * Emacspeak, started in September 1994, was 
released as Open Source in 
<https://tvraman.github.io/emacspeak//web/releases/release-3.0.html>April 
1995.
    * The goal was to create a system for daily 
use that doubled as a research work-bench for 
developing an auditory interface.
    * Speech and auditory output would be treated as first-class citizens.
    * The time felt right with respect to 
building a system that enabled eyes-free access to the emerging Web.
    * 
<https://emacspeak.sourceforge.net/turning-twenty.html>Emacspeak 
At Twenty was published in September 2014 and 
traced the evolution of the project.
    * Now, this article gives a birds-eye 
overview of the last 10 years by loosely 
following the logical structure of the Turning Twenty paper.
    * In the process, we identify the dreams that 
have come to pass as well as the expectations 
that have failed to materialize both attributable 
to developments in the larger Internet eco-system.
    * But never fear, though some of these may be 
superficially disappointing, they likely herald 
the nature of bigger and better things to come!
    * As a proof-point, in 1994, I could not have 
imagined the impact that the world of 
Internet-centered computing and the accompanying 
information revolution would have on the state of information access.
    * Conversely, I boldly (and incorrectly) 
predicted that the arrival of mobile devices and 
mainstream speech interfaces would herald the 
move to a Web of information where there would be 
a clean separation between application back-ends 
and various client-specific front-ends. See 
<https://emacspeak.sourceforge.net/raman/publications/specialized-browsers/>Specialized 
Browsers and 
<https://youtube.com/watch?v=TZKwvBkS5cs>The Web, 
The Way You Want. Distinguished Lecture Series, UW Oct 2007.
    * The above still makes sense from the view 
of scalable software architecture. However the 
rapid growth of the Web economy has also resulted 
in an even faster race to the bottom where 
applications continue to be built and re-built 
every few years for the next best thing welcome 
to the write once, debug everywhere world all over again!
    * Case in point; today we have smart phones, 
smart watches and smart speakers, but each of 
these require targeted front-ends if one wishes 
to bring the riches of the Internet to them.
    * So the larger the Web gets, the fewer 
devices it becomes available on a classic downward spiral.

Share And Enjoy The Best Is Yet To Come!


3.1. How To Read This Document

    * I recommend reading the Turning Twenty paper to get a full overview.
    * Then, read this paper a section at a time, 
while referring back to the parallel section in 
the Turning Twenty paper to understand how things have evolved.
    * Make sure to skim or deep-dive into the references in both papers.


4. Using UNIX With Speech Output 2024

    * In 2024 UNIX equates mostly to various 
Linux distributions, and from the Emacspeak 
perspective, they are all made mostly equal.
    * Variations do exist and running 
bleeding-edge distributions can come with issues, 
e.g., unstable versions of the underlying audio infrastructure.
    * Yes, 30 years and counting, Linux Audio is 
still a work in progress though I hope Pipewire 
will be the last of these tidal shifts.
    * Linux is moving to Wayland and expect that transition to be choppy.
    * Native applications are mostly gone bar the 
shouting. In this context, where most users 
access things through a mainstream Web browser, 
Emacspeak users access everything through Emacs.
    * The above when done right is hugely 
empowering; when done badly, it’s extremely 
limiting see lateer sections of this paper on the 
continuing evolution of the Web.


5. Key Enabler Emacs And Lisp Advice

    * Advice in Emacs as implemented in advice is rock-solid.
    * There is a newer nadvice that is part of 
Emacs that Emacspeak does not use.
    * There are no plans to migrate to nadvice 
since that is a lot of busy work in my view and 
any such migration would be difficult to test for correctness.
    * The classic advice package may be removed 
from Emacs at some point in the future, but never 
fear; it’ll be bundled with Emacspeak if that 
becomes necessary. This is a feature of Free 
Software and is a great example of what that Freedom entails.


6. Key Component Text To Speech (TTS)

    * Speech output especially unencumbered 
text-to-speech is just as much a challenge as it was 30 years ago.
    * In the bigger picture, early instances of 
using TTS for voice assistants has driven the 
industry toward natural sounding voices.
    * The above sounds attractive on the surface, 
but a price we have paid is the loss of 
fine-grained control over voice parameters, 
emotion, stress and other supra-linguistic features.
    * I believe these to be essential for 
delivering good auditory interfaces and remain 
optimistic that these will indeed arrive in a 
future iteration of speech interaction.
    * Things appear to be coming full circle, 
Emacspeak started with the hardware Dectalk; now, 
the 
<https://github.com/dectalk/dectalk.git>Software 
Dectalk is increasingly becoming the primary 
choice on Linux ssee this 
<https://raw.githubusercontent.com/tvraman/emacspeak/master/servers/software-dtk/Readme.org>Readme 
for setup instructions.
    * Viavoice Outloud from Voxin is still 
supported. However, you can no longer buy new 
licenses. If you have already purchased a license, it’ll continue to work.
    * The Vocalizer voices that Voxin now sells do not work with Emacspeak.
    * The other choice on Linux is ESpeak which 
will hopefully continue to be free albeit of much lower quality.
    * The future as ever is unpredictable and new 
voices may well show up especially those powered 
by on-device Large Language Models (LLMs).
    * On non-free platforms, there is usable TTS 
on the Mac, now supported by the new SwiftMac server for Emacspeak.


7. Emacspeak And Software Development

    * Magit as a Git porcelain is perhaps the 
biggest leap forward with respect to software development.
    * New completion frameworks such as company 
and consult come a close second in enhancing productivity.
    * Completion strategies such as fuzzy and 
flex provide enhanced completion.
    * 
<https://emacspeak.blogspot.com/2018/06/effective-suggest-and-complete-in-eyes.html>Effective 
Suggest And Complete In An Eyes-free Environment 
explains the higher-level concept involved in defining such strategies.
    * The ability to introspect code via eglot 
turns Emacs into a powerful and meaningful IDE I 
say meaningful because this brings thee best 
features of an integrated development environment 
while leaving behind the eye-candy that tends to bloat commercial IDEs.
    * Packages like transient enable 
discoverable, rapid keyboard access to complex nested-menu driven interfaces.
    * 
<https://emacspeak.blogspot.com/2023/09/emacs-ergonomics-dont-punish-your.html>Ergonomic 
keybindings under X using 
<https://github.com/alols/xcape>xcape to minimize 
chording has been a significant win in the last two years.
    * Jupyter is the generalization of IPython 
notebooks to Julia, Python and R. The news here 
isn’t all good; IPython notebooks are 
well-designed with respect to not getting locked 
into any given implementation. However in 
practice, front-ends depend on Javascript in the browser.
    * Consequently, Emacs packages for IPython 
Notebooks e.g., package ein, are no longer maintained.
    * Developing in higher-level languages 
continues to be very well supported in Emacspeak.
    * The re-emergence of Common Lisp in the last 
20 years, thanks to 
<https://asdf.common-lisp.dev/asdf.html>asdf and 
<https://www.quicklisp.org/>quicklisp as a 
network-aware package manager and build tool has 
once again made Lisp development using Emacs Slime a productive experience.
    * In 2022, I updated 
<https://emacspeak.blogspot.com/2022/12/aster-spoken-math-on-emacspeak-audio_21.html>Audio 
System For Technical Readings (AsTeR) my PhD 
project from 1993 ­ to run under SBCL with a 
freshly implemented Emacs front-end.
    * So now I can listen to Math content just as 
well as I could 30 years ago!


8. Emacspeak And Authoring Documents

    * Package org is to authoring as magit is to 
software development with respect to productivity gains.
    * Org has existed since circa 2006 in my 
Emacs setup; but it continues to give and give plentifully.
    * Where I once authored technical papers in 
LaTeX using auctex, used nxml for HTML, etc., I 
now mostly write everything in org-mode and 
export to the relevant target format.
    * Integrating various search engines in Emacs 
makes authoring content extremely productive.
    * Integrated access to spell-checking 
(flyspell) dictionaries, translation engines, and 
other language tools combine for a powerful authoring work-bench.
    * Extending org-mode with custom link types 
enables smart note taking with hyperlinks to 
relevant portions of an audio stream see article 
<https://emacspeaak.blogspot.com/2022/10/learn-smarter-by-taking-rich-hypertext.html>Learn 
Smarter By Taking Rich Hypertext Notes.


9. Emacspeak And The Web In 2024

    * Package shr and eww arrived around 2014. 
But in 2024, they can be said to have truly landed.
    * 2014 also marked the explicit take-over of 
the stewardship of the HTML Web by the browser 
vendors from the W3C ­ I say explicit because the 
W3C had already thrown in the towel in the preceding decade.
    * This has led to a Web of content created 
using the assembly language of divs, spans and 
Javascript under the flag of HTML5 ­ the result 
is a tangled web of spaghetti that everyone loves to hate.
    * In this context, see 
<https://idlewords.com/talks/website_obesity.htm>Tag 
Soup, Scripts And Obfuscation: How The Web Was 
Broken for a good overview of HTML’s obesity problem.
    * For better or worse, the investment in XML 
and display-independent content is now a complete 
write-off at least on the surface.
    * So what next wait for the spaghetti monster 
to show up for lunch? Humor aside that monster 
may well be called AI though whether today’s Web 
givees that monster life, indigestion, 
constipation, dysentery or hallucinations is a 
story to be written in the coming years.
    * I say on the surface above because The 
welcome re-emergence of ATOM and RSS feeds is 
perhaps a silent acknowledgement that bloated Web 
pages are now unusable even for users who can see.
    * Package elfeed has emerged as a powerful feed-manager for Emacs.
    * Emacspeak implements RSS and ATOM support 
using XSLT; those features now shine brighter 
with mainstream news sites reviving their support for content feeds.
    * Browsers like Mozilla now implement content 
filters a euphemism for scrapping off visual 
eye-candy and related cruft to reveal the 
underlying content. These are now available as 
plugins, (see 
<https://github.com/eafer/rdrview>RDRView for an 
example). Emacspeak leverages this to make the Web more readable.
    * Package url-template and 
emacspeak-websearch continue to give in plenty, 
though they do require continuous updating.
    * Web APIs come and go, so that space is in a state of constant change.
    * The state of web applications is perhaps 
the most concerning from an Emacspeak 
perspective, and I do not see that changing in 
the short-term. There are no incentives for Web 
providers to free their applications from the 
tangled Web of spaghetti they have woven around themselves.
    * But as with everything else in our 
industry, it is precisely when something feels 
completely entrenched that users rebel and 
innovations emerge to move us to the next phase so fingers crossedd.


10. Audio Formatting Generalizing Aural CSS

    * Audio formatting with Aural CSS support is 
stable, with new enhancements supporting more TTS engines.
    * Support for parallel streams of TTS using 
separate outputs to left/right channels is a big 
win and enables more efficient interaction.
    * Support for various Digital Signal 
Processing (DSP) filters enables rich auditory 
effects like binaural audio and spatial audio.
    * 
<https://emacspeak.blogspot.com/2015/12/soundscapes-on-emacspeak-audio-desktop.html>Soundscapes 
implemented via package boodler makes for a 
pleasant and relaxing auditory environment.
    * Enabling virtual sound devices via Pipewire 
for 5.1 and 7.1 spatial audio significantly enhances the auditory experience.


11. Conversational Gestures For The Audio Desktop

    * Parallel streams of audio, combined with 
more ergonomic keybindings are the primary enhancement in this area.
    * Parallel streams of speech, e.g., a 
separate notification stream on the left or right 
ear help increase the band-width of communication.
    * Notifications can thus be delivered without 
having to stop the primary speech output.


12. Accessing Media Streams

    * Emacspeak support for rich multimedia is now much more robust.
    * Emacs package empv is a powerful tool for 
locating, organizing and playing local and remote 
media streams ranging from music, audio books, radio stations and Podcasts.
    * This makes media streams from a large 
number of providers ranging from the BBC to 
Youtube available via a consistent keyboard interface.
    * This experience is augmented by a 
collection of smart content locators on the 
Emacspeak desktop, see the relevant blog article 
titled 
<https://emacspeak.blogspot.com/2024/03/updated-smart-media-selector-for-audio.html>smart 
media selectors.


13. Electronic Books Ubiquitous Access To Books

    * Emacspeak modules for Epub and Bookshare 
continue to provide good books integration.
    * There are smart book locators analogous to 
the locators for media content.
    * Emacspeak speech-enables Calibre for 
working with local electronic libraries.


14. Leveraging Computational Tools From SQL And R To IPython Notebooks

    * This area continues to provide a rich collection of packages.
    * Newer highlights include sage interaction for symbolic computation.
    * Emacspeak speech-enables packages gptel and 
ellama for working with local and network LLMs.


15. Social Web Mail, Messaging And Blogging

    * This is a space that is definitely regressing.
    * The previous decade was marked by open APIs 
to many social Web platforms.
    * Over time these first regressed with respect to privacy.
    * Then they turned into wall-gardens in their own right.
    * Finally, the Web APIs, other than the kind 
embedded in Javascript have started disappearing.
    * Looking back, the only social platform I 
now use is Blogger for hosting my Emacspeak Blog, 
it has a somewhat usable API, albeit guarded by a 
difficult to use OAuth interface that requires 
signing in via a mainstream browser.
    * IMap continues to survive as an open email 
protocol, though its days may well be numbered.
    * The dye is already cast with respect to 
mere mortals being able to setup and host their 
email witness the complexity in setting up the 
Emacspeak mailing list in 2023 vs 1993!
    * This is an area that is likely to get worse 
before it gets better, thanks to the spammers 
more’s thhe pity, since Internet Email is perhaps 
the single-most impactful technology with respect 
to leveling the communications playing field.
    * The disappearance of APIs mentioned above 
also means that today the only usable chat 
service on an open platform like Emacspeak is the 
venerable Internet Relay Chat (IRC).


16. The RESTful Web Web Wizards And URL Templatess For Faster Access

    * This area continues to thrive either 
because of or despite the bese best and worst 
efforts of application providers on the Web.
    * Twenty years on (this feature originally 
landed in 2000) Emacspeak has a far richer 
collection of filters, preprocessors and 
post-processors that enables ever-more powerful 
Web wizards. See the relevant 
<https://tvraman.github.io/emacspeak/manual/URL-Templates.html>chapter 
in the Emacspeak manual for the automatically updated list of URL Templates.


17. Mashing It Up Leveraging AII And The Web

    * Developing solutions by combining various 
API-based services on the Web has all but 
disappeared, unless one is willing to commit 
fully to the Javascript-powered Web hosted in a 
Web browser, something I hope I never have to accept.
    * So for now, I’ll keep well away and count my blessings.
    * The next chapter of the mash-up story may 
well be based around Generative AI using LLMs. In 
effect, LLMs trained on Web content define a 
platform for generating content mash-ups. The 
issue at present is that they are just as likely 
to produce meaningless mush something that may 
get better as the field getss a handle on cleaning up Web content.
    * Notice that we are now back to the 
previously unsolved problem of cleaning up the 
HTML Web with LLMs, we’ll just have aan order of 
magnitude more documents than the 2W postulated 
by 
<https://emacspeak.sourceforge.net/raman/publications/beyond-web20-cacm-2009/>Beyond 
Web 2.0, Communications Of The ACM, 2009.


18. The Final Word Donald E Knuth (DEK)

    * The best theory is inspired by practice. 
The best practice is inspired by theory.
    * The enjoyment of one’s tools is an 
essential ingredient of successful work.
    * Easy things are often amusing and relaxing, 
but their value soon fades. Greater pleasure, 
deeper satisfaction, and higher wages are 
associated with genuine accomplishments, with the 
successful fulfillment of a challenging task.
    * 
<https://www.azquotes.com/author/8177-Donald_Knuth>Computer 
Programming Is An Art.

The best example of the above is of course 
<https://en.wikipedia.org/wiki/TeX>Knuth’s TeX 
work that was motivaated by his own 
dissatisfaction with the tools available to him 
at the time for typesetting his magnum opus The 
Art Of Computer Programming (TAOCP). It is 
something I’ve looked up to ever since my time as 
a graduate student at Cornell.

The Emacspeak Speech Odyssey outlined in this 
paper is, in some small measure, my own personal 
experience of the sentiments he expresses.

T. V. Raman, San Jose, CA, August 1, 2024.
 >


19. References

    * 
<https://www.drdobbs.com/user-interface-a-means-to-an-end/184410453>User 
Interface is a means to an end, DDJ 1997.
    * <https://www.gnu.org/s/emacs/>GNU Emacs
    * <https://en.wikipedia.org/wiki/TeX>Knuth’s TeX
    * 
<https://emacspeak.blogspot.com/2022/12/aster-spoken-math-on-emacspeak-audio_21.html>Audio 
System For Technical Readings
    * 
<https://tvraman.github.io/emacspeak//web/releases/release-3.0.html>Announcing 
Emacspeak: April 1995
    * 
<https://emacspeak.sourceforge.net/turning-twenty.html>Emacspeak At Twenty
    * 
<http://www.cs.washington.edu/htbin-post/mvis/mvis?ID=636>The 
Web, The Way You Want. Distinguished Lecture Series, UW Oct 2007
    * 
<https://emacspeak.sourceforge.net/raman/publications/specialized-browsers/>Specialized 
Browsers
    * 
<https://tvraman.github.io/emacspeak/web/01-gemini.ogg>An 
Ode To Emacspeak: The Best Is Yet To Come
    * <https://github.com/dectalk/dectalk.git>Software Dectalk on Github
    * 
<https://raw.githubusercontent.com/tvraman/emacspeak/master/servers/software-dtk/Readme.org>Dectalk 
setup instructions
    * 
<https://emacspeak.blogspot.com/2018/06/effective-suggest-and-complete-in-eyes.html>Effective 
Suggest And Complete In An Eyes-free Environment
    * <https://asdf.common-lisp.dev/asdf.html>Common Lisp: asdf
    * <https://www.quicklisp.org/>Common Lisp: Quicklisp
    * 
<https://emacspeak.blogspot.com/2015/12/soundscapes-on-emacspeak-audio-desktop.html>Soundscapes 
on the Emacspeak Audio Desktop
    * <https://en.wikipedia.org/wiki/REST>RESTful Web
    * 
<https://emacspeak.blogspot.com/2023/09/emacs-ergonomics-dont-punish-your.html>Ergonomic 
keybindings
    * <https://github.com/alols/xcape>Minimize chording with XCape
    * 
<https://emacspeak.blogspot.com/2022/10/learn-smarter-by-taking-rich-hypertext.html>Learn 
Smarter By Taking Rich Hypertext Notes
    * 
<https://idlewords.com/talks/website_obesity.htm>Tag 
Soup, Scripts And Obfuscation: How The Web Was Broken
    * <https://github.com/eafer/rdrview>Readable Web Pages: RDRView
    * 
<https://emacspeak.blogspot.com/2024/03/updated-smart-media-selector-for-audio.html>smart 
media selectors
    * 
<https://emacspeak.sourceforge.net/raman/publications/beyond-web20-cacm-2009/>Beyond 
Web 2.0, Communications Of The ACM, 2009
    * 
<https://tvraman.github.io/emacspeak/manual/URL-Templates.html>Emacspeak 
Manual: URL Templates
    * 
<http://emacspeak.blogspot.com/2007/07/emacspeak-and-beautiful-code.html>Beautiful 
Code An overview of the Emacspeak architecture, O’Reilly Media, 2007.
    * 
<https://www-cs-faculty.stanford.edu/~knuth/taocp.html>The 
Art Of Computer Programming (TAOCP)

https://emacspeak.blogspot.com/2024/07/emacspeak-speech-odyssey.html



More information about the NFB-Science mailing list