[Nfbmt] Facebook automatically adding descriptive alt text to pictures

Tue Apr 5 19:53:34 UTC 2016

This may interest some of you.  Joy  

Ask a member of Facebook’s growth team what feature played the biggest role
in getting the company to a billion daily users, and they’ll likely tell you
it was photos. The endless stream of pictures, which users have been able to
upload since 2005, a year after Facebook’s launch, makes the social network
irresistible to a global audience. It’s difficult to imagine Facebook without
photos. Yet for millions of blind and visually impaired people, that’s been
the reality for over a decade.

Not anymore. Today Facebook will begin automatically describing the content
of photos to blind and visually impaired users. Called "automatic alternative
text," the feature was created by Facebook’s 5-year-old accessibility team.
Led by Jeff Wieland, a former user researcher in Facebook’s product group,
the team previously built closed captioning for videos and implemented an
option to increase the default font size on Facebook for iOS, a feature 10
percent of Facebook users take advantage of.

Automatic alt text, which is coming to iOS today and later to Android and the
web, recognizes objects in photos using machine learning. Machine learning
helps to build artificial intelligences by using algorithms to make
predictions. If you show a piece of software enough pictures of a dog, for
example, in time it will be able to identify a dog in a photograph. Automatic
alt text identifies things in Facebook photos, then uses the iPhone’s
VoiceOver feature to read descriptions of the photos out loud to users. While
still in its early stages, the technology can reliably identify concepts in
categories including transportation ("car," "boat," "airplane"), nature
("snow," "ocean," "sunset"), sports ("basketball court"), and food ("sushi").
The technology can also describe people ("baby," "smiling," beard"), and
identify a selfie.

Last week, I traveled to Facebook’s accessibility lab in Menlo Park to see
the technology in action. Wieland was there, along with Matt King, a Facebook
engineer who is blind. King, who was born with limited sight and became blind
in college, has been advocating for more accessible computers since the
1980s. Today, he represents Facebook on a World Wide Web consortium
responsible for the technical specifications that make web pages accessible.

The primary way that blind people access the internet is through a screen
reader — software that describes the elements displayed on a screen (a link,
a button, some text, and so on) and makes it possible to interact with them.
The web has evolved over the years to be friendlier to blind people. For
example, the downward-facing triangle you see on every Facebook post, which
allows you to hide the post or report it as spam, gets described by the
screen reader not as a triangle but as as "story options, collapsed pop-up
button." That way, blind users know they can interact with it.

But much of the web has long been out of reach for blind people. "You used to
hear file names, and you didn’t know if they were clickable," King says. "It
was a big Easter Egg hunt — and it wasn’t any fun at all. Even when I found
the eggs, a lot of the eggs were photos. People talk in pictures, and talking
in pictures is inherently out of reach for me." Facebook considered a range
of approaches to the problem. "We don’t want to add a lot of friction," King
says. "We could probably require people when they upload a photo: ‘please
describe this for blind people.’ It would drive people nuts — that would
never work at scale." (This is the actual approach Twitter is taking to the
problem, though adding descriptions is optional.)

Facebook’s scale is enormous: each day, users upload 2 billion photos across
Facebook, Instagram, Messenger, and WhatsApp. And so the accessibility team
turned to Facebook’s artificial intelligence division, which is building
software that recognizes images automatically. "We need a solution to that
problem if people who cannot see photos and understand what’s in them are
going to be part of the community and get the same enjoyment and benefit out
of the platform as the people who can," King says.

In a demonstration, King pulled up a few stories on Facebook that include
photos. He set the screen to black so we couldn’t see anything. If you’d like
to re-think everything you ever thought you knew about web design, watch a
blind person use the internet for five minutes. King normally has his screen
reader speak to him incredibly quickly — the slightest audio cues now orient
him on the page, reading Facebook posts out loud, identifying links, and
exposing various buttons. His fingers were a blur as he entered commands on a
standard MacBook Air. I remained totally lost until King turned the screen
back on, save for the handful of words that described what we were seeing on
Facebook.

One Facebook post had a photo with the caption "Sunday night splurge," and
the description read aloud by the phone was "pizza, food." When King turned
the screen back on, there was a photo of a giant pepperoni pizza with olives.
Another photo had the caption "celebrations," and the phone described the
photo as "three people smiling outdoors." It turned out to be … three people
smiling outdoors. "Now I’m really understanding the essence of the story,"
King says. "Sometimes it’s just really amazing what one word can do."

Facebook is not alone in using machine learning to understand photos; it’s
one of a few things artificial intelligence can currently do with any level
of sophistication. Similar technology powers keyword searches in Google
Photos and Flickr. But the technology is still prone to errors, and millions
of objects have yet to be parsed. Last year, Google was forced to apologize
after Photos tagged two black people as "gorillas."

By default, Facebook will only suggest a tag for a photo if it is 80 percent
confident that it knows what it’s looking at. But in sensitive cases —
including ones involving race, the company told me — it will require a much
higher level of confidence before offering a suggestion. When it isn’t
confident, Facebook simply won’t suggest a description. "In some cases, no
data is better than bad data," Wieland says.

It’s a cliché for tech companies to describe a project as "just the
beginning," but in this case it feels particularly true. Today it only works
on one platform, and only in English. There are still millions of objects
that Facebook can’t recognize with 80 percent confidence. ("Pizza" it knows.
"Pepperoni pizza with olives" is still a ways away.) But the team is already
pushing hard on two new tools: recognizing objects in videos, a technology it
first demonstrated in November; and something it calls "visual Q&A," which
will allow users to ask questions about pictures and receive an answer from
Facebook’s AI. You might ask who is in a photo, for example, and it would
tell you the names of the Facebook friends who appear in it.

At this stage, automatic alt tags represent a fascinating demonstration of
technology. But at scale, they could also represent a growth opportunity —
people with disabilities have been less likely to use Facebook on average,
for obvious reasons. "Inclusion is really powerful and exclusion is really
painful," King says. "The impact of doing something like this is really
telling people who are blind, your ability to participate in the social
conversation that’s going on around the world is really important to us. It’s
saying as a person, you matter, and we care about you. We want to include
everybody — and we’ll do what it takes to include everybody."