<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body dir="auto">Anne Taylor is a senior project manager at Microsoft and a former NFBK member. She also worked for the NFB in Baltimore. <div></div><div><br></div><div><h1 class="title" style="line-height: 1.4em; -webkit-hyphens: manual; margin-bottom: 1em; max-width: 100%;"><span style="font-size: 28px; background-color: rgba(255, 255, 255, 0);">Decades of computer vision research, one ‘Swiss Army knife’ - Next at Microsoft</span></h1><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">When Anne Taylor walks into a room, she wants to know the same things that any person would.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">Where is there an empty seat? Who is walking up to me, and is that person smiling or frowning? What does that sign say?</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">For Taylor, who is blind, there aren’t always easy ways to get this information. Perhaps another person can direct her to her seat, describe her surroundings or make an introduction.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">There are apps and tools available to help visually impaired people, she said, but they often only serve one limited function and they aren’t always easy to use. It’s also possible to ask other people for help, but most people prefer to navigate the world as independently as possible.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">That’s why, when Taylor arrived at Microsoft about a year ago, she immediately got interested in working with a group of researchers and engineers on a project that she affectionately calls a potential “Swiss Army knife” of tools for visually impaired people.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">“I said, ‘Let’s do something that really matters to the blind community,’” said Taylor, a senior project manager who works on ways to make Microsoft products more accessible. “Let’s find a solution for a scenario that really matters.”</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">That project is <a href="https://youtu.be/3WP7Id8SxYQ" style="text-decoration: none; max-width: 100%;">Seeing AI</a>, a research project that uses computer vision and natural language processing to describe a person’s surroundings, read text, answer questions and even identify emotions on people’s faces. Seeing AI, which can be used as a cell phone app or via smart glasses from <a href="http://www.pivothead.com/" style="text-decoration: none; max-width: 100%;">Pivothead</a>, made its public debut at the company’s <a href="http://build.microsoft.com/" style="text-decoration: none; max-width: 100%;">Build conference</a> this week. It does not currently have a release date.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">Taylor said Seeing AI provides another layer of information for people who also are using mobility aids such as white canes and guide dogs.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">“This app will help level the playing field,” Taylor said.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">At the same conference, Microsoft also unveiled <a href="https://www.captionbot.ai/" style="text-decoration: none; max-width: 100%;">CaptionBot,</a> a demonstration site that can take any image and provide a detailed description of it.</span></p><p style="max-width: 100%;"><span style="max-width: 100%; background-color: rgba(255, 255, 255, 0);"><MailScannerIFrame9297 iframe type="text/html" width="643" height="392" src="http://www.youtube.com/embed/R2mC-NUAmMk?version=3&rel=1&fs=1&autohide=2&showsearch=0&showinfo=1&iv_load_policy=1&wmode=transparent" frameborder="0" allowfullscreen="true" style="max-width: 100%;"></MailScannerIFrame9297></span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);"><strong style="max-width: 100%;">Very deep neural networks, natural language processing and more<br style="max-width: 100%;"></strong>Seeing AI and CaptionBot represent the latest advances in this type of technology, but they are built on decades of cutting-edge research in fields including computer vision, image recognition, natural language processing and machine learning.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">In recent years, a spate of breakthroughs has allowed computer vision researchers to do things they might not have thought possible even a few years before.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">“Some people would describe it as a miracle,” said <a href="http://research.microsoft.com/en-us/people/xiaohe/" style="text-decoration: none; max-width: 100%;">Xiaodong He</a>, a senior Microsoft researcher who is leading the image captioning effort that is part of <a href="https://www.microsoft.com/cognitive-services" style="text-decoration: none; max-width: 100%;">Microsoft Cognitive Services</a>. “The intelligence we can say we have developed today is so much better than six years ago.”</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">The field is moving so fast that it’s <a href="http://research.microsoft.com/pubs/264408/ImageCaptionInWild.pdf" style="text-decoration: none; max-width: 100%;">substantially better </a>than even six months ago, he said. For example, <a href="http://research.microsoft.com/people/ktran/" style="text-decoration: none; max-width: 100%;">Kenneth Tran</a>, a senior research engineer on his team who is leading the development effort, recently figured out a way to make the image captioning system more than 20 times faster, allowing people who use tools like Seeing AI to get the information they need much more quickly.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">A major a-ha moment came a few years ago, when researchers hit on the idea of using deep neural networks, which roughly mimic the biological processes of the human brain, for machine learning.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">Machine learning is the general term for a process in which systems get better at doing something as they are given more training data about that task. For example, if a computer scientist wants to build an app that helps bicyclists recognize when cars are coming up behind them, it would feed the computer tons of pictures of cars, so the app learned to recognize the difference between a car and, say, a sign or a tree.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">Computer scientists had used neural networks before, but not in this way, and the new approach resulted in big leaps in computer vision accuracy.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">Several months ago, Microsoft researchers <a href="http://research.microsoft.com/en-us/people/jiansun/" style="text-decoration: none; max-width: 100%;">Jian Sun</a> and <a href="http://research.microsoft.com/en-us/um/people/kahe/" style="text-decoration: none; max-width: 100%;">Kaiming He</a>made another big leap when they unveiled a new system that uses very deep neural networks – called <a href="http://arxiv.org/abs/1512.03385" style="text-decoration: none; max-width: 100%;">residual neural networks</a> – to correctly identify photos. The <a href="http://blogs.microsoft.com/next/2015/12/10/microsoft-researchers-win-imagenet-computer-vision-challenge/" style="text-decoration: none; max-width: 100%;">new approach</a> to recognizing images resulted in huge improvements in accuracy. The researchers shocked the academic community and won two major contests, the <a href="http://www.image-net.org/" style="text-decoration: none; max-width: 100%;">ImageNet</a> and <a href="http://mscoco.org/home/" style="text-decoration: none; max-width: 100%;">Microsoft Common Objects in Context</a>challenges.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);"><strong style="max-width: 100%;">Tools to recognize and accurately describe images<br style="max-width: 100%;"></strong>That approach is now being used by Microsoft researchers who are working on ways to not just recognize images but also write captions about them. This research, which combines image recognition with natural language processing, can help people who are visually impaired get an accurate description of an image. It also has applications for people who need information about an image but can’t look at it, such as when they are driving.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">The image captioning work also has received <a href="https://blogs.technet.microsoft.com/inside_microsoft_research/2015/06/11/microsoft-researchers-tie-for-best-image-captioning-technology/" style="text-decoration: none; max-width: 100%;">accolades for its accuracy</a> as compared to other research projects, and it is the basis for the capabilities in Seeing AI and Caption Bot. Now, the researchers are working on expanding the training set so it can give users a deeper sense of the world around them.</span></p><div class="clear" style="max-width: 100%; clear: both;"><a href="https://mscorpmedia.azureedge.net/mscorpmedia/2016/03/FSPB4720.jpg" style="text-decoration: none; max-width: 100%; background-color: rgba(255, 255, 255, 0);"><font color="#000000"><img src="https://mscorpmedia.azureedge.net/mscorpmedia/2016/03/FSPB4720-1024x634.jpg" alt="Margaret Mitchell" width="643" height="398" scale="0" class="extendsBeyondTextColumn" style="max-width: none; margin: 0.5em auto 0.5em -20px; display: block; height: auto; width: 375px;"></font></a><p style="max-width: 100%; font-style: italic;"><span style="background-color: rgba(255, 255, 255, 0);">Margaret Mitchell</span></p></div><p style="max-width: 100%;"><font color="#000000"><span style="background-color: rgba(255, 255, 255, 0);"><a href="http://m-mitchell.com/" style="text-decoration: none; max-width: 100%;">Margaret Mitchell</a>, a Microsoft researcher who specializes in natural language processing and has been one of the industry’s leading researchers on image captioning, said she and her colleagues also are looking at ways a computer can describe an image in a more human way.</span></font></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">For example, while a computer might accurately describe a scene as “a group of people that are sitting next to each other,” a person may say that it’s “a group of people having a good time.” The challenge is to help the technology understand what a person would think was <a href="http://arxiv.org/abs/1512.06974" style="text-decoration: none; max-width: 100%;">most important, and worth saying</a>, about the picture.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">“There’s a separation between what’s in an image and what we say about the image,” said Mitchell, who also is one of the leads on the Seeing AI project.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">Other Microsoft researchers are developing ways that the latest image recognition tools can provide more thorough explanations of pictures. For example, instead of just describing an image as “a man and a woman sitting next to each other,” it would be more helpful for the technology to say, “Barack Obama and Hillary Clinton are posing for a picture.”</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">That’s where <a href="http://research.microsoft.com/en-us/people/leizhang/" style="text-decoration: none; max-width: 100%;">Lei Zhang</a> comes in.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">When you search the Internet for an image today, chances are high that the search engine is relying on text associated with that image to return a picture of Kim Kardashian or Taylor Swift.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">Zhang, a senior researcher at Microsoft, is working with researchers including Yandong Guo on a system that uses machine learning to identify celebrities, politicians and public figures based on the elements of the image rather than the text associated with it.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">Zhang’s research will be included in the latest vision tools that are part of <a href="https://www.microsoft.com/cognitive-services" style="text-decoration: none; max-width: 100%;">Microsoft Cognitive Services</a>. That’s a set of tools that is  based on Microsoft’s cutting-edge machine learning research, and which developers can use to build apps and services that do things like recognize faces, identify emotions and distinguish various voices. Those tools also have provided the technical basis for Microsoft showcase apps and demonstration websites such as <a href="http://how-old.net/" style="text-decoration: none; max-width: 100%;">how-old.net</a>, which guesses a person’s age, and <a href="http://news.microsoft.com/features/fetch-new-microsoft-garage-app-uses-artificial-intelligence-to-name-that-breed/" style="text-decoration: none; max-width: 100%;">Fetch</a>, which can  identify a dog’s breed.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">Microsoft Cognitive Services is an example of what is becoming a more common phenomenon – the lightning-fast transfer of the latest research advances into products that people can actually use. The engineers who work on Microsoft Cognitive Services say their job is a bit like solving a puzzle, and the pieces are the latest research.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">“All these pieces come together and we need to figure out, how do we present those to an end user?” said Chris Buehler, a software engineering manager who works on Microsoft Cognitive Services.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);"><strong style="max-width: 100%;">From research project to helpful product</strong><br style="max-width: 100%;">Seeing AI, the research project that could eventually help visually impaired people, is another example of how fast research can become a really helpful tool. It was conceived at last year’s <a href="http://blogs.microsoft.com/firehose/2015/07/27/oneweek-hackathon-2015-heard-around-the-world/" style="text-decoration: none; max-width: 100%;">//oneweek Hackathon</a>, an event in which Microsoft employees from across the company work together to try to make a crazy idea become a reality.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">The group that built Seeing AI included researchers and engineers from all over the world who were attracted to the project because of the technological challenges and, in many cases, also because they had a personal reason for wanting to help visually impaired people operate more independently.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">“We basically had this super team of different people from different backgrounds, working to come up with what was needed,” said Anirudh Koul, who has been a lead on the Seeing AI project since its inception and became interested in it because his grandfather is losing his ability to see.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">For Taylor, who joined Microsoft to represent the needs of blind people, it was a great experience that also resulted in a potential product that could make a real difference in people’s lives.</span></p><p style="max-width: 100%;"><span style="background-color: rgba(255, 255, 255, 0);">“We were able to come up with this one Swiss Army knife that is so valuable,” she said. </span></p><p style="max-width: 100%;">This article is online at: </p><p style="max-width: 100%;"><a href="http://blogs.microsoft.com/next/2016/03/30/decades-of-computer-vision-research-one-swiss-army-knife/#sm.00002h8xm51d70fekv7feuqdwllvq">http://blogs.microsoft.com/next/2016/03/30/decades-of-computer-vision-research-one-swiss-army-knife/#sm.00002h8xm51d70fekv7feuqdwllvq</a></p><p style="max-width: 100%;"><br></p></div></body></html>