Google has launched Auto-Caption, a new service for YouTube, its video website. The new service will convert speech in videos into readable text to give accessibility to the deaf, and to those who do not speak English.
But as is usually its style, Google has also added a translation option to the auto-captioned text. Although not perfect, the results are amazingly good. I tried it myself on the iPad keynote address of Apple CEO Steve Jobs, and the accuracy was acceptable, of course with funny overlapping in some parts.
These start growing in social media under a 'YouTube caption fail' when users come across some funny scenes that are not suitable for the written word.
Using the same technology, you can now upload your manually created captions along with the video. With the auto-timing feature, YouTube will automatically figure out and adjust the appropriate text to the right time in the video, without assigning any time parameters, as you have done before.
You can export the text with the automatically created timings for use in a third party software, which was impossible in the past. However, this process may take some time, depending on the duration of the video.
The Google speech technology team faced some challenges when developing this service. They were planning to apply this technology on all videos in all languages, but the huge vocabulary, background noise, low quality recording, different accents and how to identify the voice as a speaking or singing one were all obstacles, and thus it is available now only for English videos.
The added value is that all this occurs within the cloud. The text does not fully integrate with the video, as was the case for translation in VCD movies, making it possible to immediately develop the service without re-correcting what has been done before.
If Google enables this service for more languages in the future, it can be applied instantly. This is not only useful for YouTube users, but on Google’s core search as well, where you can now search what has been said in a video.
For example, you remember a phrase spoken in a movie, but you do not remember which movie it was. Searching auto-captioned text in the video is the solution!
Large universities like Stanford, Berkeley and Columbia are already in partnership with Google to make the best use of this service, and give students full access to the content of the lectures.
Though some may not see the importance of this service, it means a wider audience for content creators without any additional efforts. Also, a video editor will know the difficulty of adding subtitles and modifying text in their own video files.
What I liked about this range of services is how Google has benefited from its prior services to easily produce a new one, as Auto-Caption has benefited from the same voice recognition algorithms used in Google Voice, and the text translation from Google Translate.
They simply did not invent anything from scratch, but made use of everything that already existed. They have only assembled these services on their videos website, just as Android, their mobile phone operating system, is based entirely on the various Google Apps (and I do not think that anyone will pay much attention if it were not there).
While the Apps exist on other phones, if Android is built-in and working well, once you are logged on, the device prompts you to connect with your Google account.
I am sure that other video websites will also provide such features eventually, but I wonder if they will use some Google services such as translation. Or will they create something from scratch? That is the challenge.