Wednesday, June 11, 2014

The Web Speech API and WebSphere Portal 8

The evolution of speech recognition software has come a long way. Companies like AT&T Bell Labs, IBM, and Nuance Communications are leading speech recognition experts. Commercialized software like IBM ViaVoice and Dragon NaturallySpeaking were game changers in the speech recognition software industry. The backbone technologies in speech software (e.g Hidden Markov Models, noise filtering technology, acoustic signal processing, etc.) that were originally developed decades ago, but are finding their application today in products and services like gaming consoles, smartphones, customer help centers, and infotainment systems. The adoption of speech recognition technologies is becoming an important part of the way we live. The application of the technology is crossing multiple industries; from enforcing traffic laws, to educational training, and medical transcription. The pervasiveness of speech-enabled products and services will likely lead to further innovation and refinement. This article will show you a rudimentary way on how to bring speech recognition to your web experience using a few lines of JavaScript, CSS, and HTML code in the WebSphere Portal 8 environment.

By leveraging the JavaScript API defined in the Web Speech API specification, we’re able to tap into the browser’s audio stream to transcribe speech to text. Currently, the only browser that supports the Web Speech API specification is Google Chrome. As a note in the cited W3C specification, the Web Speech API specification is not a World Wide Web Consortium (W3C) standard nor is on track to becoming a standard.

To enable speech recognition in our browser, there are a handful of events and attributes we need to handle and define in order to make this functional. The start, end, result, and error events will provide us the triggering points needed to initiate and terminate our speech recognition feature on the web page. There’s also a few attributes we need to set to assist us in the transcription process.  

Explanation of Core JavaScript Functions and Attributes
For this implementation, the anonymous functions correlating to the start, end, result, and error events defined in the Web Speech API Specification are defined below. The triggering points for start and end events are through button clicks; the end event could also be triggered by non-detected speech or if there’s an error detected by the speech recognition service (in our case, the service hosted by Google.)
In the example code below, we adapted the W3C Web Speech API specification sample code as our foundation:






The SpeechRecognition (e.g. webkitSpeechRecognition) JavaScript object attributes that are utilized in this example are defined below:
  • continuous: Sets the behavior for discrete or continuous speech
  • interimResults: Indicates if an interim transcript should be returned 
  • lang: Defines the language of the recognition

Please reference the Web Speech API specification for a complete list of method, attribute, and event definitions.

Enabling Speech Recognition in WebSphere Portal 8
Copy the images, CSS, and JavaScript files from the “SpeechAPI.zip” file to its corresponding Portal theme static resource folders (e.g. <Theme>/images, <Theme>/css, <Theme>/js).

Enabling Speech Recognition in WebSphere Portal 8

Copy the images, CSS, and JavaScript files from the “SpeechAPI.zip” file to its corresponding Portal theme static resource folders (e.g. <Theme>/images, <Theme>/css, <Theme>/js).


For a quick test, Edit the “search.jsp” file from “<WPS_HOME>\PortalServer\theme\wp.theme.modules\webapp\installedApps\ThemeModules.ear\ThemeModules.war\themes\html\dynamicSpots\modules\search\”:
Include the following JSTL variable declarations at the top of the “search.jsp” file
<!-- START: Speech Recognition JSTL variables -->
<c:set var="sr_basePath" value="/wps/mycontenthandler/dav/fs-type1/themes/portal8WebSpeechTheme"/>
<c:set var="sr_imgPath" value="${sr_basePath}/images"/>
<c:set var="sr_cssPath" value="${sr_basePath}/css"/>
<c:set var="sr_jsPath" value="${sr_basePath}/js"/>
<!-- END: Speech Recognition JSTL variables -->  

Include the following HTML Input-element after Input-element (id="wpthemeSearchBoxInput") of the “search.jsp” file
<input class="wpthemeSearchText" id="sr_microphone_button" type="button" title="Click to start speaking" alt="Microphone Off" style="width: 22px; height: 22px; vertical-align: middle; background-image: url('${sr_imgPath}/microphoneOff_22pxs.png');" onclick="WebSpeechHelper.prototype.toggleStartStop(event, stateInfo)">

Include the following HTML snippet right after the last <div> in the “search.jsp” file
<!-- START: Speech Recognition HTML -->
<div id="sr_webSpeechAPIContainer">
<!-- Pull in Speech Recognition Resources -->
<LINK rel="stylesheet" type="text/css" href="${sr_cssPath}/speechRecognition.css">
<SCRIPT src="${sr_jsPath}/speechRecognition.js"></SCRIPT>
<div id="sr_results" class="sr_results">
<span id="final_span"></span> 
<span id="interim_span"></span>
</div>
<SCRIPT>
var jsonText = '{"parameters":[' +
'{'+
'"locale":"en-US",'+
'"imagePath":"${sr_imgPath}",'+
'"microphoneOnImage":"microphoneOnAnimated_22pxs.gif",'+
'"microphoneOffImage":"microphoneOff_22pxs.png",'+
'"microphoneDisabledImage":"microphoneDisabled_22pxs.png",'+
'"microPhoneButtonID":"sr_microphone_button",'+
'"finalSpanID":"final_span",'+
'"interimSpanID":"interim_span",'+
'"searchBoxID":"wpthemeSearchBoxInput"'+
'}'+
']}';
var jsonObj = JSON.parse(jsonText);
var stateInfo = new SpeechStateInformation(jsonObj);
WebSpeechHelper.prototype.initializeVoiceRecognition(stateInfo);
</SCRIPT>
</div>
<!-- END: Speech Recognition HTML -->


*** Please reference the “search.jsp” file, included in this blog, to verify that the code-placement is correct. ***

After making the modification to the “search.jsp”, log into WebSphere Portal with Google Chrome (version 25 or above) to see the microphone image appear on the left of the search-icon as such:



By clicking on the microphone image, the browser will ask you to “allow” microphone usage. The microphone image will animate when it’s enabled. In this example, I searched for “web content manager”. (Note: Click on the native search button after you’re finished speaking)




Summary
This article we hope has helped introduce the possibilities of adding speech capabilities to the user experience in WebSphere Portal 8 and beyond. The referenced example and resources should help in getting started with exploring this emerging trend in voice enabled inputs in the digital experience. Many possibilities exist for reacting to and enabling voice input and interactions with the IBM Digital Experience platform. We look forward to increasing browser support and adaption in the near future. The solution presented here could also be abstracted into a portal theme module for re-use with the module theme framework. For the purpose of illustrating the voice API integration presented here we have not packaged the examples in the Portal 8 modular theme contributions as we normally would. A non-illustrative solution would contain the modular theme contribution configuration so that resource aggregation and minification are addressed and conform to Portal 8 best practices.



Richard Yu is a Senior Consultant at Prolifics with over 11 years of experience in application design, development and migration. He has worked on projects of varying complexity and magnitude in healthcare, manufacturing, and government . He holds two Master's degrees from NYU Polytechnic School of Engineering and a Bachelor's from Stony Brook University.