www.jlion.com

Monday, April 24, 2006

I'm starting to become more comfortable with Speech Server, and to have a better understanding of how it works. I think the part of Speech Server that I struggled with the most was the conceptual model. Speech Server is actually a javascript (client-side) object, so much of the logic of a Speech Server application is written in javascript. Then there's the "semantic item" which at this point I envision as a sort of holding area for variable data and as a great way for the server side of an application (database access, etc) to communicate with the client side and speech server.

Speech server requires a grammar for recognizing speech. A grammar is the template that speech must match up against to be recognized, and the fact that such a template exists helps the speech recognition to be more accurate by narrowing down the possibilities. For example, if the grammar specifies that a word must be either donut, muffin or apple, then the speech server knows that the user didn't say "donate" -- more likely that he or she said "donut".

Here's a few tricks that I've discovered:

1) If you want the speech engine to speak a number (1910 as "one nine one zero" instead of "one thousand nine hundred ten", use this javascript: sMyNumber.split("").join(",");

2) To bypass a prompt database (prerecorded speech) and force use of the speech synthesizer, use the <peml:tts> tag. For example, if the prompt is: "You have selected Cantelever Street </peml:tts>. Is that correct?"

3) To react differently the second (or third) time through a prompt, you can use a javascript function like the following:

var miRepeat=0;
function GetTimesThrough()
{
var iReturnValue=miRepeat;
miRepeat++;

return iReturnValue;
}

The function is then referenced from a prompt function. Note that you may have to do something different if any of the prompts have the autopostback checked.

4) The Speech ListSelector control seems like a great way to allow users to select something from a list which is compiled at runtime. The ListSelector can be bound to an array. I have not tested this as yet in production but in development it seems to work just fine.

5) Making outbound calls requires that a message queue be set up. The TAS then has to be told to listen to the message queue. The message should then containt the URL of the speech app to be executed. It's up to the speech app to get the target telephone number and any other necessary info from the querystring of the URL.

6) I'm working with an Intel board, and in order to get outbound calling to work I had to configure the CAM to dedicate two of the four lines to outbound calling.

Wednesday, April 12, 2006

For the last few days I've been trying to get the Microsoft Speech Server SDK 1.1 to work on a PC that had a bunch of software installed, including both VS2003 and VS2005. What I was getting were error messages like "microphone wizard failed to initialize" and "The speech recognition stream was terminated" from the speech control panel.

This mystified me and I made the bad assumption that DotNet 2.0 was the problem based on previous issues that I'd had with Reporting Services 2003 vs 2005. So based on this assumption I set up a virtual PC with just XP sp2, VS2003 and the SDK -- lo and behold: it worked! This seemed to confirm my assumption, so I set about duplicating that configuration on the PC that I was using, by installing a second instance of XP that I planned to use just for speech server development.

I had XP installed and was just at the point where I needed to join the network domain when the boss wandered over to chat. I described the error message and went to demonstrate the error message to him by bringing up the speech control panel when I noticed an "add" button in the profile area of the speech control panel. I knew that the configure microphone button wouldn't work -- and it didn't -- I was able to demo this for the boss, but when I clicked on the add profile button it did work and I found that the speech engine started up and allowed itself to be trained.

Once I trained the speech engine using this add profile button then the configure microphone button started working as did the Speech SDK speech debugger. Odd that the default profile didn't work on that PC but did in the virtual PC...and I've just duplicated this behavior on a laptop that I have at home...error messages prior to adding a profile then once a new profile has been added and trained then the SDK seems to work fine...