Building Alexa Skills (some small challenges I hit)

Now that version one of Math Mania is out in the wild, I thought it would be good to share a few more details about tech. behind it and more specifically the tech. challenges I ran into while building it.

So first, a little high-level technical background on developing an Alexa Skill for those of you that haven’t yet read the documentation.

Basically, developing Alexa Skills requires three core steps.

1. Define your intents. What things your skill is going to look for/handle, and high level details about what parameters it’s going to be looking for. An example:

{
 "intents": [
   {
     "intent": “AnswerIntent”,
     "slots": [
       {
         "name": “Direction”,
         "type": “direction”
       },
       {
         "name": “Answer”,
         "type": “NUMBER”
       }
     ]
   }
 ]
}

2. Define your example phrases. Examples of all the different phrases someone might use to trigger a given intent. An example:

AnswerIntent ah {one|Answer}

3. Provide your custom code (the custom code that gets executed when a given intent is triggered by your skill).

With the combination of the three things above, Alexa will know how to take a verbal command sent to your skill like:

“six”

and (mostly) direct it to the right bit of your custom code to handle the answer of six.

Kind of magic (and awesome) when you really step back and think about it.

But it’s still not 100% perfect. Of course there’s the obvious stuff like the fact that it’s not 100% accurate on translating what you say accurately into text sent to your code…but there are also some less obvious issues (from a developer’s stand point) that I ran into and caused me a bit of trouble.

Here are the primary challenges I hit in building version one of Math Mania:

1. Alexa doesn’t have the concept of a “default” intent yet (or I couldn’t find the documentation or details on it if it does)…so I had to design an “unsureIntent” to try and catch all the cases where you didn’t say something the app was expecting (which can be quite often actually).

In the documentation, they show a pretty simple example of a Skill that only has two intents…one has slots and one does not…but in Math Mania, I had a number of intents that actually have no slots (perhaps thats a design flaw on my part; though I think it maps pretty well to the real world). For example, I wanted to support a “Start game” command, a “restart” command, a “Score” command, a “Help” command, a “Repeat” command, and a “Settings” command. In addition to all of these, if you say something strange or un-expected (or that Alexa just heard wrong) I wanted the system to treat that as a “Repeat” command.

Seems simple enough right? The trouble is that all of these intents are essentially defined with one or two word example phrases…so when Alexa gets something that it doesn’t understand, but recognizes it as one, two, or more words…which intent should it route the request to?

From my testing, it seems like it defaults to the first intent in the list that expects/accepts that many words…so my solution was to create an UnsureIntent with example phrases that were one, two, three, four, etc. words long and put it first in my list of intents (and example phrases). And when that intent is called, it really just acts like a “repeat” intent (with slightly different wording so that the user knows the system didn’t understand vs. just a straight up repeat command).

It’s a total hack; but seems to (mostly) work for now. Still it would be a lot easier to be able to just define a “default” intent that all requests get routed as if/when Alexa has no flipping clue what you said or meant…

2. AWS Lambda has no session or state information from one request to the next…so you have to use the Alexa session or an external web service to hack the concept of one together (I chose to use a custom rest-based web service rather than the Alexa session, DynamoDB, or S3 because I didn’t want the extra cost – and/or additional tech. challenges – introduced into this experiment; i.e. I already knew how to do that really easily and I wanted to store some data about your highest score and streak that lasted longer than just your current session).

Wait, isn’t this post supposed to be talking about Alexa? Why mention AWS Lambdas?

Because the recommended way of serving up your custom code (step #3 mentioned above) is to use AWS Lambdas.

The big advantages here are that A.) it auto-scales for you, and B.) Amazon takes care of many security issues for you (if you don’t use Lambdas, you have to call an SSL-based web service; which is another additional cost to the developer).

But of course there are some trade offs to using the Lambda service (the primary one for me being that it’s a bit more complex to not go 100% Amazon for all aspects [e.g. DynamoDB])

3. The orig. version of this game is intended to be played in 60 second bursts - however I couldn’t figure out how to actually accomplish this within Alexa. A.) The conversation is fairly slow from both sides (so you wouldn’t want the timer running during that period) and B.) The timer actually should start when the response is sent and stop when the answer is received. I didn’t see an obvious way of implementing a proper timer.

Best I could do is timestamps and timestamp comparisons within my web service layer or possibly shoved into the Alexa session – but it won’t really be an accurate timing of the game play. Still, a future version probably will do just that (with extra padding on the time limit thrown in to account for the speed of the conversation and communication layers).

4. It took about 5 days from submission to approval…and I didn’t pass initial approval (so it really took me about 10 days from submission to approval). I didn’t pass initial approval for very good reasons, but they were all only clear after A.) having built and submitted the app, and B.) having someone from the Alexa approval team point out specifics with examples and suggestions (in my opinion the documentation is great on tech. specifics and fairly vague on usage/detail/scenario specifics).

5. Once the app is ‘approved’ there’s no obvious/easy/great way to share that news with the world…no link or app store listing to specifically point them too (that I could find)…the best I could come up with was a tweet with basic install instructions and of course the blog post I wrote yesterday.

6. A much much smaller issue, but the documentation is still a bit inconsistent on a few points. In the getting started documentation it mentions that you can add images to cards (cards are what get displayed within the Alexa app when a user is interacting with your skill), but there is no information about this in the actual documentation (it basically says you can only pass plain text at this time).

This post has received 28 loves.


SUBSCRIBE WITH YOUR EMAIL

ARCHIVE OF POSTS



This is the personal blog of Kevin Marshall (a.k.a Falicon) where he often digs into side projects he's working on for digdownlabs.com and other random thoughts he's got on his mind.

Kevin has a day job as CTO of Veritonic and is spending nights & weekends hacking on Share Game Tape. You can also check out some of his open source code on GitHub or connect with him on Twitter @falicon or via email at kevin at falicon.com.

If you have comments, thoughts, or want to respond to something you see here I would encourage you to respond via a post on your own blog (and then let me know about the link via one of the routes mentioned above).