Why Amazon Alexa Voice Assistant?

Taking the time to understand and see the potential to help users solve simple customer service problems was the reason for designing a Shoop Alexa Skill.

After researching the SIRI iOS10 API I came to a conclusion that currently (September 2016) Apple couldn’t support my requirements and would have to wait for iOS11.

I started to explore Amazon Echo and Alexa offering and soon understood that this project was very much used by our customers and designing a custom Skill for Alexa would be very simple to achieve.

To idea in the Alpha version was design the skill around 4 keys requirements. Which goes as follows. This insight was based on Interviews and Questionnaires which asked our users which tasks they felt would be best achieved via this new Voice Interface.

The equipment needed to access the Alexa assistant is, either an iOS or Android device for the App and an Amazon, Echo or Dot plus an active internet connection.

When designing a custom skill amazon outlines particular emphasis on Error handling as the user has nothing but Alexa response as primary response to the user’s command and limited simple UI has a backup on the App.

The skill interface is also where we specify what a skill is called so a user can invoke it by name when talking to an Alexa enabled device.

Skill invocation name

Define Skills Interaction model is what trains the skill interface so that it knows how to listen for user spoken words. It resolves the spoken words to specific intent events you define that words that should map to a particular intent name in the interaction model by providing a list of sample utterances a.

A sample utterance is a string that represents a possible way the user may talk to the skill.

Utterances are used to generate a natural language understanding model that resolves the users voice to out skill intent.

we also declare an intense schema on the interaction model the intense scheme tell the skill interface what intent the skill service implements. Once we provide the sample utterances

Recap: Utterance definitions resolve spoken words to intents in the skill interaction model, once the list of sample utterances is added to the online interaction model the user spoken words can be matched against the resulting model and resolve to an intent. 

The Skill Interface configurator is the second part of creating a skill where we specify those utterances skill interface is what is responsible for processing the users spoken words it handles the translation between audio from the user to events the skill service can handle it sends the events to the Skill Service so the event handlers can do their work

The skill interface is also where we specify what a skill is called so a user can invoke it by name when talking to an Alexa enabled device.

Sample Utterances needs to be comprehensive but generally, this is handled by Interactive Model automatically.

When designing a custom voice user journeys for the Shoop skill you need t design for success but also for failure. Understanding how Alexa handles these errors is vitally important as the user will receive any errors via Audio only. Also Unlike Visual user interfaces, you will be able to interact with your error and try to move forward in your request to achieve the desired outcomes. See below for examples.

Understanding how each command is constructed allows you design for errors and successes. When creating commands for your skill you can not design for every outcome but you can test to find user patterns with the aid of User Testing in how to navigate our your skill, how a user requests commands from your skill and also how your skill handles user generated errors.

Testing to see how a user actually requests the command in crucial to designing a custom skill that can be used with a wide spectrum of sociality. The larger you the skill becomes you larger to possibilities of possible input varieties.

Without a user testing methodology, if a User’s expected outcome does not occur and they are faced with a simple error with nothing to guide them, they will simply become frustrated. Deeming that the skill does not work and possibly delete the skill. User testing allows the business & developers to improve the custom skill.

As Voice Assistants stand currently they are part up of 2 parts, 1 – Voice reply but also 2 – Visual card via the Alexa app.

Designing for Voice and understand how errors and commands should be created is only 1 half of the custom design. Designing for the visual cards is crucial to get the most out of the Amazon Assistant system. Audio answers can be different from Visual answers. Allowing the creation of context and situations.

Prototyping

So how to prototype a Voice Interface App if all you have is thin air?

Currently there a few different tools I use to prove the concept after the paper stages. Firstly you can test either within the Amazon app, or via third party apps on iOS and macOS/Windows10.

Using SaySpring service, this enables you to setup basic conversations flows within the app. For example simple Question with answers. You can not us anything with slots as far as I am aware.  This might become the the Axure of Voice one day but not currently.

These tools give you a great idea all the basic error states you need to have covered before submission. This also allows you the demonstrate in front of others how a Question and Answer conversation could work. Ultimately this brings the project to life.

Below is a typical user journey of how to link a users account with Alexa.

Linking the account