
Working with an innovative technology focused design agency I was asked to assist the Bacardi team to design a voice first Alexa skill for Amazon Echo ecosystem of devices.
Having previously worked on the Alexa platform within the Hotel and Drinks space I was asked by AKQA to assist them an in the creation of their next Alexa based project.
The focus was to be around Culture, Luxury and Movies. Called ‘The Green Room’, the new skill is tapping into the latest voice technology and the functionality of the Amazon Echo ecosystem across all devices, the Echo Show,Spot, Dot, Look and Classic Echo) ‘The Green Room’ means people are just a few words away from relaxing with a great cocktail and inspiring stories and witty bander.

I worked with AKQA and Bacardi, initially under the Grey Goose Vodka brand to create and advise them on how to create a plan and a concept to capture all the idea that were above the line and below the line.
Ideas were captured via detained User Flows. which comprised User utterances, Alexa Responses/Question plus Errors and Logic. Once created and refined, and spoken about at length with the team to guarantee a the logic was sound and clear. We used a Text-Voice tool to user test our conversations.
When exploring new interfaces with clients, firstly I ask myself. “Where is the added value for the user?” With that understanding in mind we working closely with the client and their data insights team to find the real value for the user to answer the question. Why are users going to keep coming back to this skill?
I organised and ran multiple workshops to outline what was the real aim of this skill in terms of the users. The workshops were created with two aims in mind, firstly the educate the client. This was crucial as the clients industry was drinks industry not Voice systems. We advised on best practice VUI and showed them how others in different industries where using Voice enable systems with a focus on Amazon’s Alexa.
This allows us also the user to learn about the Drinks industry point of view on Voice enabled systems. Secondly once we took time to gather data about what Bacardi users did, running one-on-one interviews to really understand their points. With this data in mind we followed up a with a second workshop to create an idea, or more to point of a series of ideas which could be later designed as functions for the skill. We settled on the which type of Alexa functions we needed so this lead up to design a Custom Interaction Model and Flash Briefing which would support Voice only devices plus screen enabled ones.

The voice-activated skill is designed to inspire people learn more about making cocktails and be entertainment. Users ask Alexa to help them find items from Bacardi’s collection.
The emphasis of the Alexa skill is on fun and ease of use, we avoided at focusing on eCommerce and in fact focused on Inspiration and Lead Generation. Users can save recommendations as favourites and listen to them later, and request a call back in order to book the room.
Armed with this knowledge from the workshops plus interviewing key stake holders lead to design the Phase 1 and Phase 2 of the new skill.
Key to designing an Alexa skill is to understand what currently are limitations around the Alexa platform. Currently we now have access to Messages and Notifications via a ‘white list’ from Amazon although this is dependent on working with Amazon closely.
Beyond the simple limitations of the Alexa platform. Knowing which limitation might affect a particular project is key. In project knowing how a user journey can be affected by Alexa’s Default and Streaming modes was very important as this affects the creative possibilities when developing an idea at the conceptual stage. ultimately this will inform the Copyrighters and Creatives Directors to what they can create on the platform or as important what the platform does not let you do.
As this project was centred around a Podcast (Streaming) and Mixologist (Active). Exploring solutions was very important. Furthermore when explaining the overall structure of a custom skill was important to get creative ideas fitting into the platform. Knowing where a copyrighter can and not put copy and in which mode the skill is in would affect them a lot. A simple example is between the two Alexa Modes can be explained below.
Overcoming Limitation and Platform Features
Beyond the simple limitations of the Alexa platform. Ex: Custom Skill creation, Notification white list, Reminders Integration. Knowing which limitation might affect a particular project is key. In project knowing how a user journey can be affected by Alexa’s Default and Streaming modes was very important as this affects the creative possibilities when developing an idea at the conceptual stage. ultimately this will inform the Copyrighters and Creatives Directors to what they can create on the platform or as important what the platform does not let you do.
As this project was centred around a Podcast (Streaming) and Mixologist (Active). Exploring solutions was very important. Furthermore when explaining the overall structure of a custom skill was important to get creative ideas fitting into the platform. Knowing where a copyrighter can and not put copy and in which mode the skill is in would affect them a lot. A simple example is between the two Alexa Modes can be explained below.
1) Active Mode
In default mode, conversations are created in a two way manor. The end user speaks in order to give the Alexa platform utterances that it can work with and Generally if the platform understand said utterances it will give relevant spoken or visual content. This generates and active conversation with either the Alexa default system or with an open session of a Custom Skill.
If at any point the user does not respond at the start after the Alexa response they will be prompted with a Re-prompt. This occurs after 8 seconds. After which if the user still doesn’t respond the Alexa system will close the session and effectively close the Custom App.
So the Default mode has the following limitation.
- A response from alexa can be no longer than 90 seconds, either as an audio clip or as text to speech, or a combination of the two.
- E.g..10 seconds of Alexa talking + 80 seconds of audio clip, or 10 seconds of Alexa + 50 seconds of audio clip + 30 seconds of Alexa….
Please see the Alexa Docs for further reference on this subject.
https://developer.amazon.com/docs/custom-skills/speech-synthesis-markup-language-ssml-reference.html#audio
Therefore, the longest that you can wait before a user input is necessary is 90 seconds before the first prompt, then another 90 seconds before the final prompt, totalling 180 seconds that content can play without interaction before you are thrown “out of the skill”.
In Default mode, you remain “In the skill”. This means that you can use a simple interactions without needing to use the invocation name all the time. For further clarity see Amazon docs.
2) Streaming mode
By contrast from Default Active mode, the Alexa platform allows you play content in a Streaming mode. This does not require users to speak to the platform. In this mode, you can play a piece of content with no time limit, and there are only two types of user input available; streaming controls and invocation phrases.
Streaming controls mean that as you listen to the content, you can say the following phrases at any time;
- Alexa, stop
- Alexa, pause
- Alexa, play
- Alexa, volume up
- Alexa, volume down
More details and full list here
https://developer.amazon.com/post/Tx1DSINBM8LUNHY/New-Alexa-Skills-Kit-ASK-Feature-Audio-Streaming-in-Alexa-Skills
No other inputs will be recognised unless they are part of an invocation phrase