Notes on designing a VSK skill.

Designing for Alexa Video Skill Kit

From large screen experience design to screen less interfaces and then via set top box interfaces 2019 and 2020 have really been varied on what I have been either asked to explore. I found myself going off in quite a few different directions.

From the very outset of understanding what Alexa is we’ve have been introduced to the Custom skill kit. Otherwise know has ASK. This was the space promoted by Amazon to give a space to creators and brands to make their own. Since then we’ve seen a new set of skills being promoted and created by Amazon. These are the “Pre-Made API skills”.

When I saw the name “Pre Made API”, I wasn’t too keen because adding end points to API isn’t easy so I knew that I would boxed into the limited capabilities of this api. – That isn’t a bad thing just good to be aware about and know what advantages this skill can give you Compared custom skill which can sometimes feel quiet limited as well. Furthermore I knew I would have to explain these limitations to non developers so I wanted to get a good understand of VSK for Fire & Echo devices.

When I asked to bring Video content tot he Alexa Show my first thoughts to myself were. Make a custom skill with conversational design principles and add a player and make sure the skill doesn’t fall a sleep and quiet the skill while the user watches the video. My first question was to know how long each video would be since, if they had been shorter than 10minutes I might have considered a Flash Briefing PreMade Api skill.

So since I knew the videos would be longer than 10minutes it was either going to be a Custom skill or VSK. Then the following conversation was figure what they wanted from a skill. Did they beed both or just a single one. Did they need to have custom intents, because if so then VSK wasn’t wouldn’t possible.

Suddenly I was found myself going through the Alexa Dev docs again seeing how you can bring Live TV to EchoShow device. This lead me to leaving the confines of Alexa for Custom skills docs and entering into the world of FireTV and Voice on FireOS that can also be on Alexa devices plus understand what catalog-ingestion is and its limitations compared to others. So really the first question to answer was exactly is Video skill and how can i help me?

What is the Video Skills Kit and how can it help me design a solution?

Firstly let’s get the definition out there. Before I start to design any experiences know what the technology can do by default can save you time. So I generally start by reading the dev docs plus other sources. This is what the dev docs tell you.

The Video Skill Kit (VSK) API is a set of APIs that enable the far-field control of video devices and streaming services using an Alexa device. The Video Skill API provides both customers and developers with a consistent experience and interaction model. Alexa can determine which devices and services the user has and which top-titles are available for them to watch.

Amazon Developer Docs

Furthermore VSK has far deeper integration with the 1st Alexa search and feels much more like a eco system of devices since the user doesn’t need to invoke the skill a device but in face can interact with that skill via number of different inputs such as attached Echo Dots, FireTV device and Remotes

Im guess at the point it would be a good point to say that VSK are skill very new. At to the time of writing I still only have 1 VSK on my Echo Show device and thats Prime but I have seen many many in development. Features and functionality will improve over time. As Amazon continues to enrich the features of Video Skills, content providers will be able to offer an improved user experience.

How can this help me Design a solution?

From a Product Designers point of view, once I had explored and done my reading, explored the market of existing VSKs I had a fair understand of what I could create. Feeding back to the wider team and answered any questions about the technology, I found myself comparing it to Custom skills as this is what I knew.

So then I asked myself, do I really need a VSK and no matter how much i wanted that full creative branded space I found that the VSK offered a few things that just made them better for what we needed. Firstly I need Catalogue Ingestion and as you can see on the link, there is no mention of Custom skills. So if we wanted to have streaming TV we needed a VSK.

Secondly, discoverability is an issue faced by Alexa. Compared to 3rd party skills the user doesn’t need to enable to skills and then open them to get search results and this is very helpful but for me, its the branding opportunity of having your logo and title within the Video Skills section. Again Custom skills cant be placed there.

Thirdly its the deeper integration with Alexa on a first party level. If a user ask to play a certain genre or content that can be found on a particular channel or sport, Alexa will then deliver the most relevant results, making searching and discovering content easier for the user. No need to ope a skill and then search from within there.

The two issues and one fact I did faced, firstly was that if your organisation doesn’t support Oauth and you have paid for content then you have problem. There are ways to create work around but this will involve conversations. Secondly, from a visual point of view you can’t change the font from Amazon default font to your brands. I did ask but at the moment this isn’t something that has been allowed for. Thirdly, this isn’t an issue but more how the system works. You cant create any custom intents. This isn’t too much problem but it does need to be stated and explained to stack-holders

One a side note it well worth exploring whether you really need a VSK or could you just start with Media Session API. Yes another thing to read all about but if your organisation needs Voice commands functionality on a TV this could be a good stepping stone.