Getting started: Playing short audio clips & composing speech responses in your Cortana Skill

April 1, 2019

5038

In this blog post we will see how Cortana skills can play short audio clips and compose the Text-to-Speech(TTS) responses that Cortana speaks on behalf of your skill. We will leverage the SSML support in the Cortana Skills Kit to achieve this.

If you haven’t already, do check out our first post on creating your Cortana skill and the second on personalizing it as per your skill context.

SSML is an XML-based markup language that skill developers can use to specify the speech text that Cortana translates to speech. Using SSML improves the quality of synthesized content over sending Cortana plain text. Cortana’s implementation of SSML is based on World Wide Web Consortium’s Speech Synthesis Markup Language Version 1.0. Check out the complete documentation here on how you can leverage SSML in your skill.

In this blog post, we will specifically focus on using the SSML audio tag in a Cortana Skill to play audio as part of the speech response. This tag can be used in a variety of skill response scenarios – like playing existing audio snippets, responding to user request using music, playing music for the user as a progress indicator while performing a long running operation, etc.

Let’s begin!

(Go through Step 1 to 3 from our first blog post if you have never built a skill before. Follow-on when you are finishing up Step 4 there.)

Change the switch-case in the Run method in run.csx file to:

switch (activity.GetActivityType())
            {
                case ActivityTypes.Message:
                    var client = new ConnectorClient(new Uri(activity.ServiceUrl));
                    var reply = activity.CreateReply();
                    reply.Text = "Hello World! Now playing a sample mp3 file using SSML.";
                    reply.Speak = @"<speak version=""1.0"" xml:lang=""en-US"">Hello World! Testing a sample mp3 file. <audio src=""https://www.sample-videos.com/audio/mp3/crowd-cheering.mp3""/> </speak>";
                    await client.Conversations.ReplyToActivityAsync(reply); 
                    break;
               case ActivityTypes.ContactRelationUpdate: 
               case ActivityTypes.Typing: 
               case ActivityTypes.DeleteUserData:
               case ActivityTypes.Ping: 
default: 
log.Error($"Unknown activity type ignored: {activity.GetActivityType()}"); 
break; 
}

You will see that we used the <speak> element to enclose our SSML markup. It is the root element of an SSML based response and all individual elements that you can use to compose the response (like the audio tag used here) need to be enclosed within the <speak> element.

For a full list of elements you can leverage, check out the documentation here.

In this example, we are using the <audio> element to specify the file we want to play in our response. You need to ensure that the audio file you want to play via SSML needs to be:

MP3 format (MPEG v2, Bit rate 48kbps, sample rate 16000 Hz)
Hosted on an Internet-accessible HTTPS endpoint.
Less than ninety (90) seconds in length

What to do when you have a longer audio stream? Good question and an apt topic for our next blog post. 🙂

For now, save the file and test your Cortana skill. Check out Step 4 & 5 from the first blog post on instructions on how to invoke your skill.

You should be able to hear the audio clip play out as part the spoken response from Cortana when you invoke the skill.

Now, let’s go ahead and try another SSML element to see how we can make Cortana pause once she says “Hello World!”. Edit the <speak> element to use the <break> element.

<speak version=""1.0"" xml:lang=""en-US"">Hello World! <break time=""500ms"" /> Testing a sample mp3 file. It works! <audio src=""http://www.sample-videos.com/audio/mp3/crowd-cheering.mp3""/> </speak>

Here, we are asking Cortana to pause for 500ms before saying the next word.

Save the file and test by invoking your skill on Cortana.

You can do much more using other elements like the <prosody>, <phoneme> and <say-as> elements. Refer to the documentation for usage details of these and other tags.

Please let us know below if you have any questions. We would also love to hear about specific topics for blogs that you would like to see and we would be happy to oblige!

Have a great day!

* This article was originally published here

Somfy’s First Zigbee3.0 Range of Connected Shades and Curtains

How to Automate Your Mornings

Use Your Smart Home Devices to Help You Work From Home

Build a Twitter-Based Home Automation System with a Raspberry Pi

Everything You Need to Do to Secure Your Raspberry Pi Home…

Shoppable Videos: Blurring the Lines Between Entertainment and E-Commerce

How to Use Music in Your Marketing Videos Legally

How to Use ChatGPT to Create Great Business Video Scripts: 20…

Insights From Nine Experts About How Advancing Tech Will Impact Video…

Should You Use Hubitat to Automate Your Smarthome?

Best smart home systems for a connected domicile

If You Plan on Installing a Ton of Smarthome Devices, Skip…

Preloved Quilt Blanket: Set of 2

Did you know the thrill of jet skiing traces back to…

فرصة عمل من المنزل المهتمة مرحبا #girl #model #music #travel #motivation…

02/01/24… Happy birthday! 🤭🎂#io #compleanno #cagliari #me #birthday #happy…

@NEWSMAX Is he gonna fly everywhere @JohnKerry @MarlonWayans @dbongino @elonmusk

Build a Web Connected Thermostat with a Raspberry Pi and and…

Sinopé TH1120RF programmable line-voltage thermostat review: A smarter way to control…

Best smart thermostat: Reviews and buying advice

Build a Web Connected Thermostat with a Raspberry Pi and and…

Keen Home Smart Vents: The Perfect Temperature In Any Room

Security Showdown: Smart Locks vs. “Dumb” Locks

Build an Intruder Detector with a Raspberry Pi

Create a Multiple-Camera, Motion Sensor-Controlled Surveillance System with a Raspberry Pi

SimpliSafe upgrades the DIY home security experience

Nest’s video doorbell is now shipping

Getting started: Playing short audio clips & composing speech responses in your Cortana Skill

LEAVE A REPLY Cancel reply

Latest article

Why Small Businesses Should Invest in Video Production and Video Marketing

How to check and track coronavirus symptoms on Android

What Is the Mirai Botnet, and How Can I Protect My Devices?

Preloved Quilt Blanket: Set of 2

Did you know the thrill of jet skiing traces back to...

Shoppable Videos: Blurring the Lines Between Entertainment and E-Commerce