WPF Dartboard scoring application

The guide takes you on your new speech-enabled adventure, as you'll learn to mix text with speech into a simple program that synthesizes XHTML transitional blog posts into the Wave format - for your iPod - using SAPI 5.3 and encodes them into the Windows Media Audio format - for your Zune - using Windows Media Encoder 9 Series API. In the end, you'll be able to make podcasts having just a standard plain text RSS 2.0 feed; plus, you'll be able to play them on at least two of the most popular media players available on the market. | |
Difficulty: Intermediate
Time Required:
6-10 hours
Cost: Free
Software: Microsoft .NET Framework 3.0 Redistributable Package Windows Media Encoder 9 Series Visual Studio Express C# or VB
Hardware:
Download: Source Code
|
Introduction
You probably read technology news (e.g. Coding4Fun) daily... Coding4Fun does not deliver podcasts - yet - and many web sites are in a similar situation. Microsoft Anna, the new text-to-speech (TTS) voice in Windows Vista,
sounds more human than previous Microsoft voices like Mary, Mike or Sam; as time goes by, voices will become more and more natural and you won't even be able to tell the difference between a synthesized and a real human voice.
Wouldn't it be great to be able to convert blog posts into podcasts? You’ll even get to sync these synthesized podcasts with your Zune and iPod and listen to them wherever you are and whenever you want.
Prerequisites
Please download the
Microsoft .NET Framework 3.0 Redistributable Package and install it on your computer (there is no need to install the package on Windows Vista). .NET Framework 3.0 gives you the opportunity to take advantage of the latest Speech API (SAPI 5.3); however,
only Windows Vista comes with Microsoft Anna, as the voice is built right into the OS. In case you're running a previous version of Windows, the program will use a previous version of the TTS engine and won't sound as clear and crisp as you would expect.
You'll also need the
Windows Media Encoder 9 Series to encode a Wave to a Windows Media Audio.
I hope you are already using
Visual Studio Express
C# or VB since you’re on the
Coding4Fun web site (you're ok with any choice, as the sample is available is both languages).
Devices and Audio Formats
Synthesized podcasts should work on at least these devices:
Now that you have everything up and ready, let's get started: it's Coding4Fun time!
Getting Started with the Speech Synthesizer
The Speech Synthesizer is initialized with a -1 Rate (values can range between -10 and +10), a 80 Volume (values can range between 0 and 100) and a Female Adult EN-US Voice (Microsoft Anna); please note that a voice will be selected regardless of installed voices. The SpeechSynthesizerRate and SpeechSynthesizerVolume etc. can be easily modified inside "SpeakRssPodcast.exe.config".
Visual C#
41 // Synthesizer
42 this._synthesizer = new SpeechSynthesizer();
43 // Rate
44 this._synthesizer.Rate = this._settings.SpeechSynthesizerRate;
45 // Volume
46 this._synthesizer.Volume = this._settings.SpeechSynthesizerVolume;
47 // Voice
48 this._synthesizer.SelectVoiceByHints(this._settings.SpeechSynthesizerVoiceGender,
49 this._settings.SpeechSynthesizerVoiceAge, this._settings.SpeechSynthesizerVoiceAlternate,
50 this._settings.SpeechSynthesizerVoiceCulture);
51 // Speak Progress
52 this._synthesizer.SpeakProgress += new EventHandler<SpeakProgressEventArgs>(_synthesizer_SpeakProgress);
Visual Basic
38 '' Synthesizer
39 Me._synthesizer = New SpeechSynthesizer()
40 '' Rate
41 Me._synthesizer.Rate = Me._settings.SpeechSynthesizerRate
42 '' Volume
43 Me._synthesizer.Volume = Me._settings.SpeechSynthesizerVolume
44 '' Voice
45 Me._synthesizer.SelectVoiceByHints(Me._settings.SpeechSynthesizerVoiceGender, _
46 Me._settings.SpeechSynthesizerVoiceAge, Me._settings.SpeechSynthesizerVoiceAlternate, _
47 Me._settings.SpeechSynthesizerVoiceCulture)
48 '' Speak Progress
49 AddHandler _synthesizer.SpeakProgress, AddressOf Me._synthesizer_SpeakProgress
Getting Started with the Windows Media Encoder
The Windows Media Encoder is initialized with a default source group that has a single audio source; the source uses the Windows Media Audio 8 for Dial-up Modem (32 Kbps) profile for encoding a voice-only Wave, which is enough to keep the audio quality high and the size small.
Visual C#
54 // Encoder
55 this._encoder = new WMEncoder();
56 // Source Group Collection
57 IWMEncSourceGroupCollection sourceGroupColl = this._encoder.SourceGroupCollection;
58 // Source Group
59 IWMEncSourceGroup sourceGroup = sourceGroupColl.Add(this._settings.WindowsMediaEncoderSourceGroup);
60 // Source
61 this._source = sourceGroup.AddSource(WMENC_SOURCE_TYPE.WMENC_AUDIO);
62 // Profile
63 foreach (IWMEncProfile profile in this._encoder.ProfileCollection)
64 {
65 //Console.WriteLine(profile.Name);
66 if (profile.Name == this._settings.WindowsMediaEncoderProfile)
67 sourceGroup.set_Profile(profile); // Use Profile in VB
68 }
Visual Basic
51 '' Encoder
52 Me._encoder = New WMEncoder()
53 '' Source Group Collection
54 Dim sourceGroupColl As IWMEncSourceGroupCollection = Me._encoder.SourceGroupCollection
55 '' Source Group
56 Dim sourceGroup As IWMEncSourceGroup = sourceGroupColl.Add(Me._settings.WindowsMediaEncoderSourceGroup)
57 '' Source
58 Me._source = sourceGroup.AddSource(WMENC_SOURCE_TYPE.WMENC_AUDIO)
59 '' Profile
60 For Each profile As IWMEncProfile In Me._encoder.ProfileCollection
61 'Console.WriteLine(profile.Name);
62 If (profile.Name = Me._settings.WindowsMediaEncoderProfile) Then
63 sourceGroup.Profile = profile '' Use set_Profile in C#
64 End If
65 Next
Removing XHTML Tags
When you write a post on your blog, you're probably also embedding tags like IMG, DIV and SPAN etc.; such tags should not be spoken as they don't contain relevant audio data. The following piece of code removes these tags from an XHTML document and returns the inner text. You could use regular expressions to achieve the same result as it's just a matter of choice - I like it this way because it's easier to understand.
Visual C#
242 private string buildPlainTextFromXHTML(string xhtmlText)
243 {
244 try
245 {
246 XmlDocument xhtmlDoc = new XmlDocument();
247 xhtmlDoc.LoadXml(@"<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'>" +
248 "<html xmlns='http://www.w3.org/1999/xhtmlDoc'>" +
249 "<head><title></title></head>" +
250 "<body>" + xhtmlText + "</body>" +
251 "</html>");
252 return xhtmlDoc.InnerText;
253 }
254 catch (Exception ex)
255 {
256 return this._settings.SpeechPromptException;
257 }
258 }
Visual Basic
224 Private Function buildPlainTextFromXHTML(ByVal xhtmlText As String) As String
225 Try
226 Dim xhtmlDoc As XmlDocument = New XmlDocument
227 xhtmlDoc.LoadXml("<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'>" & _
228 "<html xmlns='http://www.w3.org/1999/xhtmlDoc'>" & _
229 "<head><title></title></head>" & _
230 "<body>" & xhtmlText & "</body>" & _
231 "</html>")
232 Return xhtmlDoc.InnerText
233 Catch ex As Exception
234 Return Me._settings.SpeechPromptException
235 End Try
236 End Function
Building Prompts
What are prompts? A Prompt is an object that describes what the speech synthesizer should say (the text to be spoken), but also how it should say it (emphasis, rate and volume). For more details, please check the PromptBuilder class. The following function gets called every time a post is processed - my sample takes the latest three available posts.
Visual C#
225 private Prompt buildItemPrompt(string itemTitle, DateTime itemPubDate, string itemDescription)
226 {
227 PromptBuilder pb = new PromptBuilder(this._settings.SpeechSynthesizerVoiceCulture);
228 pb.StartParagraph(this._settings.SpeechSynthesizerVoiceCulture);
229 pb.AppendText(String.Format(this._settings.SpeechPromptRssItem,
230 itemTitle, itemPubDate.ToLongDateString(),
231 buildPlainTextFromXHTML(itemDescription)), PromptEmphasis.Moderate);
232 pb.EndParagraph();
233
234 return new Prompt(pb);
235 }
Visual Basic
208 Private Function buildItemPrompt(ByVal itemTitle As String, ByVal itemPubDate As DateTime, ByVal itemDescription As String) As Prompt
209 Dim pb As PromptBuilder = New PromptBuilder(Me._settings.SpeechSynthesizerVoiceCulture)
210 pb.StartParagraph(Me._settings.SpeechSynthesizerVoiceCulture)
211 pb.AppendText(String.Format(Me._settings.SpeechPromptRssItem, _
212 itemTitle, itemPubDate.ToLongDateString(), _
213 buildPlainTextFromXHTML(itemDescription)), PromptEmphasis.Moderate)
214 pb.EndParagraph()
215
216 Return New Prompt(pb)
217 End Function
Speaking Wave Podcasts
I have explicitly told the speech synthesizer to change its output to a file; otherwise, the prompts would have been played to the default audio device (the speakers). It then speaks two prompts: the first prompt contains the name of the channel, and the second prompt contains the title, publish date and description of the post.
Visual C#
126 try
127 {
128 // Start Speak
129 this._synthesizer.SetOutputToWaveFile(waveFullPath);
130 this._synthesizer.Speak(buildChannelPrompt(channelTitle));
131 this._synthesizer.Speak(buildItemPrompt(itemTitle, itemPubDate, itemDescription));
132 // Console
133 Console.WriteLine();
134 Console.WriteLine();
135 // Stop Speak
136 this._synthesizer.SetOutputToNull();
137 if (this._settings.EncodeAsWindowsMediaAudio)
138 encodeWave(waveFullPath);
139 }
140 catch (Exception ex)
141 {
142 // Console
143 Console.ForegroundColor = ConsoleColor.Red;
144 Console.WriteLine(String.Format(this._settings.ConsoleExceptionMessage, ex.Message));
145 Console.ResetColor();
146 }
Visual Basic
117 Try
118 '' Start Speak
119 Me._synthesizer.SetOutputToWaveFile(waveFullPath)
120 Me._synthesizer.Speak(buildChannelPrompt(channelTitle))
121 Me._synthesizer.Speak(buildItemPrompt(itemTitle, itemPubDate, itemDescription))
122 '' Console
123 Console.WriteLine()
124 Console.WriteLine()
125 '' Stop Speak
126 Me._synthesizer.SetOutputToNull()
127 If Me._settings.EncodeAsWindowsMediaAudio Then
128 encodeWave(waveFullPath)
129 End If
130 Catch ex As Exception
131 '' Console
132 Console.ForegroundColor = ConsoleColor.Red
133 Console.WriteLine(String.Format(Me._settings.ConsoleExceptionMessage, ex.Message))
134 Console.ResetColor()
135 End Try
Encoding as Windows Media Audio Podcasts
Encoding from one format to the other is quite simple to do, as seen below. Be careful to always call Flush when the encoder has finished (stopped) encoding, so that you don't leave the newly converted file in an inconsistent state.
Visual C#
154 private void encodeWave(string waveFileName)
155 {
156 try
157 {
158 this._source.SetInput(waveFileName, String.Empty, String.Empty);
159 this._encoder.File.LocalFileName = String.Format(this._settings.WindowsMediaAudioFile, waveFileName);
160 // Start Encode
161 this._encoder.PrepareToEncode(true);
162 this._encoder.Start();
163 // Wait for the encoder to catch up
164 while (this._encoder.RunState != WMENC_ENCODER_STATE.WMENC_ENCODER_STOPPED)
165 { Console.WriteLine(this._encoder.Statistics.EncodingTime); }
166 this._encoder.Flush();
167 }
168 catch (Exception ex)
169 {
170 // Console
171 Console.ForegroundColor = ConsoleColor.Red;
172 Console.WriteLine(String.Format(this._settings.ConsoleExceptionMessage, ex.Message));
173 Console.ResetColor();
174 }
175 }
Visual Basic
143 Private Sub encodeWave(ByVal waveFileName As String)
144 Try
145 Me._source.SetInput(waveFileName, String.Empty, String.Empty)
146 Me._encoder.File.LocalFileName = String.Format(Me._settings.WindowsMediaAudioFile, waveFileName)
147 '' Start Encode
148 Me._encoder.PrepareToEncode(True)
149 Me._encoder.Start()
150 '' Wait for the encoder to catch up
151 While (Me._encoder.RunState <> WMENC_ENCODER_STATE.WMENC_ENCODER_STOPPED)
152 Console.WriteLine(Me._encoder.Statistics.EncodingTime) : End While
153 Me._encoder.Flush()
154 Catch ex As Exception
155 '' Console
156 Console.ForegroundColor = ConsoleColor.Red
157 Console.WriteLine(String.Format(Me._settings.ConsoleExceptionMessage, ex.Message))
158 Console.ResetColor()
159 End Try
160 End Sub
Works on iPod
I own a Black 30GB iPod (Video) and can confirm that the synthesized podcasts sound as expected on the device.
However, I do not own a Zune and can not vouch that it will play these podcasts - it should work!
Demo
To illustrate how these synthesized podcasts sound, please listen to this Wave sample (.wav) or this Windows Media Audio (.wma) that were created using the Microsoft Windows Vista RSS, Windows Vista Editions: What's right for you?. You'll agree that it sounds pretty good... Why don't you give it a try? Change the RssFeedUri to target your blog and run the program; there you go, you have created your own podcasts without a microphone or a recording studio.
Conclusion
You have reached the end of this short guide that showed you how to enhance the world of standard plain text blogs; I hope you have enjoyed reading the article as much as I have enjoyed writing and coding the sample. Please use my Windows Live Messenger Id windowslive@borza.ro to talk with me in case you need further assistance. Thanks to the Microsoft Academic Program Team Romania for support.
Improvements
What should you do to improve what I have already done?
Bio
Paul-Valentin Borza is in its second year of study at the Babes-Bolyai University of Cluj-Napoca, Faculty of Mathematics and Computer Science. Since 2005, he is involved in the Microsoft Student Partners - Microsoft Academic Program Romania. He can be reached through his web site at www.borza.ro.
The guide takes you on your new speech-enabled adventure, as you'll learn to mix text with speech
Great article! The second link however seems to point to a "Page Not Found" too.
For a while I've been trying to convert SAPI 5.1 output directly to MP3 format and managed to get the LAME ACM codec loaded, but it appears to keep writing Wave files and not really encoding. By any chance, would you have any thoughts to offer?
I was hoping to see your source code to see if I could use SAPI 5.3 to do the same thing, so I'd appreciate if you can point us to the source.
Thanks and keep it up!
@Rukmal fixed source code link