Synthesized Podcasts for your Zune and iPod using SAPI

Sign in to queue


  The guide takes you on your new speech-enabled adventure, as you'll learn to mix text with speech into a simple program that synthesizes XHTML transitional blog posts into the Wave format - for your iPod - using SAPI 5.3 and encodes them into the Windows Media Audio format - for your Zune - using Windows Media Encoder 9 Series API. In the end, you'll be able to make podcasts having just a standard plain text RSS 2.0 feed; plus, you'll be able to play them on at least two of the most popular media players available on the market.
Paul-Valentin Borza

Difficulty: Intermediate
Time Required: 6-10 hours
Cost: Free
Software: Microsoft .NET Framework 3.0 Redistributable Package Windows Media Encoder 9 Series Visual Studio Express C# or VB
Download: Source Code


You probably read technology news (e.g. Coding4Fun) daily... Coding4Fun does not deliver podcasts - yet - and many web sites are in a similar situation. Microsoft Anna, the new text-to-speech (TTS) voice in Windows Vista, sounds more human than previous Microsoft voices like Mary, Mike or Sam; as time goes by, voices will become more and more natural and you won't even be able to tell the difference between a synthesized and a real human voice. 
Wouldn't it be great to be able to convert blog posts into podcasts? You’ll even get to sync these synthesized podcasts with your Zune and iPod and listen to them wherever you are and whenever you want.


Please download the Microsoft .NET Framework 3.0 Redistributable Package and install it on your computer (there is no need to install the package on Windows Vista). .NET Framework 3.0 gives you the opportunity to take advantage of the latest Speech API (SAPI 5.3); however, only Windows Vista comes with Microsoft Anna, as the voice is built right into the OS. In case you're running a previous version of Windows, the program will use a previous version of the TTS engine and won't sound as clear and crisp as you would expect.
You'll also need the Windows Media Encoder 9 Series to encode a Wave to a Windows Media Audio.
I hope you are already using Visual Studio Express C# or VB since you’re on the Coding4Fun web site (you're ok with any choice, as the sample is available is both languages).

Devices and Audio Formats

Synthesized podcasts should work on at least these devices:

Now that you have everything up and ready, let's get started: it's Coding4Fun time!

Getting Started with the Speech Synthesizer

The Speech Synthesizer is initialized with a -1 Rate (values can range between -10 and +10), a 80 Volume (values can range between 0 and 100) and a Female Adult EN-US Voice (Microsoft Anna); please note that a voice will be selected regardless of installed voices. The SpeechSynthesizerRate and SpeechSynthesizerVolume etc. can be easily modified inside "SpeakRssPodcast.exe.config".

Visual C#

   41 // Synthesizer

   42 this._synthesizer = new SpeechSynthesizer();

   43 // Rate

   44 this._synthesizer.Rate = this._settings.SpeechSynthesizerRate;

   45 // Volume

   46 this._synthesizer.Volume = this._settings.SpeechSynthesizerVolume;

   47 // Voice

   48 this._synthesizer.SelectVoiceByHints(this._settings.SpeechSynthesizerVoiceGender,

   49     this._settings.SpeechSynthesizerVoiceAge, this._settings.SpeechSynthesizerVoiceAlternate,

   50     this._settings.SpeechSynthesizerVoiceCulture);

   51 // Speak Progress

   52 this._synthesizer.SpeakProgress += new EventHandler<SpeakProgressEventArgs>(_synthesizer_SpeakProgress);

Visual Basic

   38 '' Synthesizer

   39 Me._synthesizer = New SpeechSynthesizer()

   40 '' Rate

   41 Me._synthesizer.Rate = Me._settings.SpeechSynthesizerRate

   42 '' Volume

   43 Me._synthesizer.Volume = Me._settings.SpeechSynthesizerVolume

   44 '' Voice

   45 Me._synthesizer.SelectVoiceByHints(Me._settings.SpeechSynthesizerVoiceGender, _

   46     Me._settings.SpeechSynthesizerVoiceAge, Me._settings.SpeechSynthesizerVoiceAlternate, _

   47     Me._settings.SpeechSynthesizerVoiceCulture)

   48 '' Speak Progress

   49 AddHandler _synthesizer.SpeakProgress, AddressOf Me._synthesizer_SpeakProgress

Getting Started with the Windows Media Encoder

The Windows Media Encoder is initialized with a default source group that has a single audio source; the source uses the Windows Media Audio 8 for Dial-up Modem (32 Kbps) profile for encoding a voice-only Wave, which is enough to keep the audio quality high and the size small.

Visual C#

   54 // Encoder

   55 this._encoder = new WMEncoder();

   56 // Source Group Collection

   57 IWMEncSourceGroupCollection sourceGroupColl = this._encoder.SourceGroupCollection;

   58 // Source Group

   59 IWMEncSourceGroup sourceGroup = sourceGroupColl.Add(this._settings.WindowsMediaEncoderSourceGroup);

   60 // Source

   61 this._source = sourceGroup.AddSource(WMENC_SOURCE_TYPE.WMENC_AUDIO);

   62 // Profile

   63 foreach (IWMEncProfile profile in this._encoder.ProfileCollection)

   64 {

   65     //Console.WriteLine(profile.Name);

   66     if (profile.Name == this._settings.WindowsMediaEncoderProfile)

   67         sourceGroup.set_Profile(profile); // Use Profile in VB

   68 }

Visual Basic

   51 '' Encoder

   52 Me._encoder = New WMEncoder()

   53 '' Source Group Collection

   54 Dim sourceGroupColl As IWMEncSourceGroupCollection = Me._encoder.SourceGroupCollection

   55 '' Source Group

   56 Dim sourceGroup As IWMEncSourceGroup = sourceGroupColl.Add(Me._settings.WindowsMediaEncoderSourceGroup)

   57 '' Source

   58 Me._source = sourceGroup.AddSource(WMENC_SOURCE_TYPE.WMENC_AUDIO)

   59 '' Profile

   60 For Each profile As IWMEncProfile In Me._encoder.ProfileCollection

   61     'Console.WriteLine(profile.Name);

   62     If (profile.Name = Me._settings.WindowsMediaEncoderProfile) Then

   63         sourceGroup.Profile = profile '' Use set_Profile in C#

   64     End If

   65 Next

Removing XHTML Tags

When you write a post on your blog, you're probably also embedding tags like IMG, DIV and SPAN etc.; such tags should not be spoken as they don't contain relevant audio data. The following piece of code removes these tags from an XHTML document and returns the inner text. You could use regular expressions to achieve the same result as it's just a matter of choice - I like it this way because it's easier to understand.

Visual C#

  242 private string buildPlainTextFromXHTML(string xhtmlText)

  243 {

  244     try

  245     {

  246         XmlDocument xhtmlDoc = new XmlDocument();

  247         xhtmlDoc.LoadXml(@"<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' ''>" +

  248             "<html xmlns=''>" +

  249             "<head><title></title></head>" +

  250             "<body>" + xhtmlText + "</body>" +

  251             "</html>");

  252         return xhtmlDoc.InnerText;

  253     }

  254     catch (Exception ex)

  255     {

  256         return this._settings.SpeechPromptException;

  257     }

  258 }

Visual Basic

  224 Private Function buildPlainTextFromXHTML(ByVal xhtmlText As String) As String

  225     Try

  226         Dim xhtmlDoc As XmlDocument = New XmlDocument

  227         xhtmlDoc.LoadXml("<!DOCTYPE html PUBLIC '-//W3C//DTD XHTML 1.0 Transitional//EN' ''>" & _

  228             "<html xmlns=''>" & _

  229             "<head><title></title></head>" & _

  230             "<body>" & xhtmlText & "</body>" & _

  231             "</html>")

  232         Return xhtmlDoc.InnerText

  233     Catch ex As Exception

  234         Return Me._settings.SpeechPromptException

  235     End Try

  236 End Function

Building Prompts

What are prompts? A Prompt is an object that describes what the speech synthesizer should say (the text to be spoken), but also how it should say it (emphasis, rate and volume). For more details, please check the PromptBuilder class. The following function gets called every time a post is processed - my sample takes the latest three available posts.

Visual C#

  225 private Prompt buildItemPrompt(string itemTitle, DateTime itemPubDate, string itemDescription)

  226 {

  227     PromptBuilder pb = new PromptBuilder(this._settings.SpeechSynthesizerVoiceCulture);

  228     pb.StartParagraph(this._settings.SpeechSynthesizerVoiceCulture);

  229     pb.AppendText(String.Format(this._settings.SpeechPromptRssItem,

  230         itemTitle, itemPubDate.ToLongDateString(),

  231         buildPlainTextFromXHTML(itemDescription)), PromptEmphasis.Moderate);

  232     pb.EndParagraph();


  234     return new Prompt(pb);

  235 }

Visual Basic

  208 Private Function buildItemPrompt(ByVal itemTitle As String, ByVal itemPubDate As DateTime, ByVal itemDescription As String) As Prompt

  209     Dim pb As PromptBuilder = New PromptBuilder(Me._settings.SpeechSynthesizerVoiceCulture)

  210     pb.StartParagraph(Me._settings.SpeechSynthesizerVoiceCulture)

  211     pb.AppendText(String.Format(Me._settings.SpeechPromptRssItem, _

  212         itemTitle, itemPubDate.ToLongDateString(), _

  213         buildPlainTextFromXHTML(itemDescription)), PromptEmphasis.Moderate)

  214     pb.EndParagraph()


  216     Return New Prompt(pb)

  217 End Function

Speaking Wave Podcasts

I have explicitly told the speech synthesizer to change its output to a file; otherwise, the prompts would have been played to the default audio device (the speakers). It then speaks two prompts: the first prompt contains the name of the channel, and the second prompt contains the title, publish date and description of the post.

Visual C#

  126 try

  127 {

  128     // Start Speak

  129     this._synthesizer.SetOutputToWaveFile(waveFullPath);

  130     this._synthesizer.Speak(buildChannelPrompt(channelTitle));

  131     this._synthesizer.Speak(buildItemPrompt(itemTitle, itemPubDate, itemDescription));

  132     // Console

  133     Console.WriteLine();

  134     Console.WriteLine();

  135     // Stop Speak

  136     this._synthesizer.SetOutputToNull();

  137     if (this._settings.EncodeAsWindowsMediaAudio)

  138         encodeWave(waveFullPath);

  139 }

  140 catch (Exception ex)

  141 {

  142     // Console

  143     Console.ForegroundColor = ConsoleColor.Red;

  144     Console.WriteLine(String.Format(this._settings.ConsoleExceptionMessage, ex.Message));

  145     Console.ResetColor();

  146 }

Visual Basic

  117 Try

  118     '' Start Speak

  119     Me._synthesizer.SetOutputToWaveFile(waveFullPath)

  120     Me._synthesizer.Speak(buildChannelPrompt(channelTitle))

  121     Me._synthesizer.Speak(buildItemPrompt(itemTitle, itemPubDate, itemDescription))

  122     '' Console

  123     Console.WriteLine()

  124     Console.WriteLine()

  125     '' Stop Speak

  126     Me._synthesizer.SetOutputToNull()

  127     If Me._settings.EncodeAsWindowsMediaAudio Then

  128         encodeWave(waveFullPath)

  129     End If

  130 Catch ex As Exception

  131     '' Console

  132     Console.ForegroundColor = ConsoleColor.Red

  133     Console.WriteLine(String.Format(Me._settings.ConsoleExceptionMessage, ex.Message))

  134     Console.ResetColor()

  135 End Try

Encoding as Windows Media Audio Podcasts

Encoding from one format to the other is quite simple to do, as seen below. Be careful to always call Flush when the encoder has finished (stopped) encoding, so that you don't leave the newly converted file in an inconsistent state.

Visual C#

  154 private void encodeWave(string waveFileName)

  155 {

  156     try

  157     {

  158         this._source.SetInput(waveFileName, String.Empty, String.Empty);

  159         this._encoder.File.LocalFileName = String.Format(this._settings.WindowsMediaAudioFile, waveFileName);

  160         // Start Encode

  161         this._encoder.PrepareToEncode(true);

  162         this._encoder.Start();

  163         // Wait for the encoder to catch up

  164         while (this._encoder.RunState != WMENC_ENCODER_STATE.WMENC_ENCODER_STOPPED)

  165         { Console.WriteLine(this._encoder.Statistics.EncodingTime); }

  166         this._encoder.Flush();

  167     }

  168     catch (Exception ex)

  169     {

  170         // Console

  171         Console.ForegroundColor = ConsoleColor.Red;

  172         Console.WriteLine(String.Format(this._settings.ConsoleExceptionMessage, ex.Message));

  173         Console.ResetColor();

  174     }

  175 }

Visual Basic

  143 Private Sub encodeWave(ByVal waveFileName As String)

  144     Try

  145         Me._source.SetInput(waveFileName, String.Empty, String.Empty)

  146         Me._encoder.File.LocalFileName = String.Format(Me._settings.WindowsMediaAudioFile, waveFileName)

  147         '' Start Encode

  148         Me._encoder.PrepareToEncode(True)

  149         Me._encoder.Start()

  150         '' Wait for the encoder to catch up

  151         While (Me._encoder.RunState <> WMENC_ENCODER_STATE.WMENC_ENCODER_STOPPED)

  152               Console.WriteLine(Me._encoder.Statistics.EncodingTime) : End While

  153         Me._encoder.Flush()

  154     Catch ex As Exception

  155         '' Console

  156         Console.ForegroundColor = ConsoleColor.Red

  157         Console.WriteLine(String.Format(Me._settings.ConsoleExceptionMessage, ex.Message))

  158         Console.ResetColor()

  159     End Try

  160 End Sub

Works on iPod

I own a Black 30GB iPod (Video) and can confirm that the synthesized podcasts sound as expected on the device.

Synthesized Podcasts on iPod

However, I do not own a Zune and can not vouch that it will play these podcasts - it should work!


To illustrate how these synthesized podcasts sound, please listen to this Wave sample (.wav) or this Windows Media Audio (.wma) that were created using the Microsoft Windows Vista RSS, Windows Vista Editions: What's right for you?. You'll agree that it sounds pretty good... Why don't you give it a try? Change the RssFeedUri to target your blog and run the program; there you go, you have created your own podcasts without a microphone or a recording studio.


You have reached the end of this short guide that showed you how to enhance the world of standard plain text blogs; I hope you have enjoyed reading the article as much as I have enjoyed writing and coding the sample. Please use my Windows Live Messenger Id to talk with me in case you need further assistance. Thanks to the Microsoft Academic Program Team Romania for support.


What should you do to improve what I have already done?

  • Use the PromptBuilder.AppendAudio member to add extra flavor to the podcasts; more, you could alternate sounds (use melodies that were released under a Creative Commons license).
  • Use the Regex class to convert HTML posts to plain text.
  • Encode synthesized audio to the MP3 format; in addition, you should also add tags.


Paul-Valentin Borza is in its second year of study at the Babes-Bolyai University of Cluj-Napoca, Faculty of Mathematics and Computer Science. Since 2005, he is involved in the Microsoft Student Partners - Microsoft Academic Program Romania. He can be reached through his web site at

The Discussion

  • User profile image
    the spark and the pioneer

    The guide takes you on your new speech-enabled adventure, as you&#39;ll learn to mix text with speech

  • User profile image

    Great article! The second link however seems to point to a "Page Not Found" too.

    For a while I've been trying to convert SAPI 5.1 output directly to MP3 format and managed to get the LAME ACM codec loaded, but it appears to keep writing Wave files and not really encoding. By any chance, would you have any thoughts to offer?

    I was hoping to see your source code to see if I could use SAPI 5.3 to do the same thing, so I'd appreciate if you can point us to the source.

    Thanks and keep it up!

  • User profile image

    @Rukmal fixed source code link

Add Your 2 Cents