Technical Details

Challenges

Sky.fm streams are transferred over HTTP using Shoutcast protocol. WP7 has no support for Shoutcast streams. Fortunately, there’s a mechanism that allows applications to parse unsupported media containers, and feed the actual audio and/or video data to the Microsoft-implemented codecs, it’s called “media stream source”.

Implementing a media stream source for streaming audio is hard:

  • If you want reliable playback on real-life mobile internet, robust download buffer is a must, and you’ll get no help from the framework, you have to do it yourself.
  • To successfully parse MPEG audio stream, you have to parse frame headers. Given that there’re 3 versions of the standard, it’s not a trivial task.
  • To successfully initialize HE-AAC audio codecs, you have to provide some data structures only documented in the ISO/IEC-14496-3 standard that costs 240 USD to download, and require some understanding of very specific things like SBR and PS.
  • The API Microsoft offers to read from the HTTP response stream lucks timeout support. For reliable error handling, you must implement your own timeouts mechanism.

But the media stream source is only half of the task. To make the radio play in the background, you also have to implement two things called “audio player agent” and “audio streaming agent”.

  • Both your agents and media stream source live in the separate process. Microsoft added background agents in Windows Phone 7.1 “mango”. Before that, every WP7 applications had only one execution process. Unfortunately, they forget to add important parts of the .NET framework responsible for inter-process communication.
  • Lack of the error handling in the API. In most cases, if you’ll do something wrong, either the whole background process or one of your agents silently quits. If you’re lucky, you’ll have an exception like “COM error, code= E_FAIL” And sometimes, you’ll get no messages — it just doesn’t work.
  • Design errors in the API.
  • Bugs in the API.
  • Insufficient documentation. Most MSDN articles are about scheduled tasks, leaving you to guess which parts of those articles also relate to audio playback agents. Microsoft only has trivial samples like “synthesize a sine wave sound”.

And the following points apply to both media stream source and background agents:

  • Strict time constraints. If you’re not prompt enough, the OS will conclude you’ve failed, and close the stream/agent.
  • If the background process exceeds 15 MB in RAM usage, it’s silently killed.
  • When a WP7 device is plugged to a PC with a USB cable, the phone recognizes the PC as the Ethernet network card. What’s worse, the phone can only use this Ethernet connection for Internet access. You can’t debug your application while it uses GSM/3G or WiFi data connection.
  • In worst case (Lumia 510, Lumia 610) the device CPU is only 800 MHz. In all cases, unlike most normal applications the audio player can easily run for hours, so every CPU ticks suck valuable energy out of the battery.

What Helped

For debugging, I heavily relied on tracing. For that, I’ve implemented a SysLog (RFC 5426) client in my WP7 code, and installed free software called “SysRose Syslog Desktop” on my PC. This way, I’m able to read the diagnostic output of my app while it’s playing streams through WiFi. I gathered memory usage statistics using the same method.

Async CTP is extremely valuable. The code of both agents and stream source is highly asynchronous; however, it creates no threads besides those already in the CLR thread pool, it requires very little threads synchronization, and I encountered very few concurrency-related issues so far.

Country music as a symptom that the code works is rewarding.

What Didn’t

Open source libraries. They just don’t work.

After the playback failed, initially I tried to play the next stream using the same media stream source instance. Wrong. Unfortunately, when the two streams meets (even though it always happen on the frame boundary, and they’re of the same format), WP7’s audio codec produce ~1 second of loud noise which is unacceptable. The correct way – play the next URI using the new media stream source instance, i.e. with the new audio track. To accomplish that, media stream source requires cooperation from audio player agent — but at least they live in the same process.

Fun Facts

In January 2013, there was a memory management issue in the Microsoft's background audio player. Looks like when the OS was reading the audio data from the Stream object I've provided in the MediaStreamSource.ReportGetSampleCompleted method, they forgot to reuse the read buffer, and allocated a new one every time they read the audio sample data I've prepared for them. The evidence is here. However, they've silently fixed it in the subsequent Windows Phone OS update.

At the start of the stream playback, when listening to a premium channel, the remote server can, and will, send a lot of pre-buffered data if requested. The WP7 CPU is single-core. My own buffer is 512kb, however there's also an undocumented amount of buffering on the OS side. When the Internet is fast enough, quickly processing that amount of data overwhelms the CLR thread pool and/or Async CTP task scheduler so much, that the subsequent read callback has no time to execute. Because of that, in 30% of cases the read from the web response timed out within first few seconds of playback. It caused my audio player to skip to the next stream of the same station, which usually (in 70% cases) succeeded, but obviously the playback stopped for buffering. The single "await TaskEx.Yield();" statement before every Stream.ReadAsync(...) was enough to re-schedule the tasks, giving read callbacks time to execute, solving the problem.

Initial version January, 2013, last update April, 2014.