Way Out West Hack Battle

I went to the ‘Way Out West Hack Battle’ with Per Thulin from Youtify, and we got quite a few new nice features done for Youtify (see http://blog.youtify.com/2012/08/wowhack-lastfm-recommendations-and.html) including Last.fm scrobbling and Last.fm recommendations.

I really wanted to finish a method to let the Echo Nest do analysis of Youtube videos, but I couldn’t get it to work without doing it on the server. I’ll probably have to add a flash demuxing plugin to Aurora.js and remux the audio as MPEG1/MPEG4 (depending on the audio codec) before submitting it to the Echo Nest.

The event itself was nice, and even though they claimed that they had planned it all two weeks or so before, almost all essentials were covered including food, non-crashing wifi, ethernet for everyone, and a relatively quiet room for sleeping. A few people were complaining about the lack of a constant source of coffee, but I think they managed to survive. Everyone submitting a hack also got a festival pass for Way out West, which was a nice touch and almost everyone seemed excited about being able to see Kraftwerk play tomorrow.

The only thing that was a bit sad, was that there were no real presentation of the winners, it was just an informal thing, which means I don’t know who won. But if they get another chance next year, that would be an easy thing to improve.

If there is another Way Out West Hack Battle next year, and you have a chance to come to Sweden, take it, you won’t regret it.

Exploding dice

It is too warm to do any actual work in Sweden today, so I instead sat in my apartment, playing Classic Battletech with myself, and after a while; I thought that since I am an adult and can have any imaginary friends I like, that I wanted to incorporate some Mechwarrior (the accompanying role-playing game to Classic Battletech) characters to create a more interesting environment to lose time in.

One of the characteristics of the Mechwarrior game is that it uses 2d10 (two ten-sided dice) for almost anything, but with the special rule that any 10 is rolled again, adding to the total. For example, if I roll a 4 and a 7, my result is 11, but if I roll a 6 and a 10, I get to roll once more, maybe a result of 3, for a total of 19. You can continue rolling dice for a long while, in theory, an infinitely long while.

But what is the mean value of such a roll?

We’ll reduce the 2d10 case to a simple d10 (one ten-sided die), because the two dice are independent of each other. So, to calculate the mean, we just have to sum up each possible value, multiplied by its probability, simple.

$$\frac{1 + … + 9}{10} + \frac{11 + … + 19}{100} + \frac{21 + … + 29}{1000} + …$$

Except that sum is not really that simple, and not really that general, we can simplify it a bit more, each addend for example, if we are using \(n\)-sided dice, and we are in the \(i\)-th explosion.

$$\frac{k \times n \times (n - 1) + n(n 1) / 2}{n^(k + 1)} = \frac{(n - 1) \times (k + 1 / 2)}{n^k}$$

Then we can build the sum,

$$\sum_{k = 0}^{\infty}{\frac{(n - 1) \times (k + 1 / 2)}{n^k}}$$

and we can calculate some prefix sums for \(n\) = 10, to some arbitrary precision,

\(i\) \(\sum\)
0 4.50000
1 5.85000
2 6.07500
3 6.10650
4 6.11055
5 6.11105
6 6.11110

and we see that it seems to tend towards \(\frac{55}{9}\), or \(\frac{10}{9}\) times better than a normal die. In the general case, for \(n > 1\),

$$\frac{n(n + 1)}{2(n - 1)}$$

or \(\frac{n}{n - 1}\) times better than the regular, non-exploding die.

Comparing to D20

Another common system, with similar qualities is D20, which uses a single d20 dice, but with the special property that a 20 always succeeds. Let us compare the two systems, based on which TN (target-number) that you need to succeed.

In both systems, the lowest roll possible, always fails.

TN d20 2d10
2 95.0% 99.0%
3 90.0% 99.0%
4 85.0% 97.0%
5 80.0% 94.0%
6 75.0% 90.0%
7 70.0% 85.0%
8 65.0% 79.0%
9 60.0% 72.0%
10 55.0% 64.0%
11 50.0% 55.0%
12 45.0% 47.0%
13 40.0% 39.8%
14 35.0% 33.4%
15 30.0% 27.8%
16 25.0% 23.0%
17 20.0% 19.0%
18 15.0% 15.8%
19 10.0% 13.4%
20 5.0% 11.8%
21 5.0% 10.0%
22 5.0% 8.4%
23 5.0% 7.0%
24 5.0% 5.7%
25 5.0% 4.6%
26 5.0% 3.7%
27 5.0% 3.0%
28 5.0% 2.4%
29 5.0% 2.0%
30 5.0% 1.7%

The most interesting parts about the table, is around TN 13 where the automatic success for a single exploding die, which means that the probabilities go a bit funny there, something to watch out for when designing a game with exploding die. The same thing happens higher up in the table as well, but there it doesn’t make much difference.

Conclusion

In the end, if you want your rolls to be average more often, fail less often, but with a quite exciting system for heroic success, use the exploding dice system. If you want a system that is easer to calculate the probabilities on, use a single-die system with automatic success.

I calculated everything by hand, so be a bit wary about my calculations. I only did some quick checking with numerics the result ended up similar.

Exploding dice

It is too warm to do any actual work in Sweden today, so I instead sat in my apartment, playing Classic Battletech with myself, and after a while; I thought that since I am an adult and can have any imaginary friends I like, that I wanted to incorporate some Mechwarrior (the accompanying role-playing game to Classic Battletech) characters to create a more interesting environment to lose time in.

One of the characteristics of the Mechwarrior game is that it uses 2d10 (two ten-sided dice) for almost anything, but with the special rule that any 10 is rolled again, adding to the total. For example, if I roll a 4 and a 7, my result is 11, but if I roll a 6 and a 10, I get to roll once more, maybe a result of 3, for a total of 19. You can continue rolling dice for a long while, in theory, an infinitely long while.

But what is the mean of such a roll?

We’ll reduce the 2d10 case to a simple d10 (one ten-sided die), because the two dice are independent of each other. So, to calculate the mean, we just have to sum up each possible value, multiplied by its probability, simple.

$$\frac{1 + … + 9}{10} + \frac{11 + … + 19}{100} + \frac{21 + … + 29}{1000} + …$$

Except that sum is not really that simple, and not really that general, we can simplify it a bit more, each addend for example, if we are using \(n\)-sided dice, and we are in the \(i\)-th explosion.

$$\frac{k \times n \times (n - 1) + n(n 1) / 2}{n^(k + 1)} = \frac{(n - 1) \times (k + 1 / 2)}{n^k}$$

Then we can build the sum,

$$\sum_{k = 0}^{\Infinity}{\frac{(n - 1) \times (k + 1 / 2)}{n^k}}$$

and we can calculate some prefix sums for \(n\) = 10, to some arbitrary precision,

\(i\) \(\sum\)
0 4.50000
1 5.85000
2 6.07500
3 6.10650
4 6.11055
5 6.11105
6 6.11110

and we see that it seems to tend towards \(\frac{55}{9}\), or \(\frac{10}{9}\) times better than a normal die. In the general case, for \(n > 1\),

$$\frac{n(n + 1)}{2(n - 1)}$$

or \(\frac{n}{n - 1}\) times better than the regular

Modern Browsers

Today Google released a doodle, it was well executed, fun and I think that Robert Moog would have enjoyed it. But this isn’t about the doodle, it is about a small piece of text that Google shows just beneath it.

Upgrade to a modern browser and see what this doodle can really do.

This piece of text (and a link to the Google Chrome download page) shows up in all non-Chrome browsers, implying that they are not ‘modern’.

I have been guilty of using the term as well, once during feature sniffing, mostly to mean ‘not Internet Explorer older than X’ and I am sorry about that, especially since Microsoft seems to be taking development of Internet Explorer 10 very seriously.

But we should investigate what ‘modern’ means to Google, someone hinted that you need to support the non-standard Web Audio API to be detected as ‘modern’ so I booted up WebKit Nightly (built with the Web Audio API enabled) and went to Google and got same message.

Guessing that it was a lot simpler, I switched back to Firefox and changed the UA string to something that looked like Chrome and suddenly the message is gone, switch back to the default UA string and it shows up again.

Just to make sure that I didn’t do anything wrong, and because I was curious I investigated other browsers. Internet Explorer does get the message and the doodle, so does Opera. The iPhone and most silly UA strings do not get the doodle, and therefore does not get the message either. I don’t have any Android devices, but I assume that you would either get both the message and the doodle, or neither depending on if your device supports Flash.

In the end, the conclusion is that a ‘modern browser’ according to Google is a browser which sends ‘Chrome’ as its UA string and supports Flash or the Web Audio API.

Can we instead on production sites standardize on something like “this site requires (experimental) features not yet present in your browser” (Thanks @getify for the idea) and a link to instructions on how they can update their browser, or if it is a browser specific feature, information about the feature and why it isn’t yet supported in their browser of choice.

Note

If you are trying to reproduce results, make sure that you’re using google.com in English, the text about a ‘modern browser’ doesn’t show up otherwise.

Problems with a ‘pure Javascript’ implementation of H.264

I have written a lot of audio decoders in Javascript, and helped write a few more. I have never tackled video for a few reasons, and I’ll try to sum up why there will probably never be one implemented in ‘pure’ Javascript, and the methods with which I think it will be implemented instead.

Even the most high-end Audio codecs are also designed to work on really low-end DSP devices. ALAC (Apple Lossless Audio Codec) for example, decodes stereo fine in software on one of the 90 MHz ARM 7TDMI cores in the original iPod. AAC requires a bit more, but it is still within the reach of software on a relatively slow processor, like a Pentium or G3. A modern ARM processor can decode MP3 at a clock-speed of mere 10MHz, and with a bit more, AAC, which essentially is the most demanding codec that you’ll meet on the web.

Video codecs on the other hand are an entirely different story. The 2.4GHz Core 2 Duo in my laptop (a Macbook Pro) has serious problems decoding high-end (1080p Hi10P for example) H.264 in with FFmpeg. My desktop, a reasonably modern Xeon quad-core, handles these videos fine using FFmpeg, but with significant load. Note that this is with an implementation that is hand-optimized with assembly. To improve the situation, we cannot depend on hardware support either, because it is often out of date. No graphics card in my collection support this profile in hardware yet for example.

On top of these problems, there are some serious limitations in Javascript/ECMAScript that makes it a bad platform for video decoding. And while it is a very cool demo of emscripten, these are some of the reasons why I don’t think that Broadway.js will ever be able to decode H.264 in any sort of sane capacity using merely emscripten and some minor optimizations done by hand without a radically different Javascript engine to support it.

Floating-point

Essentially all operations in Javascript operate on floating-point numbers, and this is not likely to change in the future. For audio codecs, this is not really a problem since they tend to be designed in a way that you can implement them as both fixed-point and floating-point.

Video codecs on the other hand tend to rely a lot on fixed-point for optimization, H.264 is even optimized to avoid needing floating-point as much as possible. Even the discrete cosine transform and motion compensation in H.264 is modified to operate on fixed-point numbers instead of floating-point.

The reason for this is that modern processors can often process fixed-point operations much faster, especially the 8 and 16 bit operations that are the most common. These short integer instructions often have at least 4 times faster thoroughput than double precision floating-point. Certain complex instructions like division make the difference irrelevant, and in many cases require fallback to floating-point, but these operations are extremely uncommon in H.264.

SIMD

This is before the SIMD penalty is added for Javascript, because current Javascript engines utilize only scalar operations, a significant part of the execution hardware (1/2 to 1/4) spends most of the time idling.

Most decoders utilize these SIMD instructions, which gives them access to 8-16 times more throughput per core for simple operations. And on top of that, there are special instructions for optimizing MPEG codecs, giving a quite measurable speedup on top of that, which you are unlikely to be able to utilize without hand-optimized code.

Threads

To provide the final blow against current Javascript, there is very little possibility for shared memory multicore programming in a browser. Workers are not good enough to do this, I haven’t actually measured this as I do not plan to implement a video decoder with workers, but I think that the cost of communication and latency is currently too high for it to make sense.

Only using a single core on a processor that has 2-8 is another problem that would keep a Javascript implementation from ever competing with a native implementation.

Solution

There are two obvious solutions to all of these problems that are being prototyped on the web right now, WebCL and Rivertrail. Both of these are designed to solve the threading problem mainly, which is likely not the biggest issue, but it is still significant.

Rivertrail could solve most of these problems since it is currently based on Intel’s OpenCL runtime, which has good optimizers. It isn’t designed for this very specialized task, and while it does allow you to reduce precision, it doesn’t allow direct access to integer or media instructions, but it is a much better option than pure Javascript and with the addition of an integer API to Javascript, this could easily turn into the preferred method.

WebCL (OpenCL for the web) on the other hand, already solves most or all of these problems since it is essentially a massively parallel C with SIMD and device-specific extensions. It even allows for the GPU to pick up most of the burden, which is in many cases preferable to running on the CPU due to the extra computational power available.

There are probably other solutions as well, but just hoping that single-threaded Javascript with double precision floating-point will ever be enough is naïve and counter-productive in my opinion. Especially on mobile devices, which have special concerns, WebCL is in a great position to solve these too in the future.

And while I would love to be proved wrong about this, I don’t think I will be for a long while, and at that point, there will be more advanced codecs and higher resolutions around to target.

Music Hack Day Amsterdam

I think I might have one of the best jobs in the world, last weekend the Official.fm Labs team went to Music Hack Day: Amsterdam, and I must say, I had a most wonderful time.

If you have a chance to ever visit a MHD, you really should, the investment is only time and the rewards can be immense, for me, that reward was meeting some really nice people

I met some old and new friends, Per and Kalle from Youtify were there hacking and did some great progress adding profile pages and allowing you to follow other people’s playlists. If you haven’t been using Youtify for your Youtube music needs, you should try it, it is really nice.

The guys from Zvooq, who in addition to being awesome people, seem to have great ideas on how to improve the situation for artists and music consumers. I hope I’ll be able to test their Zvooq soon, but for now it seems to only be available in Russia and a few other countries.

And a lot of other people, I unfortunately didn’t get the name of most of them, but they were really nice and one of the most interesting things about Music Hack Day is that people of significantly different skill sets interact with each other. I learned a bit about music theory, and I think I gave some nice hints about what I know.

The location, Nederlands Instituut voor Mediakunst, was a fantastic venue, I didn’t see all of the exhibits, but they seem to be doing some serious hacking even when we’re not around so if you have the opportunity, visit them and take the tour.

The organizers, Roeland P. Landegent and his team, deserve credit as well. It was a well organized event, and all our hacker needs were fulfilled.

The only bad thing about the whole thing, I got stuck digging a deep rabbit-hole for myself, so I won’t release my work yet (I plan to convert it into an XPCOM component as soon as I get some time away from schoolwork.) But it is essentially an implementation of the ‘Simple Audio’ API that I wrote earlier, but now it is renamed Mio (澪) because ‘Simple Audio’ didn’t give the right vibe, the API isn’t designed to be simple, it is designed to be flexible and powerful.

The rest of the Official.fm Labs team were working on a sort of voice controller for games, where through pitch and intensity you controlled your character. They were having some problems with the short-time Fourier transform, but otherwise I think they got most of it nailed down now, they managed to get a demo working at least.

A final open question, Spotify, when is there going to be a Music Hack Day Sweden?

Tweeting about Computer Science education

I did send out a tweet today, and I realized only in retrospect that some people reacted very negatively on that. I am sorry about that, my intention was not to insult anyone, or their education.

This was the tweet, modified for formatting, the parenthesis contains a clarification that I also posted on Twitter.

Why do we educate computer scientists to get (obtain) developers?

We wouldn’t educate structural engineers to get (obtain) masons…

I did not intend to insult anyone, but I can see why it did and how I did it.

It was not an attack on computer science education at all, it was merely a comment on that a lot of people seem to think that you should (and need to) study computer science to become a developer.

Dijkstra once commented,

Computer science is no more about computers than astronomy is about telescopes.

And he summarized my intention in a way that is a bit less hostile.

What I meant was that computer science does not teach you how to code, how to write documentation, how to use source control, write good issues, etc. I am not good at a lot of these things, and I doubt that if I wrote a thesis in computer science and got a degree in CS, I would magically learn these skills.

No, you need to learn that somewhere else. And no amount of datastructures or natural language processing or theory about compilers and so on will ever make you good at these skills which I consider essential before I would ever consider myself a good software developer.

I am not even sure if I am good enough to call myself a developer; I am at most a hacker. I might be naïve, but I think that if I practice, I will get better. With practice I think I can get to the level that I can call myself a software developer without feeling that I have serious holes in my skillset.

Even if I magically learned what was required for every computer science course here in Lund, I think I would still feel that I have those same holes. I would definitely be a better hacker and I would certainly be a much better computer scientist, but I still would not be able to write awesome documentation, or write eloquent code.

But computer science lecturers teach you about computer science, and computer science is not only about computers and code, as Dijkstra said.

Computer science is a wonderful branch of science that has produced an immense amount of value, and do not even consider that learning things will make you a worse developer, especially not computer science. Learning things will always make you better, especially as a software developer, and learning computer science is awesome.

But what Dijkstra said about Computer Science is not true about software development. Software development is about computers and people.

For me, a software developer is someone that produces tools that turn people into better people, more productive people, happy people; they turn theory into actual working programs, they are people who generate value. And only to some extent is it through writing code, some developers design the structure, some write documentation, some test applications, and so on.

To some extent is it through writing code, some developers design the structure, some write documentation, some test applications, etc. And in a lot of these situations, a computer science educations can be really helpful.

A small part of those skills are picked up at university studying computer science (or anything else) and if I could pick up the rest while studying, I would really love to, but I do not think that the current system of education is good at teaching all of these skills that are necessary for being a good software developer.

I am sorry if I offended anyone, if you think I am wrong, please leave a comment.

Note

  1. Computer Science is a horrible name, in Swedish we sometimes call it ‘Datalogi’ which is a less horrible name.

Testing numerical accuracy of browsers

According to the standard, only the arithmetic operations in Javascript need to be correctly rounded, the functions in Math does not have any accuracy requirements.

But out in the real world, browsers are a bit better than that, we have a feeling that the functions in Math are reasonably accurate, but if you need to be convinced (like me) then you should look at https://github.com/JensNockert/accuracy.js which fuzz tests most of the operations in Math that have a tendency to be inaccurate.

If you want to be even more convinced, generate more test cases using generate.rb.

Ps. sin, cos and tan are missing, their periodicity makes them hard to fuzz using this technique.

Update

  1. I fuzzed on Windows as well, and Chrome on Windows does not provide sqrt with correct rounding, a bug has been filed. Firefox and Opera provide as much precision on Windows as on OS X.

Simple Audio

We have been discussing a lot of audio at Official.fm Labs, and since we’re working with audio in different ways and have different views on what should be a first step; I am throwing out a proposal for them, and for you.

In addition to this one, there are at least two more proposals (which are a lot less sketchy and have partial implementations) for real-time audio on the web, https://dvcs.w3.org/hg/audio/raw-file/tip/streams/StreamProcessing.html from Mozilla’s Robert O’Callahan and https://dvcs.w3.org/hg/audio/raw-file/tip/webaudio/specification.html from Google’s Chris Rogers.

Both proposals are designs based on graphs, containing various nodes. But their proposals are many pages long and I don’t have the time or energy for that, so I will try to show you that audio in the browser can be described on a napkin (using both sides).

My idea is that audio is not really that complicated, you do not need a significant amount of routing or specialist code in the browser because you can do all of that in Javascript to once you have a few of the basic building blocks, and then when there are performance issues or other limitations, you add the more advanced features.

A first implementation should provide a seed, not a forest. To do that, we need a way to read and/or write audio samples to a stream and receive events relating to that stream.

This is essentially what the Mozilla Audio Data API does, but with a more expressive API.

Note that I wrote this in a few hours, so nothing is fixed, especially not names. Also, as an OS X zealot, I have been inspired quite a bit by Core Audio and any similarities are probably not coincidence.

Usage Scenarios

  • Playing short sounds with low latency and accurate timing: Useful for games and similar applications where sounds react to user interaction.
  • Playing longer audio segments: Useful for example music players, or other streaming uses.
  • Capture audio from a microphone: Useful for all sorts of audio conferencing needs, for example something like Teamspeak or Skype in a browser.
  • Bypassing the lack of codec support in the HTML5 media elements: Useful for anyone with files in any format that is not supported in all browsers, like MP3, AAC, Vorbis, ALAC, FLAC, etc.

These are the usage scenarios that I am considering at first, and I think it covers 90% of the applications need audio (right now). Because if we look at what traditional applications that include audio are most popular, we notice that teleconference, games and music almost certainly will come up on top for almost every user.

And while I think the last 10% are absolutely awesome as well, (an HTML5 digital audio workstation for example) I think that they can wait a bit until we have the basics before we go on to the really mind-blowing stuff.

Features

  1. Writing audio to devices.
  2. Reading audio from devices (adding support for audio or video elements should be trivial).
  3. Accurate timing, relatively low latency.
  4. Events from the audio subsystem.
  5. Easy to implement.
  6. Designed for future extensibility instead of providing a kitchen sink now.

Accurate timing and low latency is very important for certain kinds of games, if you need to wait 100ms from Mario hitting the coin until the sound starts playing, players will be confused and the experience will be bad. For multimedia, the audio needs to be in sync with whatever other things are happening.

Events are required, all applications should be able to act correctly on hot-plugging events and so on. For example, if a user uses an application where you can call landlines, without a microphone, then plugging in a microphone should directly enable audio input without a reload.

The same thing should be supported if for example a USB headset is connected while on a music site, the site should be able to react to this, and play through the headset instead.

It should also be relatively easy to implement, and in the future extend for more advanced functionality.

API Overview

  1. An AudioContext, referring to the whole state of the audio subsystem of the browser.
  2. An AudioStream interface, representing a single audio stream, which can support both input and output.
  3. An AudioStreamDescription interface, a description of the data flowing in the stream.
  4. An AudioBuffer interface, contains data for a set of channels.
  5. An AudioTimeStamp interface, contains data detailing a specific point in time, relative to the clock driving a specific stream.

The Audio Context

The audio context is essentially singleton, you can create multiple contexts, but they just masquerade for the global state, the streams available should be the same in all contexts.

interface AudioContext {

readonly attribute AudioStream[] streams;

readonly attribute AudioStream defaultInputStream;

readonly attribute AudioStream defaultOutputStream;

}

Create a context via

var audio = new AudioContext()

Attributes

  • streams: An array of AudioStreams that are available for output or input.
  • defaultInputStream: An AudioStream representing the default input device, or null if there are no input devices.
  • defaultOutputStream: An AudioStream representing the default output device, or null if there are no output defices.

Events

  • NewStreamAvailable: Contains the new stream.
  • DefaultInputStreamChanged: Contains the new default input stream, and the old default input stream. It is triggered when for example, the user plugs in a microphone.
  • DefaultOutputStreamChanged: Contains the new default output stream, and the old default output stream. It is triggered when for example, the user plugs in a pair of headphones.

Discussion

To begin with, the only exposed streams would be the defaultInputStream and the defaultOutputStream, but more advanced applications like for example, a web-based digital audio workstation, could require additional streams to support a large amount of channels, or to provide for example DJ with two different outputs.

The AudioContext should be accessible from a worker, allowing audio processing to be done in a separate context from the rest of the application to provide latency sensitive applications with a more stable environment, less affected by garbage collection pauses.

There are a lot more things that would be interesting to send events about from an audio perspective, but which possibly should not be in the audio context, the first thing that pops to mind is an event when a device returns from deep sleep, to allow applications to prevent the accidental output when resuming for example a laptop.

In addition, there should be a method that allows you to create streams from media elements, allowing the programmer to post-process the audio in for example a video.

Audio Stream Description

An audio stream description is a description of the current, or in the future, the desired state of an audio stream and are designed to hold a lot of information that is useless in the normal case of uncompressed linear PCM.

interface AudioStream {

readonly attribute DOMString identifier;

readonly attribute double sampleRate;

readonly attribute DOMString[] channels;

readonly attribute short bitsPerChannel;

/* Only for formats with a fixed frame-size */

readonly attribute long bytesPerFrame;

/* PCM specific attributes */

readonly attributes DOMString sampleType;

readonly attributes DOMString endian;

readonly attributes DOMString aligned;

readonly attributes boollean interleaved

}

Attributes

  • identifier: Always ‘Linear PCM’ for now
  • sampleRate: The sampling rate in samples per second
  • channels: The canonical name of each channel
  • bitsPerChannel: The number of useful bits in each sample
  • bytesPerFrame: The number of bytes in each frame, including padding

PCM specific attributes

  • sampleType: ‘float’, ‘signed-integer’, ‘unsigned-integer’
  • endian: ‘big’, ‘little’
  • aligned: ‘packed’, ‘high’, ‘low’
  • interleaved: boolean

Notes

Different codecs need different attributes, and if an attribute does not make sense for a specific stream, then it should not include it in the description.

To begin with, there is only need for Linear PCM, since that is the format of almost all modern hardware. But in the future, more complex format descriptions would be required, especially to describe more complex formats that could be extracted from for example a media element.

Another thing that might need a change is the channel descriptions, some sort of location or something could be useful, so maybe they should be changed from strings to objects.

In some future, with bitstreaming of audio, or codecs exposed to Javascript, more complex features might be required from the stream description.

Audio Time Stamp

A time stamp object is simply represents a point in time, relative to the clock for a specific direction in a stream.

interface AudioTimeStamp {

readonly attribute AudioStream stream;

readonly attribute DOMString direction;

readonly attribute Date hostTime;

readonly attribute double sampleTime;

}

Attributes

  • stream: the audio stream that this time stamp is relative to
  • direction: ‘output’ or ‘input’, the direction of the stream that the timestamp is relative to
  • hostTime: a date, the time when the first sample will be played, or when the first sample was captured
  • sampleTime: Number of samples passed since the stream started (as a double, since a Javascript integer would only hold precision for 3 hours at 192kHz, a double on the other hand is fine for 1500 years).

Notes

Some additional time measurment systems could be included, like the relative time in seconds, etc. But it feels unnecessary or first implementation.

Audio Buffer

The simplest object, you do not create these yourself. They are passed to you when you need them.

interface AudioBuffer {

readonly attribute ArrayBuffer data;

readonly attribute AudioTimeStamp timeStamp;

readonly attribute long channels;

}

Attributes

  • data: An ArrayBuffer containing data, or into which you need to write data.
  • timeStamp: The time at which the buffer ‘starts’, or null.
  • channels: The number of channels interleaved in this buffer.

Notes

This is the construction I am least sure about, currently the channels could be inferred from the stream description, and the entire object could be replaced by a simple ArrayBuffer. I am not sure if there is any requirements for extensibility either, Google has some extra properties for these that while useful, could also be inferred from the stream description.

Audio Streams

An audio stream represents a stream of audio data to/from a device or media element. The simplest way to get a stream is to get the default ones,

interface AudioStreams {

readonly attribute AudioStreamDescription input;

readonly attribute AudioStreamDescription output;

}

Attributes

There are two interesting attributes on the stream,

  • input: The input format.
  • output: The output format.

for most streams, only one is not null.

Events

  • inputDescriptionUpdated: contains the stream, the new AudioStreamDescription.
  • outputDescriptionUpdated: contains the stream, the new AudioStreamDescription.
  • processAudio: contains the stream, an array of AudioBuffers to read from, and an array of AudioBuffers to write to.

Notes

There is a lot of room for improvements here, reconfiguring the stream is an obvious first step. Another thing is actually exposing the device that the stream belongs to, which could include a device name and so on, but there are possible privacy aspects concerning that.

It is possible that output and input streams should be split into two different interfaces, but both APIs are essentially equivalent for most purposes.

Garbage Collector

It can be important that the processAudio event is triggered just before garbage collection (depending on the length of the collection, and the buffer states) to allow the application to fill all buffers before the collection pause to minimize the risk for underflow.

If the application allocates significant amounts of memory during this callback, the garbage collector could trigger anyhow, but a careful programmer should be able to create pauseless playback in this manner.

Battery

Running on mobile devices is important, and the API can easily handle different devices and usage scenarios, when a low-powered device needs audio, it simply provides larger buffers to the applications, increasing latency, but to compensate, it does not need to power up the processor as often.

In addition, a low-powered device could provide streams with lower sampling rates, which would in some cases could reduce the amount of processing that would be required.

Security

I am not sure how you should request input device access from the user, and it is possibly out of scope of the API, but in a browser, the user needs to be asked for permission before any input device is activated or a massive privacy breach is bound to happen.

In addition, if additional information about audio streams were provided (like the audio hardware name) then it could be an information leak that when combined with other information, could uniquely identify a user.

Inside of something like Node.js, no additional permissions compared to a regular native applicaton would be required, so all of the API could be accessed by default.

Advantages

Compared to the Web Audio proposal

  • It is a lot simpler to understand, and work with for basic Javascript audio playback.
  • Supports input.

Compared to the Media Stream proposal

  • It is a lot simpler to understand, and work with for basic Javascript audio playback.
  • Allows Javascript to generate samples outside of a worker.

Disadvantages

  • It doesn’t support a lot of built-in effects for example.
  • It does not provide a way to interact with media elements (but this could easily be fixed with some extra work).
  • It is a lower-level API that might require libraries to provide a higher-level API, for example for constructing processing graphs or an API for playing short effects for games.

Notes

If you have any good ideas for names or otherwise, throw them in my direction on Twitter (@jensnockert), jens@aventine.se or here.

All events should be implemented with something like the DOM events and addEventListener, to make the system work as well together with normal web applications as possible.

Also, the API is not restricted to browsers outside of the interaction with media elements, a Node.js implementation should be possible and would be useful for certain desktop and server applications.

TBD 2012, Malmö, Sweden

I spent the weekend at the TBD 2012 Hackathon at Djäkne Kaffebar in Malmö.

It was a great event full of great people. And even though it was the first event in the series, it was a really well organized event in almost every way, the food was great, the location was good, the wifi was not too bad etc.

I really hope that it can become a regular event. From my perspective Malmö and Lund seems to be getting a lot hotter when it comes to software development, and these kinds of event are a really nice way to meet people and brings a lot of value for all developers in the region.

During the event I mostly worked on Youtify, a service for being able to consume music from the cloud in a more structured manner. I added an Official.fm backend for it, and fixed some small bugs.

Meeting the Youtify guys was a wonderful opportunity, both the service and the team are great, and it felt great to be able to help them a bit, and I am excited about being able to meet them again at Music Hack Day - Amsterdam.

During the time not spent coding I mostly talked to other people, and one of the most interesting conversations during the weekend was with filmmaker Simon Klose, who is currently finishing a documentary about the Pirate Bay called The Pirate Bay: Away From Keyboard, but at TBD he was instead trying to revolutionize the way documentaries are consumed.

The proof of concept that his group produced, called the ‘Linkontrol’ intended to improve the experience of the film by combining the film and hypermedia in a really slick way and was quite awesome to say the least. It used popcorn.js and you can read a bit about it at his blog.

A mere textual description does not really describe the concept, or demo, and if you meet any of them, do try to get a demo, it was quite exquisite, and took the audience award of course (they really deserved it.)

But there were two small things that were a bit odd, making the atmosphere a bit tense.

First, during the presentations, one team called ‘Tunafish’ forced us to take down the stream and ‘sign’ a verbal non-disclosure agreement since their idea was ‘secret’. The other thing that annoyed me was that the big prize was money.

I do not really mind people working on commercial projects on hackathon, I do not even mind if they just move their work there. But I kind of expect people that work on secret stuff to keep any secret details secret by simply not telling anyone about them instead of holding a presentation about it, and then force people to not tell others.

About the money, I wish they had just converted it to some kind of token or luxury that you usually do not buy. A nice bottle of wine, chocolate, medals, t-shirts, or maybe some sort of experience for the winning team. When you do something for money, it feels a lot like work, and I am not at a hackathoon to work, I am there to have a good time.

In the end, it was a great time, and I will most likely be there if it is repeated, but if those two small things got fixed, then it would be most awesome.