Environment and Feature Detection
A warning, most of these extensions may have extreme security issues currently, they are prototypes after all. Use a separate browser instance and profile for this series.
If you already have a working install of WebCL and River Trail, you can skip this part. Do not assume that you have them just because you have the latest version of your browser, because they are not available yet in any browser without a special build or extension.
Preparation
The first step is to install OpenCL drivers for all your devices.
If you are running OS X (Snow Leopard or Lion) then all OpenCL drivers are already installed and you’re good to go.
If you are running Linux or Windows, then you might need to install some OpenCL drivers. If your CPU is supported by both the AMD and Intel SDKs then I recommend you to install both.
If you have an nVidia or AMD graphics card, you probably already have OpenCL drivers installed for them (they are included in the graphics drivers), but you should make sure they are the latest version.
Now we’re onto a few semi-optional tools that you probably want, but can avoid them if you want to,
Git is my version control system of choice, and you will probably want to check out the repositories of examples and on Github.
Coffeescript is a thin wrapper around Javascript that I happen to like, it has a bit more Pythonesque syntax and is really nice. A lot of the support libraries, and a few of the examples will be written in Coffeescript, and you might want to be able to recompile them.
Installing WebCL
Now we can start installing the WebCL prototypes.
If you are running Linux or Windows, then you need to first install a 32-bit version of Firefox 10 and make sure that you have installed Firebug into your new profile, then you can install the Nokia WebCL prototype.
If you are running OS X, then you need to install the Samsung WebCL prototype based on WebKit. It is a bit complicated since you need to compile it from scratch.
Just follow the included readme, after a while into the build, you might meet some compilation errors but they are easily fixable.
Installing River Trail
To allow River Trail code to be accelerated via OpenCL you need to install the River Trail extension.
On Windows or Linux it really does need the Intel OpenCL driver, the AMD OpenCL driver or the driver for your graphics card is not enough. But if your computer does not support the Intel OpenCL then you can still execute River Trail code using a normal Javascript engine.
On OS X, the built in OpenCL drivers are fine.
Detecting WebCL and River Trail
So, after installing all that, we need a simple way to check that it is working. The easiest way is to checkout https://github.com/JensNockert/tools-for-the-next-generation with git (or download an archive of the repository from Github).
Under “01 - Feature Detection” there are two html files, webcl.html and rivertrail.html that contain feature detection code. Try both to make sure that your setup works. If everything installed correctly, it will look something like this for River Trail,

And something like this for WebCL,

The code is not really that spectacular, but feel free to check out the source and see my horrible DOM manipulation code. (Hook me up with a pull request if you enjoy that kind of stuff)
About OpenCL
Make sure you have webcl.html open in a browser, and make a small note of the structure of the information.
The first level in the output, “Apple” in my screenshot is the OpenCL platform name and underneath all OpenCL devices corresponding to that platform (but a single piece of hardware can be devices under multiple platforms.)
In OpenCL there are two domains where code can execute, the host (in WebCL this is the browser) or on a device which is connected to a host. The code we run on the host we call the ‘Application’ and on the code on devices we call ‘Kernels’.
And as we will learn in future lessons, calling kernels is different from how we call normal functions from the host. Another important thing to note about kernels is that they are not written in Javascript but a high-performance variant of C.
Summary
To summarize on what you should install,
- Browser capable of WebCL
- Browser capable of accelerated River Trail
and make sure they work. The rest is mainly sugar that could help you reach that goal.
Notes
I will be using Firefox most of the time, but the example code that does not depend on a specific feature should be portable to most major platforms (Firefox, Chrome, Safari and Opera.)
Any specific feature dependencies will be noted in the corresponding article (and please point it out in the comments if it is not.)
Edits
- Updated for Firefox 10
Presenting Hydra
I just want to present Hydra, a small library I developed to enable applications to have a unified interface to the WebCL prototypes, even before the specification is ready.
I hope that within a year or so, that it will be useless, but for now it is pretty nifty. And allows me to use the same example code for Windows, Linux and OS X in my “Tools for the next generation of Web Applications” series.
In addition to providing a unified interface, it also fixes some small bugs in the different prototypes.
It is available on Github https://github.com/JensNockert/hydra under the Simplified BSD license, you’re essentially allowed to use it for whatever you want.
If you find any bugs or have a feature request; add an issue on Github or send me a tweet (@jensnockert).
Tools for the next generation of Web Applications: Introduction
I do not know how the web will evolve in the future, I don’t think that anybody knows how the web of 2020 will look like, or what applications will be popular then.
But regardless of the direction the web evolves, we will undoubtedly see more and more complex client-side applications being developed. And a lot of the applications that were traditionally native applications will probably migrate to the browser within this time frame.
The migrating applications might include everything from games to large simulations to image and video editing, and everything inbetween. Your imagination is hopefully the only limit to what you will be able to achieve.
Because betting on the web is one of the safest bets to take, it is simply the platform that is most accessible to people today, and the platform people care about the most.
The goal of this series of articles is to give you some insight into some techniques, frameworks and tools that might be useful to build this new generation of applications, or to allow you to improve your current applications.
The tools that I am most interested in are tools that enable a new class of applications that we earlier could not build in the browser without plugins, and we’ll primarily focus on the set of these are almost purely performance increasing.
For example,
- Faster Javascript engines
- Typed Arrays
- SIMD Intrinsics
- Workers
- River Trail
- WebCL
- WebGL (for computing, not graphics)
But to understand these new tools of the web, we need to understand the native libraries and features that power them.
The faster Javascript engines of the future, typed arrays and any SIMD intrinsics are designed to accelerate each thread of your applications. And do so by allowing us to utilize each processing core in a better way, and program ‘closer to the metal’.
River Trail and WebCL utilizes OpenCL to allow a piece of code to run on multiple processor cores, and in the case of WebCL, allow your code to run on graphics cards and even specialist OpenCL accelerators right from your browser.
If you haven’t heard of OpenCL, it is a framework for heterogenous computing designed by Khronos (who are also maintaining the OpenGL standard), and allows you to execute kernels written in a high-performance variant of C on just about any processor around. OpenCL supports everything from large clusters down to small embedded systems.
Single-threaded
Currently there are two engines that I enjoy to code for: the new Spidermonkey with type inference introduced in Firefox 9 which is really nice, and the V8 / Crankshaft engine used in Chrome. But the Javascript engines of the future has a lot more in store for us, and all of them are already picking up steam.
For example, Mozilla is currently working on Ionmonkey. Ionmonkey is a new whole-method JIT for Spidermonkey that hopefully brings some significant speedup for many types of code (and especially the type of code that we are interested in). It isn’t ready yet, but we can already see some benchmarks here and follow how it develops.
Internet Explorer 9 introduced the new Chakra engine, which has some interesting features that will probably migrate to other engines. For example, it compiles code on a separate thread, allowing it to load code faster and start executing it quicker. And I am convinced that Internet Explorer 10 will introduce features that will allow Internet Explorer to defend its position as the most widely used browser.
One of these features that will be included in the next version of Internet Explorer (but is already supported in all other major browsers) is one of the most significant API developments in high-performance Javascript during the last few years: typed arrays. Typed arrays behave in most respects like regular Javascript arrays, but they have a fixed type and length. This on one hand gives Javascript programmers a nice way to interact with binary data and on the other hand gives the Javascript engines a lot more opportunities for optimization.
The Google Chrome team also introduced NaCl (Native Client) the last year, which is a reasonably interesting proposition from a performance standpoint, since it allows you to replace some of your Javascript with native code. It seems like you should be able to implement an OpenCL to NaCl compiler, which could be very interesting. Unfortunatly since it uses binaries instead of code, it is very hard to inspect the scripts, unlike in Javascript.
The two other browser vendors, Webkit (Apple and friends) and Opera recently shipped new browser engines, and support all the engine-level features that we currently expect, and are very likely to stay competitive in the future.
But there are a lot of other features that are in the planning stage. A pet feature of mine, for example, are SIMD intrinsics. SIMD (Single-Instruction Multiple-Data) is a method for improving computational throughput in modern processors by performing the same operation in parallel on multiple pieces of data.
These SIMD intrinsics are very simple functions that essentially map down to a few simple SIMD assembly instructions. The Javascript engine would be aware of how these functions work, and generate special optimizations for them.
This is mainly an optimization, but it would also allow us to write more easily readable code when manipulating ‘strange types’ in Javascript, for example, long (64-bit) integers.
While there are currently not even any proposals of how these SIMD intrinsics should behave, there is still a high probability that we will see something along those lines in a future revision of the Javascript language.
This leads us to more complex parallelization features that introduce more than one thread of execution.
Multi-threaded
Workers are currently the only way of executing Javascript in parallel that is widely supported in current browsers, but they are not really designed for the task. They are designed to allow for background tasks, but are not really suitable for computation on their own.
But make sure that you do not forget about them, because they are a good fallback and can be a force multiplier when combined with more advanced features.
River Trail utilizes OpenCL to execute Javascript kernels on a multi-core CPU using a friendly API. I am quite convinced that it will be a popular choice in the future.
The most compelling feature of River Trail is that it is tightly integrated with the browser, and therefore allows for a lot more optimization than WebCL (or OpenCL) allows. Don’t be surprised if a future River Trail implementation outpaces WebCL significantly on short kernels where OpenCL imposes a too high communication overhead.
Another interesting thing is that River Trail can be combined with a lot of other performance increasing features in Javascript, for example SIMD intrinsics, which (like the SIMD features in OpenCL) could significantly increase performance and readability for certain kernels.
WebCL is essentially the big brother of River Trail, exposing the full OpenCL API to the web programmer, and allows you to use unmodified OpenCL kernels in your application. It is essentially the more flexible version of River Trail, and is designed to allow you to use any OpenCL accelerator in the system, including graphics processors and so on.
WebCL is also the API that we will be using the most throughout this series, mainly since the compilers are mature, and it is also the language and framework that I am most familiar with.
But River Trail has some interesting opportunities that we won’t see in WebCL since could be tighter integrated into the browser at a future point. For example could an implementation of River Trail significantly reduce the communication overhead required to run kernels on the CPU, which is currently quite significant in OpenCL.
WebCL currently has the advantage that the infrastructure is a bit more mature on the kernel side, on the Javascript side both technologies are noticably not ready.
WebGL on the other hand has the advantage that it is reasonably mature, and allows execution on just about every graphics card available.
But I generally wouldn’t recommend using it for computation though unless you have very specific requirements, since even the simplest tasks can easily turn extremely complex unless you’re very good at GLSL and WebGL. It is simply not designed for computation, only graphics.
Conclusion
There are many tools and frameworks already available in a pre-release form for us to play with, and the best way to get used to them is to actually use them. Just be be aware of the changing and non-final nature and use this time to your advantage, most developers won’t start using these tools until they are almost ready, and by then it is to late to influence their growth.
In addition to tools that ‘merely’ grant us faster performance, we have a lot of tools that simply allow us to do a lot of things that we could not do before, but those are interesting enough to get their own introductions when we meet them later in the series.
The next episode will be a shorter one, and contain instructions on how to set up our development environment on Windows or Linux. For example, installing different OpenCL drivers, WebCL plugins and River Trail.
Notes
There are currently at least three different implementations of WebCL,
Accelerating Javascript via SIMD
Mozilla has a bug relating to the lack of SIMD instructions in Javascript. What they want to do is essentially add an assembly language to the web, to allow for a performance increase on computationally intensive code.
This is technology that is directly competing with WebCL and NaCl, and is in many ways a very good one, it would provide many of the advantages of NaCl or WebCL with a different set of disadvantages.
But there are a few problems, it could in theory allow you to build scripts that only run on certain browsers on certain CPUs. Of course, WebGL and typed arrays already provides Javascript with most of this disadvantage, since typed arrays already expose the native endianness of the hardware and WebGL may require specific graphics hardware.
There are a few ways to allow the Javascript programmer high performance primitives for building applications, some better than others.
Raw Assembly - Very High Performance
You could simply let the programmer write a piece of code in assembly as a string, or as a separate script, linked to your Javascript. In the same way that you do it in C with many common compilers, allowing programmers full flexibility when writing code and allowing the programmer access to the lowest level possible.
Advantages
- Very powerful (all CPU-specific features exposed)
- Very fast
- Much code already available
- No one expects it to be portable
Disadvantages
- How do we know the code is safe?
- A language within a language, with very different semantics
- keeping track of register usage, restrict memory accesses
And while it would allow the programmer to interact with the CPU at the lowest level that also is a significant problem. Ensuring that native code is safe is hard, really hard, which would make any inline assembly proposal get shot down pretty quickly. Another thing that could shoot it down is that it is non-portable between the two major architectures, x86 and x86-64.
This would essentially be the same thing as NaCl.
Standard Intrinsics - High Performance
Let Javascript programmers use intrinsics identical or similar to those exposed by C compilers, these are also architecture specific, or specific to a family of CPUs (for example x86 and x86-64 sharing SSE intrinsics, or PowerPC and Power sharing Altivec/VMX), and generally map to a single or a few instructions.
They are often designed to allow the compiler to do register allocation and provide a friendlier API than the raw instructions. On Intel for example, they are three-operand instead of two-operand.
Advantages
- Very powerful (most CPU-specific features exposed)
- Very fast (with a good compiler and optimizer)
- Much code already available
- No one expects it to be portable
Disadvantages
- May need to add data types (m64… for Intel, float32x4_t… for ARM NEON)
- Possibly hard to restrict memory accesses
- Modifies the Javascript runtime
An API like this would allow programmers to take existing code, rip out the C and replace it with Javascript and the kernel written using intrinsics would run unmodified, allowing a quick speedup in many common algorithms without writing a lot of code.
And the programmer will never expect this code to be portable between processors or browsers, which allows us to remove support in the future, or change the implementation. But on the other hand, future browsers or uncommon processors might simply not work, or run unaccelerated Javascript instead, making the application unnecessarily slow.
On the other hand, with a good compiler and optimizer, it would allow Javascript developers to write application kernels that execute as fast as the hardware allows, which could make the approach quite interesting.
Javascript Specific Intrinsics - High Performance
Javascript specific intrinsics, these would still be CPU specific, and expose all or most functionality of the CPU, but with intrinsics optimized for security and use with Javascript. I guess they would only operate on memory (Typed Arrays), but they could also introduce special vector types.
Advantages
- Powerful (all features we need could be exposed)
- Very fast (with a good compiler and optimizer)
- Nicer syntax
- No one expects it to be portable
- Could be designed to allow for Javascript fallbacks
Disadvantages
- Hard to implement 64-bit integer support
- Possibly hard to restrict memory accesses
- Modifies the Javascript runtime
Such an API would have share most of the advantages and disadvantages with the standard intrinsics, but would trade the ability to use existing code for a nicer API or better performance depending on implementation.
Generic Intrinsics - Low Performance
Providing processor-independent intrinsics that can optionally be accelerated with a SIMD unit, would be the last type of implementation based on intrinsic functions. And it is a quite interesting sort of implementation.
Such an API could in theory be supported on all processors, and accelerate code even on processors without SIMD support (like the nVidia Tegra 2) or on architectures that people won’t write platform specific code for (like MIPS).
Advantages
- Portable
- Nice syntax
- Could still be much faster than `pure’ Javascript
- Would be a nice way to add 64-bit integers
Disadvantages
- Modifies the Javascript runtime
- Needs fallbacks on unsupported architectures
- Does not expose all capabilities
- Exposes non-accelerated capabilities
- Possibly very complicated
From a pure performance standpoint it probably is the worst design, it is also the design that would face the least resistance, since it is in theory portable. And would be useful even if it only covers a subset of the common instructions.
Floating-point operations would probably be the most useful subset, since they are probably the most common data type in most WebGL applications, games and simulations. They are also supported pretty evenly in all SIMD implementations we care about, which gives us a pretty good starting point on what would be important.
Even without trying to generate SIMD code, such an API could probably allow for pretty good acceleration, making it low-hanging fruit for browser developers, and for web developers. And combined with a good optimizing JIT it could probably bring performance that is quite close to writing raw assembly language.
Generic Vector/Matrix API - Very Low Flexibility
A Javascript vector/matrix API could expose most of the required floating point
functionality that we would get with SIMD, except that it would be much slower
for small vectors, making it a lot less useful for WebGL/Games etc.
Just exposing something like BLAS has some advantages, since programmers are
used to it, and it has very high-speed implementations on every platform with
any kind of support for floating-point. Also, if the system has coprocessors
with significant floating point capability (like a GPU), the chance that they
implement a fast BLAS is pretty high, which could be important for performance
on future embedded platforms like phones or tablets.
Advantages
- Good for scientific computing
- Easy to use
- Very optimized
- Safe
- Portable
Disadvantages
- No media instruction support
- Slow for small vectors/matrices
While BLAS would provide a high-performance API, I have a hard time seeing that
it would be useful in the domains that Javascript is used in, web applications.
There are probably few Javascript applications that solve large systems of
linear equations or do many large matrix-matrix multiplications, so while I
really like BLAS, I think it will only add complexity to the browsers for no
significant gain right now.
It also seems to just be a less flexible version of WebCL, which isn’t really what a SIMD API is supposed to compete with.
Conclusion
So there is definitely a point in doing a general API for graphics and floating point operations, it would also be useful for audio processing and so on, but for integer and media instructions, where there is a significant spread in implementations, I cannot really see how a generic API is going to work.
LLVM provides general instructions and types for SIMD, with target specific instructions in addition to that, so some sort of mix is definitely possible where there are a basic subset that is enabled on all CPUs and a larger set of media instructions and instructions that are hard to emulate quickly that are only available on specific targets.
For Aurora.js (my multimedia framework), I don’t think most of the floating point operations will be of much use, and most of a generic SIMD API would probably be concerned with floats, at least to begin with, but I am still interested on working on a possible proposal that would provide a few basic primitives that could really help some of the audio processing, and probably help graphics related tasks a lot.
But the most important thing is probably that it could accelerate graphics intensive applications quite a bit, for example when doing matrix and vector math on fixed-length vectors, something that I assume is pretty common in most WebGL applications and most games, but could probably be useful even on `normal’ web applications.
A wishlist for ES6
First, I am not a web designer, and I realize that my needs are not the same as everyone elses. But I am going to argue that there are only a few features that need to be added to Javascript in the next version (ES6 or Harmony).
- Typed Arrays
- An improved numerical library
- Continuations
And you will probably have your own list, but if you don’t agree with at least the first two, then you are probably wrong. They are really important, both in the browser and outside. The third is more of a personal preference, but I think it is probably the one change that would improve the language the most.
Typed Arrays
Khronos’ typed arrays are essentially just an optimization, they work just like arrays, but you can only store a specific type, and their size is specified when the object is created.
This gives a bit of extra performance for many kinds of applications. And in the future we will probably write a lot more applications that manipulate binary data in Javascript. In addition, the interface is pretty good and provides sugar for different sorts of type-conversion. It does not introduce any new syntax and most browsers support them to a certain degree already, so backwards compatibility isn’t a problem.
The only thing missing in the current specification is a way to determine the native endianness of the machine the it is executing on, and you therefore need a small method to do it for you.
Float64Arrays are not implemented in Safari, DataViews are not implemented in Safari, Firefox or Opera. Typed Arrays are not supported at all in IE before version 10.
Numerical Library
The numerical library in Javascript (the Math module) is not only exceptionally sparse, it isn’t defined what it should do either. Math.sin(x) could return 4 and still follow the specification.
This makes coding to the specification impossible, and useless, since only the name of the function is defined, and a few edge cases. Four is a valid approximation of the value of sine, sin(x) = cos(x) - 1 is another valid approximation of sine that is legal but not very good. Or sin(x) = 1 - e^x which quickly becomes an extremely bad approximation.
The proposal for ES6 isn’t any better, since it does not include specifications either. The correct solution to the problem is probably that all operations should be correctly rounded (as in the IEEE 754-2008 specification) but OpenCL (see Chapter 7) provides relative error bounds, and that would be acceptable solution to the specification problem as well.
There are also some significant functions in IEEE 754-2008 that are not included in the current ES6 proposal, the most significant is the Fused Multiply-Add which is quite slow to emulate in software.
OpenCL also provides a lot of useful functions that may be interesting to implement, especially for games and other applications that require geometry and colour operations. And if the new correctly rounded functions are too slow for embedded devices, or for specific applications, then adding native, fast, half (or single) versions of some of the functions could make sense. (see the OpenCL specification again)
This change would not really affect any applications, if anything, it would make the output of a Javascript application using the Math module a lot more predictable. It also does not introduce any new syntax, and should be easy to add.
Continuations
Adding continuations is more of a personal thing, I find them extremely useful in other languages, and I would find them extremely useful to handle the whole callback-inception that you usually get stuck in while coding Javascript.
It is a more controversial extension to the language since it actually involves syntax, but on the other hand, a reasonable Python-style implementation is already available in Spider Monkey (under the name generators), so any syntax changes is arguably already there.
Summary
I don’t really think that we need a lot of the syntax that people are proposing and while some are reasonable (blocks for example), I do not really feel that we really need them.
The same thing for classes and so on, we should be relatively restrictive about adding syntax, since it restricts our ability to extend syntax in the future. On the other hand, we can be relatively liberal when adding library functions that can be deprecated in later versions of the standard.