Cheerp – A C/C++ compiler for web applications

pierre-renaux · on Aug 12, 2014

If anyone from Cheerp reads this, I use emscripten (https://github.com/kripken/emscripten) at the moment, is Cheerp any better ?

Specifically does it compile faster, generates a smaller final .js output, or produce "faster" code ?

flohofwoe · on Aug 12, 2014

I'm not from Cheerp, but it looks like the main difference is this (taken from their front page):

Dynamic memory management. C++ objects are translated directly to JS objects, without the proxy of an emulated, flat memory space. Allow your applications to exploit the JavaScript VM garbage collector and co-exist with fair, on-demand memory allocation.

I don't think that's a good thing though, since the reason why emscripten compiled code is fast is because it has a flat memory space and doesn't create expensive JS objects.

Only advantage IMHO is that you don't need to allocate a big chunk of memory ahead of time (although I think emscripten still allows a growable heap as an option, but with a performance penalty).

dochtman · on Aug 12, 2014

The Emscripten author wrote a blog post when Cheerp (then called duetto) was first announced:

http://mozakai.blogspot.co.uk/2013/11/c-to-javascript-emscri...

azakai · on Aug 12, 2014

As far as I know that comparison is still up to date, but if the Cheerp devs are here maybe they can comment.

I may do a followup post now, since Cheerp is at 1.0. Seems like a good time to do some benchmarking.

kolodny · on Aug 12, 2014

I just ran the HelloClient.cpp from their example page [1] and the compiled file was 1.7 Mb

[1] http://www.leaningtech.com/cheerp/examples/

mx267 · on Aug 12, 2014

Even if js files are usually delivered compressd by gzip (which is very effective on unminimized js files), we are aware of the issue and we are working on reducing output size.

An integrated code minimizer is in the pipeline for release 1.1, and for the medium term we are working on an optimizer which cuts away unneeded libc++ initialization, which is preventing the dead code eliminator from pruning the non-useful libc++ code (which is a great part of that 1.7Mb).

So, expect big improvements in this area!

Disclaimer: I am one of Leaning Technologies founders.

neuromancer2701 · on Aug 12, 2014

From my understanding it also allows you to write your front and back end in one application and language. Then it compiles the front end to JS and the backend to C++ executable. I think this is where a large advantage can be seen. edit: it automates the connection between the front and backends tying the two together.

adamnemecek · on Aug 12, 2014

Reminds me of Wt http://www.webtoolkit.eu/wt

jbarrow · on Aug 12, 2014

This is somewhat off-topic but they turned user-scalability off, making it impossible to zoom out (of the pre-zoomed in) mobile site.

Other than that, it's an interesting proposition and I look forward to trying it out.

mx267 · on Aug 12, 2014

Thanks for the report, we have fixed that issue today.

ksikka · on Aug 12, 2014

"Cheerp compiles to native binary code backend, JavaScript frontend, and automatically generates RPC communication code"

The RPC feature sounds like it could be immensely useful for day-to-day web development. But the C/C++ is a deal-breaker for most web developers. Are there any projects that only do the RPC autogeneration stuff?

Also can frontend/backend code be shared (if there are no dependencies to the browser/unix system)?

mx267 · on Aug 12, 2014

Yes, code without client- or server-specific dependencies can be shared.

Kip9000 · on Aug 12, 2014

I wonder what advantages choosing C++ to write web apps would bring. At the end of the day it's all JavaScript so performance is rather moot point.

humanrebar · on Aug 12, 2014

> At the end of the day it's all JavaScript so performance is rather moot point

That's probably true, but C++ does provide very powerful compile-time evaluation mechanisms like templates, constexpr (compile-time evaluated constants and functions), and template metaprogramming. That being said, compile-time C++ can look very gross since most of the features are accidental.

However, there are a lot of huge performance benefits to moving computation to compile time: selecting the most optimal algorithm for a type, guaranteeing you don't mismatch your units, precompiling search (glob, regex) engines, eliminating dead code, and so on.

So I would be shocked if at least some javascript applications couldn't be much faster and more correct by writing in C++ first and compiling to js. That being said, it's probably not in scope for a minimum-viable product, so improving already business-critical parts of the code base might be the best place to start.

pkolaczk · on Aug 12, 2014

Metaprogramming and type-safety facilities of C++ are very limited compared to other languages like Haskell or Scala. And for the latter there is a JS compiler as well.

flohofwoe · on Aug 12, 2014

For me it's portability first, decent performance second and third the 'seamless distribution model' of the web (no installation, no app shops, no code signing, just an URL).

I wouldn't write a web-only app in C++, but if the same code needs to run on other platforms as well (e.g. native mobile, desktop or gaming consoles) then C/C++ is basically the only choice. Also, at least in emscripten, compiled code is faster then hand-written JS for several reasons (asm.js, LLVM optimizer passes, no gc), so it makes sense to write stuff like physics/pathfinding/AI engines in C/C++ and compile it to JS.

Plus, it's fun to write a desktop OpenGL demo, flip a build system switch and run it in the browser ;)

swah · on Aug 12, 2014

> it's all JavaScript so performance is rather moot point

If that was the case, http://asmjs.org/ wouldn't be a thing.

oso2k · on Aug 12, 2014

According to the Chrome guys...it isn't a thing [0][1], and I tend to agree. They've kept up, and often exceeded FF, IMO, for real code by making a better JIT compiler. They still haven't implemented asm.js and yet their "naive" approach to asm.js style code is on par.

[0] https://code.google.com/p/v8/issues/detail?id=2599#c53

[1] http://mrale.ph/blog/2013/03/28/why-asmjs-bothers-me.html

gcp · on Aug 12, 2014

The links you pasted show Firefox currently being 3x faster than Chrome when running asm.js code, so I'm not sure what your point is. V8 is good at generating fast code without the hints, but obviously the extra restrictions help the optimizer. Even Google admits this when they try to promote Dart.

asm.js does little for you if you write "real JS", but this thread is about cross-compiling other languages, which clearly has real world use.

swah · on Aug 13, 2014

I'm not a compiler expert but I also have the feeling that a new bytecode would be better.

But if I recall correctly, there are very good reasons why we don't have that in our browsers; security concerns maybe? Anyone?

Or http://hn.algolia.com, here I go again!

(What a great post, thanks for linking. Even though I follow mraleph on twitter, I had never opened his website. Probably because it all looks like scary compiler stuff.)

aboodman · on Aug 12, 2014

Another major benefit (of emscripten too) is that is you have significant non-presentation client-side logic, it can be shared between iOS, Android, and the Web.

For example take something like Dropbox's Carousel architecture, which shares the client side cache, date model, and sync client between iOS and Android via a C++ library. Something like Cheerp or Emscripten allows you to also use this same library on the web.

Touche · on Aug 12, 2014

That's actually not true. Emscripten compiles to code that is much faster than hand-written JavaScript. In this case they are not using Emscripten or a similar technique but are compiling to regular old JavaScript, so performance probably is the same, rather this might be for people who just like C++, like what GWT is for Java devs.

Kip9000 · on Aug 12, 2014

You misunderstood. People mostly write with C++ when performance is critical. Why? C++ offers deterministic memory management, Specialize to CPU instruction set and better use of CPU caches etc. Those performance benefits would not translate when you cross compile to something like JavaScript. I get the point about converting the legacy desktop apps to web apps, but how would it handle the architectural differences ( Desktop apps don't scale)

humanrebar · on Aug 12, 2014

> People mostly write with C++ when performance is critical.

Performance is a big reason people use C++, but there are many others. For example, people also write C++ when correctness is critical (bank software, safety-critical systems, etc.).

pkolaczk · on Aug 12, 2014

Writing for correctness in a language with no memory-safety, weak type system and, till not very long ago, no standardized memory-model, and UB in every other paragraph of the specs? Good joke. If they are really doing this, they have no idea what they are doing. There are plenty of languages better for safety-critical systems than C++; even widely hated, boring Java is much better.

humanrebar · on Aug 12, 2014

It's a bit off topic, but I'm being descriptive (based on professional experience) and not normative when I say C++ (and C and Ada) are used for safety-critical software. Javascript and functional languages certainly aren't.

The vagaries of the standard aren't issues since safety critical software is validated on a per-platform and per-compiler basis. Memory safety is certainly a concern during the development process but more importantly (!) GC languages (like javascript and Scala) do not have provably (for some value of provably) deterministic execution times. Determinism is also a problem for lazy languages (like Haskell).

I say that to say the zero-cost abstractions of C++ is a huge benefit when writing safety-critical software. The level of control over emitted binaries is essential and fairly rare in programming languages.

> If they are really doing this, they have no idea what they are doing.

Well, I can't speak for entire industries, but that's a pretty sweeping generalization. You might be surprised what the challenges are when writing safety-critical systems software. That being said, I'm sure large portions of the industry are ripe for innovation.

pkolaczk · on Aug 13, 2014

> GC languages (like javascript and Scala) do not have provably (for some value of provably) deterministic execution times

This is not unique to GC languages, but any languages with dynamic memory management. C++ new/free and STL abstractions built on top are not provably deterministic either. And if you program without ever touching dynamic memory (statically allocating and pooling everything) then GC is not a concern.

> I say that to say the zero-cost abstractions of C++ is a huge benefit when writing safety-critical software.

You're talking about performance now, not correctness. All those benefits get lost once translated to JS.

Kip9000 · on Aug 12, 2014

That was going to be my response..

pkolaczk · on Aug 12, 2014

I was replying to the statement: "For example, people also write C++ when correctness is critical". And I say if someone writes C++ in a correctness critical system, he must not know what he is doing. Even if you're using only STL, RAII and following all the good coding practices, there are still so many ways to screw things up, that C++ is among the worst choices in this regard.

s_baby · on Aug 12, 2014

Nasa and military don't know what they're doing? C++ is the go to language in mission critical systems.

pkolaczk · on Aug 12, 2014

NASA and military are using a whole lot of different technologies and languages, including, but not limited to assembly, C, C++, ADA, Haskell, Coq, Python and Java. Saying they use C++ for correctness, when they are using Coq or Haskell as well, is again - exaggeration. I believe they use C/C++ more for performance / low memory overhead rather than its correctness related features.

flohofwoe · on Aug 12, 2014

At least when compiled with emscripten, the same memory-access-related optimizations used in native code also apply to cross-compiled JS code since the memory layout is exactly the same. emscripten uses a big linear memory buffer (a single big JS typed array) as heap and dlmalloc for dynamic memory management within this heap. If your code is cache-friendly in the natively compiled version of the code it will also be cache friendly in emscripten compiled code. Cheerp seemt to use a different approach though and seems to generate 'traditional' high level JS objects (which also means it requires garbage collection, which emscripten generated code doesn't).

Touche · on Aug 12, 2014

You do get some of those performance benefits when you cross-compile. JS VMs have different paths for different types of code and the type of code that Emscripten produces goes through a faster path because more is known about the code ahead of time.

illumen · on Aug 12, 2014

You also get 1.4MB of code, which takes 1 hour to load on a mobile connection, and reimplements functionality that host JS/Browser provides which goes much quicker ;) Most C++ code does not care about compiled size(especially with templates!!!), but JS code often does, so is written in a smaller way.

flohofwoe · on Aug 12, 2014

True, you need to be sensible about compiled code size which traditionally wasn't important for C/C++ code, and it's also true that the C++ std lib or careless use oft templates can bloat the code size. On the other hand, three.js which is basically the standard lib for 3D rendering on the web comes at 0.5 MB minified, which isn't exactly small either ;)

aikah · on Aug 12, 2014

While true,if your "c++" hit the DOM , in the client, you'll have the same problems as hand-written JS.

virmundi · on Aug 12, 2014

Probably none. However, it could act as a gateway drug for C++ desktop folk. They do exist.

tylermac1 · on Aug 12, 2014

We do exist. The fables are true.

recentdarkness · on Aug 12, 2014

So it's like GWT for C++

shmerl · on Aug 12, 2014

A different idea, but still an interesting C++ Web development framework: http://cppcms.com

azakai · on Aug 12, 2014

I'm a bit puzzled by the rebranding. First of all, the logo and the name sound a bit similar to Twitter. And I'm not sure what bird noises have to do with a compiler?

The previous name, Duetto, made sense to me, since they compile to both client and server, so it's like two things running in harmony, which is a duet in music.

_kst_ · on Aug 12, 2014

I know C and C++ pretty well; I wonder what this "C/C++" language is like.

pmelendez · on Aug 12, 2014

The subset that is the interception of both? which would be a subset of C. (Although I am pretty sure that wasn't what the author had in mind)

oso2k · on Aug 12, 2014

Bjarne has always maintained that C++ is (mostly) backwards compatible with C. And that's what most people mean by C/C++.

_kst_ · on Aug 12, 2014

I know that C++ is mostly, but not entirely, backwards compatible with C. That doesn't answer the question of what "C/C++" means.

They're two different languages. The problem is that different people mean different things by "C/C++". Some people mean "C and C++"; others just seem to be unaware that they're distinct.

multimillion · on Aug 12, 2014

Cheerp, like clang (upon which it is based), can compile both C and C++ files.

keithnoizu · on Aug 12, 2014

Interesting although if you want massive performance on a web application i'd probably vie for erlang unless you're doing something computationally complex on the backend. It would be nice to be able to write code once and use in multiple places but theres often so much extra work involved in that, that for many applications it makes more sense to just use interfaces etc.

Neat project either way.

farresito · on Aug 12, 2014

If you want massive performance, you most likely want to avoid erlang; if you want massive scalability, erlang might very well be a great fit. I think it's quite different.

keithnoizu · on Aug 12, 2014

Performance in terms of request throughput you can serve per minute for some given set of CPU and RAM.

Erlang processing is slower than C by quite a bit but the reduced threading costs tends to make up for it when you are in a scenario where you need to serve multiple requests simultaneously.

On an apples to apples comparison well written C is going to beat erlang on this every time but once you start getting into threading, mutex locks etc. the equation starts to shift more towards Erlang favor _as long_ as each request you are serving is not highly computational in nature and would involve semaphores, and so on for thread management in c.