I had exactly this issue when trying to make my OpenSCAD port multithreaded in asm.js - the results of each worker took longer to serialise/deserialise than the time saved by doing them in parallel.
Unfortunately couldn't even use transferable objects, as the C++ objects in question were CGAL nef polys, and once copied across would be garbage. It would work though if you have objects you can allocate to a specific buffer (I since discovered called a 'bump allocator')
https://groups.google.com/forum/m/#!searchin/emscripten-disc...