Donald Knuth on multicore (2008): Andrew: Vendors of multicore processors have e...

mrich · on Dec 18, 2011

Knuth comes across as living in a bit of denial here... Sure the frequency speedups were much nicer, but faced with hardware limits that make this impossible, he should point out that the future lies in finding algorithms best suited to multicore. They surely deserve their own volume of TAOCP, and I don't see how this would be wasted research for the next 10+ years.

There are some amazing lock-free data structures/algorithms out there which should be taught in any CS curriculum.

Intel has been releasing some great tools to aid with multicore development since they realized years ago that this was the only way to get more performance.

http://software.intel.com/en-us/parallel/

sliverstorm · on Dec 18, 2011

Knuth comes across as living in a bit of denial here... Sure the frequency speedups were much nicer, but faced with hardware limits that make this impossible

It's easier to say "the hardware guys are dropping the ball" than to acknowledge silly things like "Physics".

Now, he could try to argue that hardware should be responsible for paralleling things under the covers, and that would be a bit more reasonable- though I would still have to disagree. CS folks are the ones supposed to be discovering algorithms; find them, and have the hardware guys stick them in once they are known.

marshray · on Dec 18, 2011

Yes, some of those lock-free structures are amazing.

However, if you read the fine print it turns out that they are quite often slower than lock-based exclusion on top of one of Knuth's structures. The atomic primitive operations still require cross-core communication which only gets more expensive with more cores.

Most applications simply don't parallelize perfectly. We're going to have to face the fact that not only is the free lunch over, the lunch we have to pay for isn't always as satisfying.

einhverfr · on Dec 18, 2011

I think Knuth here has hit the nail on the head (as usual). There are plenty of applications of parallelism but they don't affect everything.

I am going to add something to this. There are a lot of areas where existing performance on a single core is certainly good enough, and where optimizing for a single core provides the ideal combination of performance and maintainability. I have to think about some of the weird multi-row insert bugs I have run into on MySQL and think that they are thread concurrency problems, but with PostgreSQL (single threaded model) things just hum along well.

We are at a point hence where processors don't need to offer better performance at single tasks for the most part, and neither do software developers, and where those that do can be frequently split off into some sort of component model either with separate threads or separate processes.

At the same time when we look at server software, we are usually talking about parallel jobs, and so those of us who do technical work on server software in fact do see performance increases. I mean if I am running a browser on my box which connects to a web server on my box which uses an async API to hit the database server also on my box, these multiple cores come in very handy.

This being said, I do wonder if a better use of additional capacity at some point would be in increasing cache sizes and memory throughput rather than adding more cores.

ehsanu1 · on Dec 18, 2011

I have to think about some of the weird multi-row insert bugs I have run into on MySQL and think that they are thread concurrency problems, but with PostgreSQL (single threaded model) things just hum along well.

Surely that shows that there is an issue with multi-threaded programming being prone to bugs rather than parallelism in general? It seems to me like many of the people advocating parallelism as the future of software believe we have to invent better ways to manage that parallelism as well, current methods (particularly threading) being deficient. For example, see Guy Steele's work on Fortress. [1]

[1] http://www.infoq.com/presentations/Thinking-Parallel-Program...

einhverfr · on Dec 18, 2011

Of course it is not with parallelism in general. PostgreSQL is pretty good at the process model although this means that queries can't have portions executed in parallel (within the query). Multiple sessions with multiple queries can be executed in parallel with no problem however.

Note that the big open source projects (Gridsql and Postgres-XC) which get rid of this limitation currently do this through a different method without breaking single threaded process model.

So the question is how intimately connected portions that run in parallel should be.

shasta · on Dec 18, 2011

Multiprocessing doesn't help with TeX because compiling it is an inherently linear problem the way it was designed. One can imagine a layout language that sucks less (TeX is very powerful but one of the worst designed languages in existence) that can manage to work on rendering different parts of the document in parallel.

justincormack · on Dec 18, 2011

Indeed he says that solving the whole document layout optimization is too hard, so much of the rendering of paragraphs could be fairly parallel.

loup-vaillant · on Dec 18, 2011

> Surely, for example, multiple processors are no help to TeX.

But but but, The STEPS project at http://vpri.org did find a way ! Basically, each character is an object, and is placed relatively to the one just before it. Sure, you'd want to wait for the previous character before you place the next one, but you could easily deal with each paragraph separately…

I'd like to blame hardware vendors as well, but I think they did their best to provide Moore's law Free Ride to single threaded programming. The first one who don't would be driven out of market. And now we know that parallelism can be applied quite pervasively, it'd be our fault not to do it.

taeric · on Dec 18, 2011

Pretty sure you can't do what you are saying for some layout considerations. For example, how do you layout a paragraph if you don't know whether or not it will be split across a page boundary?

wmf · on Dec 18, 2011

It ends up being iterative. First you layout every paragraph in parallel, then layout the paragraphs on the page, then re-layout any paragraph that needs to be split, etc.

Some people are working on parallelizing HTML layout, so the same techniques should be applicable to TeX. http://parlab.eecs.berkeley.edu/research/80

gnosis · on Dec 18, 2011

I don't really see a problem here.

The entire document could be laid out as if the page length was infinite, and then -- depending on desired page length -- the page breaks could be inserted between the appropriate lines.

In any case, in a typical document the overwhelming majority of paragraphs won't be affected by page breaks. So their layout could be parallelized regardless of where the page breaks wind up occurring.

taeric · on Dec 19, 2011

When laying out a paragraph near a page boundary, you don't simply insert the page break wherever you want. In fact, depending on whether or not orphaned lines are desired, you may wind up adjusting the inter word spacing of several lines worth of characters.

That said, the other poster is likely correct that you can probably just do an optimistic run in parallel across all paragraphs, and then rerun any that are "dirty" from other considerations. In this regard, it is a lot like multiplying two large numbers. There are some parts that are pretty much only done sequentially, but some could likely be guessed or pinned down in parallel.

The question remains, how much benefit do you really get from this, over just speeding up the sequential operation? Especially in any layout that has figures or heaven help you columns. A single "dirty" find could likely render all of the parallel work worthless. Best example I could give would be a parallel "adder" adding 9999 to 1111. Since every single digit has a carry, the whole operation winds up being sequential anyway.