Andrew: Vendors of multicore processors have expressed frustration at the difficulty of moving developers to this model. As a former professor, what thoughts do you have on this transition and how to make it happen? Is it a question of proper tools, such as better native support for concurrency in languages, or of execution frameworks? Or are there other solutions?
Donald: I don’t want to duck your question entirely. I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multithreading idea turns out to be a flop, worse than the "Itanium" approach that was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write.
Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. Surely, for example, multiple processors are no help to TeX.[1]
How many programmers do you know who are enthusiastic about these promised machines of the future? I hear almost nothing but grief from software people, although the hardware folks in our department assure me that I’m wrong.
I know that important applications for parallelism exist—rendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc. But all these applications require dedicated code and special-purpose techniques, which will need to be changed substantially every few years.
Even if I knew enough about such methods to write about them in TAOCP, my time would be largely wasted, because soon there would be little reason for anybody to read those parts. (Similarly, when I prepare the third edition of Volume 3 I plan to rip out much of the material about how to sort on magnetic tapes. That stuff was once one of the hottest topics in the whole software field, but now it largely wastes paper when the book is printed.)
The machine I use today has dual processors. I get to use them both only when I’m running two independent jobs at the same time; that’s nice, but it happens only a few minutes every week. If I had four processors, or eight, or more, I still wouldn’t be any better off, considering the kind of work I do—even though I’m using my computer almost every day during most of the day. So why should I be so happy about the future that hardware vendors promise? They think a magic bullet will come along to make multicores speed up my kind of work; I think it’s a pipe dream. (No—that’s the wrong metaphor! "Pipelines" actually work for me, but threads don’t. Maybe the word I want is "bubble.")
From the opposite point of view, I do grant that web browsing probably will get better with multicores. I’ve been talking about my technical work, however, not recreation. I also admit that I haven’t got many bright ideas about what I wish hardware designers would provide instead of multicores, now that they’ve begun to hit a wall with respect to sequential computation. (But my MMIX design contains several ideas that would substantially improve the current performance of the kinds of programs that concern me most—at the cost of incompatibility with legacy x86 programs.)
Knuth comes across as living in a bit of denial here... Sure the frequency speedups were much nicer, but faced with hardware limits that make this impossible, he should point out that the future lies in finding algorithms best suited to multicore. They surely deserve their own volume of TAOCP, and I don't see how this would be wasted research for the next 10+ years.
There are some amazing lock-free data structures/algorithms out there which should be taught in any CS curriculum.
Intel has been releasing some great tools to aid with multicore development since they realized years ago that this was the only way to get more performance.
Knuth comes across as living in a bit of denial here... Sure the frequency speedups were much nicer, but faced with hardware limits that make this impossible
It's easier to say "the hardware guys are dropping the ball" than to acknowledge silly things like "Physics".
Now, he could try to argue that hardware should be responsible for paralleling things under the covers, and that would be a bit more reasonable- though I would still have to disagree. CS folks are the ones supposed to be discovering algorithms; find them, and have the hardware guys stick them in once they are known.
Yes, some of those lock-free structures are amazing.
However, if you read the fine print it turns out that they are quite often slower than lock-based exclusion on top of one of Knuth's structures. The atomic primitive operations still require cross-core communication which only gets more expensive with more cores.
Most applications simply don't parallelize perfectly. We're going to have to face the fact that not only is the free lunch over, the lunch we have to pay for isn't always as satisfying.
I think Knuth here has hit the nail on the head (as usual). There are plenty of applications of parallelism but they don't affect everything.
I am going to add something to this. There are a lot of areas where existing performance on a single core is certainly good enough, and where optimizing for a single core provides the ideal combination of performance and maintainability. I have to think about some of the weird multi-row insert bugs I have run into on MySQL and think that they are thread concurrency problems, but with PostgreSQL (single threaded model) things just hum along well.
We are at a point hence where processors don't need to offer better performance at single tasks for the most part, and neither do software developers, and where those that do can be frequently split off into some sort of component model either with separate threads or separate processes.
At the same time when we look at server software, we are usually talking about parallel jobs, and so those of us who do technical work on server software in fact do see performance increases. I mean if I am running a browser on my box which connects to a web server on my box which uses an async API to hit the database server also on my box, these multiple cores come in very handy.
This being said, I do wonder if a better use of additional capacity at some point would be in increasing cache sizes and memory throughput rather than adding more cores.
I have to think about some of the weird multi-row insert bugs I have run into on MySQL and think that they are thread concurrency problems, but with PostgreSQL (single threaded model) things just hum along well.
Surely that shows that there is an issue with multi-threaded programming being prone to bugs rather than parallelism in general? It seems to me like many of the people advocating parallelism as the future of software believe we have to invent better ways to manage that parallelism as well, current methods (particularly threading) being deficient. For example, see Guy Steele's work on Fortress. [1]
Of course it is not with parallelism in general. PostgreSQL is pretty good at the process model although this means that queries can't have portions executed in parallel (within the query). Multiple sessions with multiple queries can be executed in parallel with no problem however.
Note that the big open source projects (Gridsql and Postgres-XC) which get rid of this limitation currently do this through a different method without breaking single threaded process model.
So the question is how intimately connected portions that run in parallel should be.
Multiprocessing doesn't help with TeX because compiling it is an inherently linear problem the way it was designed. One can imagine a layout language that sucks less (TeX is very powerful but one of the worst designed languages in existence) that can manage to work on rendering different parts of the document in parallel.
> Surely, for example, multiple processors are no help to TeX.
But but but, The STEPS project at http://vpri.org did find a way ! Basically, each character is an object, and is placed relatively to the one just before it. Sure, you'd want to wait for the previous character before you place the next one, but you could easily deal with each paragraph separately…
I'd like to blame hardware vendors as well, but I think they did their best to provide Moore's law Free Ride to single threaded programming. The first one who don't would be driven out of market. And now we know that parallelism can be applied quite pervasively, it'd be our fault not to do it.
Pretty sure you can't do what you are saying for some layout considerations. For example, how do you layout a paragraph if you don't know whether or not it will be split across a page boundary?
It ends up being iterative. First you layout every paragraph in parallel, then layout the paragraphs on the page, then re-layout any paragraph that needs to be split, etc.
The entire document could be laid out as if the page length was infinite, and then -- depending on desired page length -- the page breaks could be inserted between the appropriate lines.
In any case, in a typical document the overwhelming majority of paragraphs won't be affected by page breaks. So their layout could be parallelized regardless of where the page breaks wind up occurring.
When laying out a paragraph near a page boundary, you don't simply insert the page break wherever you want. In fact, depending on whether or not orphaned lines are desired, you may wind up adjusting the inter word spacing of several lines worth of characters.
That said, the other poster is likely correct that you can probably just do an optimistic run in parallel across all paragraphs, and then rerun any that are "dirty" from other considerations. In this regard, it is a lot like multiplying two large numbers. There are some parts that are pretty much only done sequentially, but some could likely be guessed or pinned down in parallel.
The question remains, how much benefit do you really get from this, over just speeding up the sequential operation? Especially in any layout that has figures or heaven help you columns. A single "dirty" find could likely render all of the parallel work worthless. Best example I could give would be a parallel "adder" adding 9999 to 1111. Since every single digit has a carry, the whole operation winds up being sequential anyway.
Andrew: Vendors of multicore processors have expressed frustration at the difficulty of moving developers to this model. As a former professor, what thoughts do you have on this transition and how to make it happen? Is it a question of proper tools, such as better native support for concurrency in languages, or of execution frameworks? Or are there other solutions?
Donald: I don’t want to duck your question entirely. I might as well flame a bit about my personal unhappiness with the current trend toward multicore architecture. To me, it looks more or less like the hardware designers have run out of ideas, and that they’re trying to pass the blame for the future demise of Moore’s Law to the software writers by giving us machines that work faster only on a few key benchmarks! I won’t be surprised at all if the whole multithreading idea turns out to be a flop, worse than the "Itanium" approach that was supposed to be so terrific—until it turned out that the wished-for compilers were basically impossible to write.
Let me put it this way: During the past 50 years, I’ve written well over a thousand programs, many of which have substantial size. I can’t think of even five of those programs that would have been enhanced noticeably by parallelism or multithreading. Surely, for example, multiple processors are no help to TeX.[1]
How many programmers do you know who are enthusiastic about these promised machines of the future? I hear almost nothing but grief from software people, although the hardware folks in our department assure me that I’m wrong.
I know that important applications for parallelism exist—rendering graphics, breaking codes, scanning images, simulating physical and biological processes, etc. But all these applications require dedicated code and special-purpose techniques, which will need to be changed substantially every few years.
Even if I knew enough about such methods to write about them in TAOCP, my time would be largely wasted, because soon there would be little reason for anybody to read those parts. (Similarly, when I prepare the third edition of Volume 3 I plan to rip out much of the material about how to sort on magnetic tapes. That stuff was once one of the hottest topics in the whole software field, but now it largely wastes paper when the book is printed.)
The machine I use today has dual processors. I get to use them both only when I’m running two independent jobs at the same time; that’s nice, but it happens only a few minutes every week. If I had four processors, or eight, or more, I still wouldn’t be any better off, considering the kind of work I do—even though I’m using my computer almost every day during most of the day. So why should I be so happy about the future that hardware vendors promise? They think a magic bullet will come along to make multicores speed up my kind of work; I think it’s a pipe dream. (No—that’s the wrong metaphor! "Pipelines" actually work for me, but threads don’t. Maybe the word I want is "bubble.")
From the opposite point of view, I do grant that web browsing probably will get better with multicores. I’ve been talking about my technical work, however, not recreation. I also admit that I haven’t got many bright ideas about what I wish hardware designers would provide instead of multicores, now that they’ve begun to hit a wall with respect to sequential computation. (But my MMIX design contains several ideas that would substantially improve the current performance of the kinds of programs that concern me most—at the cost of incompatibility with legacy x86 programs.)
Source: http://www.informit.com/articles/article.aspx?p=1193856