The Itanium processor, part 2: Instruction encoding, templates, and stops

faragon · on July 28, 2015

VLIW processors with dynamic configuration of the opcode instruction group has a lot of potential in the future, in my opinion. E.g. 128 or 256 bit "opcode groups" with variable-size opcodes, using their own temporary registers, even custom programable opcodes for ad-hoc opcode compression, etc.

The failure of the Itanium because of cheap x86 chips doing "macro-op" execution, more efficiently than the Itanium explicit VLIW, is the other side of the same coin: no matter if explicit or implicit, the way to increase single-thread IPC (instruction per clock) rate is doing more in parallel and hiding execution latency. So, explicit, or as hidden abstraction, VLIW-esque execution seems to be unavoidable.

WallWextra · on July 28, 2015

Parallel, speculative execution is what's unavoidable. The VLIW idea of statically compiler-scheduled parallelism and speculation has not been successful for general-purpose code. The last generation of Itanium abandoned explicit, statically-scheduled parallelism and switched to out of order execution, like every fast general-purpose CPU uses.

anon4 · on July 28, 2015

> has not been successful for general-purpose code

Let me interject that it has also not been successful for AMD's line of videocards - they used a VLIW architecture before transitioning to a SIMD architecture codenamed Graphics Core Next or GCN in 2011. nVidia had been using SIMD for quite a while before that, but some of their latest cards have a few VLIW features like the option to explicitly specify parallelism between instructions.

WallWextra · on July 28, 2015

Apparently they switched because VLIW wasn't good for GPGPU workloads, which is in keeping with the general theme, that VLIW works very well on certain highly-specific DSP-ish workloads.

smcl · on July 28, 2015

Is the Itanium bundling described in the article a pretty common pattern in cpu architecture? The only CPU architectures I knew fairly intimately (Analog Devices Blackfin and SHARC - neither particularly mainstream) had a limited version of this and, at least in Blackfin's case, was pretty key to achieving reasonably performant code.

willvarfar · on July 28, 2015

It's fairly common in newer ISAs.

For a very modern take, [shameless plug] the Mill CPU instruction encoding talk is very entertaining: http://millcomputing.com/topic/instruction-encoding/

(Mill team)

pslam · on July 28, 2015

It's also very common in older ISAs, especially DSPs. As pointed out elsewhere, it's common in VLIWs (which is generally "older" ISAs).

A good example is the Motorola 56K DSP family. Instructions are 24 bit wide (24 bit wide memory!). They come in a few formats, which can be broken down into two major types: complex instruction, or simple instruction plus up to 2 memory moves and address updates.

It's basically designed around performing a single-cycle 24x24->56b multiple-accumulate with two memory reads and address increments, with zero loop overhead. The rest of the ISA really, really suffers for it :) But it's impressive when it's doing exactly what it was designed to do! It's a great example of a domain-specific ISA.

Also worth noting that in 56K, there is no score-boarding or other dependency tracking to automatically insert pipeline bubbles. That's the job of the hand-writer/assembler/compiler. If you get it wrong, it simply does something undefined. It's "fun" when this happen.

CountSessine · on July 28, 2015

Yes - this seems to be a common design in DSPs. TI's c64x series is similar - 'instructions' are 32 bits wide and 8 of them are bonded together and issued simultaneously for a mammoth 256 bit instruction 'packet'.

WallWextra · on July 28, 2015

What are some examples besides Mill? To be honest, I am totally ignorant of what new ISAs are out there. The only things I can think of are AArch64 and RISC-V (not commercially relevant, but well-publicised) and they are both pretty standard.

protomyth · on July 28, 2015

No, its a VLIW thing.

ajross · on July 28, 2015

I'm not sure I'd even go that far. Traditionally the idea behind VLIW is that the instruction format matches the superscalar pipeline setup, so in each cycle you explicitly issue an instruction to each of the available ports in the architecture.

IA64 had wide "bundles" for superscalar issue, but it explicitly avoided exposing the individual instructions as "ports" and instead had a more flexible setup where, really, the bundles contained "just a bunch of independent instructions". There architecture assumed they'd all be issued in parallel (so there were rules about dependencies between instructions in a bundle), but made no promises about whether they actually could be, or even whether or not more than one bundle could be issued simultaneously.

hollerith · on July 28, 2015

Itanium is the first processor architecture in whose design patent lawyers were heavily involved. (I try to spend my personal learning time on tech that isn't patented.)

acallan · on July 28, 2015

Honest question: like what technology? Is there an area of technology that is relatively patent-free?

hollerith · on July 28, 2015

No one talks about patents on programming-language features. So, there is an example for you. In the 1990s or early 2000s, the designers of a programming language named Curl out of MIT tried to protect their design with patents, but they were a very small minority among PL designers (at least at that time and since that time), and I don't think their PL has any remaining users.

Because of the nature of the patent system, it is impossible to state definitively that none of the currently-popular PLs currently have any patent encumberances, but we can say that none of the project leaders or main contributors to the design of any of the popular PLs have been accused of pursuing patents on their creations. Nor does anyone AFAIK complain or warn about any popular PL's being patent-encumbered.

Although most designers of instruction-set architectures that have seen significant economic use have pursued patents on this or that feature or technique, Itanium is AFAIK the only one where the patent lawyers were involved in the early stages of the design of the architecture.

kyberias · on July 28, 2015

This is the first time I see PL used as an acronym for programming language.

ceequof · on Aug 5, 2015

It's used quite often among PL nerds: http://lambda-the-ultimate.org/