Re: COMP: Moore's Law

Eugene Leitl (
Thu, 10 Jun 1999 14:35:25 -0700 (PDT) writes:

> >physics itself. Let's shift the complexity load to the software,
> >where it belongs.
> Even with a factor of ten slowdown compared to hardware?

Mapping an algorithm into reecofigurable hardware is *faster* than doing it all-purpose hardware, unless you can have a silicon foundry within your computer which can churn out new dedicated ASICs at a MHz rate. With reconfigurable architectures, you swap out virtual circuitry, not code. In fact reconfigurable hardware allows the creation of very dynamic, hyperactive machines with unified data/code, the most efficient things theoretically possible. These things are very new, so even academia doesn't quite know how to tackle them yet. You certainly can't go Darwin in machina (the next thing after OOP) without them.

> No, I was pointing out that Intel have been able to keep putting out new,
> faster CPUs at a tremendous rate by reusing as much of their old design
> as possible, rather than trying to come up with brand-new architectures;
> when they try that -- with Merced, for example -- they fail. Whether the
> chips themselves are any good is a completely different matter.

Merced is a yet another 64 bit CPU. It is not particularly innovative, and it has been mostly designed by HP, not Intel.

> That's precisely my point; new technologies tend to take longer to build
> and cost more... and that's getting worse.

What makes it worse: a widespread low-risk attutude, gaining ground. Of course it is these global technology stalls, which allow dramatic revolutionary bursts. Essentially, we haven't seen a single revolution in computing yet since 1940s.

> >Oh, but there is. These gate delays add up after a while. One needs to
> >streamline/radically simplify the architecture to go to higher clock
> >rates. Structure shrink alone can't do it forever, you know.
> Yes, but there's still plenty of room in the 80x86 architecture, and they

The 80x86 architecture gets mostly emulated, these days. I'm really looking forward to what Transmeta is going to produce.

> could bolt on a 64-bit kludge just like the prior 32-bit kludge. The main
> aim of Merced and the Camino chip seems to be locking people into a
> proprietary Intel architecture so they can eliminate competition and boost
> profits, not any essential technical improvements.

Of course. The essence of Wintel's success. What I don't understand is why after all these years people are still buying it, hook and sinker.

> >There is no need to go dedicated. If there are hundreds or thousands
> >identical CPUs in each desktop there is sufficient horsepower to do
> >anything in software.
> Why pay for hundreds of expensive CPUs if you can do the same job with one
> CPU and nine dedicated support chips?

Why paying for one expensive, legacy-ballast CPU and invest in nine others hideously complex designs (possibly more complex than the CPU itself), each requiring individual resources on the fab if you could churn out ~500-1000 CPUs for roughly $500 production costs?

> >God, we can do embedded RAM now.
> But it's very hard to do anything useful with embedded RAM because any
> reasonable amount bloats the die size so much. My graphics card has 32MB

That's perhaps because people have a strange notion of what is a reasonable amount. You can implement a pretty frisky 32 bit CPU core plus networking in ~30 kTransistors, and I guess have semiquantitive die yield assuming 1 MBit grains. Since you can fit a nanokernel OS in 4..12 kBytes, especially if using threaded code (which requires a bi-stack architecture -- since the shallow stacks are part of the CPU there is context switch overhead). Of course few people are comfortable with a type of coding where subroutine calls contribute to >>20% of all instructions. Now assume programming in an asychronous message-passing OOP model with average object size of a few hundred bytes and hard memory grains of about a MBit, and see why seasoned programmers are having a problem with that.

I have no idea how that CPU architecture scales to 1 kBit bus, but I strongly guess it does it roughly linearly.

> of RAM; you're not going to fit that into a single chip with a graphics
> controller that already has close to ten million transistors and get any
> kind of affordable yield. Plus, of course, once you've built your chip
> with 32MB of RAM you can't then expand it without replacing the entire
> chip.

That's another reason people don't do it: because they operate in the context of unvoiced assumptions. Sony's design uses 4 MByte grains, which is hard at the edge of feasibility, imo. If I was going to build a rendering engine, I'd distribute it either by bitplanes or do a display mosaic. Engines look very differently if you simultaneously operate on an entire screen line, or do things the voxel way.

> >There is a lot of silicon
> >real estate out there on these 300 mm wafers. And quantitive yield can
> >do wonders to prices.
> You should really talk to the people who've tried WSI before making claims
> as to how wonderful it's going to be. The only company I know of who ever
> did it are Anamartic, and they had a hell of a time making it work; do they
> even exist anymore?

The processes allowing RAM/logic integration are brand new, and thus currently accessible only to major players. There is no way how a small company could go WSI and succeed, also consider that you couldn't sell such architectures. You could do emulate a legacy system on them, but it would be no faster or slower due to the intrinsically sequential nature of legacy systems.

The reason's why we don't have WSI yet are mostly not technical. It is because people don't want to learn.

> Mark