COMP: AI substrate

Eugene Leitl (
Sun, 25 Jul 1999 23:31:32 -0700 (PDT)

Little noticed by the mainstream, the hitherto special-purpose DSPs have matured into full computer systems on a chip. For instance the Analog Devices SHARC DSP family

has a new member with the following intriguing characteristics:

600 MFlops peak
4 (soon 8) MBit on-die RAM with 1.6 GByte/s bandwidth 6 Byte-wide links, capable of 600 MBytes/s total transfer 27 mm x 27 mm PBGA with 4 W maximal heat dissipation capability to boot from link
glueless integration into 3d DSP arrays by direct linking pricing $10 in 100k quantities (Real Soon Now)

Obviously, by simply glueing arrays of chips onto 500 mm x 500 mm
(say, perforated copper) support, connecting the next neighbours with
links, stacking 16 of such units (~200 mm), interconnecting the stack, we arrive at a 4 kCPU, 2 GByte RAM, 17 kW thermal dissipation, 2.5 TFlop peak, 6.5 TByte/s memory bandwidth and 2.5 TByte/s aggregate link bandwidth in a volume of 0.05 m^3, for the raw silicon price of ~50 k$, the price of a current server or a mid-range engineering workstation.

(17 kW in a volume thus small could be easily contained using a liquid
heat exchanger, such as a flurocarbon or a mineral oil).

As this is a 3d lattice, I can more or less (mostly for coolant plumbing issues) easily scale this to 20 such units for budget of 1 M$. We're talking ~100 k computational elements here. (I do not gloss over the reliability issue, the dead units can be easily managed by software if running nonbrittle (fault-resistant) codes.

Now, we can easily dismiss the above as ravings of a geek struck with arithmomania. However, I cannot help but wonder what such a mid-1999 technology (even barring progress in nanotechnology, we can assume Moore will hold up to ~2010.) 1 M$ machine could accomplish, _if properly programmed_.

Oh, we can't program it properly? But a 100 kCPU machine can shuffle opcodes damn quickly, 24 h/day, 365 d/year. Despite all the Kozas of this world, we cannot mutate assembler opcodes in a nonbrittle manner. However, nothing prevents us from making a robust mutator function our fitness goal. Find the code which mutates the machine code without breaking it too often. Feed back the best performing mutation function specimens back into the population searching for the best mutation function. (Did somebody just said "autocatalysis"?). Mix, stir, repeat until done.

Hey, can someone spare a coupla megabucks?