Little noticed by the mainstream, the hitherto special-purpose DSPs have matured into full computer systems on a chip. For instance the Analog Devices SHARC DSP family
http://products.analog.com/products_html/list_gen_98_2_1.html
has a new member with the following intriguing characteristics:
600 MFlops peak
4 (soon 8) MBit on-die RAM with 1.6 GByte/s bandwidth
6 Byte-wide links, capable of 600 MBytes/s total transfer
27 mm x 27 mm PBGA with 4 W maximal heat dissipation
capability to boot from link
glueless integration into 3d DSP arrays by direct linking
pricing $10 in 100k quantities (Real Soon Now)
Obviously, by simply glueing arrays of chips onto 500 mm x 500 mm
(say, perforated copper) support, connecting the next neighbours with
links, stacking 16 of such units (~200 mm), interconnecting the stack,
we arrive at a 4 kCPU, 2 GByte RAM, 17 kW thermal dissipation, 2.5
TFlop peak, 6.5 TByte/s memory bandwidth and 2.5 TByte/s aggregate
link bandwidth in a volume of 0.05 m^3, for the raw silicon price of
~50 k$, the price of a current server or a mid-range engineering
workstation.
(17 kW in a volume thus small could be easily contained using a liquid
heat exchanger, such as a flurocarbon or a mineral oil).
As this is a 3d lattice, I can more or less (mostly for coolant plumbing issues) easily scale this to 20 such units for budget of 1 M$. We're talking ~100 k computational elements here. (I do not gloss over the reliability issue, the dead units can be easily managed by software if running nonbrittle (fault-resistant) codes.
Now, we can easily dismiss the above as ravings of a geek struck with arithmomania. However, I cannot help but wonder what such a mid-1999 technology (even barring progress in nanotechnology, we can assume Moore will hold up to ~2010.) 1 M$ machine could accomplish, _if properly programmed_.
Hey, can someone spare a coupla megabucks?