> > [ Linux can do SMP, SMP is poor parallelism, when mainstreams maspar? ]
>
> I agree that eventually we will have to move away from SMP. Too much
> contention in those systems to be particularly scalable. 5 years isn't such
I think SMP is a very broken architecture. It speculates on memory
bandwidth, a scarce commodity, and it requires good caches, which are
difficult/costly to do, and do not at all help if there is little or no
data/code locality.
> a bad estimate. I think the limiting factor won't be hardware, but
Of course. Software _determines_ the course of hardware development
nowadays, and software development is lagging notoriously behind that of
what hardware can do. Customer mainstream clings to braindead designs, and
calls this investment protection. The only way to leave a local minimum
behind is to do de novo design. Ain't that supposed to be the chiefest
advantage of human design versus darwinian evolution, to discontinuously
tunnel through interim fitness walls? We boasted to soon, it seems.
> software. OSs will have to support more truly parallel architectures before
> the hardware will become popular. The first large-scale OS to adopt these
> architectures will probably be one of the quickly evolving ones, like Linux.
Linux is great, but Unix can't be scaled down to nanokernel size, alas.
Dedicated OOP DSP OSses are much better candidates for maspar systems,
imo. It will get really exciting, when GA-grown ANNs will jump from DSP
clusters to dedicated ANN chips. Probably, the need for neural DSP will
pioneer this, other fields (robotics, generic control, ALife AI) will
gladly accept the torch. Now imagine entire countries encrusted with
boxes full of ANN chips, mostly idle, locally connected by fiber links...
Though agoric computing will inhibit that somewhat, the phase transition
to >web is writ all over the wall, in neon letters light-minutes-large...
> The hardware is already starting to get there. We are starting to see
> multiple buses becoming available on PC motherboards, and fully integrated
Many DSPs (Sharc, TI, &c) already offer several high-bandwidth links.
Theoretically, a single macroscopic (albeit short) serial bus can carry
100 MBytes/s, optical links several orders of magnitude more.
(High-clocked stuff must be done in optics anyway, for dissipation
and signal reasons).
> L2 caches (like the P6) are a good start towards eliminating resource
> contention in multiprocessor systems. The one thing that will take the
Caches are evil. Caches transistors provide no extra storage, and take
4-6 times the transistor resources of a an equivalent DRAM. Putting
caches on die bloats die extremely, which kills die yield and thus makes
the result unsuitable for WSI. Cache consistancy is a problem. Added
latency in case of cache miss is a problem. Increased design complexity,
prolonged design time and increased probability of a design glitch are a
problem. Decreased operating frequency due to circuit and design
complexity is a problem. Lots of ugly & hairy things.
> longest is breaking out of the shared memory model. Most of the rest of the
> required technology is available and supported.
>
> I am not too sure that the shared memory is really such a bad idea, in terms
> of efficiency. I think what *really* needs to be improved is the general
Shared memory contradicts the demand for locality. Data/code should
reside in the utmost immediate vicinity of the ALU, ideally being a single
entity (CAMs qualify here best). Because of constraints of our spacetime
(just 3 dimensions, the curled-up ones, alas, unaccessible), and, even
worse, of silicon photolitho technology, which is a fundamentally
2d-technique, the conflict arising between making the same data
accessible to several processors is unresolvable. Caches are no good, and
open a wholly new can of worms... (cache consistancy in shared-memory
architectures is a nightmare, see KSR).
> memory architecture currently used. If they used some type of fine grained
> switching matrix mechanism, maybe something similar to the old Burroughs
If we are to accept the scheme by which our relativistic universe works,
we must adopt an large-scale asynchronous, locally-coupled, nanograin
message-passing OO paradigm. Crossbars, whether vanilla, or perfect
shuffle, are excellent for asynchronous OOP, provided the topology allows
trivial routing. Hypergrid does.
> B5000 series mainframes, a lot of memory contention could be eliminated.
> This of course in addition to speeding up the entire memory architecture
> altogether.
Alas, there are physical limits to that, particularly if it comes to
off-die memory. Locality strikes yet again, aargh.
> -James Rogers
> jamesr@best.com
>
ciao,
'gene
______________________________________________________________________________
|mailto:ui22204@sunmail.lrz-muenchen.de |transhumanism >H, cryonics, |
|mailto:Eugene.Leitl@uni-muenchen.de |nanotechnology, etc. etc. |
|mailto:c438@org.chemie.uni-muenchen.de |"deus ex machina, v.0.0.alpha" |
|icbmto:N 48 10'07'' E 011 33'53'' |http://www.lrz-muenchen.de/~ui22204 |