Software/Hardware Architectures (WAS: RE: Human minds on Windows(?))

Eugene Leitl (eugene.leitl@lrz.uni-muenchen.de)
Tue, 13 Jul 1999 23:28:46 -0700 (PDT)

Billy Brown writes:

> > Hey, so they really call OS.send.message() every few 100 machine
> > instructions, or so, and context-switch every 1 us? Really?
>
> Why on Earth would you want to do that? Even for masively parallel

Because that's the only way how you can do things on a fine-grain maspar system. I believe we were talking about portability/migration issues...

> architectures you are better off either using large CPU/memory blocks, or

Uh, there _are_ no large contiguous memory blocks/monster CPUs in a maspar fine-grain system. For price (yield), reliability, footprint and thermal dissipation reasons you shouldn't imagine something like the ASCI Red but a midi tower stuffed few 100...1000 VLIW CPUs with few-MBit on-die RAM, interconnected by multiple/ redundant fast (multi-GBps, <<10 ns latency) serial links running a primitive switchable protocol. But I seem to be repeating myself...

> running conventional apps in a virtual machine. At any rate, Redmond

As I said you *could* run conventional apps, but not at practical speeds. It would be even less usable than http://www.bochs.com/

> designs for the hardware that is actually in use (big surprise), so they
> only context switch every millisecond or two.

Context switches are intrinsically expensive if done on anything else than (bi)stack machines http://www.cs.cmu.edu/~koopman/stack_computers/ . These cannot be adequately programmed in C-type languages.

> They haven't gotten around to re-writing the entire OS this way yet, but

Thanks god, or OpenSource OSses would be in trouble ;)

> everything new *is* done that way. In Office 2000, for instance, every
> spreadsheet cell is indeed an object (and so is every other recognizeable
> program element). The same goes for ADO, MTS, and recent versions of most

Yes, but are these asynchronous objects?

> On a modern CPU a context switch isn't any big deal. You don't want to do

No, it is a very big deal because you have to save the context, and there is a lot of it. Register sets, stack frames, you name it. You might not notice the machinery for the overhead and the introduced delays in a 50 MTransistor bloatware CPU, but we're not talking about anysuch. You certainly can't beat context switch times in a modern MISC CPU.

> it every other instruction, but there isn't any good reason to do that in
> the first place. You can certainly do it anywhere there is a reason to
> without having to worry about it affecting your performance.

Of course context switching is a bad idea, but we're using it only to simulate something essentially parallel. After all, the same code should run on a single-node and multi-node machine, right?

> > Multithreading!=asynchronous message passing on many tiny objects.
> > We're talking about several thousands primitive (few kBytes) objects
> > which send message packets which are routed by hardware directly --
> > while the originator code may or may not wait for the ack/result
> > to arrive. If this exists at all, it is academic curiousity at best
> > (Thinking Machines might qualify, though I really doubt they exploited
> > their options fully every time).
>
> It doesn't exist because there is no reason to do it. The current model

Of course there is a reason: we need to migrate to massively parallel hardware yet to keep our apps for the time being, remember?

> does exactly the same thing, but the objects are 10-100 times as big and
> which communicate about 10% as often. Asynchronous calls are used whenever
> they actualy do something for you (in most cases they don't work, because
> you can't proceed witht he current operation until you get your results
> back).

This is strange, for the world is an intrinsically parallel place. Many things are happening simultaneously, and are only coupled locally, if at all. This is most naturally expressed as a large number of asynchronous objects which are most naturally run on a large parallel machine.

If you think that the most of the world is sequential, you must be a programmer ;)

> Or, to put it another way, the reason current apps wouldn't benifit from
> your fast chip/small memory parallel processing architecture is because most
> of the tasks they do are inherently linear, not because they are poorly
> written. The only way to speed up a linear process is to give it a single
> very fast thread of execution. That's why massively parallel machines are
> generally reserved for inherently parallel sorts of computation.

Sorry, but this is nonsense. That particular machine by my desk is running ~100 processes, each of which even having some massively parallel aspects. There shouldn't be more than one process/CPU in most cases, and hence there is no need to for context switching, nor MMU for address space protection (nor cache, because the stuff is on-die).

Searching is intrinsically parallel, so is rendering, so is neural DSP, so is simulation of any kind. While I type this into emacs sequentially, don't tell me GC can't profit from a little parallelism, or the GUI, or the mp3 process in the background, the file system reindexing, the web server, the molecular dynamics process, the FASTA search, the parallel make, etc. All this off the top of my head, there is surely lots more to it.

> Now, if you take a close look at modern PC architectures you'll see that
> there is an emerging trend towards increasing parallelism in the areas where
> it is usefull. Servers often have multiple CPUs, since they have to handle

There is no noticeable parallelism in the modern PC but in the CPU. Also, I wouldn't say that little parallelism is in there because it is particularly useful. People wouldn't be building Beowulfs if one could purchase them on the free market, right?

> many different requests simultaneously. Video subsystems often incorporate

In absence of crossbars to memory, SMPs suck massively. Global shared memory is a myth anyway, so we'd better get used to message-passing anyway.

> several DSPs, and the trend seems to be towards using more and more of them.

There are no parallel DSP arrays in any PC video subsystem I am aware of. In fact I cannot readily think of any DSP array applications anywhere in the mainstream.

> With modems, sound cards and other specialized functions turning into
> software for DSP chips, it would not be at all surprising if the PC of the

Right now people attempt to reduce hardware prices by moving functionality into software. See Windows printers, memory mapped video, etc.

> future had a large array of DSPs for parallel-processing tasks. However, it
> will still need that fast Pentium-whatever chip for handling more linear
> jobs in a timely fashion.

Non sequitur. A pentium is a legacy processor if I ever saw one. There is absolutely no need for a dedicated head in a lattice of modern DSPs.