Re: tech snippets

James Rogers (jamesr@best.com)
Thu, 19 Dec 1996 15:59:23 -0800


At 05:09 PM 12/19/96 +0100, you wrote:
>On Mon, 16 Dec 1996, James Rogers wrote:
>
>>
>> >It seems, MMX multimedia instructions performance is distinctly inferior
>> >to vanilla PowerPC.
>> >
>> I agree. I think MMX is an excuse to have mediocre floating point
>> performance. I have always had reservations about specialty, high-level
>
>Very few vanilla applications actually require floats, actually. Some new
>games do, but this is sloppy programming: fixed-point or scaled integers,
>especally xxl integers can substitute any float. Even scientific number
>crunching can be done without floats.

I often do use fixed-point numbers for floating point computation (32-bit
int for fraction and 32 or 64-bit int for integer portion). For some
applications it is more convenient (and more accurate) to use fixed point
calculations. However, for some types of floating point intensive
computation, I doubt this is the _fastest_ method.

>Both PentiumPro and PowerPC, especially in cheap SMP machines are serious
>candidates to obsoletify workstations in number crunching applications.
>After mainframe has sunk below the surface of the tar pit, workstation's
>death knell is beginning to toll, already.
>
>> I think the P6 is Intel's first x86 CPU with adequate (although modest)
>> floating point performance. If I was going to do serious floating point
>> number crunching though, I would definitely go with PA-RISC, Alpha, or
>> PowerPC architectures. I think as Intel's architectures get faster, they
>
>Are you sure you get the most bang for the $$? PA-RISC is horrendously
>expensive, as are the better Alphas. Don't know much about PowerPC,
>though, but then there are not many vendors shipping them (Apple is
>specializing in crippled architectures, as usual).

Granted, Fast fp RISC machines are expensive, but if you need a workstation
with high-end fp capability, this is the way to go. "Consumer" CPUs,
especially SMP, will get you more bang for the buck, but SMP is only useful
for a subset of computational problems. The performance gap is getting
pretty narrow though. Using specmarks as a reference, the highest-end
workstation CPUs only offer roughly twice the performance of the highest-end
"consumer" CPUs. For some classes of problems, a 4-way SMP P6 or PowerPC
system would seriously out-perform many high-end workstations at a fraction
of the cost. Case in point: An Intergraph P6 SMP graphics workstations
(with custom acceleration hardware) will out-perform any SGI graphics system
costing less than $100k. The Intergraph system will only cost you $25k
because it uses off the shelf components, with the sole exception of the
hardware accelerator (which is compatible with any NT workstation).

I think this convergence of performance will kill a significant number of
the RISC vendors unless prices converge as well.

>> are going to find it harder and harder to sell their chips if they don't
>> seriously upgrade their floating point hardware. Fabulous integer
>> performance isn't enough for most high-end, computation heavy software,
>> especially in the realm of multimedia where you have a lot of realtime
>> signal processing and transforms.
>
>The more reason for doing it in 128 bit integers, and with a maspar
>pipeline (one CPU for each processing stage).

I never really thought about it in this sense. I suppose it WOULD be
possible to pipeline integer CPUs to get fast floating point performance.
It probably never occurred to me because this is contrary to conventional
thinking.

This could be the basis for a flexible, fast computing architecture.
Pipeline simple 128-bit (or even 256-bit) ALUs. The ALUs, by nature, would
be really small, simple, and very fast. You could build a simple floating
point processor with many parallel pipelines using less than >1 million
transistors (trivial these days). The fp throughput would be enormous, and
I suspect that you could build a veritable supercomputer on a chip or MCM
this way. And depending on how it was designed, you could have arbitrary
hardware supported precision, up to the point of the total number of
pipelines on the chip or MCM.

-James Rogers
jamesr@best.com