Re: Mostly stuff about software (was Homeless + Jobs, Lots of stuff about Software world)

From: James Rogers (jamesr@best.com)
Date: Mon Sep 25 2000 - 07:11:47 MDT


On Sun, 24 Sep 2000, Samantha Atkins wrote:
>
> I would be very curious how you would obsolete the attitudes and beliefs
> that that big government grows out of with only technology. Maybe
> something that raises the general IQ across the board?

Governments spend most of their time managing perceived scarcity of one
sort or another. Lacking scarcity or finding something to replace
government "management" will do the trick. At the very least it will put
them on a rather shaky defensive ground.

> Of course we are still in the early days of hardware improvement and of
> human/machine interaction. I believe the day will come when you don't
> explain a complex design requirement to the machine at any deeper level
> than you would to a skilled human. Getting there will require a lot of
> work across multiple disciplines.

This effectively requires an AI for the most part. There is a lot of
context and implied design parameters in even the simplest of
applications, many of which tend to be valid largely from a human
perspective. Creating the specification for something like a
spreadsheet application would be a monstrous undertaking, and I think
that one would need to specify things that would be considered
"obvious" to human operators. There are many aspects of design that
are considered "best choice" on a rather subjective (i.e. arguable)
basis. Basically, I think the UI would suffer if it was generated by
a computer, although the backend would probably come out just fine.
For something like a database engine, I expect that it would be
relatively easy to create a code generator that runs off a human level
spec, but only because the tool/design space is relatively limited.

> > The biggest problem with runtime code generators is debugging the
> > resulting mess. However, it has allowed me to work on the some
> > of the many interesting problems of self-observation. Designing methods to
> > resolve issues such as detecting complex and non-procedural
> > infinite loops (e.g. infinite loops caused by how the data interacts with
> > the code at runtime, without compile-time knowledge of what the data can
> > look like) has been fun.
>
> If the code-generator is well-designed / tested and builds in run-time
> checks there is not quite so onerous a problem with debugging is there?
> Some of the things you mention sound like good fun indeed.

Very true. It was a matter of "getting from here to there". I didn't
start out with very many runtime checks at all. Backtracking from the
algorithm results back to the code generation system is not trivial.
There is a lot to be learned by doing this type of stuff that one is
not likely to run into in your average programming job.

> Good C programmers do not generally encapsulate data structure and the
> functions for manipulating those data structures well or at least not in
> a way that enforces that encapsulation except by convention.

True, but I don't see this as too important. Convention can go a
long way in this regard. Nonetheless, it is one of the things I like about
C++.

> Good C
> programmers do not generally think through polymorphism/genericity.

The *good* ones do when necessary. I only go through the effort when I
anticipate that I will want to reuse the code at a later date, as making
it so does not produce the optimal code for the problem at hand. The
problem is that when I write nice OO C++, I "see" it in terms of the C
equivalent but with a pretty wrapper. Some C++ compilers are nothing more
than C precompilers. It is not difficult for a competent C programmer to
create the functional equivalent of any C++ construct, though it will
probably be a little more verbose.

> And
> good C and C++ programmers generally believe a lot of mystical claptrap
> about their ability to manage memory without a formal GC. And no, using
> reference counting and "discipline" is NOT an acceptable solution.

I both agree and disagree. For most applications I agree that memory
management is an issue, and there is nothing like a good GC when I am
feeling lazy. However, for some applications you *need* to do your own
memory management, for any number of reasons. GCs are a poor solution
in a number of design spaces. One of the things that annoys me about
Java and that will forever relegate it to second-string status is that
you have no control of the system resources. I have very mature and
powerful memory managers that I have written that I use for certain
classes of applications, and also have pointer managers that I use for
just about everything else when working in C/C++. I guess I object to
being forced to use a one-size-fits-all solution to a problem.

I don't think C/C++ programmers need something to clean up their
pointers. Mechanisms to keep track of them for the programmer (with
some transactional context) should be sufficient and will allow more
flexibility and control. GCs like the ones in Java force me to give up
too much control in some contexts. A garbage collector is an idealized
solution, but it falls short when used in some real world applications.

> I pick Python/C++ (for delimited tight things)/Java (mostly because of
> size of Java savvy population and prejudice although it has a few
> worthwhile features)/Lisp/Scheme/Smalltalk.

Python is very good; I really like it. I use Java largely because it has
become a defacto standard for some application spaces and it is tolerable
for a lot of the stuff I get paid good money to do. However, it has some
weaknesses that I dislike a good bit. What I am really waiting for is
the day when Python starts to push Perl out of the Big Script domain.
Perl should have stopped adding features at version 4.x.

> All else being equal I will
> reach for Python or Lisp first when attempting to model a problem with
> Smalltalk running a close second. If I am coding some tight data
> structure I will go to C/C++ as a sort of universal assembler.

The great thing about languages like Java/Python/et al is that they bind
well into a C/C++ environment. I frequently combine environments into
one application to use the unique capabilities of each. The ability to
mix interpreted and compiled languages into the same binary is an
incredibly powerful capability that hasn't been fully exploited in my
opinion. You really can build a chimera.

> Modeling and coding closer to the actual problem space is more
> important. Grnated that many people using OO don't understand that.

I agree with this, but it is not nearly as applicable to most of the
software I work on as it may be too many programmers. It depends on the
design space of the project. I do OOD only on rare occasion.

> Clusters are a tool for network topology and dependable service
> delivery. Again, they should not be visible (except at the lowest
> levels) to the programming/application space. If they are then
> something is very wrong.

Idealistically yes, realistically definitely not. Clusters exist
both for availability and performance reasons. Resource and performance
limitations must necessarily drive design in a manner that requires
awareness of the underlying resource parameters. I don't think it is
even possible to create an abstract clustering interface that would
even remotely be considered efficient if the application programmer
did not control the trajectory of their code. Without detailed
knowledge of the application code that only the programmer would know,
the cluster could actually cause a dramatic slow down in application
performance. It is similar to the reasons that database kernels manage
their own resources rather than having the OS do it; the OS isn't
optimized for database engines because it has to work well for all
applications and this can have an enormous impact on resource
intensive applications. The differences in performance one would see
for a similar situation on a cluster would be magnified by the
relatively limited bandwidth and high latency between nodes.

> One of the things that make multi-threading and transactional integrity
> hard is that most language environments give only a few blunt tools for
> really addressing concurrency issues and most of the tools given have
> gross impedance mismatches with the language. We do not to this day
> have good long-transaction models or tools.

The problem is that the limitations are mathematical/fundamental; you
can't fight them, you have to work with them. I have found most of
the thread API implementations to actually be fairly elegant, though
there are a few turds in the bunch (e.g. Win32 threads). The only
thread model I don't like is the one in Java, largely because it makes
you jump through some very inelegant hoops to do some types of
complicated thread manipulation.

There are no good long-transaction models because the concept of a
"long-transaction model" doesn't really make sense. If most of your
transactions are "long" then they aren't, and any model that would make
guarantees on the performance of long transactions will grossly
undermine the average performance of the system, which is hardly a net
improvement. Most database engines today can be configured to "guarantee"
long-running transactions for those that really need it, but it often
imposes a terrible performance cost to maintain the ACID properties of the
transactions.

Note that there is no difference between long and short transactions,
it is merely a convention for describing transaction length
distribution. Current transaction models describe optimal systems
regardless of absolute transaction length. However, transactions that
deviate significantly from the average transaction on a given system
will always suffer degraded performance. Since "long transactions"
by definition deviate signficantly relative to the average transaction
length distribution and are therefore rare, most DBMSs sacrifice the
performance of long-running transactions for the sake of those
that run closer to the average distribution, maximizing average
performance. This is why many large database applications distribute their
queries across multiple systems by average transaction type. It is so that
each system has a typically narrow distribution of transaction types
(e.g. OLAP or OLTP) for optimal performance since neither will have
"long" transactions to worry about.

> Again, most programmers should have no need to "get" cluster design.

Only because they don't write cluster apps.

> In a well designed system there is little need for most application
> programmers and even many system programmers to worry about all of these
> issues much of the time. A large part of successful systems programming
> is keeping these issues out of the face of application programmers.

Wishful thinking. A good example is the modern RDBMS. All the
transaction management and data access complexity is hidden from the
application programmer; they merely have to program the business
logic.

Unfortunately, that is what happens all to often. Yes, you can ignore the
internal mechanisms of the engine, but it often comes at the expense of an
order of magnitude or more of performance. This is great if you
have infinite resources at your disposal, but it crumbles badly when real
limits exist. There is no solution that will work optimally for all
possible implementations, good or bad. Fact is, most database systems
are highly-tuned to provide very good performance across a great
number of application spaces, yet programmers regularly ruin system
performance because they have no clue as to how their code actually
impacts the database and therefore run grossly suboptimal sets of
operations against the engine. For all database engines, there is a
set of operations that will make it perform poorly. The number of
times programmers actually use this set of Very Bad operations in
practice is far greater than predicted by the Million Monkey Theory.

This isn't a case of having interface that just isn't good enough. It is
a case of their being fundamental limits to how much stupidity and
ignorance that software can cover for. Even the most elegant and robust
algorithms can give a poor showing if used poorly.

> Cut,paste, modify sucks big-time and produces krap systems. Components
> are more difficult to write and can't be written well in broken toy
> languages.

I didn't say it was necessarily good, just "viable".

> Doing components well also requires some advances in our
> ability to model some aspects of the semantics that we cannot model well
> today. I do not understand your comment about data awareness re
> components. From outside the component its data awareness is irrelevant
> to the user as it gives a message/capability/functional interface only.
> Would you want more than that?

What I meant was that components should work with essentially all
conceivable data i.e. be able to adapt themselves to the data spaces
that they are used to work on. A weakness with components is that they
focus on the algorithms rather than the data.

You can more-or-less componentize simple algorithms like sorting
because the data is simple in nature. But for complex problems, such
as a component that generically does cost-based routing through a
complex web of heterogenous nodes (where each node may be a small
world unto itself), generic component design becomes much trickier.
The algorithms are simple, but the data can occur in very complex
forms. This is the primary reason I think generic components as
implemented today are weak. They are not able to adapt effectively to
relatively simple algorithms that encompass very complex data spaces,
only to complex algorithm spaces with simple data. As a result, every
problem with a complex data space becomes a custom software solution.

I think it is arguable that one would actually see greater cost
savings in a great many applications from components designed to
generically handle complex dataspaces than you would from components
that are designed to handle complex algorithms.

User interface components are another thing altogether.

> > But components don't really matter; I have strong doubts as to
> > whether components are a correct solution in the long-term anyway.
> > Components exist more to solve a human weakness in software
> > implementation than to solve any particular weakness intrinsic
> > to software design itself. I can't imagine why a smart machine would
> > use components to write software.
>
> That is like saying that standardized reuseable parts in hardware exist
> only to solve particular weaknesses intrinsic to hardware design.

The weakness is human. We use components because the computational
effort required to maximally optimize every part of a design is too
high. Component reuse produces a less efficient design, but this is
outweighed by cost savings in the development cycle.

> smart machine would use components in order to not reinvent/reimplement
> the wheel every time it is needed, much as reasonably intelligent human
> software designers also attempt to do. It is not possible to
> think/design at increasingly more complex levels without reusing levels
> of parts (components) already sufficiently mastered, generalized and
> packaged.

Again, this is a computational limitation. A sufficiently smart
computer that can actually keep track of the intimate details of a
billion little parts will have no need to use components. There is
also a cost to adapting components to work together rather than
writing a perfectly fitted piece. Humans use components to reduce bug
counts and to find/fix design flaws quicker. But for a computer that
writes bug-free code the first time, every time, this won't be much of
a feature.

-James Rogers
 jamesr@best.com



This archive was generated by hypermail 2b29 : Mon Oct 02 2000 - 17:38:55 MDT