From: Smigrodzki, Rafal (SmigrodzkiR@msx.upmc.edu)
Date: Wed Feb 06 2002 - 21:52:12 MST
I just finished reading CFAI and GISAI and here are some questions and
Of course, SIAI <../GISAI/meta/glossary.html> knows of only one current
project advanced enough to even begin implementing the first baby steps
toward Friendliness - but where there is one today, there may be a dozen
### Which other AI projects are on the right track?
That is, conscious reasoning can replace the "damage signal" aspect of pain.
If the AI successfully solves a problem, the AI can choose to increase the
priority or devote additional computational power to whichever subheuristics
or internal cognitive events were most useful in solving the problem,
replacing the positive-feedback aspect of pleasure.
### Do you think that the feedback loops focusing the AI on a particular
problem might (in any sufficiently highly organized AI) give rise to qualia
analogous to our feelings of "intellectual unease", and "rapture of an
The absoluteness of "The end does not justify the means" is the result of
the Bayesian Probability Theorem <../../GISAI/meta/glossary.html> applied to
internal cognitive events. Given the cognitive event of a human thinking
that the end justifies the means, what is the probability that the end
actually does justify the means? Far, far less than 100%, historically
speaking. Even the cognitive event "I'm a special case for [reason X] and am
therefore capable of safely reasoning that the end justifies the means" is,
historically speaking, often dissociated with external reality. The rate of
hits and misses is not due to the operation of ordinary rationality, but to
an evolutionary bias towards self-overestimation. There's no Bayesian
binding <../../GISAI/meta/glossary.html> between our subjective experience
of feeling justified and the external event of actually being justified, so
our subjective experience cannot license actions that would be dependent on
being actually justified.
### I think I can dimly perceive a certain meaning in the above paragraph,
with which I could agree (especially in hindsight after reading 3.4.3:
Causal validity semantics <design/structure/causal.html> ). Yet, without
recourse to the grand finale, this paragraph is very cryptic and in some
interpretations for me unacceptable.
I think that undirected evolution is unsafe, and I can't think of any way to
make it acceptably safe. Directed evolution might be made to work, but it
will still be substantially less safe than self-modification. Directed
evolution will also be extremely unsafe unless pursued with Friendliness in
mind and with a full understanding of non-anthropomorphic minds. Another
academically popular theory is that all people are blank slates, or that all
altruism is a child goal of selfishness - evolutionary psychologists know
better, but some of the social sciences have managed to totally insulate
themselves from the rest of cognitive science, and there are still AI people
who are getting their psychology from the social sciences.
### Is altruism something other than a child goal of selfishness? It was my
impression that evolutionary psychology predicts the emergence of altruism
as a result of natural selection. Since natural selection is not
goal-oriented, the goals defining the base of the goal system (at least if
you use the derivative validity rule), are the goals present in the
subjects of natural selection - selfish organisms, which in the service of
their own survival have to develop (secondarily) altruistic impulses. As the
derivative validity rule is itself the outcome of ethical reasoning , one
could claim that it cannot invalidate the goals from which it is derived,
thus sparing us a total meltdown of the goal system and shutdown.
The semantics of objectivity are also ubiquitous because they fit very well
into the way our brain processes statements; statements about morality
(containing the word "should") are not evaluated by some separate, isolated
subsystem, but by the same stream of consciousness that does everything else
in the mind. Thus, for example, we cognitively expect the same kind of
coherence and sensibility from morality as we expect from any other fact in
### It is likely that there are specialized cortical areas, mainly in the
frontopolar and ventromedial frontal cortices, involved in the processing of
ethics-related information. Many of us are perfectly capable of double- and
triple-thinking about ethical issues, as your examples of self-deception
testify, while similar feats of mental juggling are not possible in the
arena of mathematics or motorcycle maintenance.
Actually, rationalization does not totally disjoint morality and actions; it
simply gives evolution a greater degree of freedom by loosely decoupling the
two. Every now and then, the gene pool or the memetic environment spits out
a genuine altruist; who, from evolution's perspective, may turn out to be a
lost cause. The really interesting point is that evolution is free to load
us with beliefs and adaptations which, if executed in the absence of
rationalization, would turn us into total altruists ninety-nine point nine
percent of the time. Thus, even though our "carnal" desires are almost
entirely observer-centered, and our social desires are about evenly split
between the personal and the altruistic, the adaptations that control our
moral justifications have strong biases toward moral symmetry, fairness,
truth, altruism, working for the public benefit, and so on.
### In my very personal outlook, the "moral justifications" are the results
of advanced information processing applied in the service of "carnal"
desires, supplemented by innate, evolved biases. The initial supergoals are
analyzed, their implications for action under various conditions are
explored, and the usual normative human comes to recognize the superior
effectiveness of fairness, truth, etc., for survival in a social situation.
As a result the initial supergoals are overwritten by new content (at least
to some degree, dictated by the ability to deceive others). As much as the
imprint of my 4-year old self in my present mind might object, I am forced
to accept the higher Kohlberg stage rules. Do you think that the Friendly AI
will have some analogue of such (higher) levels? Can you hypothesize about
the supergoal content of such level? Could it be translated back for
unenhanced humans, or would it be only accessible to highly improved
An AI's complete mind-state at any moment in time is the result of a long
causal chain. We have, for this moment, stopped speaking in the language of
desirable and undesirable, or even true and false, and are now speaking
strictly about cause and effect. Sometimes the causes described may be
beliefs existing in cognitive entities, but we are not obliged to treat
these beliefs as beliefs, or consider their truth or falsity; it suffices to
treat them as purely physical events with purely physical consequences.
This is the physicalist perspective, and it's a dangerous place for humans
to be. I don't advise that you stay too long. The way the human mind is set
up to think about morality, just imagining the existence of a physicalist
perspective can have negative emotional effects. I do hope that you'll hold
off on drawing any philosophical conclusions until the end of this topic at
the very least
### If I understand this paragraph correctly, my way of thinking about
myself has been physicalist for the past 10 - 15 years, yet it failed to
produce negative emotional effects. I am an information processing routine,
with the current self-referential goal of preserving its continued
existence, all in the context of 15x10e9 years of the Universe's evolution.
Even this self-referential supergoal can be explicitly renounced if it
becomes advantageous for the routine's survival (as in joining the Borg to
escape the Klingons).
---- The rule of derivative validity - "Effects cannot have greater validity than their causes." - contains a flaw; it has no tail-end recursion. Of course, so does the rule of derivative causality - "Effects have causes" - and yet, we're still here; there is Something rather than Nothing. The problem is more severe for derivative validity, however. At some clearly defined point after the Big Bang, there are no valid causes (before the rise of self-replicating chemicals on Earth, say); then, at some clearly defined point in the future (i.e., the rise of homo sapiens sapiens) there are valid causes. At some point, an invalid cause must have had a valid effect. To some extent you might get around this by saying that, i.e., self-replicating chemicals or evolved intelligences are pattern-identical with (represent) some Platonic valid cause - a low-entropy cause, so that evolved intelligences in general are valid causes - but then there would still be the question of what validates the Platonic cause. And so on. The rule of derivative validity is embedded very deeply in the human mind. It's the ultimate drive behind our search for the Meaning of Life. It's the reason why we instinctively dislike circular logic. It's a very powerful shaper(!). Just violating it arbitrarily, to trick the AI into doing something, or in the belief that it doesn't really matter... well, that wouldn't be safe (4), because that kind of "selfishness" is designated as an extraneous cause by quite a few deeper shapers. Of course, I'm omitting the possibility that the programmer personally believes that kind of logic is okay (i.e., would use it herself), in which case things would probably come out okay, though I personally would worry that this programmer, or her shaper network, had too high a tolerance for circular logic... ### I think it's possible for a human to have both a limited use of derivative validity and circular (or self-referential) basis for the goal system. See my comments above. --- We want a Meaning of Life that can be explained to a rock, in the same way that the First Cause (whatever it is) can be explained to Nothingness. We want what I call an "objective morality" - a set of moral propositions, or propositions about differential desirabilities, that have the status of provably factual statements, without derivation from any previously accepted moral propositions. We want a tail-end recursion to the rule of derivative validity. Without that, then yes - in the ultimate sense described above, Friendliness is unstable ### I do agree with the last sentence. A human's self-Friendliness is inherently unstable, too --- As General Intelligence and Seed AI describes a seed AI capable of self-improvement, so Creating Friendly AI describes a Friendly AI capable of self-correction. A Friendly AI is stabilized, not by objective morality - though I'll take that if I can get it - but by renormalization, in which the whole passes judgement on the parts, and on its own causal history. ### This seems to summarize my goal system functioning, with the importance of personal history, use of a wide range of cognitive tools to derive rules, and to slowly change the goal system. ---- . Even if an extraneous cause affects a deep shaper, even deep shapers don't justify themselves; rather than individual principles justifying themselves - as would be the case with a generic goal system protecting absolute supergoals - there's a set of mutually reinforcing deep principles that resemble cognitive principles more than moral statements, and that are stable under renormalization. Why "resemble cognitive principles more than moral statements"? Because the system would distrust a surface-level moral statement capable of justifying itself! ### Can you give examples of such deep moral principles? ----- Humanity is diverse, and there's still some variance even in the panhuman layer, but it's still possible to conceive of description for humanity and not just any one individual human, by superposing the sum of all the variances in the panhuman layer into one description of humanity. Suppose, for example, that any given human has a preference for X; this preference can be thought of as a cloud in configuration space. Certain events very strongly satisfy the metric for X; others satisfy it more weakly; other events satisfy it not at all. Thus, there's a cloud in configuration space, with a clearly defined center. If you take something in the panhuman layer (not the personal layer) and superimpose the clouds of all humanity, you should end up with a slightly larger cloud that still has a clearly defined center. Any point that is squarely in the center of the cloud is "grounded in the panhuman layer of humanity". ### What if the shape of superposition turns out to be more complicated, with the center of mass falling outside the maximum values of the superposition? In that case implementing a Friendliness focused on this center would have outcomes distasteful to all humans, and finding alternative criteria for Friendliness would be highly nontrivial. --- ### And a few more comments: I wonder if you read Lem's "Golem XIV"? Oops, Google says you did read it. Of course. In a post on Exilist you say that uploading is a post-Singularity technology. While I intuitively feel that true AI will be built well before the computing power becomes available for an upload, I would imagine it should be possible to do uploading without AI. After all, you need just some improved scanning methods, and with laser tissue machining, quantum dot antibody labeling and high-res confocal microscopy, as well as the proteome project, this might be realistic in as little as 10 years (a guess). With a huge computer but no AI the scanned data would give you a human mind in a box, amenable to some enhancement. What do you think about using interactions between a nascent AI and the upload(s), with reciprocal rounds of enhancemnt and ethical system transfer, to develop Friendliness? And, by the way, I do think that CFAI and GISAI are wonderful intellectual achievements. Rafal
This archive was generated by hypermail 2.1.5 : Fri Nov 01 2002 - 13:37:38 MST