"Friendly AI" appears at last!

From: Eliezer S. Yudkowsky (sentience@pobox.com)
Date: Wed Apr 18 2001 - 11:44:52 MDT


Remember how, back in September, I said I was taking a bit of time off
from the paper "Coding a Transhuman AI" to write a small subpaper called
"Friendly AI"? Well, the first version of "Friendly AI" is now being
circulated for commentary, and "Friendly AI" is more than twice as long as
the rest of CaTAI put together - 740K versus 360K. Hence the eight
months. It's all here - everything from Bayesian programmer-affirmed
supergoals to ethical injunctions, human psychology and AI psychology,
self-improvement and directed evolution, causal rewrite semantics and
programmer-independence, subgoal stomps and wireheading. The answer to
your question IS here.

--

http://singinst.org/CaTAI/friendly/contents.html

The Singularity Institute is pleased to announce that the "open commentary" version of the long-awaited "Friendly AI" is now available to the academic and futurist communities.

Complete table of contents:

Preface [2K] INIT [1K] 4.1: Design requirements of Friendliness [50K] 4.1.1: Envisioning perfection 4.1.2: Assumptions "conservative" for Friendly AI 4.1.3: Strong Singularity, seed AI, the Transition Guide... 4.1.4: Goal-oriented behavior Interlude: The story of a blob [12K] 4.2: Humans, AIs, and SIs: Beyond anthropomorphism [78K] 4.2.1: Reinventing retaliation 4.2.2: Selfishness is an evolved trait 4.2.2.1: Pain and pleasure 4.2.2.1.1: FoF: Wireheading 1 4.2.2.2: Anthropomorphic capitalism 4.2.2.3: Mutual friendship 4.2.2.4: A final note on selfishness 4.2.3: Observer-biased beliefs evolve in imperfectly deceptive social organisms 4.2.4: Anthropomorphic political rebellion is just plain silly Interlude: Movie cliches about AIs 4.2.5: Review of the AI Advantage Interlude: Beyond the adversarial attitude [17K] 4.3: Design of Friendship systems [0K] 4.3.1: Generic goal systems [78K] 4.3.1.1: Generic goal system functionality 4.3.1.2: Layered mistake detection 4.3.1.2.1: FoF: Autonomic blindness 4.3.1.3: FoF: Non-malicious mistake 4.3.1.4: Injunctions 4.3.1.4.1: Anthropomorphic injunctions 4.3.1.4.2: Adversarial injunctions 4.3.1.4.3: AI injunctions 4.3.1.5: Ethical injunctions 4.3.1.5.1: Anthropomorphic ethical injunctions 4.3.1.5.2: AI ethical injunctions 4.3.1.6: FoF: Subgoal stomp 4.3.1.7: Emergent phenomena in generic goal systems 4.3.1.7.1: Convergent subgoals 4.3.1.7.2: Habituation 4.3.1.7.3: Anthropomorphic satisfaction 4.3.2: Seed AI goal systems [105K] 4.3.2.1: Equivalence of self and self-image 4.3.2.2: Coherence and consistency through self-production 4.3.2.2.1: Look-ahead: Coherent supergoals 4.3.2.3: Programmer-assisted Friendliness 4.3.2.3.1: Unity of will 4.3.2.3.2: Cooperative safeguard: "Preserve transparency" 4.3.2.3.3: Absorbing assists into the system 4.3.2.3.4: Programmer-created beliefs must be truthful... 4.3.2.4: Wisdom tournaments 4.3.2.4.1: Wisdom tournament structure 4.3.2.5: FoF: Wireheading 2 4.3.2.6: Directed evolution in goal systems 4.3.2.6.1: Anthropomorphic evolution 4.3.2.6.2: Evolution and Friendliness 4.3.2.6.3: Conclusion: Evolution is not safe 4.3.2.7: FAI hardware: The flight recorder Interlude: Why structure matters [7K] 4.3.3: Friendly goal systems [4K] 4.3.3.1: External reference semantics [67K] 4.3.3.1.1: Bayesian sensory binding 4.3.3.1.2: External objects and external referents... 4.3.3.1.2.1: Flexibility of conclusions... 4.3.3.1.3: Bayesian reinforcement 4.3.3.1.3.1: Bayesian reinforcement... 4.3.3.1.3.2: Perseverant affirmation... 4.3.3.1.4: Bayesian programmer affirmation... 4.3.3.2: Shaper/anchor semantics [59K] 4.3.3.2.1: "Travel AI": Convergence begins to dawn 4.3.3.2.2: Some forces that shape Friendliness 4.3.3.2.3: Beyond rationalization 4.3.3.2.4: Shapers of philosophies 4.3.3.2.4.1: SAS: Correction of programmer errors 4.3.3.2.4.2: SAS: Programmer-independence 4.3.3.2.4.3: SAS: Grounding for ERS... 4.3.3.2.5: Anchors 4.3.3.2.5.1: Positive anchors 4.3.3.2.5.2: Negative anchors 4.3.3.2.5.3: Anchor abuse 4.3.3.2.6: Shaper/anchor semantics require intelligence... 4.3.3.3: Causal rewrite semantics [37K] 4.3.3.3.1: The physicalist explanation of Friendly AIs 4.3.3.3.2: Causal rewrites and extraneous causes 4.3.3.3.3: The rule of derivative validity 4.3.3.3.4: Truly perfect Friendliness 4.3.3.3.5: The acausal level 4.3.3.3.6: Renormalization... 4.3.3.4: The secret actual definition of Friendliness [8K] 4.3.3.4.1: Requirements for "sufficient" convergence 4.3.4: Developmental Friendliness [28K] 4.3.4.1: Teaching Friendliness content 4.3.4.1.1: Trainable differences for causal rewrites 4.3.4.2: Commercial Friendliness and research Friendliness 4.3.4.2.1: When Friendliness becomes necessary 4.3.4.2.2: Evangelizing Friendliness 4.3.4.3: "In case of Singularity, break glass"... 4.3.4.3.1: The Bayesian Boundary 4.3.4.3.2: Controlled ascent Interlude: Of Transition Guides and Sysops [10K] The Transition Guide The Sysop Scenario 4.4: Policy implications [34K] 4.4.1: Comparative analyses 4.4.1.1: FAI relative to other technologies 4.4.1.2: FAI relative to computing power 4.4.1.3: FAI relative to unFriendly AI 4.4.1.4: FAI relative to social awareness 4.4.1.5: Conclusions from comparative analysis 4.4.2: Policies and effects 4.4.2.1: Regulation (-) 4.4.2.2: Relinquishment (-) 4.4.2.3: Selective support (+) 4.4.3: Recommendations 4.5: Miscellaneous [4K] 4.5.1: Relevant literature END [1K] Appendix 4.A: Friendly AI Guides and References [0K] 4.A.1: Indexed FAQ [27K] 4.A.2: Complete Table of Contents [0K]

This is not the official launch of Friendly AI; this is the "open commentary" version we're circulating in the community first. However, you are politely requested to check the Indexed FAQ before sending in your commentary, since we've already heard quite a few questions about Friendly AI, and your comments may have already been taken into account.

-- -- -- -- -- Eliezer S. Yudkowsky http://singinst.org/ Research Fellow, Singularity Institute for Artificial Intelligence



This archive was generated by hypermail 2b30 : Mon May 28 2001 - 09:59:47 MDT