feat: LLM Wiki 세컨드 브레인 초기 셋팅

- CLAUDE.md 생성 (볼트 운영 규칙, Karpathy LLM Wiki 10가지 규칙)
- 나의 핵심 맥락.md 생성 (아키텍트 프로필, 세컨드 브레인 목적, 핵심 소스)
- raw/ 구조 정립 (book/기존 설계원칙 보존, articles/repos/notes/ 추가)
- wiki/ 초기화 (index.md, log.md, concepts/sources/patterns/ 폴더)
- output/ 초기화
- LLMWiki/ 기존 프롬프트 패턴 파일 보존

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
minsung
2026-04-30 14:34:29 +09:00
parent d7a123de97
commit 44e26d6972
48 changed files with 14334 additions and 0 deletions

19
raw/CLAUDE.md Normal file
View File

@@ -0,0 +1,19 @@
# raw/ — 불변 원본
이 폴더의 파일은 절대 수정하지 않는다. 추가만 가능.
## 하위 폴더
| 폴더 | 내용 |
|------|------|
| `book/` | 책 원문, 스캔, 발췌 |
| `articles/` | 블로그, 웹 아티클 |
| `repos/` | GitHub 저장소 분석 |
| `notes/` | 개인 메모, 인사이트 |
## 파일 규칙
- 파일명: `{출처}-{번호}.md` 형식 권장
- 이미지는 `{파일명}_images/` 폴더에
- 원본 그대로 보존 — 요약/편집 금지
- 요약과 해석은 `wiki/`에서만

0
raw/articles/.gitkeep Normal file
View File

View File

@@ -0,0 +1,207 @@
# SOFTWARE DESIGN FOR FLEXIBILITY
How to Avoid Programming Yourself into a Corner
Chris Hanson and Gerald Jay Sussman
![](설계원칙-001-020_images/_page_0_Picture_3.jpeg)
# **Software Design for Flexibility**
# **Software Design for Flexibility**
How to Avoid Programming Yourself into a Corner
Chris Hanson and Gerald Jay Sussman foreword by Guy L. Steele Jr.
The MIT Press Cambridge, Massachusetts London, England
#### © 2021 Massachusetts Institute of Technology
This work is subject to a
Creative Commons Attribution-ShareAlike 4.0 International License.
To view a copy of this license, visit [http://creativecommons.org/licenses/by-sa/4.0/.](http://creativecommons.org/licenses/by-sa/4.0/)
![](설계원칙-001-020_images/_page_3_Picture_4.jpeg)
Subject to such license, all rights are reserved.
This book was set in Computer Modern by the authors with the typesetting system.
Library of Congress Cataloging-in-Publication Data
Names: Hanson, Chris (Christopher P.), author. | Sussman, Gerald Jay, author.
Title: Software design for flexibility : how to avoid programming yourself into a corner / Chris Hanson and Gerald Jay Sussman ; foreword by Guy L. Steele Jr.
Description: Cambridge, Massachusetts : The MIT Press, [2021] | Includes bibliographical references and index.
Identifiers: LCCN 2020040688 | ISBN 9780262045490 (hardcover)
Subjects: LCSH: Software architecture. | Software patterns. Classification: LCC QA76.76.D47 H35 2021 | DDC 005.1/112 dc23
LC record available at <https://lccn.loc.gov/2020040688>
10 9 8 7 6 5 4 3 2 1
A computer is like a violin. You can imagine a novice trying first a phonograph and then a violin. The latter, he says, sounds terrible. That is the argument we have heard from our humanists and most of our computer scientists. Computer programs are good, they say, for particular purposes, but they aren't flexible. Neither is a violin, or a typewriter, until you learn how to use it.
Marvin Minsky, "Why Programming Is a Good Medium for Expressing Poorly-Understood and Sloppily-Formulated Ideas" in *Design and Planning*, (1967)
# Contents
| Foreword |
|----------------------------------------------------------|
| Preface |
| Acknowledgments |
| 1:<br>Flexibility<br>in<br>Nature<br>and<br>in<br>Design |
| 2:<br>Domain-Specific<br>Languages |
| 3:<br>Variations<br>on<br>an<br>Arithmetic<br>Theme |
| 4:<br>Pattern<br>Matching |
| 5:<br>Evaluation |
| 6:<br>Layering |
| 7:<br>Propagation |
| 8:<br>Epilogue |
| A<br>Appendix:<br>Supporting<br>Software |
| B<br>Appendix:<br>Scheme |
| References |
| Index |
| List<br>of<br>Exercises |
#### **List of figures**
#### Chapter 1
- Figure 1.1 The superheterodyne plan, invented by Major Edwin Armstrong in 1918,…
- Figure 1.2 Exploratory behavior can be accomplished in two ways. In one way a g…
#### Chapter 2
- Figure 2.1 The composition f g of functions f and g is a new function that is…
- Figure 2.2 In parallel-combine the functions f and g take the same number of ar…
- Figure 2.3 In spread-combine the n +m arguments are split between the functions…
- Figure 2.4 The combinator spread-combine is really a composition of two parts. …
- Figure 2.5 The combinator (discard-argument 2) takes a threeargument function …
- Figure 2.6 The combinator ((curry-argument 2) 'a 'b 'c) specifies three of the …
- Figure 2.7 The combinator (permute-arguments 1 2 0 3) takes a function f of fou…
#### Chapter 3
Figure 3.1 A trie can be used to classify sequences of features. A trie is a di…
#### Chapter 7
- Figure 7.1 Kanizsa's triangle is a classic example of a completion illusion. Th…
- Figure 7.2 The angle θ of the triangle to the distant star erected on the semim…
- Figure 7.3 Here we see a "wiring diagram" of the propagator system constructed …
- Figure 7.4 The constraint propagator constructed by c:\* is made up of three dir…
# <span id="page-8-0"></span>**Foreword**
Sometimes when you're writing a program, you get stuck. Maybe it's because you realize you didn't appreciate some aspect of the problem, but all too often it's because you made some decision early in the program design process, about a choice of data structure or a way of organizing the code, that has turned out to be too limiting, and also to be difficult to undo.
This book is a master class in specific program organization strategies that maintain flexibility. We all know by now that while it is very easy to declare an array of fixed size to hold data to be processed, such a design decision can turn out to be an unpleasant limitation that may make it impossible to handle input lines longer than a certain length, or to handle more than a fixed number of records. Many security bugs, especially in the code for the Internet, have been consequences of allocating a fixed-size memory buffer and then failing to check whether the data to be processed would fit in the buffer. Dynamically allocated storage (whether provided by a C-style malloc library or by an automatic garbage collector), while more complicated, is much more flexible and, as an extra benefit, less error-prone (especially when the programming language always checks array references to make sure the index is within bounds). That's just a very simple example.
A number of early programming language designs in effect made a design commitment to reflect the style of hardware organization called the *Harvard architecture*: the code is *here*, the data is *there*, and the job of the code is to massage the data. But an inflexible, arm's-length separation between code and data turns out to be a significant limitation on program organization. Well before the end of the twentieth century, we learned from functional programming
languages (such as ML, Scheme, and Haskell) and from objectoriented programming languages (such as Simula, Smalltalk, C++, and Java) that there are advantages to being able to treat code as data, to treat data as code, and to bundle smallish amounts of code and related data together rather than organizing code and data separately as monolithic chunks. The most flexible kind of data is a record structure that can contain not only "primitive data items" such as numbers and characters but also references to executable code, such as a function. The most powerful kind of code constructs other code that has been bundled with just the right amount of curated data; such a bundle is not just a "function pointer" but a *closure* (in a functional language) or an *object* (in an object-oriented language).
Jerry Sussman and Chris Hanson draw on their collective century of programming experience to present a set of techniques, developed and tested during decades of teaching at MIT, that further extend this basic strategy for flexibility. Don't just use functions; use *generic* functions, which are open-ended in a way that plain functions are not. Keep functions small. Often the best thing for a function to return is another function (that has been customized with curated data). Be prepared to treat data as code, perhaps even to the extreme of creating a new embedded programming language within your application if necessary. (That is one view of how the Scheme language got its start: the MacLisp dialect of Lisp did not support a completely general form of function closure, so Jerry and I simply used MacLisp to code an embedded dialect of Lisp that did support the kind of function closure we needed.) Be prepared to replace a data structure with a more general data structure that subsumes the original and extends its capabilities. Use automatic constraint propagation to avoid a premature commitment to which data items are inputs and which are outputs.
This book is not a survey, or a tutorial—as I said before, it is a master class. In each chapter, watch as two experts demonstrate an advanced technique by incrementally developing a chunk of working code, explaining the strategy as they go, occasionally
pausing to point out a pitfall or to remove a limitation. Then be prepared, when called on, to demonstrate the technique yourself, by extending a data structure or writing additional code—and then to use your imagination and creativity to go beyond what they have demonstrated. The ideas in this book are rich and deep; close attention to both the prose and the code will be rewarded.
> Guy L. Steele Jr. Lexington, Massachusetts August 2020
# <span id="page-11-0"></span>**Preface**
We have all spent too much time trying to deform an old piece of code so that it could be used in a way that we didn't realize would be needed when we wrote it. This is a terrible waste of time and effort. Unfortunately, there are many pressures on us to write code that works very well for a very specific purpose, with few reusable parts. But we think that this is not necessary.
It is hard to build systems that have acceptable behavior over a larger class of situations than was anticipated by their designers. The best systems are evolvable: they can be adapted to new situations with only minor modification. How can we design systems that are flexible in this way?
It would be nice if all we had to do to add a new feature to a program was to add some code, without changing the existing code base. We can often do this by using certain organizing principles in the construction of the code base and incorporating appropriate hooks at that time.
Observations of biological systems tell us a great deal about how to make flexible and evolvable systems. Techniques originally developed in support of symbolic artificial intelligence can be viewed as ways of enhancing flexibility and adaptability in programs and other engineered systems. By contrast, common practice of computer science actively discourages the construction of systems that are easily modified for use in novel settings.
We have often programmed ourselves into corners and had to expend great effort refactoring code to escape from those corners. We have now accumulated enough experience to feel that we can identify, isolate, and demonstrate strategies and techniques that we have found to be effective for building large systems that can be
adapted for purposes that were not anticipated in the original design. In this book we share some of the fruits of our over 100 years of programming experience.
#### **This book**
This book was developed as the result of teaching computer programming at MIT. We started this class many years ago, intending to expose advanced undergraduate students and graduate students to techniques and technologies that are useful in the construction of programs that are central to artificial intelligence applications, such as mathematical symbolic manipulation and rulebased systems. We wanted the students to be able to build these systems flexibly, so that it would be easier to combine such systems to make even more powerful systems. We also wanted to teach students about dependencies—how they can be tracked, and how they can be used for explanation and to control backtracking.
Although the class was and is successful, it turned out that in the beginning we did not have as much understanding of the material as we originally believed. So we put a great deal of effort into sharpening our tools and making our ideas more precise. We now realize that these techniques are not just for artificial intelligence applications. We think that anyone who is building complex systems, such as computer-language compilers and integrated development environments, will benefit from our experience. This book is built on the lectures and problem sets that are now used in our class.
#### **The contents**
There is much more material in this book than can be covered in a single-semester class. So each time we offer the class we pick and choose what to present. Chapter 1 is an introduction to our
programming philosophy. Here we show *flexibility* in the grand context of nature and of engineering. We try to make the point that flexibility is as important an issue as efficiency and correctness. In each subsequent chapter we introduce techniques and illustrate them with sets of exercises. This is an important organizing principle for the book.
In chapter 2 we explore some universally applicable ways of building systems with room to grow. A powerful way to organize a flexible system is to build it as an assembly of domain-specific languages, each appropriate for easily expressing the construction of a subsystem. Here we develop basic tools for the development of domain-specific languages: we show how subsystems can be organized around mix-and-match parts, how they can be flexibly combined with *combinators*, how *wrappers* can be used to generalize parts, and how we can often simplify a program by abstracting out a domain model.
In chapter 3 we introduce the extremely powerful but potentially dangerous flexibility technique of predicate-dispatched *generic procedures*. We start by generalizing arithmetic to deal with symbolic algebraic expressions. We then show how such a generalization can be made efficient by using type tags for data, and we demonstrate the power of the technique with the design of a simple, but easy to elaborate, adventure game.
In chapter 4 we introduce symbolic *pattern matching*, first to enable term-rewriting systems, and later, with *unification*, to show how type inference can easily be made to work. Here we encounter the need for *backtracking* because of segment variables. Unification is the first place where we see the power of representing and combining *partial-information* structures. We end the chapter with extending the idea to matching general graphs.
In chapter 5 we explore the power of *interpretation* and *compilation*. We believe that programmers should know how to escape the confines of whatever programming language they must use by making an interpreter for a language that is more appropriate for expressing the solution to the current problem. We also show how to naturally incorporate backtracking search by implementing
nondeterministic amb in an interpreter/compiler system, and how to use *continuations*.
In chapter 6 we show how to make systems of *layered data* and *layered procedures*, where each data item can be annotated with a variety of metadata. The processing of the underlying data is not affected by the metadata, and the code for processing the underlying data does not even know about or reference the metadata. However, the metadata is processed by its own procedures, effectively in parallel with the data. We illustrate this by attaching units to numerical quantities and by showing how to carry dependency information, giving the provenance of data, as derived from the primitive sources.
This is all brought together in chapter 7, where we introduce *propagation* to escape from the expression-oriented paradigm of computer languages. Here we have a wiring-diagram vision of connecting modules together. This allows the flexible incorporation of multiple sources of partial information. Using layered data to support tracking of dependencies enables the implementation of *dependency-directed backtracking*, which greatly reduces the search space in large and complex systems.
This book can be used to make a variety of advanced classes. We use the combinator idea introduced in chapter 2 and the generic procedures introduced in chapter 3 in all subsequent chapters. But patterns and pattern matching from chapter 4 and evaluators from chapter 5 are not used in later chapters. The only material from chapter 5 that is needed later is the introduction to amb in sections 5.4 and 5.4.1. The layering idea in chapter 6 is closely related to the idea of generic procedures, but with a new twist. The use of layering to implement dependency tracking, introduced as an example in chapter 6, becomes an essential ingredient in propagation (chapter 7), where we use the dependencies to optimize backtracking search.
#### **Scheme**
The code in this book is written in Scheme, a mostly functional language that is a variant of Lisp. Although Scheme is not a popular language, or widely used in an industrial context, it is the right choice for this book. [1](#page-16-0)
<span id="page-15-0"></span>The purpose of this book is the presentation and explanation of programming ideas. The presentation of example code to elucidate these ideas is shorter and simpler in Scheme than in more popular languages, for many reasons. And some of the ideas would be nearly impossible to demonstrate using other languages.
Languages other than those in the Lisp family require lots of ceremony to say simple things. The only thing that makes our code long-winded is that we tend to use long descriptive names for computational objects.
The fact that Scheme syntax is extremely simple—it is just a representation of the natural parse tree, requiring minimal parsing —makes it easy to write programs that manipulate program texts, such as interpreters, compilers, and algebraic expression manipulators.
It is important that Scheme is a permissive rather than a normative language. It does not try to prevent a programmer from doing something "stupid." This allows us to play powerful games, like dynamically modulating the meanings of arithmetic operators. We would not be able to do this in a language that imposes more restrictive rules.
Scheme allows assignment but encourages functional programming. Scheme does not have static types, but it has very strong dynamic typing that allows safe dynamic storage allocation and garbage collection: a user program cannot manufacture a pointer or access an arbitrary memory location. It is not that we think static types are not a good idea. They certainly are useful for the early exorcism of a large class of bugs. And Haskell-like type systems can be helpful in thinking out strategies. But for this book the intellectual overhead of static types would inhibit consideration of potentially dangerous strategies of flexibility.
Also Scheme provides special features, such as reified continuations and dynamic binding, that are not available in most other languages. These features allow us to implement such powerful mechanisms as nondeterministic amb in the native language (without a second layer of interpretation).
<span id="page-16-0"></span>[1](#page-15-0) We provide a short introduction to Scheme in Appendix B.
# <span id="page-17-0"></span>**Acknowledgments**
This book would not have been possible without the help of a great number of MIT students who have been in our classes. They actually worked the problems and often told us about bad choices we made and things we did wrong! We are especially indebted to those students who served as teaching assistants over the years. Michael Blair, Alexey Radul, Pavel Panchekha, Robert L. McIntyre, Lars E. Johnson, Eli Davis, Micah Brodsky, Manushaqe Muco, Kenny Chen, and Leilani Hendrina Gilpin have been especially helpful.
Many of the ideas presented here were developed with the help of friends and former students. Richard Stallman, Jon Doyle, David McAllester, Ramin Zabih, Johan deKleer, Ken Forbus, and Jeff Siskind all contributed to our understanding of dependency-directed backtracking. And our understanding of propagation, in chapter 7, is the result of years of work with Richard Stallman, Guy Lewis Steele Jr., and Alexey Radul.
We are especially grateful for the help and support of the functional-programming community, and especially of the Scheme Team. Guy Steele coinvented Scheme with Gerald Jay Sussman back in the 1970s, and he has given a guest lecture in our class almost every year. Arthur Gleckler, Guillermo Juan Rozas, Joe Marshall, James S. Miller, and Henry Manyan Wu were instrumental in the development of MIT/GNU Scheme. Taylor Campbell and Matt Birkholz have made major contributions to that venerable system. We also want to thank Will Byrd and Michael Ballantyne for their help with understanding unification with segment variables.
Hal Abelson and Julie Sussman, coauthors with Gerald Jay Sussman of *Structure and Interpretation of Computer Programs*, helped form our ideas for this book. In many ways this book is an advanced sequel to SICP. Dan Friedman, with his many wonderful students and friends, has made deep contributions to our understanding of programming. We have had many conversations about the art of programming with some of the greatest wizards, such as William Kahan, Richard Stallman, Richard Greenblatt, Bill Gosper, and Tom Knight. Working with Jack Wisdom for many years on mathematical dynamics helped clarify many of the issues that we address in this book.
Sussman wants to especially acknowledge the contributions of his teachers: ideas from discussions with Marvin Minsky, Seymour Papert, Jerome Lettvin, Joel Moses, Paul Penfield, and Edward Fredkin appear prominently in this text. Ideas from Carl Hewitt, David Waltz, and Patrick Winston, who were contemporaneous students of Minsky and Papert, are also featured here. Jeff Siskind and Alexey Radul pointed out and helped with the extermination of some very subtle bugs.
Chris learned a great deal about large-scale programming while working at Google and later at Datera; this experience has influenced parts of this book. Arthur Gleckler provided useful feedback on the book in biweekly lunches. Mike Salisbury was always excited to hear about the latest developments during our regular meetings at Google. Hongtao Huang and Piyush Janawadkar read early drafts of the book. A special thanks goes to Rick Dukes, the classmate at MIT who introduced Chris to the lambda papers and set him on the long road towards this book.
We thank the MIT Department of Electrical Engineering and Computer Science and the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) for their hospitality and logistical support. We acknowledge the Panasonic Corporation (formerly the Matsushita Electric Industrial Corporation) for support of Gerald Jay Sussman through an endowed chair. Chris Hanson was also partially supported by CSAIL and later by Google for this work.
Julie Sussman, PPA, provided careful reading and serious criticism that forced us to reorganize and rewrite major parts of the text. She has also developed and maintained Gerald Jay Sussman over these many years.
Elizabeth Vickers, spouse of many years, provided a supporting and stable environment for both Chris and their children, Alan and Erica. Elizabeth also cooked many excellent meals for both authors during the long work sessions in Maine. Alan was an occasional but enthusiastic reader of early drafts.
Chris Hanson and Gerald Jay Sussman
---
## 다이어그램 페이지
### Page 1
![Page 1 — 다이어그램](설계원칙-001-020_images/page_1.png)

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.5 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.1 MiB

View File

@@ -0,0 +1,209 @@
# **Flexibility in Nature and in Design**
It is difficult to design a mechanism of general utility that does any particular job very well, so most engineered systems are designed to perform a specific job. General-purpose inventions, such as the screw fastener, are rare and of great significance. The digital computer is a breakthrough of this kind, because it is a universal machine that can simulate any other information-processing machine. [1](#page-20-0) We write software that configures our computers to effect this simulation for the specific jobs that we need done.
<span id="page-0-0"></span>We have been designing software to do particular jobs very well, as an extension of past engineering practice. Each piece of software is designed to do a relatively narrow job. As the problem to be solved changes, the software must be changed. But small changes to the problem do not often entail only small changes to the software. Software is designed too tightly for there to be much flexibility. As a consequence, systems cannot evolve gracefully. They are brittle and must be replaced with entirely new designs as the problem domain changes. [2](#page-20-1) This is slow and expensive.
<span id="page-0-1"></span>Our engineered systems do not have to be brittle. The Internet has been extended from a small system to one of global scale. Our cities evolve organically, to accommodate new business models, life styles, and means of transportation and communication. Indeed, from observation of biological systems we see that it is possible to build systems that can be adapted to changes in the environment, both individually and as an evolutionary ensemble. Why is this not the way we design and build most software? There are historical reasons, but the main reason is that we don't know how to do this
generally. At this moment it is an accident if a system turns out to be robust in the face of changes in requirements.
### **Additive programming**
Our goal in this book is to investigate how to construct computational systems so that they can be easily adapted to changing requirements. One should not have to modify a working program. One should be able to add to it to implement new functionality or to adjust old functions for new requirements. We call this *additive programming*. We explore techniques to add functionality to an existing program without breaking it. Our techniques do not guarantee that the additions are correct: the additions must themselves be debugged; but they should not damage existing functionality accidentally.
Many of the techniques we explore in this book are not novel: some of them date back to the early days of computing! They are also not a comprehensive set, but simply some that we have found useful. Our intention is not to promote the use of these techniques, but to encourage a style of thinking that is focused on flexibility.
In order for additive programming to be possible, it is necessary to minimize the assumptions about how a program works and how it will be used. Assumptions made during the design and construction of a program may reduce the possible future extensions of the program. Instead of making such assumptions, we build our programs to make just-in-time decisions based on the environment that the program is running in. We will explore several techniques that support this kind of design.
We can always combine programs to get the union of the behaviors that each supports. But we want the whole to be more than the sum of its parts; we want the parts of the combined system to cooperate to give the system capabilities that no one part can provide by itself. But there are tradeoffs here: the parts that we combine to make a system must sharply separate concerns. If a part does one thing extremely well, it is easier to reuse, and also easier to debug, than one that combines several disparate capabilities. If we
want to build additively, it is important that the individual pieces combine with minimal unintended interactions.
To facilitate additive programming, it is necessary that the parts we build be as simple and general as we can make them. For example, a part that accepts a wider range of inputs than is strictly necessary for the problem at hand will have a wider applicability than one that doesn't. And families of parts that are built around a standardized interface specification can be mixed and matched to make a great variety of systems. It is important to choose the right abstraction level for our parts, by identifying the domain of discourse for the family and then building the family for that domain. We start consideration of these requirements in chapter 2.
For maximum flexibility the range of outputs of a part should be quite small and well defined—much smaller than the range of acceptable inputs for any part that might receive that output. This is analogous to the static discipline in the digital abstraction that we teach to students in introductory computer systems subjects [126]. The essence of the digital abstraction is that the outputs are always better than the acceptable inputs of the next stage, so that noise is suppressed.
In software engineering this principle is enshrined as "Postel's law" in honor of Internet pioneer Jon Postel. In RFC760 [97], describing the Internet protocol, he wrote: "The implementation of a protocol must be robust. Each implementation must expect to interoperate with others created by different individuals. While the goal of this specification is to be explicit about the protocol, there is the possibility of differing interpretations. In general, an implementation should be conservative in its sending behavior, and liberal in its receiving behavior." This is usually summarized as "Be conservative in what you do, be liberal in what you accept from others."
Using more general parts than appear to be necessary builds a degree of flexibility into the entire structure of our systems. Small perturbations of the requirements can be tolerated, because every component is built to accept perturbed (noisy) inputs.
A family of mix-and-match parts for a particular domain of discourse is the foundation of a *domain-specific language*. Often the best way to attack a family of hard problems is to make a language—a set of primitives, means of combination, and means of abstraction—that makes the solutions for those problems easy to express. So we want to be able to erect appropriate domain-specific languages as needed, and to combine such languages flexibly. We start thinking about domain-specific languages in chapter 2. More powerfully, we can implement such languages by direct evaluation. We expand on this idea in chapter 5.
One strategy for enhancing flexibility, which should be familiar to many programmers, is *generic dispatch*. We will explore this extensively in chapter 3. Generic dispatch is often a useful way to extend the applicability of a procedure by adding additional handlers (methods) based on details of the arguments passed to the procedure. By requiring handlers to respond to disjoint sets of arguments, we can avoid breaking an existing program when a new handler is added. However, unlike the generic dispatch in the typical object-oriented programming context, our generic dispatch doesn't involve ideas like classes, instances, and inheritance. These weaken the separation of concerns by introducing spurious ontological commitments.
A quite different strategy, to be explored in chapter 6, is to *layer* both data and procedures. This exploits the idea that data usually has associated metadata that can be processed alongside the data. For example, numerical data often has associated units. We will show how providing the flexibility of adding layers after the fact can enhance a program with new functionality, without any change to the original program.
We can also build systems that combine multiple sources of *partial information* to obtain more complete answers. This is most powerful when the contributions come from independent sources of information. In chapter 4 we will see how type inference is really a matter of combining multiple sources of partial information. Locally deducible clues about the type of a value, for example that a numerical comparison requires numerical inputs and produces a
boolean output, can be combined with other local type constraints to produce nonlocal type constraints.
In chapter 7 we will see a different way to combine partial information. The distance to a nearby star can be estimated geometrically, by parallax: measuring the angle by which the star image shifts against the background sky as the Earth revolves around the Sun. The distance to the star can also be estimated by consideration of its brightness and its spectrum, using our understanding of stellar structure and evolution. Such estimates can be combined to get estimates that are more accurate than the individual contributions.
A dual idea is the use of *degeneracy*: having multiple ways to compute something, which can be combined or modulated as needed. There are many valuable uses for degeneracy, including error detection, performance management, and intrusion detection. Importantly, degeneracy is also additive: each contributing part is self-contained and can produce a result by itself. One interesting use of degeneracy is to dynamically select from different implementations of an algorithm depending on context. This avoids the need to make assumptions about how the implementation will be used.
Design and construction for flexibility has definite costs. A procedure that can take a greater variety of inputs than are necessary for solving the current problem will have more code than absolutely necessary and will take more thinking by the programmer than absolutely necessary. The same goes for generic dispatch, layering, and degeneracy, each of which involves constant overheads in memory space, compute time, and/or programmer time. But the principal cost of software is the time spent by programmers over the lifetime of the product, including maintenance and adaptations that are needed for changing requirements. Designs that minimize rewriting and refactoring reduce the overall costs to the incremental additions rather than complete rewrites. In other words, long-term costs are additive rather than multiplicative.
# **1.1 Architecture of computation**
<span id="page-5-0"></span>A metaphor from architecture may be illuminating for the kind of system that we contemplate. After understanding the nature of the site to be built on and the requirements for the structure to be constructed, the design process starts with a *parti*: an organizing principle for the design. [3](#page-20-2) The *parti* is usually a sketch of the geometric arrangement of parts. The *parti* may also embody abstract ideas, such as the division into "served spaces" and "servant spaces," as in the work of Louis Isadore Kahn [130]. This decomposition is intended to divide the architectural problem into parts by separating out infrastructural support, such as the hallways, the restrooms, the mechanical rooms, and the elevators, from the spaces to be supported, such as the laboratories, classrooms, and offices in an academic building.
The *parti* is a model, but it is usually not a completely workable structure. It must be elaborated with functional elements. How do we fit in the staircases and elevators? Where do the HVAC ducts, the plumbing, the electrical and communications distribution systems go? How will we run a road to accommodate the delivery patterns of service vehicles? These elaborations may cause modifications of the *parti*, but the *parti* continues to serve as a scaffold around which these elaborations are developed.
In programming, the *parti* is the abstract plan for the computations to be performed. At small scale the *parti* may be an abstract algorithm and data-structure description. In larger systems it is an abstract composition of phases and parallel branches of a computation. In even larger systems it is an allocation of capabilities to logical (or even physical) locales.
Traditionally, programmmers have not been able to design as architects. In very elaborate languages, such as Java, the *parti* is tightly mixed with the elaborations. The "served spaces," the expressions that actually describe the desired behavior, are horribly conflated with the "servant spaces," such as the type declarations,
<span id="page-6-0"></span>the class declarations, and the library imports and exports. [4](#page-20-3) More spare languages, such as Lisp or Python, leave almost no room for the servant spaces, and attempts to add declarations, even advisory ones, are shunned because they impede the beauty of the exposed *parti*.
The architectural *parti* should be sufficiently complete to allow the creation of models that can be used for analysis and criticism. The skeleton plan of a program should be adequate for analysis and criticism, but it should also be executable, for experiment and for debugging. Just as an architect must fill in the *parti* to realize the structure being designed, a programmer must elaborate the plan to realize the computational system required. Layering (introduced in chapter 6) is one way to build systems that allow this kind of elaboration.
# **1.2 Smart parts for flexibility**
Large systems are composed of many smaller components, each of which contributes to the function of the whole either by directly providing a part of that function or by cooperating with other components to which it is interconnected in some pattern specified by the system architect to establish a required function. A central problem in system engineering is the establishment of interfaces that allow the interconnection of components so that the functions of those components can be combined to build compound functions.
For relatively simple systems the system architect may make formal specifications for the various interfaces that must be satisfied by the implementers of the components to be interconnected. Indeed, the amazing success of electronics is based on the fact that it is feasible to make such specifications and to meet them. High-frequency analog equipment is interconnected with coaxial cable with standardized impedance characteristics, and with standardized families of connectors [4]. Both the function of a
component and its interface behavior can usually be specified with only a few parameters [60]. In digital systems things are even clearer: there are static specifications of the meanings of signals (the digital abstraction); there are dynamic specifications of the timing of signals [126]; and there are mechanical specifications of the form factors of components. [5](#page-21-0)
<span id="page-7-0"></span>Unfortunately, this kind of a priori specification becomes progressively more difficult as the complexity of the system increases. We could specify that a chess-playing program plays a *legal* game— that it doesn't cheat—but how would one begin to specify that it plays a *good* game of chess? Our software systems are built with large numbers of custom-made highly specialized parts. The difficulty of specifying software components is exacerbated by the individualized nature of the components.
By contrast, biology constructs systems of enormous complexity without very large specifications (considering the problem to be solved!). Every cell in our bodies is a descendant of a single zygote. All the cells have exactly the same genetic endowment (about 1 GByte of ROM!). However, there are skin cells, neurons, muscle cells, etc. The cells organize themselves to be discrete tissues, organs, and organ systems. Indeed, the 1 GByte of ROM specifies how to build the enormously complex machine (the human) from a huge number of failure-prone parts. It specifies how to operate those basic parts and how to configure them. It also specifies how to operate that compound machine reliably, over a great range of hostile conditions, for a very long life span, and how to defend that machine from others that would love to eat it!
If our software components were simpler or more general they would have simpler specifications. If the components were able to adapt themselves to their surroundings, the precision of their specification would be less important. Biological systems exploit both of these strategies to build robust complex organisms. The difference is that the biological cells are dynamically configurable, and able to adapt themselves to their context. This is possible because the way a cell differentiates and specializes depends on its
environment. Our software doesn't usually have this ability, and consequently we must adapt each part by hand. How could biology possibly work?
<span id="page-8-0"></span>Consider another example. We know that the various components of the brain are hooked together with enormous bundles of neurons, and there is nowhere near enough information in the genome to specify that interconnect in any detail. It is likely that the various parts of the brain learn to communicate with each other, based on the fact that they share important experiences. [6](#page-21-1) So the interfaces must be self-configuring, based on some rules of consistency, information from the environment, and extensive exploratory behavior. This is pretty expensive in boot-up time (it takes some years to configure a working human), but it provides a kind of robustness that is not found in our engineered entities to date.
<span id="page-8-1"></span>One idea is that biological systems use contextual signals that are informative rather than imperative. [7](#page-21-2) There is no master commander saying what each part must do; instead the parts choose their roles based on their surroundings. The behaviors of cells are not encoded in the signals; they are separately expressed in the genome. Combinations of signals just enable some behaviors and disable others. This weak linkage allows variation in the implementation of the behaviors that are enabled in various locales without modification of the mechanism that defines the locales. So systems organized in this way are evolvable in that they can accommodate adaptive variation in some locales without changing the behavior of subsystems in other locales.
Traditionally, software systems are built around an imperative model, in which there is a hierarchy of control built into the structure. The individual pieces are assumed to be dumb actors that do what they are told. This makes adaptation very difficult, since all changes must be reflected in the entire control structure. In social systems, we are well aware of the problems with strict power structures and centralized command. But our software follows this flawed model. We can do better: making the parts smarter and
individually responsible streamlines adaptation, since only those parts directly affected by a change need to respond.
#### **Body plans**
<span id="page-9-1"></span>All vertebrates have essentially the same body plan, yet the variation in details is enormous. Indeed, all animals with bilateral symmetry share homeobox genes, such as the Hox complex. Such genes produce an approximate coordinate system in the developing animal, separating the developing animal into distinct locales. [8](#page-21-3) The locales provide context for a cell to differentiate. And information derived from contact with its neighbors produces more context that selects particular behaviors from the possible behaviors that are available in the cell's genetic program. [9](#page-21-4) Even the methods of construction are shared—the morphogenesis of ducted glands, and organs such as lungs and kidneys, is based on one embryological trick: the invagination of epithelium into mesenchyme automagically [10](#page-21-5) produces a branching maze of blind-end tubules surrounded by differentiating mesenchyme. [11](#page-21-6)
<span id="page-9-4"></span><span id="page-9-3"></span><span id="page-9-2"></span><span id="page-9-0"></span>Good engineering has a similar flavor, in that good designs are modular. Consider the design of a radio receiver. There are several grand "body plans" that have been discovered, such as direct conversion, TRF (tuned radio frequency), and superheterodyne. Each has a sequence of locales, defined by the engineering equivalent of a Hox complex, that patterns the system from the antenna to the output transducer. For example, a superheterodyne receiver ([figure](#page-10-0) 1.1) has a standard set of locales (from nose to tail).
<span id="page-10-0"></span>![](설계원칙-021-044_images/_page_10_Figure_0.jpeg)
**[Figure](#page-9-0) 1.1** The superheterodyne plan, invented by Major Edwin Armstrong in 1918, is still the dominant "body plan" for radio receivers.
The modules identified in this plan each decompose into yet other modules, such as oscillators, mixers, filters, and amplifiers, and so on down to the individual electronic components. Additionally, each module can be instantiated in many possible ways: the RF section may be just a filter, or it may be an elaborate filter and amplifier combination. Indeed, in an analog television receiver part of the output of the mixer is processed as AM by the video chain and another part is processed as FM to produce the audio. And some sections, such as the converter, may be recursively elaborated (as if parts of the Hox complex were duplicated!) to obtain multiple-conversion receivers.
In biological systems this structure of compartments is also supported at higher levels of organization. There are tissues that are specialized to become boundaries of compartments, and tubes that interconnect them. Organs are bounded by such tissues and interconnected by such tubes, and the entire structure is packaged to fit into coeloms, which are cavities lined with specialized tissues in higher organisms.
Similar techniques can be used in software. A body plan is just a wrapper that combines partially specified components. This is a kind of *combinator*: a thing that combines subparts together into a larger part. It is possible to create *combinator languages*, in which
the components and the composite all have the same interface specification. In a combinator language, it is possible to build arbitrarily large composites from small numbers of mix-and-match components. The self-similar structures make combination easy. In chapter 2 we will begin to build combinator-based software, and this theme will run through all of the rest of the book.
Something similar can be done with domain-specific languages. By making an abstraction of the domain, we can use the same domain-independent code in different domains. For example, numerical integrators are useful in any domain that has numerical aspects, regardless of the domain. Another example is pattern matching in chapter 4, which can be applied to a wide variety of domains.
<span id="page-11-0"></span>Biological mechanisms are universal in that each component can, in principle, act as any other component. Analog electronics components are not universal in that sense. They do not adapt themselves to their surroundings based on local signaling. But there are universal electrical building blocks (a programmable computer with analog interfaces, for example!). [12](#page-22-0) For low-frequency applications one can build analog systems from such blocks. If each block had all of the code required to be any block in the system, but was specialized by interactions with its neighbors, and if there were extra unspecialized "stem cells" in the package, then we could imagine building self-reconfiguring and self-repairing analog systems. But for now we still design and build these parts individually.
In programming we do have the idea of a universal element: the *evaluator*. An evaluator takes a description of some computation to be performed and inputs to that computation. It produces the outputs that would arise if we passed the inputs to a bespoke component that implemented the desired computation. In computation we have a chance to pursue the powerfully flexible strategy of embryonic development. We will elaborate on the use of evaluator technology in chapter 5.
# **1.3 Redundancy and degeneracy**
<span id="page-12-0"></span>Biological systems have evolved a great deal of robustness. One of the characteristics of biological systems is that they are redundant. Organs such as the liver and kidney are highly *redundant*: there is vastly more capacity than is necessary to do the job, so a person missing a kidney or part of a liver suffers no obvious incapacity. Biological systems are also highly *degenerate*: there are usually many ways to satisfy a given requirement. [13](#page-22-1) For example, if a finger is damaged, there are ways that the other fingers may be configured to pick up an object. We can obtain the necessary energy for life from a great variety of sources: we can metabolize carbohydrates, fats, and proteins, even though the mechanisms for digestion and for extraction of energy from each of these sources is quite distinct.
The genetic code is itself degenerate, in that the map from codons (triples of nucleotides) to amino acids is not one-to-one: there are 64 possible codons to specify only about 20 possible amino acids [86, 54]. As a consequence, many point mutations (changes of a single nucleotide) do not change the protein specified by a coding region. Also, quite often the substitution of one amino acid with a similar one does not impair the biological activity of a protein. These degeneracies provide ways that variation can accumulate without obvious phenotypic consequences. Furthermore, if a gene is duplicated (not an uncommon occurrence), the copies may diverge silently, allowing the development of variants that may become valuable in the future, without interfering with current viability. In addition, the copies can be placed under different transcriptional controls.
<span id="page-12-1"></span>Degeneracy is a product of evolution, and it certainly enables evolution. Probably degeneracy is itself selected for, because only creatures that have significant amounts of degeneracy are sufficiently adaptable to allow survival as the environment changes. [14](#page-22-2) For example, suppose we have some creature (or engineered system) that is degenerate in that there are several very
different independent mechanisms to achieve some essential function. If the environment changes (or the requirements change) so that one of the ways of achieving an essential function becomes untenable, the creature will continue to live and reproduce (the system will continue to satisfy its specifications). But the subsystem that has become inoperative is now open to mutation (or repair), without impinging on the viability (or current operation) of the system as a whole.
The theoretical structure of physics is deeply degenerate. For example, problems in classical mechanics can be approached in multiple ways. There is the Newtonian formulation of vectoral mechanics and the Lagrangian and Hamiltonian formulations of variational mechanics. If both vectoral mechanics and either form of variational mechanics are applicable, they produce equivalent equations of motion. For analysis of systems with dissipative forces like friction, vectoral mechanics is effective; variational methods are not well suited for that kind of system. Lagrangian mechanics is far better than vectoral mechanics for dealing with systems with rigid constraints, and Hamiltonian mechanics provides the power of canonical transformations to help understand systems using the structure of phase space. Both the Lagrangian and Hamiltonian formulations help us with deep insights into the role of symmetries and conserved quantities. The fact that there are three overlapping ways of describing a mechanical system, which agree when they are all applicable, gives us multiple avenues of attack on any problem [121].
Engineered systems may incorporate some redundancy, in critical systems where the cost of failure is extreme. But they almost never intentionally incorporate degeneracy of the kind found in biological systems, except as a side effect of designs that are not optimal. [15](#page-22-3)
<span id="page-13-0"></span>Degeneracy can add value to our systems: as with redundancy, we can cross-check the answers of degenerate computations to improve robustness. But degenerate computations are not just redundant but *different* from one another, meaning that a bug in one is
unlikely to affect the others. This is a positive characteristic not only for reliability but also for security, as a successful attack must compromise multiple degenerate parts.
When degenerate parts generate partial information, the result of their combination can be better than any individual result. Some navigation systems use this idea to combine several positional estimates to generate a highly accurate result. We will explore the idea of combining partial information in chapter 7.
# **1.4 Exploratory behavior**
<span id="page-14-2"></span>One of the most powerful mechanisms of robustness in biological systems is exploratory behavior. [16](#page-22-4) The idea is that the desired outcome is produced by a [generate-and-test](#page-14-0) mechanism (see figure 1.2). This organization allows the generator mechanism to be general and to work independently of the testing mechanism that accepts or rejects a particular generated result.
<span id="page-14-1"></span><span id="page-14-0"></span>![](설계원칙-021-044_images/_page_14_Figure_4.jpeg)
**[Figure](#page-14-1) 1.2** Exploratory behavior can be accomplished in two ways. In one way a generator proposes an action (or a result), which may be explicitly rejected by a tester. The generator then must propose an alternative. Another way is that the generator produces all of the alternatives, without feedback, and a filter selects one or more that are acceptable.
For example, an important component of the rigid skeleton that supports the shape of a cell is an array of microtubules. Each microtubule is made up of protein units that aggregate to form it. Microtubules are continually created and destroyed in a living cell; they are created growing out in all directions. However, only microtubules that encounter a kinetochore or other stabilizer in the cell membrane are stable, thus supporting the shape determined by the positions of the stabilizers [71]. So the mechanism for growing and maintaining a shape is relatively independent of the mechanism for specifying the shape. This mechanism partly determines the shapes of many types of cells in a complex organism, and it is almost universal in animals.
Exploratory behavior appears at all levels of detail in biological systems. The nervous system of a growing embryo produces a vastly larger number of neurons than will persist in the adult. Those neurons that find appropriate targets in other neurons, sensory organs, or muscles will survive, and those that find no targets kill themselves. The hand is fashioned by production of a pad and deletion, by apoptosis (programmed cell death), of the material between the fingers [131]. Our bones are continually being remodeled by osteoblasts (which build bone) and osteoclasts (which destroy bone). The shape and size of the bones is determined by constraints determined by their environment: the parts that they must be associated with, such as muscles, ligaments, tendons, and other bones.
Because the generator need not know about how the tester accepts or rejects its proposals, and the tester need not know how the generator makes its proposals, the two parts can be independently developed. This makes adaptation and evolution more efficient, because a mutation to one or the other of these two subsystems need not be accompanied by a complementary mutation to the other. However, this isolation can be expensive because of the wasted effort of generation and rejection of failed proposals. [17](#page-22-5)
<span id="page-15-0"></span>Indeed, generate and test is a metaphor for all of evolution. The mechanisms of biological variation are random mutations:
modifications of the genetic instructions. Most mutations are neutral in that they do not directly affect fitness because of degeneracy in the systems. Natural selection is the test phase. It does not depend on the method of variation, and the method of variation does not anticipate the effect of selection.
<span id="page-16-0"></span>There are even more striking phenomena: even in closely related creatures some components that end up almost identical in the adult are constructed by entirely different mechanisms in the embryo. [18](#page-22-6) For distant relationships, divergent mechanisms for constructing common structures may be attributed to "convergent evolution," but for close relatives it is more likely evidence for separation of levels of detail, in which the result is specified in a way that is somewhat independent of the way it is accomplished.
Engineered systems may show similar structure. We try to separate specification from implementation: there are often multiple ways to satisfy a specification, and designs may choose different implementations. The best method to use to sort a data set depends on the expected size of the data set, as well as the computational cost of comparing elements. The appropriate representation of a polynomial depends on whether it is sparse or dense. But if choices like these are made dynamically (an unusual system) they are deterministic: we do not see many systems that simultaneously try several ways to solve a problem and use the one that converges first (what are all those cores for, anyway?). It is even rare to find systems that try multiple methods sequentially: if one method fails try another. We will examine use of backtracking to implement generate-and-test mechanisms in pattern matching in chapter 4. We will learn how to build automatic backtracking into languages in chapter 5. And we will learn how to build a dependency-directed backtracking mechanism that extracts as much information as possible from failures in chapter 7.
# **1.5 The cost of flexibility**
Lisp programmers know the value of everything but the cost of nothing. Alan Perlis paraphrasing Oscar Wilde
We have noted that generality and evolvability are enhanced in systems that use generics, layers, redundancy, degeneracy, and exploratory behavior. Each of these is expensive, when looked at in isolation. A mechanism that works over a wide range of inputs must do more to get the same result than a mechanism specialized to a particular input. A redundant mechanism has more parts than an equivalent nonredundant mechanism. A degenerate mechanism appears even more extravagant. And a mechanism that explores by generate-and-test methods can easily get into an infeasible exponential search. Yet these are key ingredients in evolvable systems. Perhaps to make truly robust systems we must be willing to pay for what appears to be a rather elaborate and expensive infrastructure.
Part of the problem is that we are thinking about cost in the wrong terms. Use of time and space matters, but our intuition about where those costs come from is poor. Every engineer knows that evaluating the real performance of a system involves extensive and careful measurements that often show that the cost is in surprising places. As complexity increases, this will only get harder. But we persist in doing premature optimization at all levels of our programs without knowing its real value.
Suppose we separate the parts of a system that have to be fast from the parts that have to be smart. Under this policy, the cost of generality and evolvability can be confined to the parts that have to be smart. This is an unusual perspective in computing systems, yet it is ubiquitous in our life experience. When we try to learn a new skill, for example to play a musical instrument, the initial stages involve conscious activity to connect the intended effect to the physical movements required to produce it. But as the skill is mastered, most of the work is done without conscious attention.
This is essential to being able to play at speed, because the conscious activity is too slow.
A similar argument is found in the distinction between hardware and software. Hardware is designed for efficiency, at the cost of having a fixed interface. One can then build software on top of that interface—in effect creating a virtual machine—using software. That extra layer of abstraction incurs a well-known cost, but the tradeoff is well worth the generality that is gained. (Otherwise we'd still be programming in assembly language!) The point here is that this layered structure provides a way to have both efficiency and flexibility. We believe that requiring an entire system to be implemented in the most efficient possible way is counterproductive, preventing the flexibility for adapting to future needs. The real cost of a system is the time spent by programmers in designing, understanding, maintaining, modifying, and debugging the system. So the value of enhanced adaptability may be even more extreme. A system that is easily adapted and maintained eliminates one of the largest costs: teaching new programmers how the existing system works, in all its gory detail, so that they know where to reach in and modify the code. Indeed, the cost of our brittle infrastructure probably greatly exceeds the cost of flexible design, both in the cost of disasters and in the lost opportunity costs due to the time of redesign and rebuilding. And if a significant fraction of the time spent reprogramming a system for a new requirement is replaced by having that system adapt itself to the new situation, that can be an even bigger win.
### **The problem with correctness**
To the optimist, the glass is half full. To the pessimist, the glass is half empty. To the engineer, the glass is twice as big as it needs to be.
author unknown
But there may be an even bigger cost to building systems in a way that gives them a range of applicability greater than the set of situations that we have considered at design time. Because we intend to be willing to apply our systems in contexts for which they were not designed, we cannot be sure that they work correctly!
In computer science we are taught that the "correctness" of software is paramount, and that correctness is to be achieved by establishing formal specification of components and systems of components and by providing proofs that the specifications of a combination of components are met by the specifications of the components and the pattern by which they are combined. [19](#page-23-0) We assert that this discipline makes systems more brittle. In fact, to make truly robust systems we must discard such a tight discipline.
<span id="page-19-0"></span>The problem with requiring proofs is that it is usually harder to prove general properties of general mechanisms than it is to prove special properties of special mechanisms used in constrained circumstances. This encourages us to make our parts and combinations as special as possible so we can simplify our proofs. But the combination of tightly specialized parts is brittle—there is no room for variation! [20](#page-23-1)
<span id="page-19-2"></span><span id="page-19-1"></span>We are not arguing against proofs. They are wonderful when available. Indeed, they are essential for critical system components, such as garbage collectors (or ribosomes). [21](#page-23-2) However, even for safety-critical systems, such as autopilots, the restriction of applicability to situations for which the system is provably correct as specified may actually contribute to unnecessary failure. Indeed, we want an autopilot to make a good-faith attempt to safely fly an airplane that is damaged in a way not anticipated by the designer!
We are arguing against the discipline of *requiring* proofs: the requirement that everything must be proved to be applicable in a situation before it is allowed to be used in that situation excessively inhibits the use of techniques that could enhance the robustness of designs. This is especially true of techniques that allow a method to be used, on a tight leash, outside of its proven domain, and
techniques that provide for future expansion without putting limits on the ways things can be extended.
Unfortunately, many of the techniques we advocate make the problem of proof much more difficult, if not practically impossible. On the other hand, sometimes the best way to attack a problem is to generalize it until the proof becomes simple.
- <span id="page-20-0"></span>[1](#page-0-0) The discovery of the existence of universal machines by Alan Turing [124], and the fact that the set of functions that can be computed by Turing machines is equivalent to both the set of functions representable in Alonzo Church's *λ* calculus [17, 18, 16] and the general recursive functions of Kurt Gödel [45] and Jacques Herbrand [55], ranks among the greatest intellectual achievements of the twentieth century.
- <span id="page-20-1"></span>[2](#page-0-1) Of course, there are some wonderful exceptions. For example, Emacs [113] is an extensible editor that has evolved gracefully to adapt to changes in the computing environment and to changes in its users' expectations. The computing world is just beginning to explore "engineered frameworks," for example, Microsoft's .net and Sun's Java. These are intended to be infrastructures to support evolvable systems.
- <span id="page-20-2"></span>[3](#page-5-0) A *parti* (pronounced parTEE) is the central idea of an architectural work: it is "the [architectural] composition being conceived as a whole, with the detail being filled in later." [62]
- <span id="page-20-3"></span>[4](#page-6-0) Java *does* support interfaces, which could be considered a kind of *parti*, in that they are an abstract representation of the program. But a *parti* combines both abstract and concrete components, while a Java interface is wholly abstract. Not to mention that over-use of interfaces is considered a "code smell" by many programmers.
- <span id="page-21-0"></span>[5](#page-7-0) *The TTL Data Book for Design Engineers* [123] is a classic example of a successful set of specifications for digital-system components. TTL specifies several internally consistent "families" of small-scale and medium-scale integrated-circuit components. The families differ in such characteristics as speed and power dissipation, but not in function. The specification describes the static and dynamic characteristics of each family, the functions available in each family, and the physical packaging for the components. The families are cross-consistent as well as internally consistent in that each function is available in each family, with the same packaging and a consistent nomenclature for description. Thus a designer may design a compound function and later choose the family for implementation. Every good engineer (and biologist!) should be familiar with the lessons of TTL.
- <span id="page-21-1"></span>[6](#page-8-0) An elementary version of this self-configuring behavior has been demonstrated by Jacob Beal in his S.M. thesis [9].
- <span id="page-21-2"></span>[7](#page-8-1) Kirschner and Gerhart examine this [70].
- <span id="page-21-3"></span>[8](#page-9-1) This is a very vague description of a complex process involving gradients of morphogens. We do not intend to get more precise here, as this is not about biology, but rather about how biology can inform engineering.
- <span id="page-21-4"></span>[9](#page-9-2) We have investigated some of the programming issues involved in this kind of development in our Amorphous Computing project [2].
- <span id="page-21-5"></span>[10](#page-9-3) Automagically: "Automatically, but in a way which, for some reason (typically because it is too complicated, or too ugly, or perhaps even too trivial), the speaker doesn't feel like explaining." From *The Hacker's Dictionary* [117, 101]
- <span id="page-21-6"></span>[11](#page-9-4) One well-studied example of this kind of mechanism is the formation of the submandibular gland of the mouse. See, for
- example, the treatment in [11] or the summary in [7] section 3.4.3.
- <span id="page-22-0"></span>[12](#page-11-0) Piotr Mitros has developed a novel design strategy for building analog circuits from potentially universal building blocks. See [92].
- <span id="page-22-1"></span>[13](#page-12-0) Although clear in extreme cases, the distinction biologists make between redundancy and degeneracy is fuzzy at the boundary. For more information see [32].
- <span id="page-22-2"></span>[14](#page-12-1) Some computer scientists have used simulation to investigate the evolution of evolvability [3].
- <span id="page-22-3"></span>[15](#page-13-0) Indeed, one often hears arguments against building degeneracy into an engineered system. For example, in the philosophy of the computer language Python it is claimed: "There should be one and preferably only one—obvious way to do it." [95]
- <span id="page-22-4"></span>[16](#page-14-2) This thesis is nicely explored in the book of Kirschner and Gerhart [70].
- <span id="page-22-5"></span>[17](#page-15-0) This expense can be greatly reduced if there is sufficient information present to quickly reduce the number of candidates that must be tested. We will examine a very nice example of this optimization in chapter 7.
- <span id="page-22-6"></span>[18](#page-16-0) The cornea of a chick and the cornea of a mouse are almost identical, but the morphogenesis of these two are not at all similar: the order of the morphogenetic events is not even the same. Bard [7] section 3.6.1 reports that having divergent methods of forming the same structures in different species is common. He quotes a number of examples. One spectacular case is that the frog *Gastrotheca riobambae* (see del Pino and Elinson [28]) develops ordinary frog morphology from an embryonic disk, whereas other frogs develop from an approximately spherical embryo.
- <span id="page-23-0"></span>[19](#page-19-0) It is hard, and perhaps impossible, to specify a complex system. As noted on page 7, it is easy to specify that a chess player must play legal chess, but how would we specify that it plays well? And unlike chess, whose rules do not change, the specifications of most systems are dynamically changing as the conditions of their usage change. How do we specify an accounting system in the light of rapidly changing tax codes?
- <span id="page-23-1"></span>[20](#page-19-1) Indeed, Postel's Law (on page 3) is directly in opposition to the practice of building systems from precisely and narrowly specified parts: Postel's law instructs us to make each part more generally applicable than absolutely necessary for any particular application.
- <span id="page-23-2"></span>[21](#page-19-2) A subtle bug in a primitive storage management subsystem, like a garbage collector, is extremely difficult to debug—especially in a system with concurrent processes! But if we keep such subsystems simple and small they can be specified and even proved "correct" with a tractable amount of work.

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

File diff suppressed because it is too large Load Diff

Binary file not shown.

After

Width:  |  Height:  |  Size: 39 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 35 KiB

View File

@@ -0,0 +1,629 @@
# **Variations on an Arithmetic Theme**
In this chapter we introduce the extremely powerful but potentially dangerous flexibility technique of predicate-dispatched *generic procedures*. We start out in the relatively calm waters of arithmetic, modulating the meanings of the operator symbols.
We first generalize arithmetic to deal with symbolic algebraic expressions, and then to functions. We use a combinator system where the elements being combined are packages of arithmetic operations.
But soon we want even more flexibility. So we invent dynamically extensible generic procedures, where the applicability of a handler is determined by predicates on the supplied arguments. This is very powerful and great fun. Using generic procedures to extend the arithmetic to operate on "differential objects," we get automatic differentiation with very little work!
Predicate dispatch is pretty expensive, so we investigate ways to ameliorate that expense. In the process we invent a kind of tagged data, where a tag is just a way of memoizing the value of a predicate. To finish the chapter we demonstrate the power of generic procedures with the design of a simple, but easy to elaborate, adventure game.
# **3.1 Combining arithmetics**
Suppose we have a program that computes some useful numerical results. It depends on the meanings of the arithmetic operators that
are referenced by the program text. These operators can be extended to work on things other than the numbers that were expected by the program. With these extensions the program may do useful things that were not anticipated when the program was written. A common pattern is a program that takes numerical weights and other arguments and makes a linear combination by adding up the weighted arguments. If the addition and multiplication operators are extended to operate on tuples of numbers as well as on the original numbers, the program can make linear combinations of vectors. This kind of extension can work because the set of arithmetic operators is a well-specified and coherent entity. Extensions of numerical programs with more powerful arithmetic can work, unless the new quantities do not obey the constraints that were assumed by the author of the program. For example, multiplication of matrices does not commute, so extension of a numerical program that depends on the fact that multiplication of numbers is commutative will not work. We will ignore this problem for now.
#### **3.1.1 A simple ODE integrator**
A differential equation is a description of how the state of a system changes as an independent variable is varied; this is called the *evolution* of the system's state. <sup>1</sup> We can approximate the evolution of a system's state by sampling the independent variable at various points and approximating the state change at each sample point. This process of approximation is called *numerical integration*.
Let's investigate the generality of numerical operations in a numerical integrator for second-order ordinary differential equations. We will use an integrator that samples its independent variable at uniform intervals, each of which is called a *step*. Consider this equation:
$$D^2x(t) = F(t, x(t))$$
(3.1)
The essential idea is that a discrete approximation to the second derivative of the unknown function is a linear combination of second derivatives of some previous steps. The particular coefficients are chosen by numerical analysis and are not of interest here.
$$\frac{x(t+h) - 2x(t) + x(t-h)}{h^2} = \sum_{j=0}^{k} A(j)F(t-jh, x(t-jh))$$
(3.2)
where *h* is the step size and *A* is the array of magic coefficients. For example, Stormer's integrator of order 2 is
$$x(t+h) - 2x(t) + x(t-h)$$
$$= \frac{h^2}{12} (13F(t, x(t)) - 2F(t-h, x(t-h)) + F(t-2h, x(t-2h)))$$
(3.3)
To use this to compute the future of *x* we write a program. The procedure returned by stormer-2 is an integrator for a given function and step size, that given a history of values of *x*, produces an estimate of the value of *x* at the next time, *x*(*t* + *h*). The procedures t and x extract previous times and values of *x* from the history: (x 0 history) returns *x*(*t*), (x 1 history) returns *x*(*t h*), and (x 2 history) returns *x*(*t * 2*h*). We access the time of a step from a history similarly: (t 1 history) returns *t h*.
```
(define (stormer-2 F h)
(lambda (history)
(+ (* 2 (x 0 history))
(* -1 (x 1 history))
(* (/ (expt h 2) 12)
(+ (* 13 (F (t 0 history) (x 0 history)))
(* -2 (F (t 1 history) (x 1 history)))
(F (t 2 history) (x 2 history)))))))
```
The procedure returned by stepper takes a history and returns a new history advanced by *h* for the given integrator.
```
(define (stepper h integrator)
(lambda (history)
(extend-history (+ (t 0 history) h)
(integrator history)
history)))
```
The procedure stepper is used in the procedure evolver to produce a procedure step that will advance a history by one step. The step procedure is used in the procedure evolve that advances the history by a given number of steps of size *h*. We explicitly use specialized integer arithmetic here (the procedures named n:> and n:-) for counting steps. This will allow us to use different types of arithmetic for everything else without affecting simple counting. 2
```
(define (evolver F h make-integrator)
(let ((integrator (make-integrator F h)))
(let ((step (stepper h integrator)))
(define (evolve history n-steps)
(if (n:> n-steps 0)
(evolve (step history) (n:- n-steps 1))
history))
evolve)))
```
A second-order differential equation like equation 3.1 generally needs two initial conditions to determine a unique trajectory: *x*(*t*<sup>0</sup> ) and *x*(*t*<sup>0</sup> ) are sufficient to get *x*(*t*) for all *t*. But the Stormer multistep integrator we are using requires three history values, *x*(*t*<sup>0</sup> ), *x*(*t*<sup>0</sup> * h*), and *x*(*t*<sup>0</sup> ** 2*h*), to compute the next value *x*(*t*<sup>0</sup> + *h*). So to evolve the trajectory with this integrator we must start with an initial history that has three past values of *x*.
Consider the very simple differential equation:
$$D^2x(t) + x(t) = 0$$
In the form shown in equation 3.1 the right-hand side is:
```
(define (F t x) (- x))
```
Because all the solutions of this equation are linear combinations of sinusoids, we can get the simple sine function by initializing the history with three values of the sine function:
```
(define numeric-s0
(make-initial-history 0 .01 (sin 0) (sin -.01) (sin -.02)))
```
where the procedure make-initial-history takes the following arguments:
```
(make-initial-history t h x(t) x(t h) x(t 2h))
```
Using Scheme's built-in arithmetic, after 100 steps of size *h* = *.*01 we get a good approximation to sin(1):
```
(x 0 ((evolver F .01 stormer-2) numeric-s0 100))
.8414709493275624
(sin 1)
.8414709848078965
```
#### **3.1.2 Modulating arithmetic operators**
Let's consider the possibility of modulating what is meant by addition, multiplication, etc., for new data types unimagined by our example's programmer. Suppose we change our arithmetic operators to operate on and produce symbolic expressions rather than numerical values. This can be useful in debugging purely numerical calculations, because if we supply symbolic arguments we can examine the resulting symbolic expressions to make sure that the program is calculating what we intend it to. This can also be the basis of a partial evaluator for optimization of numerical programs.
Here is one way to accomplish this goal. We introduce the idea of an *arithmetic package*. An arithmetic package, or just *arithmetic*, is a map from operator names to their operations (implementations). We can install an arithmetic in the user's read-eval-print environment to replace the default bindings of the operators named in the arithmetic with the arithmetic's implementations.
The procedure make-arithmetic-1 generates a new arithmetic package. It takes a name for the new arithmetic, and an operationgenerator procedure that given an operator name constructs an *operation*, here a handler procedure, for that operator. The procedure make-arithmetic-1 calls the operation-generator procedure with each arithmetic operator, accumulating the results into a new arithmetic package. For symbolic arithmetic, the operation is implemented as a procedure that creates a symbolic expression by consing the operator name onto the list of its arguments.
```
(define symbolic-arithmetic-1
(make-arithmetic-1 'symbolic
(lambda (operator)
(lambda args (cons operator args)))))
```
To use this newly defined arithmetic, we install it. This redefines the arithmetic operators to use this arithmetic: 3
```
(install-arithmetic! symbolic-arithmetic-1)
```
install-arithmetic! changes the values of the user's global variables that are the names of the arithmetic operators defined in the arithmetic to their values in that arithmetic. For example, after this install:
```
(+ 'a 'b)
(+ a b)
(+ 1 2)
(+ 1 2)
```
Now we can observe the result of taking one step of the Stormer evolution: 4 5
```
(pp (x 0
((evolver F 'h stormer-2)
(make-initial-history 't 'h 'xt 'xt-h 'xt-2h)
1)))
(+ (+ (* 2 xt) (* -1 xt-h))
(* (/ (expt h 2) 12)
```
```
(+ (+ (* 13 (negate xt)) (* -2 (negate xt-h)))
(negate xt-2h))))
```
We could easily produce simplified expressions by replacing the cons in symbolic-arithmetic-1 with an algebraic simplifier, and then we would have a symbolic manipulator. (We will explore algebraic simplification in section 4.2.)
This transformation was ridiculously easy, and yet our original design didn't make any provisions for symbolic computation. We could just as easily add support for vector arithmetic, matrix arithmetic, etc.
#### **Problems with redefining operators**
The ability to redefine operators *after the fact* gives both extreme flexibility and ways to make whole new classes of bugs! (We anticipated such a problem in the evolver procedure and avoided it by using the special arithmetic operators n:> and n:- for counting steps.)
There are more subtle problems. A program that depends on the exactness of operations on integers may not work correctly for inexact floating-point numbers. This is exactly the risk that comes with the evolution of biological or technological systems— some mutations will be fatal! On the other hand, some mutations will be extremely valuable. But that risk must be balanced against the cost of narrow and brittle construction.
Indeed, it is probably impossible to prove very much about a program when the primitive procedures can be redefined, except that it will work when restricted to the types it was defined for. This is an easy but dangerous path for generalization.
## **3.1.3 Combining arithmetics**
The symbolic arithmetic cannot do numerical calculation, so we have broken our integration example by replacing the operator definitions. We really want an operator's action to depend on its arguments: for example, numerical addition for (+ 1 2) but
building a list for (+ 'a 'b). Thus the arithmetic packages must be able to determine which handler is appropriate for the arguments tendered.
#### **An improved arithmetic abstraction**
By annotating each operation with an *applicability specification*, often shortened to just an *applicability*, we can combine different kinds of arithmetic. For example, we can combine symbolic and numeric arithmetic so that a combined operation can determine which implementation is appropriate for its arguments.
An applicability specification is just a list of *cases*, each of which is a list of predicates, such as number? or symbolic?. A procedure is deemed applicable to a sequence of arguments if the arguments satisfy one of the cases—that is, if each predicate in the case is true of the corresponding argument. For example, for binary arithmetic operators, we would like the numeric operations to be applicable in just the case (number? number?) and the symbolic operations to be applicable in these cases: ((number? symbolic?) (symbolic? number?) (symbolic? symbolic?)).
We use make-operation to make an operation that includes an applicability for the handler procedure, like this:
```
(define (make-operation operator applicability procedure)
(list 'operation operator applicability procedure))
```
It is then possible to get the applicability for an operation:
```
(define (operation-applicability operation)
(caddr operation))
```
We introduce an abstraction for writing applicability information for an operation. The procedure all-args takes two arguments, the first being the number of arguments that the operation accepts (its *arity*, as on page 26), and the second being a predicate that must be true of each argument. It returns an applicability specification that can be used to determine if the operation is applicable to the
arguments supplied to it. In a numeric arithmetic, each operation takes numbers for each of its arguments.
Using all-args we can implement an operation constructor for the simplest operations:
```
(define (simple-operation operator predicate procedure)
(make-operation operator
(all-args (operator-arity operator)
predicate)
procedure))
```
We will also find it useful to have a *domain predicate* that is true for the objects (such as functions or matrices) that a given arithmetic's operations take as arguments—for example, number? for numeric arithmetic. To support this more elaborate idea we will create a constructor make-arithmetic for arithmetic packages. The procedure make-arithmetic is like make-arithmetic-1 (see page 71) but has additional arguments.
```
(make-arithmetic name
domain-predicate
base-arithmetic-packages
map-of-constant-name-to-constant
map-of-operator-name-to-operation)
```
An arithmetic package produced by make-arithmetic has a name that is useful for debugging. It has the domain predicate noted above. It has a list of arithmetic packages, called the *bases*, that the new arithmetic will be built from. In addition, the arithmetic will contain a set of named constants, and a set of operators along with their corresponding operations. The final two arguments are used to generate these sets.
An example of the use of a base arithmetic is vectors. A vector is represented as an ordered sequence of coordinates: consequently an arithmetic on vectors is defined in terms of arithmetic on its coordinates. So the base arithmetic for a vector arithmetic is the appropriate arithmetic for the vector's coordinates. A vector arithmetic with numeric coordinates will use a numeric arithmetic as its base, while a vector arithmetic with symbolic coordinates will
use a symbolic arithmetic as its base. For brevity, we often use the term "over" to specify the base, as in "vectors over numbers" or "vectors over symbols."
The base arithmetics also determine the constants and operators that the derived arithmetic will define. The defined constants will be the union of the constants defined by the bases, and the defined operators will be the union of their operators. If there are no bases, then standard sets of constant and operator names will be defined.
Using these new capabilities, we can define a numeric arithmetic with applicability information. Since numeric arithmetic is built on the Scheme substrate, the appropriate handler for the operator for Scheme number arguments is just the value of the operator symbol for the Scheme implementation. Also, certain symbols, such as the identity constants for addition and multiplication, are specially mapped.
```
(define numeric-arithmetic
(make-arithmetic 'numeric number? '()
(lambda (name) ;constant generator
(case name
((additive-identity) 0)
((multiplicative-identity) 1)
(else (default-object))))
(lambda (operator) ;operation generator
(simple-operation operator number?
(get-implementation-value
(operator->procedure-name operator))))))
```
The last two lines of this code find the procedure defined by the Scheme implementation that is named by the operator. 6
We can similarly write the symbolic-extender constructor to construct a symbolic arithmetic based on a given arithmetic.
```
(define (symbolic-extender base-arithmetic)
(make-arithmetic 'symbolic symbolic? (list base-arithmetic)
(lambda (name base-constant) ;constant generator
base-constant)
(let ((base-predicate
(arithmetic-domain-predicate base-arithmetic)))
(lambda (operator base-operation) ;operation generator
(make-operation operator
```
```
(any-arg (operator-arity operator)
symbolic?
base-predicate)
(lambda args
(cons operator args)))))))
```
One difference between this and the numeric arithmetic is that the symbolic arithmetic is applicable whenever *any* argument is a symbolic expression. <sup>7</sup> This is indicated by the use of any-arg rather than all-args; any-arg matches if at least one of the arguments satisfies the predicate passed as the second argument, and all the other arguments satisfy the predicate passed as the third argument. 8 Also notice that this symbolic arithmetic is based on a provided base-arithmetic, which will allow us to build a variety of such arithmetics.
Applicability specifications are not used as guards on the handlers: they do not prevent the application of a handler to the wrong arguments. The applicability specifications are used only to distinguish among the possible operations for an operator when arithmetics are combined, as explained below.
#### **A combinator for arithmetics**
The symbolic and numeric arithmetics are of the same shape, by construction. The symbolic-extender procedure produces an arithmetic with the same operators as the base arithmetic it is given. Making a combinator language for building composite arithmetics from parts might be a good approach.
The procedure add-arithmetics, below, is a combinator for arithmetics. It makes a new arithmetic whose domain predicate is the disjunction of the given arithmetics' domain predicates, and each of whose operators is mapped to the union of the operations for the given arithmetics. 9
```
(define (add-arithmetics . arithmetics)
(add-arithmetics* arithmetics))
(define (add-arithmetics* arithmetics)
```
```
(if (n:null? (cdr arithmetics))
(car arithmetics) ;only one arithmetic
(make-arithmetic 'add
(disjoin*
(map arithmetic-domain-predicate
arithmetics))
arithmetics
constant-union
operation-union)))
```
The third argument to make-arithmetic is a list of the arithmetic packages being combined. The arithmetic packages must be compatible in that they specify operations for the same named operators. The fourth argument is constant-union, which combines multiple constants. Here this selects one of the argument constants for use in the combined arithmetic; later we will elaborate on this. 10
```
(define (constant-union name . constants)
(let ((unique
(remove default-object?
(delete-duplicates constants eqv?))))
(if (n:pair? unique)
(car unique)
(default-object))))
```
The last argument is operation-union, which constructs the operation for the named operator in the resulting arithmetic. An operation is applicable if it is applicable in any of the arithmetics that were combined.
```
(define (operation-union operator . operations)
(operation-union* operator operations))
(define (operation-union* operator operations)
(make-operation operator
(applicability-union*
(map operation-applicability operations))
(lambda args
(operation-union-dispatch operator
operations
args))))
```
The procedure operation-union-dispatch must determine the operation to use based on the arguments supplied. It chooses the operation from the given arithmetics that is appropriate to the given arguments and applies it to the arguments. If more than one of the given arithmetics has an applicable operation, the operation from the first arithmetic in the arguments to add-arithmetics is chosen.
```
(define (operation-union-dispatch operator operations args)
(let ((operation
(find (lambda (operation)
(is-operation-applicable? operation args))
operations)))
(if (not operation)
(error "Inapplicable operation:" operator args))
(apply-operation operation args)))
```
A common pattern is to combine a base arithmetic with an extender on that arithmetic. The combination of numeric arithmetic and a symbolic arithmetic built on numeric arithmetic is such a case. So we provide an abstraction for that pattern:
```
(define (extend-arithmetic extender base-arithmetic)
(add-arithmetics base-arithmetic
(extender base-arithmetic)))
```
We can use extend-arithmetic to combine the numeric arithmetic and the symbolic arithmetic. Since the applicability cases are disjoint—all numbers for numeric arithmetic and at least one symbolic expression for symbolic arithmetic—the order of arguments to add-arithmetics is irrelevant here, except for possible performance issues.
```
(define combined-arithmetic
(extend-arithmetic symbolic-extender numeric-arithmetic))
(install-arithmetic! combined-arithmetic)
```
Let's try the composite arithmetic:
```
(+ 1 2)
3
```
```
(+ 1 'a)
(+ 1 a)
(+ 'a 2)
(+ a 2)
(+ 'a 'b)
(+ a b)
```
The integrator still works numerically (compare page 70):
```
(define numeric-s0
(make-initial-history 0 .01 (sin 0) (sin -.01) (sin -.02)))
(x 0 ((evolver F .01 stormer-2) numeric-s0 100))
.8414709493275624
```
It works symbolically (compare page 72):
```
(pp (x 0
((evolver F 'h stormer-2)
(make-initial-history 't 'h 'xt 'xt-h 'xt-2h)
1)))
(+ (+ (* 2 xt) (* -1 xt-h))
(* (/ (expt h 2) 12)
(+ (+ (* 13 (negate xt)) (* -2 (negate xt-h)))
(negate xt-2h))))
```
And it works in combination, with numeric history but symbolic step size h:
```
(pp (x 0 ((evolver F 'h stormer-2) numeric-s0 1)))
(+ 9.999833334166664e-3
(* (/ (expt h 2) 12)
-9.999750002487318e-7))
```
Notice the power here. We have combined code that can do symbolic arithmetic and code that can do numeric arithmetic. We have created a system that can do arithmetic that depends on both abilities. This is not just the union of the two abilities— it is the cooperation of two mechanisms to solve a problem that neither could solve by itself.
#### **3.1.4 Arithmetic on functions**
Traditional mathematics extends arithmetic on numerical quantities to many other kinds of objects. Over the centuries "arithmetic" has been extended to complex numbers, vectors, linear transformations and their representations as matrices, etc. One particularly revealing extension is to functions. We can combine functions of the same type using arithmetic operators:
```
(f + g)(x) = f (x) + g(x)
(f g)(x) = f (x) g(x)
(fg)(x) = f (x)g(x)
(f/g)(x) = f (x)/g(x)
```
The functions that are combined must have the same domain and codomain, and an arithmetic must be defined on the codomain.
The extension to functions is not hard. Given an arithmetic package for the codomain of the functions that we wish to combine, we can make an arithmetic package that implements the function arithmetic, assuming that functions are implemented as procedures.
```
(define (pure-function-extender codomain-arithmetic)
(make-arithmetic 'pure-function function?
(list codomain-arithmetic)
(lambda (name codomain-constant) ; *** see below
(lambda args codomain-constant))
(lambda (operator codomain-operation)
(simple-operation operator function?
(lambda functions
(lambda args
(apply-operation codomain-operation
(map (lambda (function)
(apply function args))
functions))))))))
```
Notice that the constant generator (with comment \*\*\*) must produce a constant function for each codomain constant. For example, the additive identity for functions must be the function of any number of arguments that returns the codomain additive
identity. Combining a functional arithmetic with the arithmetic that operates on the codomains makes a useful package:
```
(install-arithmetic!
(extend-arithmetic pure-function-extender
numeric-arithmetic))
((+ cos sin) 3)
-.8488724885405782
(+ (cos 3) (sin 3))
-.8488724885405782
```
By building on combined-arithmetic we can get more interesting results:
```
(install-arithmetic!
(extend-arithmetic pure-function-extender
combined-arithmetic))
((+ cos sin) 3)
-.8488724885405782
((+ cos sin) 'a)
(+ (cos a) (sin a))
(* 'b ((+ cos sin) (+ (+ 1 2) 'a)))
(* b (+ (cos (+ 3 a)) (sin (+ 3 a))))
```
The mathematical tradition also allows one to mix numerical quantities with functions by treating the numerical quantities as constant functions of the same type as the functions they will be combined with.
$$(f+1)(x) = f(x) + 1 (3.4)$$
We can implement the coercion of numerical quantities to constant functions quite easily, by minor modifications to the procedure pure-function-extender:
```
(define (function-extender codomain-arithmetic)
(let ((codomain-predicate
(arithmetic-domain-predicate codomain-arithmetic)))
```
```
(make-arithmetic 'function
(disjoin codomain-predicate function?)
(list codomain-arithmetic)
(lambda (name codomain-constant)
codomain-constant)
(lambda (operator codomain-operation)
(make-operation operator
(any-arg (operator-arity operator)
function?
codomain-predicate)
(lambda things
(lambda args
(apply-operation codomain-operation
(map (lambda (thing)
;; here is the coercion:
(if (function? thing)
(apply thing args)
thing))
things)))))))))
```
To allow the coercion of codomain quantities, such as numbers, to constant functions, the domain of the new function arithmetic must contain both the functions and the elements of the codomain of the functions (the possible values of the functions). The operator implementation is applicable if any of the arguments is a function; and functions are applied to the arguments that are given. Note that the constant generator for the make-arithmetic doesn't need to rewrite the codomain constants as functions, since the constants can now be used directly.
With this version we can
```
(install-arithmetic!
(extend-arithmetic function-extender combined-arithmetic))
((+ 1 cos) 'a)
(+ 1 (cos a))
(* 'b ((+ 4 cos sin) (+ (+ 1 2) 'a)))
(* b (+ 4 (cos (+ 3 a)) (sin (+ 3 a))))
```
This raises an interesting problem: we have symbols, such as a and b, that represent literal numbers, but nothing to represent literal functions. For example, if we write
```
(* 'b ((+ 'c cos sin) (+ 3 'a)))
```
our arithmetic will treat c as a literal number. But we might wish to have c be a literal function that combines as a function. It's difficult to do this with our current design, because c carries no type information, and the context is insufficient to distinguish usages.
But we can make a literal function that has no properties except for a name. Such a function just attaches its name to the list of its arguments.
```
(define (literal-function name)
(lambda args
(cons name args)))
```
With this definition we can have a literal function c correctly combine with other functions:
```
(* 'b ((+ (literal-function 'c) cos sin) (+ (+ 1 2) 'a)))
(* b (+ (+ (c (+ 3 a)) (cos (+ 3 a))) (sin (+ 3 a))))
```
This is a narrow solution that handles a useful case.
#### **3.1.5 Problems with combinators**
The arithmetic structures we have been building up to now are an example of the use of combinators to build complex structures by combining simpler ones. But there are some serious drawbacks to building this system using combinators. First, some properties of the structure are determined by the means of combination. For example, we pointed out that add-arithmetics prioritized its arguments, such that their order can matter. Second, the layering implicit in this design, such that the codomain arithmetic must be constructed prior to the function arithmetic, means that it's impossible to augment the codomain arithmetic after the function arithmetic has been constructed. Finally, we might wish to define an arithmetic for functions that return functions. This cannot be done in a general way within this framework, without introducing another mechanism for self reference, and self reference is cumbersome to arrange.
Combinators are powerful and useful, but a system built of combinators is not very flexible. One problem is that the shapes of the parts must be worked out ahead of time: the generality that will be available depends on the detailed plan for the shapes of the parts, and there must be a localized plan for how the parts are combined. This is not a problem for a well-understood domain, such as arithmetic, but it is not appropriate for open-ended construction. In section 3.2 we will see how to add new kinds of arithmetic incrementally, without having to decide where they go in a hierarchy, and without having to change the existing parts that already work.
Other problems with combinators are that the behavior of any part of a combinator system must be independent of its context. A powerful source of flexibility that is available to a designer is to build systems that *do* depend upon their context. By varying the context of a system we can obtain variation of the behavior. This is quite dangerous, because it may be hard to predict how a variation will behave. However, carefully controlled variations can be useful.
# **Exercise 3.1: Warmup with boolean arithmetic**
In digital design the boolean operations *and*, *or*, and *not* are written with the operators \*, +, and -, respectively.
There is a Scheme predicate boolean? that is true only of #t and #f. Use this to make a boolean arithmetic package that can be combined with the arithmetics we have. Note that all other arithmetic operators are undefined for booleans, so the appropriate result of applying something like cos to a boolean is to report an error.
The following template could help get you started:
```
(define boolean-arithmetic
(make-arithmetic 'boolean boolean? '()
(lambda (name)
(case name
((additive-identity) #f)
((multiplicative-identity) #t)
```
```
(else (default-object))))
(lambda (operator)
(let ((procedure
(case operator
((+) <...>)
((-) <...>)
((*) <...>)
((negate) <...>)
(else
(lambda args
(error "Operator undefined in Boolean"
operator))))))
(simple-operation operator boolean? procedure)))))
```
In digital design the operator - is typically used only as a unary operator and is realized as negate. When an arithmetic is installed, the binary operators +, \*, -, and / are generalized to be *n*-ary operators.
The unary application (- *operand*) is transformed by the installer into (negate *operand*). Thus to make - work, you will need to define the unary boolean operation for the operator negate.
## **Exercise 3.2: Vector arithmetic**
We will make and install an arithmetic package on geometric vectors. This is a big assignment that will bring to the surface many of the difficulties and inadequacies of the system we have developed so far.
**a.** We will represent a vector as a Scheme vector of numerical quantities. The elements of a vector are coordinates relative to some Cartesian axes. There are a few issues here. Addition (and subtraction) is defined only for vectors of the same dimension, so your arithmetic must know about dimensions. First, make an arithmetic that defines only addition, negation, and subtraction of vectors over a base arithmetic of operations applicable to the coordinates of vectors. Applying any other operation to a vector should report an error. Hint: The following procedures will be helpful:
```
(define (vector-element-wise element-procedure)
(lambda vecs ; Note: this takes multiple vectors
(ensure-vector-lengths-match vecs)
(apply vector-map element-procedure vecs)))
(define (ensure-vector-lengths-match vecs)
(let ((first-vec-length (vector-length (car vecs))))
(if (any (lambda (v)
(not (n:= (vector-length v)
first-vec-length)))
vecs)
(error "Vector dimension mismatch:" vecs))))
```
The use of apply here is subtle. One way to think about it is to imagine that the language supported an ellipsis like this:
```
(define (vector-element-wise element-procedure)
(lambda (v1 v2 ...)
(vector-map element-procedure v1 v2 ...)))
```
Build the required arithmetic and show that it works for numerical vectors and for vectors with mixed numerical and symbolic coordinates.
**b.** Your vector addition required addition of the coordinates. The coordinate addition procedure could be the value of the + operator that will be made available in the user environment by install-arithmetic!, or it could be the addition operation from the base arithmetic of your vector extension. Either of these would satisfy many tests, and using the installed addition may actually be more general. Which did you use? Show how to implement the other choice. How does this choice affect your ability to make future extensions to this system? Explain your reasoning.
Hint: A nice way to control the interpretation of operators in a procedure is to provide the procedure to use for each operator as arguments to a "maker procedure" that returns the procedure needed. For example, to control the arithmetic operations used in vector-magnitude one might write:
```
(define (vector-magnitude-maker + * sqrt)
(let ((dot-product (dot-product-maker + *)))
(define (vector-magnitude v)
(sqrt (dot-product v v)))
vector-magnitude))
```
- **c.** What shall we do about multiplication? First, for two vectors it is reasonable to define multiplication to be their dot product. But there is a bit of a problem here. You need to be able to use both the addition and multiplication operations, perhaps from the arithmetic on the coordinates. This is not hard to solve. Modify your vector arithmetic to define multiplication of two vectors as their dot product. Show that your dot product works.
- **d.** Add vector magnitude to your vector arithmetic, extending the numerical operator magnitude to give the length of a vector. The code given above is most of the work!
- **e.** Multiplication of a vector by a scalar or multiplication of a scalar by a vector should produce the scalar product (the vector with each coordinate multiplied by the scalar). So multiplication can mean either dot product or scalar product, depending on the types of its arguments. Modify your vector arithmetic to make this work. Show that your vector arithmetic can handle both dot product and scalar product. Hint: The operation-union procedure on page 78 enables a very elegant way to solve this problem.
## **Exercise 3.3: Ordering of extensions**
Consider two possible orderings for combining your vector extension (exercise 3.2) with the existing arithmetics:
```
(define vec-before-func
(extend-arithmetic
function-extender
(extend-arithmetic vector-extender combined-arithmetic)))
(define func-before-vec
```

View File

@@ -0,0 +1,592 @@
```
(extend-arithmetic
vector-extender
(extend-arithmetic function-extender combined-arithmetic)))
```
How does the ordering of extensions affect the properties of the resulting arithmetic? The following procedure makes points on the unit circle:
```
(define (unit-circle x)
(vector (sin x) (cos x)))
```
If we execute each of the following expressions in environments resulting from installing vec-before-func and func-before-vec:
```
((magnitude unit-circle) 'a)
((magnitude (vector sin cos)) 'a)
```
The result (unsimplified) should be:
```
(sqrt (+ (* (sin a) (sin a)) (* (cos a) (cos a))))
```
However, each of these expressions fails with one of the two orderings of the extensions. Is it possible to make an arithmetic for which both evaluate correctly? Explain.
# **3.2 Extensible generic procedures**
Systems built by combinators, as in section 3.1, result in beautiful diamond-like systems. This is sometimes the right idea, and we will see it arise again, but it is very hard to add to a diamond. If a system is built as a ball of mud, it is easy to add more mud. 11
One organization for a ball of mud is a system erected on a substrate of extensible generic procedures. Modern dynamically typed programming languages, such as Lisp, Scheme, and Python, usually have built-in arithmetic that is generic over a variety of types of numerical quantities, such as integers, floats, rationals, and complex numbers [115, 64, 105]. But systems built on these languages are usually not easily extensible after the fact.
The problems we indicated in section 3.1.5 are the result of using the combinator add-arithmetics. To solve these problems we will abandon that combinator. However, the arithmetic package abstraction is still useful, as is the idea of an extender. We will build an arithmetic package in which the operations use generic procedures that can be dynamically augmented with new behavior. We can then extend the generic arithmetic and add the extensions to the generic arithmetic. 12
We will start by implementing generic procedures, which are procedures that can be dynamically extended by adding handlers after the generic procedures are defined. A generic procedure is a dispatcher combined with a set of *rules*, each of which describes a handler that is appropriate for a given set of arguments. Such a rule combines a handler with its applicability.
Let's examine how this might work, by defining a generic procedure named plus that works like addition with numeric and symbolic quantities:
```
(define plus (simple-generic-procedure 'plus 2 #f))
(define-generic-procedure-handler plus
(all-args 2 number?)
(lambda (a b) (+ a b)))
(define-generic-procedure-handler plus
(any-arg 2 symbolic? number?)
(lambda (a b) (list '+ a b)))
(plus 1 2)
3
(plus 1 'a)
(+ 1 a)
(plus 'a 2)
(+ a 2)
(plus 'a 'b)
(+ a b)
```
The procedure simple-generic-procedure takes three arguments: The first is an arbitrary name to identify the procedure when debugging; the second is the procedure's arity. The third argument is used to provide a default handler; if none is supplied (indicated by #f), then if no specific handler is applicable an error is signaled. Here plus is bound to the new generic procedure returned by simple-generic-procedure. It is a Scheme procedure that can be called with the specified number of arguments.
The procedure define-generic-procedure-handler adds a rule to an existing generic procedure. Its first argument is the generic procedure to be extended; the second argument is an applicability specification (as on page 73) for the rule being added; and the third argument is the handler for arguments that satisfy that specification.
```
(define-generic-procedure-handler generic-procedure
applicability
handler-procedure)
```
It is often necessary to specify a rule in which different arguments are of different types. For example, to make a vector arithmetic package we need to specify the interpretation of the \* operator. If both arguments are vectors, the appropriate handler computes the dot product. If one argument is a scalar and the other is a vector, then the appropriate handler scales the vector elements by the scalar. The applicability argument is the means by which this is accomplished.
The simple-generic-procedure constructor we used above to make the generic procedure plus is created with the procedure generic-procedure-constructor
```
(define simple-generic-procedure
(generic-procedure-constructor make-simple-dispatch-store))
```
where make-simple-dispatch-store is a procedure that encapsulates a strategy for saving, retrieving, and choosing a handler.
The generic-procedure-constructor takes a dispatch-store constructor and produces a generic-procedure constructor that itself takes three arguments—a name that is useful in debugging, an arity, and a default handler to be used if there are no applicable handlers. If the default handler argument is #f, the default handler signals an error:
```
((generic-procedure-constructor dispatch-store-constructor)
name
arity
default-handler)
```
The reason why generic procedures are made in this way is that we will need families of generic procedures that differ in the choice of dispatch store.
In section 3.2.3, we will see one way to implement this mechanism. But first let's see how to use it.
#### **3.2.1 Generic arithmetic**
We can use this new generic-procedure mechanism to build arithmetic packages in which the operators map to operations that are implemented as generic procedures. This will allow us to make self-referential structures. For example, we might want to make a generic arithmetic that includes vector arithmetic where both the vectors and the components of a vector are manipulated by the same generic procedures. We could not build such a structure using just add-arithmetics introduced earlier.
```
(define (make-generic-arithmetic dispatch-store-maker)
(make-arithmetic 'generic any-object? '()
constant-union
(let ((make-generic-procedure
(generic-procedure-constructor
dispatch-store-maker)))
(lambda (operator)
(simple-operation operator
any-object?
(make-generic-procedure
operator
```
```
(operator-arity operator)
#f))))))
```
The make-generic-arithmetic procedure creates a new arithmetic. For each arithmetic operator, it constructs an operation that is applicable to any arguments and is implemented by a generic procedure. (The predicate any-object? is true of anything.) We can install this arithmetic in the usual way.
But first, let's define some handlers for the generic procedures. It's pretty simple to do now that we have the generic arithmetic object. For example, we can grab the operations and constants from any already-constructed arithmetic.
```
(define (add-to-generic-arithmetic! generic-arithmetic
arithmetic)
(add-generic-arith-constants! generic-arithmetic
arithmetic)
(add-generic-arith-operations! generic-arithmetic
arithmetic))
```
This takes a generic arithmetic package and an ordinary arithmetic package with the same operators. It merges constants into the generic arithmetic using constant-union. And for each operator of the given arithmetic it adds a handler to the corresponding generic procedure.
Adding a handler for a particular operator uses the standard generic procedure mechanism, extracting the necessary applicability and procedure from the arithmetic's operation.
```
(define (add-generic-arith-operations! generic-arithmetic
arithmetic)
(for-each
(lambda (operator)
(let ((generic-procedure
(simple-operation-procedure
(arithmetic-operation operator
generic-arithmetic)))
(operation
(arithmetic-operation operator arithmetic)))
(define-generic-procedure-handler
generic-procedure
(operation-applicability operation)
```
```
(operation-procedure operation))))
(arithmetic-operators arithmetic)))
```
The add-generic-arith-operations! procedure finds, for each operator in the given arithmetic, the generic procedure that must be augmented. It then defines a handler for that generic procedure that is the handler for that operator in the given arithmetic, using the applicability for that handler in the given arithmetic.
The code for adding the constants from an arithmetic to the generic arithmetic is similar. For each constant name in the generic arithmetic it finds the entry in the association of names to constant values in the generic arithmetic. It then replaces the constant value with the constant-union of the existing constant and the constant it got for that same name from the given arithmetic.
```
(define (add-generic-arith-constants! generic-arithmetic
arithmetic)
(for-each
(lambda (name)
(let ((binding
(arithmetic-constant-binding name
generic-arithmetic))
(element
(find-arithmetic-constant name arithmetic)))
(set-cdr! binding
(constant-union name
(cdr binding)
element))))
(arithmetic-constant-names generic-arithmetic)))
```
### **Fun with generic arithmetics**
We can add many arithmetics to a generic arithmetic to give it interesting behavior:
```
(let ((g
(make-generic-arithmetic make-simple-dispatch-store)))
(add-to-generic-arithmetic! g numeric-arithmetic)
(add-to-generic-arithmetic! g
(function-extender numeric-arithmetic))
(add-to-generic-arithmetic! g
(symbolic-extender numeric-arithmetic))
(install-arithmetic! g))
```
This produces a generic arithmetic that combines numeric arithmetic with symbolic arithmetic over numeric arithmetic and function arithmetic over numeric arithmetic:
```
(+ 1 3 'a 'b)
(+ (+ 4 a) b)
```
And we can even run some more complex problems, as on page 79:
```
(pp (x 0 ((evolver F 'h stormer-2) numeric-s0 1)))
(+ 9.999833334166664e-3
(* (/ (expt h 2) 12)
-9.999750002487318e-7))
```
As before, we can mix symbols and functions:
```
(* 'b ((+ cos sin) 3))
(* b -.8488724885405782)
```
but the following will signal an error, trying to add the symbolic quantities (cos a) and (sin a) as numbers:
```
(* 'b ((+ cos sin) 'a))
```
We get this error because cos and sin are numeric operators, like +. Since we have symbolic arithmetic over numeric arithmetic, these operators are extended so that for symbolic input, here a, they produce symbolic outputs, (cos a) and (sin a). We also added function arithmetic over numeric arithmetic, so if functions are numerically combined (here by +) their outputs may be combined only if the outputs are numbers. But the symbolic results cannot be added numerically. This is a consequence of the way we built the arithmetic g.
But there is magic in generic arithmetic. It can be closed: all extensions to the generic arithmetic can be made over the generic arithmetic!
```
(let ((g
(make-generic-arithmetic make-simple-dispatch-store)))
(add-to-generic-arithmetic! g numeric-arithmetic)
(extend-generic-arithmetic! g symbolic-extender)
```
```
(extend-generic-arithmetic! g function-extender)
(install-arithmetic! g))
```
Here we use a new procedure extend-generic-arithmetic! that captures a common pattern.
```
(define (extend-generic-arithmetic! generic-arithmetic
extender)
(add-to-generic-arithmetic! generic-arithmetic
(extender generic-arithmetic)))
```
Now we can use complex mixed expressions, because the functions are defined over the generic arithmetic:
```
(* 'b ((+ 'c cos sin) (+ 3 'a)))
(* b (+ (+ c (cos (+ 3 a))) (sin (+ 3 a))))
```
We can even use functions that return functions:
```
(((+ (lambda (x) (lambda (y) (cons x y)))
(lambda (x) (lambda (y) (cons y x))))
3)
4)
(+ (3 . 4) (4 . 3))
```
So perhaps we have achieved nirvana?
## **3.2.2 Construction depends on order!**
Unfortunately, there is a severe dependence on the order in which rules are added to the generic procedures. This is not surprising, because the construction of the generic procedure system is by assignment. We can see this by changing the order of construction:
```
(let ((g
(make-generic-arithmetic make-simple-dispatch-store)))
(add-to-generic-arithmetic! g numeric-arithmetic)
(extend-generic-arithmetic! g function-extender) ;*
(extend-generic-arithmetic! g symbolic-extender) ;*
(install-arithmetic! g))
```
and then we will find that the example
```
(* 'b ((+ 'c cos sin) (+ 3 'a)))
```
which worked in the previous arithmetic, fails because the symbolic arithmetic captures (+ 'c cos sin) to produce a symbolic expression, which is not a function that can be applied to (+ 3 a). The problem is that the applicability of the symbolic operation for + accepts arguments with at least one symbolic argument and other arguments from the domain predicate of the base. But the symbolic arithmetic was created over the generic arithmetic as a base, and the domain predicate of a generic arithmetic accepts anything! There is also a function operation for + that is applicable to the same arguments, but it has not been chosen because of the accidental ordering of the extensions. Unfortunately, the choice of rule is ambiguous. It would be better to not have more than one applicable operation.
One way to resolve this problem is to restrict the symbolic quantities to represent numbers. We can do this by building our generic arithmetic so that the symbolic arithmetic is over the numeric arithmetic, as we did on page 92, rather than over the entire generic arithmetic:
```
(let ((g
(make-generic-arithmetic make-simple-dispatch-store)))
(add-to-generic-arithmetic! g numeric-arithmetic)
(extend-generic-arithmetic! g function-extender)
(add-to-generic-arithmetic! g
(symbolic-extender numeric-arithmetic))
(install-arithmetic! g))
```
This works, independent of the ordering, because there is no ambiguity in the choice of rules. So now the 'c will be interpreted as a constant to be coerced to a constant function by the function extender.
```
(* 'b ((+ 'c cos sin) (+ 3 'a)))
(* b (+ (+ c (cos (+ 3 a))) (sin (+ 3 a))))
```
Unfortunately, we may want to have symbolic expressions over other quantities besides numbers. We cannot yet implement a general solution to this problem. But if we really want a literal function named c, we can use literal-function as we did earlier:
```
(* 'b ((+ (literal-function 'c) cos sin) (+ 3 'a)))
(* b (+ (+ (c (+ 3 a)) (cos (+ 3 a))) (sin (+ 3 a))))
```
This will work independent of the order of construction of the generic arithmetic.
With this mechanism we are now in a position to evaluate the Stormer integrator with a literal function:
```
(pp (x 0 ((evolver (literal-function 'F) 'h stormer-2)
(make-initial-history 't 'h 'xt 'xt-h 'xt-2h)
1))
(+ (+ (* 2 xt) (* -1 xt-h))
(* (/ (expt h 2) 12)
(+ (+ (* 13 (f t xt))
(* -2 (f (- t h) xt-h)))
(f (- t (* 2 h)) xt-2h))))
```
This is pretty ugly, and it would be worse if we looked at the output of two integration steps. But it is interesting to look at the result of simplifying a two-step integration. Using a magic symbolicexpression simplifier we get a pretty readable expression. This can be useful for debugging a numerical process.
```
(+ (* 2 (expt h 2) (f t xt))
(* -1/4 (expt h 2) (f (+ (* -1 h) t) xt-h))
(* 1/6 (expt h 2) (f (+ (* -2 h) t) xt-2h))
(* 13/12
(expt h 2)
(f (+ h t)
(+ (* 13/12 (expt h 2) (f t xt))
(* -1/6 (expt h 2) (f (+ (* -1 h) t) xt-h))
(* 1/12 (expt h 2) (f (+ (* -2 h) t) xt-2h))
(* 2 xt)
(* -1 xt-h))))
(* 3 xt)
(* -2 xt-h))
```
For example, notice that there are only four distinct top-level calls to the acceleration function f. The second argument to the fourth top-level call uses three calls to f that have already been computed. If we eliminate common subexpressions we get:
```
(let* ((G84 (expt h 2)) (G85 (f t xt)) (G87 (* -1 h))
(G88 (+ G87 t)) (G89 (f G88 xt-h)) (G91 (* -2 h))
(G92 (+ G91 t)) (G93 (f G92 xt-2h)))
(+ (* 2 G84 G85)
(* -1/4 G84 G89)
(* 1/6 G84 G93)
(* 13/12 G84
(f (+ h t)
(+ (* 13/12 G84 G85)
(* -1/6 G84 G89)
(* 1/12 G84 G93)
(* 2 xt)
(* -1 xt-h))))
(* 3 xt)
(* -2 xt-h)))
```
Here we clearly see that there are only four distinct calls to f. Though each integration step in the basic integrator makes three calls to f, the two steps overlap on two intermediate calls. While this is obvious for such a simple example, we see how symbolic evaluation might help in understanding a numerical computation.
#### **3.2.3 Implementing generic procedures**
We have used generic procedures to do amazing things. But how do we make such a thing work?
### **Making constructors for generic procedures**
On page 89 we made a simple generic procedure constructor:
```
(define simple-generic-procedure
(generic-procedure-constructor make-simple-dispatch-store))
```
The procedure generic-procedure-constructor is given a "dispatch strategy" procedure; it returns a generic-procedure constructor that takes a name, an arity, and a default-handler specification. When this procedure is called with these three arguments it returns a generic procedure that it associates with a newly constructed metadata store for that procedure, which holds the name, the arity, an instance of the dispatch strategy, and the
default handler, if any. The dispatch-strategy instance will maintain the handlers, their applicabilities, and the mechanism for deciding which handler to choose for given arguments to the generic procedure.
The code that implements generic-procedure-constructor is:
```
(define (generic-procedure-constructor dispatch-store-maker)
(lambda (name arity default-handler)
(let ((metadata
(make-generic-metadata
name arity (dispatch-store-maker)
(or default-handler
(error-generic-procedure-handler name)))))
(define (the-generic-procedure . args)
(generic-procedure-dispatch metadata args))
(set-generic-procedure-metadata! the-generic-procedure
metadata)
the-generic-procedure)))
```
This implementation uses the-generic-procedure, an ordinary Scheme procedure, to represent the generic procedure, and a metadata store (for rules, etc.) that determines the procedure's behavior. This store is associated with the generic procedure using a "sticky note" (as on page 28) and can later be obtained by calling generic-procedure-metadata. This allows procedures such as define-generic-procedure-handler to modify the metadata of a given generic procedure.
The argument to generic-procedure-constructor is a procedure that creates a dispatch store for saving and retrieving handlers. The dispatch store encapsulates the strategy for choosing a handler.
Here is the simple dispatch-store constructor we have used so far. The dispatch store is implemented as a message-accepting procedure:
```
(define (make-simple-dispatch-store)
(let ((rules '()) (default-handler #f))
(define (get-handler args)
;; body will be shown in text below.
...)
(define (add-handler! applicability handler)
```
```
;; body will be shown in text below.
...)
(define (get-default-handler) default-handler)
(define (set-default-handler! handler)
(set! default-handler handler))
(lambda (message) ; the simple dispatch store
(case message
((get-handler) get-handler)
((add-handler!) add-handler!)
((get-default-handler) get-default-handler)
((set-default-handler!) set-default-handler!)
((get-rules) (lambda () rules))
(else (error "Unknown message:" message))))))
```
The simple dispatch store just maintains a list of the rules, each of which pairs an applicability with a handler. When the gethandler internal procedure is called with arguments for the generic procedure, it scans the list sequentially for a handler whose applicability is satisfied by the arguments tendered; it returns the handler, or #f if it doesn't find one:
```
(define (get-handler args)
(let ((rule
(find (lambda (rule)
(predicates-match? (car rule) args))
rules)))
(and rule (cdr rule))))
```
There are many possible strategies for choosing handlers to run. The above code returns the first applicable handler in the list. Another strategy is to return all applicable handlers. If more than one handler is applicable, perhaps all should be tried (in parallel?) and the results compared! Passing a dispatch-store constructor as an argument to generic-procedure-constructor allows the strategy to be chosen when the generic-procedure constructor is created, rather than being hard-coded into the implementation.
#### **Adding handlers to generic procedures**
The handler definition procedure (see below) adds new rules by calling the internal procedure add-handler of the dispatch store.
For make-simple-dispatch-store above, add-handler adds the new rule to the front of the list of rules. (But if there was already a rule for handling that applicability, it just replaces the handler.)
```
(define (add-handler! applicability handler)
(for-each (lambda (predicates)
(let ((p (assoc predicates rules)))
(if p
(set-cdr! p handler)
(set! rules
(cons (cons predicates handler)
rules)))))
applicability))
```
The define-generic-procedure-handler procedure uses the metadata table to get the metadata record for the generic procedure. It asks the dispatch store for the add-handler! procedure and uses that procedure to add a rule to the metadata that associates the applicability with the handler. The dispatch-store instance is retrieved from the metadata of the generic procedure by genericmetadata-dispatch-store.
```
(define (define-generic-procedure-handler generic-procedure
applicability
handler)
(((generic-metadata-dispatch-store
(generic-procedure-metadata generic-procedure))
'add-handler!)
applicability
handler))
```
Finally, the heart of the mechanism is the dispatch, called by a generic procedure (the-generic-procedure on page 97), which finds an appropriate handler and applies it. The default handler, as supplied during construction of the generic procedure, is called if there is no applicable handler. 13
```
(define (generic-procedure-dispatch metadata args)
(let ((handler
(get-generic-procedure-handler metadata args)))
(apply handler args)))
(define (get-generic-procedure-handler metadata args)
```
```
(or ((generic-metadata-getter metadata) args)
((generic-metadata-default-getter metadata))))
```
#### **The power of extensible generics**
Construction of a system on a substrate of extensible generic procedures is a powerful idea. In our example it is possible to define what is meant by addition, multiplication, etc., for new data types unimagined by the language designer. For example, if the arithmetic operators of a system are implemented as extensible generics, a user may extend them to support arithmetic on quaternions, vectors, matrices, integers modulo a prime, functions, tensors, differential forms, This is not just making new capabilities possible; it also extends old programs, so a program that was written to manipulate simple numerical quantities may become useful for manipulating scalar-valued functions.
We have seen that there are potential problems associated with this use of extensible generic procedures. On the other hand, some "mutations" will be extremely valuable. For example, it is possible to extend arithmetic to symbolic quantities. The simplest way to do this is to make a generic extension to all of the operators to take symbolic quantities as arguments and return a data structure representing the indicated operation on the arguments. With the addition of a simplifier of algebraic expressions we suddenly have a symbolic manipulator. This is useful in debugging purely numerical calculations, because if we give them symbolic arguments we can examine the resulting symbolic expressions to make sure that the program is calculating what we intend it to. It is also the basis of a partial evaluator for optimization of numerical programs. And functional differentiation can be viewed as a generic extension of arithmetic to a compound data type (see section 3.3). The scmutils system we use to teach classical mechanics [121] implements differentiation in exactly this way.
# **Exercise 3.4: Functional values**
The generic arithmetic structure allows us to close the system so that functions that return functions can work, as in the example
```
(((* 3
(lambda (x) (lambda (y) (+ x y)))
(lambda (x) (lambda (y) (vector y x))))
'a)
4)
(* (* 3 (+ a 4)) #(4 a))
```
- **a.** How hard is it to arrange for this to work in the purely combinator-based arithmetic introduced in section 3.1? Why?
- **b.** Exercise 3.3 on page 86 asked about the implications of ordering of vector and functional extensions. Is the generic system able to support both expressions discussed there (and copied below)? Explain.
```
((magnitude unit-circle) 'a)
((magnitude (vector sin cos)) 'a)
```
**c.** Is there any good way to make the following work at all?
```
((vector cos sin) 3)
#(-.9899924966004454 .1411200080598672)
```
Show code that makes this work or explain the difficulties.
# **Exercise 3.5: A weird bug**
Consider the +-like ("plus-like") procedure in arith.scm, shown below, which implements *n*-ary procedures + and \* as part of installing an arithmetic. It returns a pair of a name and a procedure; the installer will bind the name to the procedure.
It seems that it is written to execute the get-identity procedure that computes the identity every time the operation is
called with no arguments.
```
(define (+-like operator identity-name)
(lambda (arithmetic)
(let ((binary-operation
(find-arithmetic-operation operator arithmetic)))
(and binary-operation
(let ((binary
(operation-procedure binary-operation))
(get-identity
(identity-name->getter identity-name
arithmetic)))
(cons operator
(lambda args
(case (length args)
((0) (get-identity))
((1) (car args))
(else (pairwise binary args))))))))))
```
Perhaps the identity for an operator should be computed only once, not every time the handler is called. As a consequence, it is proposed that the code should be modified as follows:
```
(define (+-like operator identity-name)
(lambda (arithmetic)
(let ((binary-operation
(find-arithmetic-operation operator arithmetic)))
(and binary-operation
(let ((binary
(operation-procedure binary-operation))
(identity
((identity-name->getter identity-name
arithmetic))))
(cons operator
(lambda args
(case (length args)
((0) identity)
((1) (car args))
(else (pairwise binary args))))))))))
```
However, this has a subtle bug! Can you elicit the bug? Can you explain it?
## **Exercise 3.6: Matrices**
Matrices are ubiquitous in scientific and technical computing.
**a.** Make and install an arithmetic package for matrices of numbers, with operations +, -, negate, and \*. This arithmetic needs to be able to know the number of rows and the number of columns in a matrix, since matrix multiplication is defined only if the number of columns in the first matrix is equal to the number of rows in the second one.
Make sure that your multiplier can multiply a matrix with a scalar or with a vector. For matrices to play well with vectors you probably need to distinguish row vectors and column vectors. How does this affect the design of the vector package? (See exercise 3.2 on page 85.)
You may assume that the vectors and matrices are of small dimension, so you do not need to deal with sparse representations. A reasonable representation of a matrix is a Scheme vector in which each element is a Scheme vector representing a row.
- **b.** Vectors and matrices may contain symbolic numerical quantities. Make this work.
- **c.** Matrix inversion is appropriate for your arithmetic. If a symbolic matrix is dense, the inverse may take space that is factorial in the dimension. Why?
Note: We are not asking you to implement matrix inversion.
# **Exercise 3.7: Literal vectors and matrices**
It is also possible to have arithmetic on literal matrices and literal vectors with an algebra of symbolic expressions of vectors and matrices. Can you make symbolic algebra of these compound

View File

@@ -0,0 +1,703 @@
structures play well with vectors and matrices that have symbolic numerical expressions as elements? Caution: This is quite hard. Perhaps it is appropriate as part of a long-term project.
# **3.3 Example: Automatic differentiation**
One remarkable application of extensible generic procedures is *automatic differentiation*. <sup>14</sup> This is a beautiful way to obtain a program that computes the derivative of the function computed by a given program. <sup>15</sup> Automatic differentiation is now an important component in machine learning applications.
We will see that a simple way to implement automatic differentiation is to extend the generic arithmetic primitives to work with *differential objects*, a new compound data type. This will enable the automatic differentiation of symbolic as well as numerical functions. It will also enable us to make automatic differentiation work with higher-order procedures—procedures that return other procedures as values.
Here is a simple example of automatic differentiation to illustrate what we are talking about:
```
((derivative (lambda (x) (expt x 3))) 2)
12
```
Note that the derivative of the function that computes the cube of its argument is a new function, which when given 2 as its argument returns 12 as its value.
If we extend the arithmetic to handle symbolic expressions, and we do some algebraic simplification on the result, we get:
```
((derivative (lambda (x) (expt x 3))) 'a)
(* 3 (expt a 2))
```
And the full power of the programming language is available, including higher-order procedures. This kind of system is useful in
working with the very large expressions that occur in interesting physics problems. 16
Let's look at a simple application: the computation of the roots of an equation by Newton's method. The idea is that we want to find values of *x* for which *f* (*x*) = 0. If *f* is sufficiently smooth, and we have a sufficiently close guess *x*<sup>0</sup> , we can improve the guess by computing a new guess *x*<sup>1</sup> by the formula:
$$x_{n+1} = x_n - \frac{f(x_n)}{Df(x_n)}$$
This can be repeated, as necessary, to get a sufficiently accurate result. An elementary program to accomplish this is:
```
(define (root-newton f initial-guess tolerance)
(let ((Df (derivative f)))
(define (improve-guess xn)
(- xn (/ (f xn) (Df xn))))
(let loop ((xn initial-guess))
(let ((xn+1 (improve-guess xn)))
(if (close-enuf? xn xn+1 tolerance)
xn+1
(loop xn+1))))))
```
Notice that the local procedure named Df in root-newton is a procedure that computes the derivative of the function computed by the procedure passed in as *f*.
For example, suppose we want to know the angle *θ* in the first quadrant for which cos(*θ*) = sin(*θ*). (The answer is *π/*4 ≈ *.*7853981633974484) We can write:
```
(define (cs theta)
(- (cos theta) (sin theta)))
(root-newton cs 0.5 1e-8)
.7853981633974484
```
This result is correct to full machine accuracy.
#### **3.3.1 How automatic differentiation works**
The program for automatic differentiation is directly derived from the definition of the derivative. Suppose that given a function *f* and a point *x* in its domain, we want to know the value of the function at a nearby point *f* (*x* + Δ*x*), where Δ*x* is a small increment. The derivative of a function *f* is defined to be the function *Df* whose value for particular arguments *x* is something that can be "multiplied" by an increment Δ*x* of the argument to get the best possible linear approximation to the increment in the value of *f*:
$$f(x + \Delta x) \approx f(x) + Df(x) \Delta x$$
We implement this definition using a data type that we call a *differential object*. A differential object [*x, δx*] can be thought of as a number with a small increment, *x* + *δx*. But we treat it as a new numerical quantity similar to a complex number: it has two components, a *finite part* and an *infinitesimal part*. <sup>17</sup> We extend each primitive arithmetic function to work with differential objects: each primitive arithmetic function *f* must know its derivative function *Df* , so that:
$$[x, \delta x] \xrightarrow{f} [f(x), Df(x)\delta x]$$
(3.5)
Note that the derivative of *f* at the point *x*, *Df* (*x*), is the coefficient of *δx* in the infinitesimal part of the resulting differential object.
Now here is the powerful idea: If we then pass the result of *f* ([*x, δx*]) (equation 3.5) through another function *g*, we obtain the chainrule answer we would hope for:
$$[f(x), Df(x)\delta x] \stackrel{g}{\longmapsto} [g(f(x)), Dg(f(x))Df(x)\delta x]$$
Thus, if we can compute the results of all primitive functions on differential objects, we can compute the results of all compositions of functions on differential objects. Given such a result, we can extract the derivative of the composition: the derivative is the coefficient of the infinitesimal increment in the resulting differential object.
To extend a generic arithmetic operator to compute with differential objects, we need only supply a procedure that computes the derivative of the primitive arithmetic function that the operator names. Then we can use ordinary Scheme compositions to get the derivative of any composition of primitive functions. 18
Given a procedure implementing a unary function f, the procedure derivative produces a new procedure the-derivative that computes the derivative of the function computed by f. <sup>19</sup> When applied to some argument, x, the derivative creates a new infinitesimal increment dx and adds it to the argument to get the new differential object [*x, δx*] that represents *x* + *δx*. The procedure f is then applied to this differential object and the derivative of f is obtained by extracting the coefficient of the infinitesimal increment dx from the value:
```
(define (derivative f)
(define (the-derivative x)
(let* ((dx (make-new-dx))
(value (f (d:+ x (make-infinitesimal dx)))))
(extract-dx-part value dx)))
the-derivative)
```
The procedure make-infinitesimal makes a differential object whose finite part is zero and whose infinitesimal part is dx. The procedure d:+ adds differential objects. The details will be explained in section 3.3.3.
#### **Extending the primitives**
We need to make handler procedures that extend the primitive arithmetic generic procedures to operate on differential objects. For each unary procedure we have to make the finite part of the result
and the infinitesimal part of the result, and we have to put the results together, as expressed in equation 3.5. So the handler for a unary primitive arithmetic procedure that computes function *f* is constructed by diff:unary-proc from the procedure f for *f* and the procedure df for its derivative *Df*. These are glued together using special addition and multiplication procedures d:+ and d:\* for differential objects, to be explained in section 3.3.3.
```
(define (diff:unary-proc f df)
(define (uop x) ; x is a differential object
(let ((xf (finite-part x))
(dx (infinitesimal-part x)))
(d:+ (f xf) (d:* (df xf) dx))))
uop)
```
For example, the sqrt procedure handler for differential objects is just:
```
(define diff:sqrt
(diff:unary-proc sqrt (lambda (x) (/ 1 (* 2 (sqrt x))))))
```
The first argument of diff:unary-proc is the sqrt procedure and the second argument is a procedure that computes the derivative of sqrt.
We add the new handler to the generic sqrt procedure using
```
(assign-handler! sqrt diff:sqrt differential?)
```
where differential? is a predicate that is true only of differential objects. The procedure assign-handler! is just shorthand for a useful pattern:
```
(define (assign-handler! procedure handler . preds)
(define-generic-procedure-handler procedure
(apply match-args preds)
handler))
```
And the procedure match-args makes an applicability specification from a sequence of predicates.
Handlers for other unary primitives are straightforward: 20
```
(define diff:exp (diff:unary-proc exp exp))
(define diff:log (diff:unary-proc log (lambda (x) (/ 1 x))))
(define diff:sin (diff:unary-proc sin cos))
(define diff:cos
```
Binary arithmetic operations are a bit more complicated.
$$g(x + \Delta x, y + \Delta y) \approx g(x, y) + \partial_0 g(x, y) \Delta x + \partial_1 g(x, y) \Delta y$$
(3.6)
where $\partial_0 f$ and $\partial_1 f$ are the partial derivative functions of f with respect to the two arguments. Let f be a function of two arguments; then $\partial_0 f$ is a new function of two arguments that computes the partial derivative of f with respect to its first argument:
$$\partial_0 f(x,y) = \left. \frac{\partial}{\partial u} f(u,v) \right|_{u=x,v=y}$$
So the rule for binary operations is
$$([x, \delta x], [y, \delta y]) \xrightarrow{f} [f(x, y), \partial_0 f(x, y) \delta x + \partial_1 f(x, y) \delta y]$$
To implement binary operations we might think that we could simply follow the plan for unary operations, where dof and dlf are the two partial derivative functions:
```
(define (diff:binary-proc f d0f d1f)
(define (bop x y)
(let ((dx (infinitesimal-part x))
(dy (infinitesimal-part y))
(xf (finite-part x))
```
```
(yf (finite-part y)))
(d:+ (f xf yf)
```
This is a good plan, but it isn't quite right: it doesn't ensure that the finite and infinitesimal parts are consistently chosen for the two arguments. We need to be more careful about how we choose the parts. We will explain this technical detail and fix it in section 3.3.3, but let's go with this approximately correct code for now.
Addition and multiplication are straightforward, because the partial derivatives are simple, but division and exponentiation are more interesting. We show the assignment of handlers only for diff:+ because all the others are similar.
```
(define diff:+
(diff:binary-proc +
(lambda (x y) 1)
(lambda (x y) 1))
(assign-handler! + diff:+ differential? any-object?)
(assign-handler! + diff:+ any-object? differential?)
(define diff: *
(diff:binary-proc *
(lambda (x y) y)
(lambda (x y) x))
(define diff:/
(diff:binary-proc /
(lambda (x y)
(/ 1 y))
(lambda (x y))
(* -1 (/ x (square y))))))
```
The handler for exponentiation f(x, y) = x is a bit more complicated. The partial with respect to the first argument is simple: $\partial_0 f(x, y) = yx^{-1}$ . But the partial with respect to the second argument is usually $\partial_1 f(x, y) = x \log x$ , except for some special cases:
```
(define diff:expt
(diff:binary-proc expt
(lambda (x y)
(* y (expt x (- y 1))))
(lambda (x y)
(if (and (number? x) (zero? x))
(if (number? y)
(if (positive? y)
0
(error "Derivative undefined: EXPT"
x y))
0)
(* (log x) (expt x y))))))
```
## **Extracting the derivative's value**
To compute the value of the derivative of a function, we apply the function to a differential object and obtain a result. We have to extract the derivative's value from that result. There are several possibilities that must be handled. If the result is a differential object, we have to pull the derivative's value out of the object. If the result is not a differential object, the derivative's value is zero. There are other cases that we have not mentioned. This calls for a generic procedure with a default that produces a zero.
```
(define (extract-dx-default value dx) 0)
(define extract-dx-part
(simple-generic-procedure 'extract-dx-part 2
extract-dx-default))
```
In the case where a differential object is returned, the coefficient of dx is the required derivative. This will turn out to be a bit complicated, but the basic idea can be expressed as follows:
```
(define (extract-dx-differential value dx)
(extract-dx-coefficient-from (infinitesimal-part value)
dx))
(define-generic-procedure-handler extract-dx-part
(match-args differential? diff-factor?)
extract-dx-differential)
```
The reason this is not quite right is that for technical reasons the structure of a differential object is more complex than we have already shown. It will be fully explained in section 3.3.3.
Note: We made the extractor generic to enable future extensions to functions that return functions or compound objects, such as vectors, matrices, and tensors. (See exercise 3.12 on page 124.)
Except for the fact that there may be more primitive operators and data structures to be included, this is all that is really needed to implement automatic differentiation! All of the procedures referred to in the handlers are the usual generic procedures on arithmetic; they may include symbolic arithmetic and functional arithmetic.
### **3.3.2 Derivatives of n-ary functions**
For a function with multiple arguments we need to be able to compute the partial derivatives with respect to each argument. One way to do this is: 21
```
(define ((partial i) f)
(define (the-derivative . args)
(if (not (< i (length args)))
(error "Not enough arguments for PARTIAL" i f args))
(let* ((dx (make-new-dx))
(value
(apply f (map (lambda (arg j)
(if (= i j)
(d:+ arg
(make-infinitesimal dx))
arg))
args (iota (length args))))))
(extract-dx-part value dx)))
the-derivative)
```
Here we are extracting the coefficient of the infinitesimal dx in the result of applying f to the arguments supplied with the ith argument incremented by dx. 22
Now consider a function *g* of two arguments. Expanding on equation 3.6 we find that the derivative *Dg* is multiplied by a vector of increments to the arguments:
The derivative *Dg* of *g* at the point *x, y* is the pair of partial derivatives in square brackets. The inner product of that *covector* of partials with the *vector* of increments is the increment to the function *g*. The general-derivative procedure computes this result:
```
(define (general-derivative g)
(define ((the-derivative . args) . increments)
(let ((n (length args)))
(assert (= n (length increments)))
(if (= n 1)
(* ((derivative g) (car args))
(car increments))
(reduce (lambda (x y) (+ y x))
0
(map (lambda (i inc)
(* (apply ((partial i) g) args)
inc))
(iota n)
increments)))))
the-derivative)
```
Unfortunately general-derivative does not return the structure of partial derivatives. It is useful in many contexts to have a derivative procedure gradient that actually gives the covector of partial derivatives. (See exercise 3.10.)
# **Exercise 3.8: Partial derivatives**
Another way to think about partial derivatives is in terms of *λ*calculus currying. Draw a diagram of how the data must flow. Use currying to fix the arguments that are held constant, producing a one-argument procedure that the ordinary derivative will be applied to. Write that version of the partial derivative procedure.
# **Exercise 3.9: Adding handlers**
There are primitive arithmetic functions for which we did not add handlers for differential objects, for example tan.
- **a.** Add handlers for tan and atan1 (atan1 is a function of one argument).
- **b.** It would be really nice to have atan optionally take two arguments, as in the Scheme Report [109], because we usually want to preserve the quadrant we are working in. Fix the generic procedure atan to do this correctly—using atan1 for one argument and atan2 if given two arguments. Also, install an atan2 handler for differentials. Remember, it must coexist with the atan1 handler.
# **Exercise 3.10: Vectors and covectors**
As described above, the idea of derivative can be generalized to functions with multiple arguments. The gradient of a function of multiple arguments is the covector of partial derivatives with respect to each of the arguments.
- **a.** Develop data types for vectors and covectors such that the value of *Dg*(*x, y*) is the covector of partials. Write a gradient procedure that delivers that value. Remember, the product of a vector and a covector should be their inner product—the sum of the componentwise products of their elements.
- **b.** Notice that if the input to a function is a vector, that is similar to multiple inputs, so the output of the gradient should be a covector. Note also that if the input to a function is a covector, then the output of the gradient should be a vector. Make this work.
#### **3.3.3 Some technical details**
Although the idea behind automatic differentiation is not complicated, there are a number of subtle technical details that must be addressed for it to work correctly.
### **Differential algebra**
If we want to compute a second derivative we must take a derivative of a derivative function. The evaluation of such a function will have two infinitesimals in play. To enable the computation of multiple derivatives and derivatives of functions of several variables we define an algebra of differential objects in "infinitesimal space." The objects are multivariate power series in which no infinitesimal increment has exponent greater than one. 23
A differential object is represented by a tagged list of the terms of a power series. Each term has a coefficient and a list of infinitesimal incremental factors. The terms are kept sorted, in descending order. (Order is the number of incrementals. So *δxδy* is higher order than *δx* or *δy*.) Here is a quick and dirty implementation: 24
```
(define differential-tag 'differential)
(define (differential? x)
(and (pair? x) (eq? (car x) differential-tag)))
(define (diff-terms h)
(if (differential? h)
(cdr h)
(list (make-diff-term h '()))))
```
The term list is just the cdr of the differential object. However, if we are given an object that is not explicitly a differential object, for example a number, we coerce it to a differential object with a single term and with no incremental factors. When we make a differential object from a (presorted) list of terms, we always try to return a simplified version, which may be just a number, which is not explicitly a differential object:
```
(define (make-differential terms)
(let ((terms ; Nonzero terms
(filter
(lambda (term)
(let ((coeff (diff-coefficient term)))
(not (and (number? coeff) (= coeff 0)))))
terms)))
(cond ((null? terms) 0)
((and (null? (cdr terms))
;; Finite part only:
(null? (diff-factors (car terms))))
(diff-coefficient (car terms)))
((every diff-term? terms)
(cons differential-tag terms))
(else (error "Bad terms")))))
```
In this implementation the terms are also represented as tagged lists, each containing a coefficient and an ordered list of factors.
```
(define diff-term-tag 'diff-term)
(define (make-diff-term coefficient factors)
(list diff-term-tag coefficient factors))
(define (diff-term? x)
(and (pair? x) (eq? (car x) diff-term-tag)))
(define (diff-coefficient x)
(cadr x))
(define (diff-factors x)
(caddr x))
```
To compute derivatives we need to be able to add and multiply differential objects:
```
(define (d:+ x y)
(make-differential
(+diff-termlists (diff-terms x) (diff-terms y))))
(define (d:* x y)
(make-differential
(*diff-termlists (diff-terms x) (diff-terms y))))
```
and we also need this:
```
(define (make-infinitesimal dx)
(make-differential (list (make-diff-term 1 (list dx)))))
```
Addition of term lists is where we enforce and use the sorting of terms, with higher-order terms coming earlier in the lists. We can add two terms only if they have the same factors. And if the sum of the coefficients is zero we do not include the resulting term.
```
(define (+diff-termlists l1 l2)
(cond ((null? l1) l2)
((null? l2) l1)
(else
(let ((t1 (car l1)) (t2 (car l2)))
(cond ((equal? (diff-factors t1) (diff-factors
t2))
(let ((newcoeff (+ (diff-coefficient t1)
(diff-coefficient t2))))
(if (and (number? newcoeff)
(= newcoeff 0))
(+diff-termlists (cdr l1) (cdr l2))
(cons
(make-diff-term newcoeff
(diff-factors t1))
(+diff-termlists (cdr l1)
(cdr l2))))))
((diff-term>? t1 t2)
(cons t1 (+diff-termlists (cdr l1) l2)))
(else
(cons t2
(+diff-termlists l1 (cdr l2)))))))))
```
Multiplication of term lists is straightforward, if we can multiply individual terms. The product of two term lists l1 and l2 is the term list resulting from adding up the term lists resulting from multiplying every term in l1 by every term in l2.
```
(define (*diff-termlists l1 l2)
(reduce (lambda (x y)
(+diff-termlists y x))
'()
(map (lambda (t1)
(append-map (lambda (t2)
(*diff-terms t1 t2))
l2))
l1)))
```
A term has a coefficient and a list of factors (the infinitesimals). In a differential object no term may have an infinitesimal with an exponent greater than one, because *δx* <sup>2</sup> = 0. Thus, when we multiply two terms we must check that the lists of factors we are merging have no factors in common. This is the reason that \*diff-terms returns a list of the product term or an empty list, to be appended in \*diff-termlists. We keep the factors sorted when we merge the two lists of factors; this makes it easier to sort the terms.
```
(define (*diff-terms x y)
(let ((fx (diff-factors x)) (fy (diff-factors y)))
(if (null? (ordered-intersect diff-factor>? fx fy))
(list (make-diff-term
(* (diff-coefficient x) (diff-coefficient y))
(ordered-union diff-factor>? fx fy)))
'())))
```
### **Finite and infinitesimal parts**
A differential object has a finite part and an infinitesimal part. Our diff:binary-proc procedure on page 109 is not correct for differential objects with more than one infinitesimal. To ensure that the parts of the arguments x and y are selected consistently we actually use:
```
(define (diff:binary-proc f d0f d1f)
(define (bop x y)
(let ((factor (maximal-factor x y)))
(let ((dx (infinitesimal-part x factor))
(dy (infinitesimal-part y factor))
(xe (finite-part x factor))
(ye (finite-part y factor)))
(d:+ (f xe ye)
(d:+ (d:* dx (d0f xe ye))
(d:* (d1f xe ye) dy))))))
bop)
```
where factor is chosen by maximal-factor so that both x and y contain it in a term with the largest number of factors.
The finite part of a differential object is all terms except for terms containing the maximal factor in a term of highest order, and the
infinitesimal part is the remaining terms, all of which contain that factor.
Consider the following computation:
The highest-order term is *∂*0*∂*<sup>1</sup> *f* (*x, y*) · *δxδy*. It is symmetrical with respect to *x* and *y*. The crucial point is that we may break the differential object into parts in any way consistent with any one of the maximal factors (here *δx* or *δy*) being primary. It doesn't matter which is chosen, because mixed partials of **R → R** commute. 25
```
(define (finite-part x #!optional factor)
(if (differential? x)
(let ((factor (default-maximal-factor x factor)))
(make-differential
(remove (lambda (term)
(memv factor (diff-factors term)))
(diff-terms x))))
x))
(define (infinitesimal-part x #!optional factor)
(if (differential? x)
(let ((factor (default-maximal-factor x factor)))
(make-differential
(filter (lambda (term)
(memv factor (diff-factors term)))
(diff-terms x))))
0))
(define (default-maximal-factor x factor)
(if (default-object? factor)
(maximal-factor x)
factor))
```
## **How extracting really works**
As explained on page 114, to make it possible to take multiple derivatives or to handle functions with more than one argument, a
differential object is represented as a multivariate power series in which no infinitesimal increment has exponent greater than one. Each term in this series has a coefficient and a list of infinitesimal incremental factors. This complicates the extraction of the derivative with respect to any one incremental factor. Here is the real story:
In the case where a differential object is returned we must find those terms of the result that contain the infinitesimal factor dx for the derivative we are evaluating. We collect those terms, removing dx from each. If there are no terms left after taking out the ones with dx, the value of the derivative is zero. If there is exactly one term left, which has no differential factors, then the coefficient of that term is the value of the derivative. But if there are remaining terms with differential factors, we must return the differential object with those residual terms as the value of the derivative.
```
(define (extract-dx-differential value dx)
(let ((dx-diff-terms
(filter-map
(lambda (term)
(let ((factors (diff-factors term)))
(and (memv dx factors)
(make-diff-term (diff-coefficient term)
(delv dx factors)))))
(diff-terms value))))
(cond ((null? dx-diff-terms) 0)
((and (null? (cdr dx-diff-terms))
(null? (diff-factors (car dx-diff-terms))))
(diff-coefficient (car dx-diff-terms)))
(else (make-differential dx-diff-terms)))))
(define-generic-procedure-handler extract-dx-part
(match-args differential? diff-factor?)
extract-dx-differential)
```
## **Higher-order functions**
For many applications we want our automatic differentiator to work correctly for functions that return functions as values:
```
(((derivative
(lambda (x)
(lambda (y z)
(* x y z))))
2)
3
4)
;Value: 12
```
Including literal functions and partial derivatives makes this even more interesting.
```
((derivative
(lambda (x)
(((partial 1) (literal-function 'f))
x 'v)))
'u)
(((partial 0) ((partial 1) f)) u v)
```
And things can get even more complicated:
```
(((derivative
(lambda (x)
(derivative
(lambda (y)
((literal-function 'f)
x y)))))
'u)
'v)
(((partial 0) ((partial 1) f)) u v)
```
Making this work introduces serious complexity in the procedure extract-dx-part.
If the result of applying a function to a differential object is a function—a derivative of a derivative, for example—we need to defer the extraction until that function is called with arguments:
In a case where a function is returned, as in
```
(((derivative
(lambda (x)
(derivative
(lambda (y)
(* x y)))))
'u)
```
```
'v)
1
```
we cannot extract the derivative until the function is applied to arguments. So we defer the extraction until we get the value resulting from that application. We extend our generic extractor:
```
(define (extract-dx-function fn dx)
(lambda args
(extract-dx-part (apply fn args) dx)))
(define-generic-procedure-handler extract-dx-part
(match-args function? diff-factor?)
extract-dx-function)
```
Unfortunately, this version of extract-dx-function has a subtle bug. <sup>26</sup> Our patch is to wrap the body of the new deferred procedure with code that remaps the factor dx to avoid the unpleasant conflict. So, we change the handler for functions to:
```
(define (extract-dx-function fn dx)
(lambda args
(let ((eps (make-new-dx)))
(replace-dx dx eps
(extract-dx-part
(apply fn
(map (lambda (arg)
(replace-dx eps dx arg))
args))
dx)))))
```
This creates a brand-new factor eps and uses it to stand for dx in the arguments, thus preventing collision with any other instances of dx.
Replacement of the factors is itself a bit more complicated, because the code has to grovel around in the data structures. We will make the replacement a generic procedure, so we can extend it to new kinds of data. The default is that the replacement is just the identity on the object:
```
(define (replace-dx-default new-dx old-dx object) object)
(define replace-dx
```
```
(simple-generic-procedure 'replace-dx 3
replace-dx-default))
```
For a differential object we have to actually go in and substitute the new factor for the old one, and we have to keep the factor lists sorted:
```
(define (replace-dx-differential new-dx old-dx object)
(make-differential
(sort (map (lambda (term)
(make-diff-term
(diff-coefficient term)
(sort (substitute new-dx old-dx
(diff-factors term))
diff-factor>?)))
(diff-terms object))
diff-term>?)))
(define-generic-procedure-handler replace-dx
(match-args diff-factor? diff-factor? differential?)
replace-dx-differential)
```
Finally, if the object is itself a function we have to defer it until arguments are available to compute a value:
```
(define (replace-dx-function new-dx old-dx fn)
(lambda args
(let ((eps (make-new-dx)))
(replace-dx old-dx eps
(replace-dx new-dx old-dx
(apply fn
(map (lambda (arg)
(replace-dx eps old-dx arg))
args)))))))
(define-generic-procedure-handler replace-dx
(match-args diff-factor? diff-factor? function?)
replace-dx-function)
```
This is quite a bit more complicated than we might expect. It actually does three replacements of the differential factors. This is to prevent collisions with factors that may be free in the body of fn that are inherited from the lexical environment of definition of the function fn. 27
# **Exercise 3.11: The bug!**
Before we became aware of the bug pointed out in footnote 26 on page 121, the procedure extract-dx-function was written:
```
(define (extract-dx-function fn dx)
(lambda args
(extract-dx-part (apply fn args) dx)))
```
Demonstrate the reason for the use of the replace-dx wrapper by constructing a function whose derivative is wrong with this earlier version of extract-dx-part but is correct in the fixed version. This is not easy! You may want to read the references pointed at in footnote 26.
#### **3.3.4 Literal functions of differential arguments**
For simple arguments, applying a literal function is just a matter of constructing the expression that is the application of the function expression to the arguments. But literal functions must also be able to accept differential objects as arguments. When that happens, the literal function must construct (partial) derivative expressions for the arguments that are differentials. For the ith argument of an nargument function the appropriate derivative expression is:
```
(define (deriv-expr i n fexp)
(if (= n 1)
'(derivative ,fexp)
'((partial ,i) ,fexp)))
```
Some arguments may be differential objects, so a literal function must choose, for each argument, a finite part and an infinitesimal part. Just as for binary arithmetic handlers, the maximal factor must be consistently chosen. Our literal functions are able to take many arguments, so this may seem complicated, but we wrote the maximal-factor procedure to handle many arguments. This is explained in section 3.3.3.
If there are no differential objects among the arguments we just cons up the required expression. If there are differential objects we need to make a derivative of the literal function. To do this we find a maximal factor from all of the arguments and separate out the finite parts of the arguments—the terms that do not have that factor. (The infinitesimal parts are the terms that have that factor.) The partial derivatives are themselves literal functions with expressions that are constructed to include the argument index. The resulting differential object is the inner product of the partial derivatives at the finite parts of the arguments with the infinitesimal parts of the arguments.
This is all brought together in the following procedure:
```
(define (literal-function fexp)
(define (the-function . args)
(if (any differential? args)
(let ((n (length args))
(factor (apply maximal-factor args)))
(let ((realargs
(map (lambda (arg)
(finite-part arg factor))
args))
(deltargs
(map (lambda (arg)
(infinitesimal-part arg factor))
args)))
(let ((fxs (apply the-function realargs))
(partials
(map (lambda (i)
(apply (literal-function
(deriv-expr i n fexp))
realargs))
(iota n))))
(fold d:+ fxs
(map d:* partials deltargs)))))
'(,fexp ,@args)))
the-function)
```
# **Exercise 3.12: Functions with structured values**

View File

@@ -0,0 +1,260 @@
We made the extract-dx-part procedure generic (page 110) so we could extend it for values other than differential objects and functions. Extend extract-dx-part to work with derivatives of functions that return vectors. Note: You also have to extend the replace-dx generic procedure (page 122) in the extractor.
# **3.4 Efficient generic procedures**
In section 3.2.3 we dispatched to a handler by finding an applicable rule using the dispatch store provided in the metadata:
```
(define (generic-procedure-dispatch metadata args)
(let ((handler
(get-generic-procedure-handler metadata args)))
(apply handler args)))
```
The implementation of the dispatch store (on page 98) we used (on page 89) to make the simple-generic-procedure constructor was rather crude. The simple dispatch store maintains the rule set as a list of rules. Each rule is represented as a pair of an applicability and a handler. The applicability is a list of lists of predicates to apply to tendered arguments. The way a generic procedure constructed by simple-generic-procedure finds an appropriate handler is to sequentially scan the list of rules looking for an applicability that is satisfied by the arguments.
This is seriously inefficient, because the applicability of many rules may have the same predicate in a given operand position: For example, for multiplication in a system of numerical and symbolic arithmetic there may be many rules whose first predicate is number?. So the number? predicate may be applied many times before finding an applicable rule. It would be good to organize the rules so that finding an applicable one does not perform redundant tests. This is usually accomplished by the use of an index.
#### **3.4.1 Tries**
One simple index mechanism is based on the *trie*. 28
A trie is traditionally a tree structure, but more generally it may be a directed graph. Each node in the trie has edges connecting to successor nodes. Each edge has an associated predicate. The data being tested is a linear sequence of features, in this case the arguments to a generic procedure.
Starting at the root of the trie, the first feature is taken from the sequence and is tested by each predicate on an edge emanating from the root node. The successful predicate's edge is followed to the next node, and the process repeats with the remainder of the sequence of features. When we run out of features, the current node will contain the associated value, in this case an applicable handler for the arguments.
It is possible that at any node, more than one predicate may succeed. If this happens, then all of the successful branches must be followed. Thus there may be multiple applicable handlers, and there must be a separate means of deciding what to do.
Here is how we can use a trie. Evaluating the following sequence of commands will [incrementally](#page-1-0) construct the trie shown in figure 3.1.
<span id="page-1-1"></span><span id="page-1-0"></span>![](설계원칙-162-170_images/_page_1_Figure_5.jpeg)
**[Figure](#page-1-1) 3.1** A trie can be used to classify sequences of features. A trie is a directed graph in which each edge has a predicate. Starting at the root, the first feature is tested by each predicate on an edge proceeding from the root. If a predicate is satisfied, the process moves to the node at the end of that edge and the next feature is tested. This is repeated with successive features. The classification of the sequence is the set of terminal nodes arrived at.
```
(define a-trie (make-trie))
```
We can add an edge to this trie
```
(define s (add-edge-to-trie a-trie symbol?))
```
where add-edge-to-trie returns the new node that is at the target end of the new edge. This node is reached by being matched against a symbol.
We can make chains of edges, which are referenced by lists of the corresponding edge predicates
```
(define sn (add-edge-to-trie s number?))
```
The node sn is reached from the root via the path (list symbol? number?). Using a path, there is a simpler way to make a chain of edges than repeatedly calling add-edge-to-trie:
```
(define ss (intern-path-trie a-trie (list symbol? symbol?)))
```
We can add a value to any node (here we show symbolic values, but we will later store values that are procedural handlers):
```
(trie-has-value? sn)
#f
(set-trie-value! sn '(symbol number))
(trie-has-value? sn)
#t
(trie-value sn)
(symbol number)
```
We can also use a path-based interface to set values
```
(set-path-value! a-trie (list symbol? symbol?)
'(symbol symbol))
(trie-value ss)
(symbol symbol)
```
Note that both intern-path-trie and set-path-value! reuse existing nodes and edges when possible, adding edges and nodes
where necessary.
Now we can match a feature sequence against the trie we have constructed so far:
```
(equal? (list ss) (get-matching-tries a-trie '(a b)))
#t
(equal? (list s) (get-matching-tries a-trie '(c)))
#t
```
We can also combine matching with value fetching. The procedure get-a-value finds all matching nodes, picks one that has a value, and returns that value.
```
(get-a-value a-trie '(a b))
(symbol symbol)
```
But not all feature sequences have an associated value:
```
(get-a-value a-trie '(-4))
;Unable to match features: (-4)
```
We can incrementally add values to nodes in the trie:
```
(set-path-value! a-trie (list negative-number?)
'(negative-number))
(set-path-value! a-trie (list even-number?)
'(even-number))
(get-all-values a-trie '(-4))
((even-number) (negative-number))
```
where get-all-values finds all the nodes matching a given feature sequence and returns their values.
Given this trie implementation, we can make a dispatch store that uses a trie as its index:
```
(define (make-trie-dispatch-store)
(let ((delegate (make-simple-dispatch-store))
(trie (make-trie)))
(define (get-handler args)
(get-a-value trie args))
(define (add-handler! applicability handler)
((delegate 'add-handler!) applicability handler)
```
```
(for-each (lambda (path)
(set-path-value! trie path handler))
applicability))
(lambda (message)
(case message
((get-handler) get-handler)
((add-handler!) add-handler!)
(else (delegate message))))))
```
We make this dispatch store simple by delegating most of the operations to a simple dispatch store. The operations that are not delegated are add-handler!, which simultaneously stores the handler in the simple dispatch store and also in the trie, and gethandler, which exclusively uses the trie for access. The simple dispatch store manages the default handler and also the set of rules, which is useful for debugging. This is a simple example of the use of delegation to extend an interface, as opposed to the better-known inheritance idea.
### **Exercise 3.13: Trie rules**
To make it easy to experiment with different dispatch stores, we gave generic-procedure-constructor and make-genericarithmetic the dispatch store maker. For example, we can build a full generic arithmetic as on page 95 but using make-triedispatch-store as follows:
```
(define trie-full-generic-arithmetic
(let ((g (make-generic-arithmetic make-trie-dispatch-
store)))
(add-to-generic-arithmetic! g numeric-arithmetic)
(extend-generic-arithmetic! g function-extender)
(add-to-generic-arithmetic! g
(symbolic-extender numeric-arithmetic))
g))
(install-arithmetic! trie-full-generic-arithmetic)
```
**a.** Does this make any change to the dependence on order that we wrestled with in section 3.2.2?
- **b.** In general, what characteristics of the predicates could produce situations where there is more than one appropriate handler for a sequence of arguments?
- **c.** Are there any such situations in our generic arithmetic code?
We have provided a crude tool to measure the effectiveness of our dispatch strategy. By wrapping any computation with withpredicate-counts we can find out how many times each dispatch predicate is called in an execution. For example, evaluating (fib 20) in a generic arithmetic with a trie-based dispatch store may yield something like this: 29
```
(define (fib n)
(if (< n 2)
n
(+ (fib (- n 1)) (fib (- n 2)))))
(with-predicate-counts (lambda () (fib 20)))
(109453 number)
(109453 function)
(54727 any-object)
(109453 symbolic)
6765
```
## **Exercise 3.14: Dispatch efficiency: gotcha!**
Given this performance tool it is instructive to look at executions of
```
(define (test-stormer-counts)
(define (F t x) (- x))
(define numeric-s0
(make-initial-history 0 .01 (sin 0) (sin -.01) (sin
-.02)))
(with-predicate-counts
(lambda ()
(x 0 ((evolver F 'h stormer-2) numeric-s0 1)))))
```
for the rule-listbased dispatch in make-simple-dispatch-store, in the arithmetic you get by:
```
(define full-generic-arithmetic
(let ((g (make-generic-arithmetic make-simple-dispatch-
store)))
(add-to-generic-arithmetic! g numeric-arithmetic)
(extend-generic-arithmetic! g function-extender)
(add-to-generic-arithmetic! g
(symbolic-extender numeric-arithmetic))
g))
(install-arithmetic! full-generic-arithmetic)
```
and the trie-based version (exercise 3.13), in the arithmetic you get by:
```
(install-arithmetic! trie-full-generic-arithmetic)
```
For some problems the trie should have much better performance than the simple rule list. We expect that the performance will be better with the trie if we have a large number of rules with the same initial segment.
Understanding this is important, because the fact that sometimes the trie does not help with the performance appears counterintuitive. We explicitly introduced the trie to avoid redundant calls. Explain this phenomenon in a concise paragraph.
For an additional insight, look at the performance of (fib 20) in the two implementations.
When more than one handler is applicable for a given sequence of arguments, it is not clear how to use those handlers; addressing this situation is the job of a *resolution policy*. There are many considerations when designing a resolution policy. For example, a policy that chooses the most specific handler is often a good policy; however, we need more information to implement such a policy. Sometimes it is appropriate to run all of the applicable handlers and compare their results. This can be used to catch errors and provide a kind of redundancy. Or if we have partial information provided by each handler, such as a numerical interval, the results of different handlers can be combined to provide better information.
#### **3.4.2 Caching**
With the use of tries we have eliminated redundant evaluation of argument predicates. We can do better by using abstraction to eliminate the evaluation of predicates altogether. A predicate identifies a set of objects that are distinguished from all other objects; in other words, the predicate and the set it distinguishes are effectively the same. In our trie implementation, we use the equality of the predicate procedures to avoid redundancy; otherwise we would have redundant edges in the trie and it would be no help at all. This is also why the use of combinations of predicates doesn't mix well with the trie implementation.
The problem here is that we want to build an index that discriminates objects according to predicates, but the opacity of procedures makes them unreliable when used as keys to the index. What we'd really like is to assign a name to the set distinguished by a given predicate. If we had a way to get that name from a given object by superficial examination, we could avoid computing the predicate at all. This name is a "type"; but in order to avoid confusion we will refer to this name as a *tag*.
Given a way to get a tag from an object, we can make a cache that saves the handler resulting from a previous dispatch and reuses it for other dispatches whose arguments have the same tag pattern. But in the absence of explicitly attached tags, there are limitations to this approach, because we can only discriminate objects that share an implementation-specified representation. For example, it's easy to distinguish between a number and a symbol, but it's not easy to distinguish a prime number, because it's unusual for an implementation to represent them specially.
We will return to the problem of explicit tagging in section 3.5, but in the meantime it is still possible to make a useful cache using the representation tags from the Scheme implementation. Given an implementation-specific procedure implementation-type-name to obtain the representation tag of an object, we can make a cached dispatch store:
```
(define a-cached-dispatch-store
(cache-wrapped-dispatch-store (make-trie-dispatch-store)
implementation-type-name))
```
This dispatch store wraps a cache around a trie dispatch store, but it could just as well wrap a simple dispatch store.
The heart of the cached dispatch store is a memoizer built on a hash table. The key for the hash table is the list of representation tags extracted by the implementation-type-name procedure from the arguments. By passing implementation-type-name into this dispatch-store wrapper (as get-key) we can use it to make cached dispatch stores for more powerful tag mechanisms that we will develop soon.
```
(define (cache-wrapped-dispatch-store dispatch-store get-key)
(let ((get-handler
(simple-list-memoizer
eqv?
(lambda (args) (map get-key args))
(dispatch-store 'get-handler))))
(lambda (message)
(case message
((get-handler) get-handler)
(else (dispatch-store message))))))
```
The call to simple-list-memoizer wraps a cache around its last argument, producing a memoized version of it. The second argument specifies how to get the cache key from the procedure's arguments. The eqv? argument specifies how the tags will be identified in the cache.
## **Exercise 3.15: Cache performance**
Using the same performance tool we introduced for exercise 3.14 on page 130, make measurements for execution of (test-stormercounts) and (fib 20) in the cached version of dispatch with the same generic arithmetics explored in exercise 3.14. Record your results. How do they compare?

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

View File

@@ -0,0 +1,715 @@
### **3.5 Efficient user-defined types**
In section 3.4.2 we introduced tags as part of a caching mechanism for dispatch. Each argument is mapped to a tag, and the list of tags is then used as a key in a cache to obtain the handler. If the cache has a handler associated with this list of tags, it is used. If not, the trie of predicates is used to find the appropriate handler and it is entered into the cache associated with the list of tags.
This mechanism is pretty crude: the predicates that can be used for the applicability specifications are restricted to those that always give the same boolean value for any two objects with the same tag. So the discrimination of types cannot be any finer than the available tags. The tags were implementation-specific symbols, such as pair, vector, or procedure. So this severely limits the possible predicates. We could not have rules that distinguish between integers that satisfy even-integer? and integers that satisfy oddinteger?, for example.
What is needed is a system of tagging that makes it computationally easy to obtain the tag associated with a data item, but where the tags are not restricted to a small set of implementation-specific values. This can be accomplished by attaching a tag to each data item, either with an explicit data structure or via a table of associations.
We have several problems interwoven here: we want to use predicates in applicability specifications; we want an efficient mechanism for dispatch; and we want to be able to specify relationships between predicates that can be used in the dispatch. For example, we want to be able to say that the predicate integer? is the disjunction of the predicates even-integer? and oddinteger?, and also that integer? is the disjunction of positiveinteger?, negative-integer?, and zero?.
To capture such relationships we need to put metadata on the predicates; but adding an associative lookup to get the metadata of a predicate, as we did with the arity of a function (on page 28), adds
too much overhead, because the metadata will contain references to other tags, and chasing these references must be efficient.
One way out is to register the needed predicates. Registration creates a new kind of tag, a data structure that is associated with the predicate. The tag will be easy to attach to objects that are accepted by the predicate. The tag will provide a convenient place to store metadata.
We will construct a system in which each distinct object can have only one tag and where relationships between predicates can be declared. This may appear to be overly simple, but it is adequate for our purposes.
#### **3.5.1 Predicates as types**
Let's start with some *simple predicates*. For example, the primitive procedure exact-integer? is preregistered in our system as a simple predicate:
```
(predicate? exact-integer?)
#t
```
Now let's define a new predicate that's not a primitive. We will build it on this particularly slow test for prime numbers.
```
(define (slow-prime? n)
(and (n:exact-positive-integer? n)
(n:>= n 2)
(let loop ((k 2))
(or (n:> (n:square k) n)
(and (not (n:= (n:remainder n k) 0))
(loop (n:+ k 1)))))))
```
Note that all of the arithmetic operators are prefixed with n: to ensure that we get the underlying Scheme operations.
We construct the prime-number? abstract predicate, with a name for use in error messages and a criterion, slow-prime?, for an object to be considered a prime number:
```
(define prime-number?
(simple-abstract-predicate 'prime-number slow-prime?))
```
The procedure simple-abstract-predicate creates an *abstract predicate*, which is a clever trick for memoizing the result of an expensive predicate (in this case slow-prime?). An abstract predicate has an associated constructor that is used to make a *tagged object*, consisting of the abstract predicate's tag and an object. The constructor requires that the object to be tagged satisfies the expensive predicate. The resulting tagged object satisfies the abstract predicate, as well as carrying its tag. Consequently the tagged object can be tested for the property defined by the expensive predicate by using the fast abstract predicate (or, equivalently, by dispatching on its tag).
For example, the abstract predicate prime-number? is used to tag objects that are verified prime numbers, for the efficient implementation of generic dispatch. This is important because we do not want to execute slow-prime? during the dispatch to determine whether a number is prime. So we build a new *tagged object*, which contains both a *tag* (the tag for prime-number?) and a *datum* (the raw prime number). When a generic procedure is handed a tagged object, it can efficiently retrieve its tag and use that as a cache key.
In order to make tagged objects, we use predicateconstructor to get the constructor associated with the abstract predicate:
```
(define make-prime-number
(predicate-constructor prime-number?))
(define short-list-of-primes
(list (make-prime-number 2)
(make-prime-number 7)
(make-prime-number 31)))
```
The constructor make-prime-number requires that its argument be prime, as determined by slow-prime?: the only objects that can be tagged by this constructor are prime numbers.
```
(make-prime-number 4)
;Ill-formed data for prime-number: 4
```
#### **3.5.2 Relationships between predicates**
The sets that we can define with abstract predicates can be related to one another. For example, the primes are a subset of the positive integers. The positive integers, the even integers, and the odd integers are subsets of the integers. This is important because any operation that is applicable to an integer is applicable to any element of any subset, but there are operations that can be applied to an element of a subset that cannot be applied to all elements of an enclosing superset. For example, the even integers can be halved without leaving a remainder, but that is not true of the full integers.
When we defined prime-number?, we effectively defined a set of objects. But that set has no relation to the set defined by exactinteger?:
```
(exact-integer? (make-prime-number 2))
#f
```
We would like these sets to be properly related, which is done by adding some metadata to the predicates themselves:
```
(set-predicate<=! prime-number? exact-integer?)
```
This procedure set-predicate<=! modifies the metadata of its argument predicates to indicate that the set defined by the first argument is a (non-strict) subset of the set defined by the second argument. In our case, the set defined by prime-number? is declared to be a subset of the set defined by exact-integer?. Once this is done, exact-integer? will recognize our objects:
```
(exact-integer? (make-prime-number 2))
#t
```
#### **3.5.3 Predicates are dispatch keys**
The abstract predicates we have defined are suitable for use in generic dispatch. Even better, they can be used as cache keys to make dispatch efficient. As we described above, when a predicate is registered, a new tag is created and associated with the predicate. All
we need is a way to get the tag for a given object: the procedure get-tag does this.
If we pass get-tag to cache-wrapped-dispatch-store as its get-key argument, we have a working implementation. However, since the set defined by a predicate can have subsets, we need to consider a situation where there are multiple potential handlers for some given arguments. There are a number of possible ways to resolve this situation, but the most common is to identify the "most specific" handler by some means, and invoke that one. Since the subset relation is a partial order, it may not be clear which handler is most specific, so the implementation must resolve the ambiguity by independent means.
Here is one such implementation. It uses a procedure rule< to sort the matching rules into an appropriate order, then chooses a handler from the result. 30
```
(define (make-subsetting-dispatch-store-maker choose-handler)
(lambda ()
(let ((delegate (make-simple-dispatch-store)))
(define (get-handler args)
(let ((matching
(filter (lambda (rule)
(is-generic-handler-applicable?
rule args))
((delegate 'get-rules)))))
(and (n:pair? matching)
(choose-handler ; from sorted handlers
(map cdr (sort matching rule<))
((delegate 'get-default-handler))))))
(lambda (message)
(case message
((get-handler) get-handler)
(else (delegate message)))))))
```
The procedure make-most-specific-dispatch-store chooses the first of the sorted handlers to be the effective handler:
```
(define make-most-specific-dispatch-store
(make-subsetting-dispatch-store-maker
(lambda (handlers default-handler)
(car handlers))))
```
Another possible choice is to make a "chaining" dispatch store, in which each handler gets an argument that can be used to invoke the next handler in the sorted sequence. This is useful for cases where a subset handler wants to extend the behavior of a superset handler rather than overriding it. We will see an example of this in the clock handler of the adventure game in section 3.5.4.
```
(define make-chaining-dispatch-store
(make-subsetting-dispatch-store-maker
(lambda (handlers default-handler)
(let loop ((handlers handlers))
(if (pair? handlers)
(let ((handler (car handlers))
(next-handler (loop (cdr handlers))))
(lambda args
(apply handler (cons next-handler args))))
default-handler)))))
```
Either one of these dispatch stores can be made into a cached dispatch store by adding a caching wrapper:
```
(define (make-cached-most-specific-dispatch-store)
(cache-wrapped-dispatch-store
(make-most-specific-dispatch-store)
get-tag))
(define (make-cached-chaining-dispatch-store)
(cache-wrapped-dispatch-store
(make-chaining-dispatch-store)
get-tag))
```
Then we create the corresponding generic-procedure constructors:
```
(define most-specific-generic-procedure
(generic-procedure-constructor
make-cached-most-specific-dispatch-store))
(define chaining-generic-procedure
(generic-procedure-constructor
make-cached-chaining-dispatch-store))
```
### **3.5.4 Example: An adventure game**
One traditional way to model a world is "object-oriented programming." The idea is that the world being modeled is made up of objects, each of which has independent local state, and the coupling between the objects is loose. Each object is assumed to have particular behaviors. An object may receive messages from other objects, change its state, and send messages to other objects. This is very natural for situations where the behavior we wish to model does not depend on the collaboration of multiple sources of information: each message comes from one other object. This is a tight constraint on the organization of a program.
There are other ways to break a problem into pieces. We have looked at "arithmetic" enough to see that the meaning of an operator, such as \*, can depend on the properties of multiple arguments. For example, the product of a number and a vector is a different operation from the product of two vectors or of two numbers. This kind of problem is naturally formulated in terms of generic procedures. 31
Consider the problem of modeling a world made of "places," "things," and "people" with generic procedures. How should the state variables that are presumed to be local to the entities be represented and packaged? What operations are appropriately generic over what kinds of entities? Since it is natural to group entities into types (or sets) and to express some of the operations as appropriate for all members of an inclusive set, how is subtyping to be arranged? Any object-oriented view will prescribe specific answers to these design questions; here we have more freedom, and must design the conventions that will be used.
To illustrate this process we will build a world for a simple adventure game. There is a network of rooms connected by passages and inhabited by a variety of creatures, some of which are *autonomous* in that they can wander around. There is an *avatar* that is controlled by the player. There are things, some of which can be picked up and carried by the creatures. There are ways that the creatures can interact: a troll can bite another creature and damage it; any creature can take a thing carried by another creature.
Every entity in our world has a set of named properties. Some of these are fixed and others are changeable. For example, a room has exits to other rooms. These represent the topology of the network and cannot be changed. A room also has contents, such as the creatures who are currently in the room and things that may be acquired. The contents of a room change as creatures move around and as they carry things to and from other rooms. We will computationally model this set of named properties as a table from names to property values.
There is a set of generic procedures that are appropriate for this world. For example, some things, such as books, creatures, and the avatar, are movable. In every case, moving a thing requires deleting it from the contents of the source, adding it to the contents of the destination, and changing its location property. This operation is the same for books, people, and trolls, all of which are members of the "movable things" set.
A book can be read; a person can say something; a troll can bite a creature. To implement these behaviors there are specific properties of books that are different from the properties of people or those of trolls. But these different kinds of movable things have some properties in common, such as location. So when such a thing is instantiated, it must make a table for all of its properties, including those inherited from more inclusive sets. The rules for implementing the behavior of operators such as move must be able to find appropriate handlers for manipulating the properties in each case.
#### **The game**
Our game is played on a rough topological map of MIT. There are various autonomous agents (non-player characters), such as students and officials. The registrar, for example, is a troll. There are movable and immovable things, and movable things can be taken by an autonomous agent or the player's avatar. Although this game has little detail, it can be expanded to be very interesting.
We create a session with an avatar named gjs who appears in a random place. The game tells the player about the environment of the avatar.
```
(start-adventure 'gjs)
You are in dorm-row
You see here: registrar
You can exit: east
```
Since the registrar is here it is prudent to leave! (He may bite, and after enough bites the avatar will die.)
```
(go 'east)
gjs leaves via the east exit
gjs enters lobby-7
You are in lobby-7
You can see: lobby-10 infinite-corridor
You can exit: up west east
alyssa-hacker enters lobby-7
alyssa-hacker says: Hi gjs
ben-bitdiddle enters lobby-7
ben-bitdiddle says: Hi alyssa-hacker gjs
registrar enters lobby-7
registrar says: Hi ben-bitdiddle alyssa-hacker gjs
```
Notice that several autonomous agents arrive after the avatar, and that they do so one at a time. So we see that the report is for an interval of simulated time rather than a summary of the state at an instant. This is an artifact of our implementation rather than a deliberate design choice.
Unfortunately the registrar has followed, so it's time to leave again.
```
(say "I am out of here!")
gjs says: I am out of here!
(go 'east)
gjs leaves via the east exit
gjs enters lobby-10
You are in lobby-10
You can see: lobby-7 infinite-corridor great-court
You can exit: east south west up
```
```
(go 'up)
gjs leaves via the up exit
gjs enters 10-250
You are in 10-250
You see here: blackboard
You can exit: up down
```
Room 10-250 is a lecture hall, with a large blackboard. Perhaps we can take it?
```
(take-thing 'blackboard)
blackboard is not movable
```
So sad—gjs loves blackboards. Let's keep looking around.
```
(go 'up)
gjs leaves via the up exit
gjs enters barker-library
You are in barker-library
You see here: engineering-book
You can exit: up down
An earth-shattering, soul-piercing scream is heard...
```
Apparently, a troll (maybe the registrar) has eaten someone. However, here is a book that should be takable, so we take it and return to the lecture hall.
```
(take-thing 'engineering-book)
gjs picks up engineering-book
(go 'down)
gjs leaves via the down exit
gjs enters 10-250
You are in 10-250
Your bag contains: engineering-book
You see here: blackboard
You can exit: up down
```
From the lecture hall we return to lobby-10, where we encounter lambda-man, who promptly steals our book.
```
(go 'down)
gjs leaves via the down exit
gjs enters lobby-10
gjs says: Hi lambda-man
```
```
You are in lobby-10
Your bag contains: engineering-book
You see here: lambda-man
You can see: lobby-7 infinite-corridor great-court
You can exit: east south west up
alyssa-hacker enters lobby-10
alyssa-hacker says: Hi gjs lambda-man
lambda-man takes engineering-book from gjs
gjs says: Yaaaah! I am upset!
```
#### **The object types**
To create an object in our game, we define some properties with make-property, define a type predicate with make-type, get the predicate's associated instantiator with type-instantiator, and call that instantiator with appropriate arguments.
How do we make a troll? The make-troll constructor for a troll takes arguments that specify the values for properties that are specific to the particular troll being constructed. The troll will be created in a given place with a restlessness (proclivity to move around), an acquisitiveness (proclivity to take things), and a hunger (proclivity to bite other people).
```
(define (create-troll name place restlessness hunger)
(make-troll 'name name
'location place
'restlessness restlessness
'acquisitiveness 1/10
'hunger hunger))
```
We create two trolls: grendel and registrar. They are initially placed in random places, with some random proclivities.
```
(define (create-trolls places)
(map (lambda (name)
(create-troll name
(random-choice places)
(random-bias 3)
(random-bias 3)))
'(grendel registrar)))
```
The procedure random-choice randomly selects one item from the list it is given. The procedure random-bias chooses a number (in this case 1, 2, or 3) and returns its reciprocal.
The troll type is defined as a predicate that is true only of trolls. The make-type procedure is given a name for the type and a descriptor of the properties that are specific to trolls. (Only trolls have a hunger property.)
```
(define troll:hunger
(make-property 'hunger 'predicate bias?))
(define troll?
(make-type 'troll (list troll:hunger)))
```
The troll is a specific type of autonomous agent. Thus the set of trolls is a subset of (<=) the set of autonomous agents.
```
(set-predicate<=! troll? autonomous-agent?)
```
The constructor for trolls is directly derived from the predicate that defines the type, as is the accessor for the hunger property.
```
(define make-troll
(type-instantiator troll?))
(define get-hunger
(property-getter troll:hunger troll?))
```
Autonomous agents are occasionally stimulated by the "clock" to take some action. The distinctive action of the troll is to bite other people.
```
(define-clock-handler troll? eat-people!)
```
A biased coin is flipped to determine whether the troll is hungry at the moment. If it is hungry it looks for other people (trolls are people too!), and if there are some it chooses one to bite, causing the victim to suffer some damage. The narrator describes what happens.
```
(define (eat-people! troll)
(if (flip-coin (get-hunger troll))
```
```
(let ((people (people-here troll)))
(if (n:null? people)
(narrate! (list (possessive troll) "belly
rumbles")
troll)
(let ((victim (random-choice people)))
(narrate! (list troll "takes a bite out of"
victim)
troll)
(suffer! (random-number 3) victim))))))
```
The procedure flip-coin generates a random fraction between 0 and 1. If that fraction is greater than the argument, it returns true. The procedure random-number returns a positive number less than or equal to its argument.
The procedure narrate! is used to add narration to the story. The second argument to narrate! (troll in the above code) may be anything that has a location. The narrator announces its first argument in the location thus determined. One can only hear that announcement if one is in that location.
We said that a troll is a kind of autonomous agent. The autonomous agent type is defined by its predicate, which specifies the properties that are needed for such an agent. We also specify that the set of autonomous agents is a subset of the set of all persons.
```
(define autonomous-agent:restlessness
(make-property 'restlessness 'predicate bias?))
(define autonomous-agent:acquisitiveness
(make-property 'acquisitiveness 'predicate bias?))
(define autonomous-agent?
(make-type 'autonomous-agent
(list autonomous-agent:restlessness
autonomous-agent:acquisitiveness)))
(set-predicate<=! autonomous-agent? person?)
```
The constructor for trolls specified values for the properties restlessness and acquisitiveness, which are needed to make an autonomous agent, in addition to the hunger property specific to
trolls. Since trolls are autonomous agents, and autonomous agents are persons, there must also be values for the properties of a person and all its supersets. In this system almost all properties have default values that are automatically filled if not specified. For example, all objects need names; the name was specified in the constructor for trolls. But a person also has a health property, necessary to accumulate damage, and this property value was not explicitly specified in the constructor for trolls.
#### **The generic procedures**
Now that we have seen how objects are built, we will look at how to implement their behavior. Specifically, we will see how generic procedures are an effective tool for describing complex behavior.
We defined get-hunger, which is used in eat-people!, in terms of property-getter. A getter for a property of objects of a given type is implemented as a generic procedure that takes an object as an argument and returns the value of the property.
```
(define (property-getter property type)
(let ((procedure ; the getter
(most-specific-generic-procedure
(symbol 'get- (property-name property))
1 ; arity
#f))) ; default handler
(define-generic-procedure-handler procedure
(match-args type)
(lambda (object)
(get-property-value property object)))
procedure))
```
This shows the construction of a generic procedure with a generated name (for example get-hunger) that takes one argument, and the addition of a handler that does the actual access. The last argument to most-specific-generic-procedure is the default handler for the procedure; specifying #f means that the default is to signal an error.
We also used define-clock-handler to describe an action to take when the clock ticks. That procedure just adds a handler to a
generic procedure clock-tick!, which is already constructed.
```
(define (define-clock-handler type action)
(define-generic-procedure-handler clock-tick!
(match-args type)
(lambda (super object)
(super object)
(action object))))
```
This generic procedure supports "chaining," in which each handler gets an extra argument (in this case super) that when called causes any handlers defined on the supersets of the given object to be called. The arguments passed to super have the same meaning as the arguments received here; in this case there's just one argument and it is passed along. This is essentially the same mechanism used in languages such as Java, though in that case it's done with a magic keyword rather than an argument.
The clock-tick! procedure is called to trigger an action, not to compute a value. Notice that the action we specify will be taken after any actions specified by the supersets. We could have chosen to do the given action first and the others later, just by changing the order of the calls.
The real power of the generic procedure organization is illustrated by the mechanisms for moving things around. For example, when we pick up the engineering book, we move it from the room to our bag. This is implemented with the move! procedure:
```
(define (move! thing destination actor)
(generic-move! thing
(get-location thing)
destination
actor))
```
The move! procedure is implemented in terms of a more general procedure generic-move! that takes four arguments: the thing to be moved, the thing's current location, its destination location, and the actor of the move procedure. This procedure is generic because the movement behavior potentially depends on the types of all of the arguments.
When we create generic-move! we also specify a very general handler to catch cases that are not covered by more specific handlers (for specific argument types).
```
(define generic-move!
(most-specific-generic-procedure 'generic-move! 4 #f))
(define-generic-procedure-handler generic-move!
(match-args thing? container? container? person?)
(lambda (thing from to actor)
(tell! (list thing "is not movable")
actor)))
```
The procedure tell! sends the message (its first argument) to the actor that is trying to move the thing. If the actor is the avatar, the message is displayed.
In the demo we picked up the book. We did that by calling the procedure take-thing with the name engineering-book. This procedure resolves the name to the thing and then calls takething!, which invokes move!:
```
(define (take-thing name)
(let ((thing (find-thing name (here))))
(if thing
(take-thing! thing my-avatar)))
'done)
(define (take-thing! thing person)
(move! thing (get-bag person) person))
```
There are two procedures here. The first is a user-interface procedure to give the player a convenient way of describing the thing to be taken by giving its name. It calls the second, an internal procedure that is also used in other places.
To make this work we supply a handler for generic-move! that is specialized to moving mobile things from places to bags:
```
(define-generic-procedure-handler generic-move!
(match-args mobile-thing? place? bag? person?)
(lambda (mobile-thing from to actor)
(let ((new-holder (get-holder to)))
(cond ((eqv? actor new-holder)
(narrate! (list actor
```
```
"picks up" mobile-thing)
actor))
(else
(narrate! (list actor
"picks up" mobile-thing
"and gives it to" new-holder)
actor)))
(if (not (eqv? actor new-holder))
(say! new-holder (list "Whoa! Thanks, dude!")))
(move-internal! mobile-thing from to))))
```
If the actor is taking the thing, the actor is the new-holder. But it is possible that the actor is picking up the thing in the place and putting it into someone else's bag!
The say! procedure is used to indicate that a person has said something. Its first argument is the person speaking, and the second argument is the text being spoken. The move-internal! procedure actually moves the object from one place to another.
To drop a thing we use the procedure drop-thing to move it from our bag to our current location:
```
(define (drop-thing name)
(let ((thing (find-thing name my-avatar)))
(if thing
(drop-thing! thing my-avatar)))
'done)
(define (drop-thing! thing person)
(move! thing (get-location person) person))
```
The following handler for generic-move! enables dropping a thing. The actor may be dropping a thing from its own bag or it might pick up something from another person's bag and drop it.
```
(define-generic-procedure-handler generic-move!
(match-args mobile-thing? bag? place? person?)
(lambda (mobile-thing from to actor)
(let ((former-holder (get-holder from)))
(cond ((eqv? actor former-holder)
(narrate! (list actor
"drops" mobile-thing)
actor))
(else
(narrate! (list actor
```
```
"takes" mobile-thing
"from" former-holder
"and drops it")
actor)))
(if (not (eqv? actor former-holder))
(say! former-holder
(list "What did you do that for?")))
(move-internal! mobile-thing from to))))
```
Yet another generic-move! handler provides for gifting or stealing something, by moving a thing from one bag to another bag. Here the behavior depends on the relationships among the actor, the original holder of the thing, and the final holder of the thing.
```
(define-generic-procedure-handler generic-move!
(match-args mobile-thing? bag? bag? person?)
(lambda (mobile-thing from to actor)
(let ((former-holder (get-holder from))
(new-holder (get-holder to)))
(cond ((eqv? from to)
(tell! (list new-holder "is already carrying"
mobile-thing)
actor))
((eqv? actor former-holder)
(narrate! (list actor
"gives" mobile-thing
"to" new-holder)
actor))
((eqv? actor new-holder)
(narrate! (list actor
"takes" mobile-thing
"from" former-holder)
actor))
(else
(narrate! (list actor
"takes" mobile-thing
"from" former-holder
"and gives it to" new-holder)
actor)))
(if (not (eqv? actor former-holder))
(say! former-holder (list "Yaaaah! I am upset!")))
(if (not (eqv? actor new-holder))
(say! new-holder
(list "Whoa! Where'd you get this?")))
(if (not (eqv? from to))
(move-internal! mobile-thing from to)))))
```
Another interesting case is the motion of a person from one place to another. This is implemented by the following handler:
```
(define-generic-procedure-handler generic-move!
(match-args person? place? place? person?)
(lambda (person from to actor)
(let ((exit (find-exit from to)))
(cond ((or (eqv? from (get-heaven))
(eqv? to (get-heaven)))
(move-internal! person from to))
((not exit)
(tell! (list "There is no exit from" from
"to" to)
actor))
((eqv? person actor)
(narrate! (list person "leaves via the"
(get-direction exit) "exit")
from)
(move-internal! person from to))
(else
(tell! (list "You can't force"
person
"to move!")
actor))))))
```
There can be many other handlers, but the important thing to see is that the behavior of the move procedure can depend on the types of all of the arguments. This provides a clean decomposition of the behavior into separately understandable chunks. It is rather difficult to achieve such an elegant decomposition in a traditional objectoriented design, because in such a design one must choose one of the arguments to be the principal dispatch center. Should it be the thing being moved? the source location? the target location? the actor? Any one choice will make the situation more complex than necessary.
As Alan Perlis wrote: "It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures."
#### **Implementing properties**
We saw that the objects in our game are created by defining some properties with make-property, defining a type predicate with
make-type, getting the predicate's associated instantiator with type-instantiator, and calling that instantiator with appropriate arguments. This simple description hides a complex implementation that is worth exploring.
The interesting aspect of this code is that it provides a simple and flexible mechanism for managing the properties that are associated with a type instance, which is robust when subtyping is used. Properties are represented by abstract objects rather than names, in order to avoid namespace conflicts when subtyping. For example, a type mammal might have a property named forelimb that refers to a typical front leg. A subtype bat of mammal might have a property with the same name that refers to a different object, a wing! If the properties were specified by their names, then one of these types would need to change its name. In this implementation, the property objects are specified by themselves, and two properties with the same name are distinct.
The procedure make-property creates a data type containing a name, a predicate, and a default-value supplier. Its first argument is the property's name, and the rest of the arguments are a property list with additional metadata about the property. For example, see the definition of troll:hunger on page 143. We will ignore how the property list is parsed since it's not interesting. 32
```
(define (make-property name . plist)
(guarantee n:symbol? name)
(guarantee property-list? plist)
(%make-property name
(get-predicate-property plist)
(get-default-supplier-property plist)))
```
A property is implemented as a Scheme *record* [65], which is a data structure that consists of a set of named fields. It is defined by elaborate syntax that specifies a constructor, a type predicate, and an accessor for each field:
```
(define-record-type <property>
(%make-property name predicate default-supplier)
property?
(name property-name)
```
```
(predicate property-predicate)
(default-supplier property-default-supplier))
```
We chose to give the primitive record constructor %make-property a name with an initial percent sign (%). We often use the initial percent sign to indicate a low-level procedure that will not be used except to support a higher-level abstraction. The %make-property procedure is used only in make-property, which in turn is used by other parts of the system.
Given a set of properties, we can construct a type predicate:
```
(define (make-type name properties)
(guarantee-list-of property? properties)
(let ((type
(simple-abstract-predicate name instance-data?)))
(%set-type-properties! type properties)
type))
```
A type predicate is an ordinary abstract predicate (see page 134) along with the specified properties, which are stored in an association using %set-type-properties!. Those specified properties aren't used by themselves; instead they are aggregated with the properties of the supersets of this type. The object being tagged satisfies instance-data?. It is an association from the properties of this type to their values.
```
(define (type-properties type)
(append-map %type-properties
(cons type (all-supertypes type))))
```
And type-instantiator builds the instantiator, which accepts a property list using property names as keys, parses that list, and uses the resulting values to create the instance data, which associates each property of this instance with its value. It also calls the setup! procedure, which gives us the ability to do type-specific initialization.
```
(define (type-instantiator type)
(let ((constructor (predicate-constructor type))
(properties (type-properties type)))
(lambda plist
(let ((object
```
```
(constructor (parse-plist plist properties))))
(set-up! object)
object))))
```
### **Exercise 3.16: Adventure warmup**
Load the adventure game and start the simulation by executing the command (start-adventure *your-name*). Walk your avatar around. Find some takable object and take it. Drop the thing you took in some other place.
## **Exercise 3.17: Health**
Change the representation of the health of a person to have more possible values than are given in the initial game. Scale your representation so that the probability of death from a troll bite is the same as it was before you changed the representation. Also make it possible to recover from a nonfatal troll bite, or other loss of health, by some cycles of rest.
## **Exercise 3.18: Medical help**
Make a new place, the medical center. Make it easily accessible from the Green building and the Gates tower. If a person who suffers a nonfatal injury (perhaps from a troll bite) makes it to the medical center, their health may be restored.
## **Exercise 3.19: A** *palantir*
Make a new kind of thing called a *palantir* (a "seeing stone," as in Tolkien's *Lord of the Rings*). Each instance of a *palantir* can communicate with any other instance; so if there is a *palantir* in lobby-10 and another in dorm-row, you can observe the goings-on in
dorm-row by looking into a *palantir* in lobby-10. (Basically, a *palantir* is a magical surveillance camera and display.)
Plant a few immovable *palantiri* in various parts of the campus, and enable your avatar to use one. Can you keep watch on the positions of your friends? Of the trolls?
Can you make an autonomous person other than your avatar use a *palantir* for some interesting purpose? The university's president might be a suitable choice.
## **Exercise 3.20: Invisibility**
Make an "Invisibility Cloak" that any person (including an avatar) can acquire to become invisible, thus invulnerable to attacks by trolls. However, the cloak must be discarded (dropped) after a short time, because possession of the cloak slowly degrades the person's health.
# **Exercise 3.21: Your turn**
Now that you have had an opportunity to play with our "world" of characters, places, and things, extend this world in some substantial way, limited only by your creativity. One idea is to have mobile places, such as elevators, which have entrances and exits that change with time, and are perhaps controllable by persons. But that is just one suggestion—invent something you like!
### **Exercise 3.22: Multiple players**
This is a pretty big project rather than a simple exercise.
- **a.** Extend the adventure game so that there can be multiple players, each controlling a personal avatar.
- **b.** Make it possible for players to be on different terminals.

View File

@@ -0,0 +1,66 @@
## **3.6 Summary**
The use of generic procedures introduced in this chapter is both powerful and dangerous—it is not for the faint of heart. Allowing the programmer to dynamically change the meanings of the primitive operators of the language can result in unmanageable code. But if we are careful to extend operators to only new types of arguments, without changing their behavior on the original types, we can get powerful extensions without breaking any old software. Most programming languages do not allow the freedom to modify the existing behavior of primitive operators, for good reason. However, many of the ideas here are portable and can be safely used. For example, in many languages, as diverse as C++ and Haskell, one can overload operators to have new meanings on userdefined types.
Extensions of arithmetic are pretty tame, but we must be aware of the problems that can come up, and the subtle bugs that can be evoked: addition of integers is associative, but addition of floatingpoint numbers is not associative; multiplication of numbers is commutative, but multiplication of matrices is not. And if we extend addition to be concatenation of strings, that extension is not commutative. On the good side, it is straightforward to extend arithmetic to symbolic expressions containing literal numbers as well as purely numerical quantities. It is not difficult, but lots of work, to continue to expand to functions, vectors, matrices, and tensors. However, we eventually run into real problems with the ordering of extensions—symbolic vectors are not the same as vectors with symbolic coordinates! We also can get into complications with the typing of symbolic functions.
One beautiful example of the power of extensible generics is the almost trivial implementation of forward-mode automatic differentiation by extending each primitive arithmetic procedure to handle differential objects. However, making this work correctly with higher-order functions that return functions as values was
difficult. (Of course, most programmers writing applications that need automatic differentiation do not need to worry about this complication.)
In our system the "type" is represented by a predicate that is true of elements of that type. In order to make this efficient we introduced a predicate registration and tagging system that allowed us to add declarations of relationships among the types. For example, we could have prime numbers be a subset of the integers, so numbers that satisfy the user-defined prime? predicate automatically satisfy the integer? predicate.
Once we have user-defined types with declared subset relationships, we enter a new realm of possibilities. We demonstrated this with a simple but elegantly extensible adventure game. Because our generic procedures dispatch on the types of all of their arguments, the descriptions of the behaviors of the entities in our adventure game are much simpler and more modular than they would be if we dispatched on the first argument to produce a procedure that dispatched on the second argument, and so on. So modeling these behaviors in a typical single-dispatch objectoriented system would be more complicated.
We used tagged data to efficiently implement extensible generic procedures. The data was tagged with the information required to decide which procedures to use to implement the indicated operations. But once we have the ability to tag data, there are other uses tags can be put to. For example, we may tag data with its provenance, or how it was derived, or the assumptions it was based on. Such audit trails may be useful for access control, for tracing the use of sensitive data, or for debugging complex systems [128]. So there is power in the ability to attach arbitrary tags to any data item, in addition to the use of tags to determine the handlers for generic procedures.
<sup>1</sup> ODE means "ordinary differential equation," meaning a differential equation with a single independent variable.
- 2 Because we anticipated varying the meanings of many operators in the MIT/GNU Scheme system, we made a special set of operators that name primitive procedures we might need later. We named the copies with the prefix n:. In MIT/GNU Scheme the original primitive procedures are always available, with their original names, in the system-global-environment, so we could have chosen to get them from there.
- 3 A recent Scheme standard [109] introduced "libraries," which provide a way to specify bindings of the free references in a program. We could use libraries to connect an arithmetic with the code that uses it. But here we demonstrate the ideas by modifying the read-eval-print environment.
- 4 The procedure pp prints a list "prettily" by using line breaks and indentation to reveal the list's structure.
- 5 You may have noticed that in these symbolic expressions the additions and multiplications are expressed as binary operations, even though in Scheme they are allowed to take many arguments; the installer implements the *n*-ary versions as nested binary operations. Similarly, the unary - is converted to negate. Subtractions and divisions with multiple arguments are also realized as nested binary operations.
- 6 The procedure default-object produces an object that is different from any possible constant. The procedure defaultobject? identifies that value.
- 7 Another difference you may have noticed is that the constantgenerator and operation-generator procedures for the numeric arithmetic have only one formal parameter, while the generator procedures for the symbolic extender have two. The symbolic arithmetic is built on a base arithmetic, so the constant or operation for the base arithmetic is given to the generator.
- 8 The call (any-arg 3 p1? p2?) will produce an applicability specification with seven cases, because there are seven ways that
```
this applicability can be satisfied: ((p2? p2? p1?) (p2? p1?
p2?) (p2? p1? p1?) (p1? p2? p2?) (p1? p2? p1?) (p1?
p1? p2?) (p1? p1? p1?))
```
- 9 disjoin\* is a predicate combinator. It accepts a list of predicates and produces the predicate that is their disjunction.
- 10 Making this arbitrary choice is not really reasonable. For example, a vector's zero is not only distinct from the numerical zero, but also is not the same for vectors of different dimension. We have chosen to ignore this problem here.
- 11 At the APL-79 conference Joel Moses is reported to have said: "APL is like a beautiful diamond—flawless, beautifully symmetrical. But you can't add anything to it. If you try to glue on another diamond, you don't get a bigger diamond. Lisp is like a ball of mud. Add more and it's still a ball of mud—it still looks like Lisp." But Joel denies that he said this.
- 12 A mechanism of this sort is implicit in most "object-oriented languages," but it is usually tightly bound to ontological mechanisms such as inheritance. The essential idea of extensible generics appears in SICP [1] and is usefully provided in tinyCLOS [66] and SOS [52].A system of extensible generics, based on predicate dispatching, is used to implement the mathematical representation system in SICM [121]. A nice exposition of predicate dispatching is given by Ernst [33].The idea that generic procedures are a powerful tool has been percolating in the Lisp community for decades. The fullest development of these ideas is in the Common Lisp Object System (CLOS) [42]. The underlying structure is beautifully expressed in the Metaobject Protocol [68]. It is further elaborated in the "Aspect-oriented programming" movement [67].
- 13 generic-metadata-getter and generic-metadatadefault-getter retrieve the get-handler procedure and the
- get-default-handler procedure from the dispatch-store instance stored in the metadata of the generic procedure.
- 14 The term *automatic differentiation* was introduced by Wengert [129] in 1964.
- 15 The derivative here is the derivative of a function, not the derivative of an expression. If *f* is a function, the derivative *Df* of *f* is a new function, which when applied to *x* gives a value *Df* (*x*). Its relation to an expression derivative is:
$$Df(t) = \frac{d}{dx}f(x)\Big|_{x=t}$$
- 16 The automatic differentiation code we present here is derived from the code that we wrote to support the advanced classical mechanics class that Sussman teaches at MIT with Jack Wisdom [121, 122].
- 17 Differential objects like these are sometimes referred to as *dual numbers*. Dual numbers, introduced by Clifford in 1873 [20], extend the real numbers by adjoining one new element *E* with the property *є* <sup>2</sup> = 0. However, in order to conveniently compute multiple derivatives (and derivatives of functions with multiple arguments) it helps to introduce a new infinitesimal part for each independent variable. So our differential algebra space is much more complicated than the single-*E* dual number space. Our differential objects are also something like the hyperreal numbers, invented by Edwin Hewitt in 1948 [59].
- 18 This idea was "discovered" by Dan Zuras (then of Hewlett Packard Corporation) and Gerald Jay Sussman in an all-night programming binge in 1992. We assumed at the time that this had also been discovered by many others, and indeed it had [129, 12], but we were overjoyed when we first understood the idea
- ourselves! See [94] for a formal exposition of automatic differentiation.
- 19 We will get to binary functions soon. This is just to make the idea clear before things get complicated. We will extend to *n*-ary functions in section 3.3.2
- 20 We are showing the definitions of handlers but we are not showing the assignment of the handlers here.
- 21 For an alternative strategy, see exercise 3.8 on page 113.
- 22 The procedure iota returns a list of consecutive integers from 0 through (length args).
- 23 The formal algebraic details were clarified by Hal Abelson around 1994, as part of an effort to fix a bug. The code was painfully reworked in 1997 by Sussman with the help of Hardy Mayer and Jack Wisdom.
- 24 A nicer version would use record structures, but that would be harder to debug without having a way to print them nicely.
- 25 The fact that any factor of any highest-order term in the series can be used was a central insight of Hal Abelson in the 1994 revision of this idea.
- 26 A bug of this class was pointed out to us by Alexey Radul in 2011. The general problem was first identified by Siskind and Perlmutter in 2005 [111]: the differential tags created to distinguish the infinitesimals incrementing an argument for a derivative calculation can be confused in the evaluation of a derivative of a function whose value is a function. The deferred derivative procedure may be called more than once, using the tag that was created for the outer derivative calculation. More recently, Jeff Siskind showed us another bug that plagued our patch for the first one: there was a potential collision between a tag occurring in an argument and a tag inherited from the lexical scope of a derivative function. These very subtle bugs are
- explained, along with a careful analysis of ways to fix them, in a beautiful paper by Manzyuk et al. [87].
- 27 This is carefully explained in Manzyuk et al. [87].
- 28 The trie data structure was invented by Edward Fredkin in the early 1960s.
- 29 The names printed for predicates by with-predicate-counts do not end in a question mark; for example the name printed for the predicate number? is simply number. The reason for this is obscure, and the curious are welcome to track it down in the code.
- 30 The procedure is-generic-handler-applicable? abstracts the handler checking that we previously did using predicatesmatch? in get-handler on page 98. This gives us a hook for later elaboration.
- 31 In languages such as Haskell and Smalltalk, multiple arguments are handled by dispatching on the first argument, producing an object that then dispatches on the second argument, etc.
- 32 The make-property procedure uses a helper called guarantee to do argument checking. The first argument to guarantee is a predicate (preferably a registered predicate) and the second argument is an object to be tested. There may be a third argument, to identify the caller. If the object doesn't satisfy the predicate, guarantee signals an error. The procedure guarantee-list-of works similarly except that it requires the object to be a list of elements satisfying the predicate.
- We have used assert earlier in this text. assert is more convenient for posing assertions that must be true where they are made. guarantee is preferable for the more restricted case of argument type checking.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,816 @@
# **Layering**
In section 1.1 we alluded to the idea that programming could learn from the practice of architecture. A programmer might start with an executable skeleton plan (a *parti* ) to help try out an idea. When the *parti* looks good the programmer could elaborate it with more information.
For example, declared implementation types may enable the compilation of efficient code and inhibit the occurrence of type errors. Declared dimensions and units may be added to prevent some bugs and support documentation. Assertions of predicates can help with the localization of errors that occur at run time and they could support the automatic or manual construction of proofs of "correctness." Declarations of how much precision is needed for some numerical quantities and operations can give clarity to numerical analysis problems. Suggestions of alternative implementations can enable useful degeneracy in an implementation. We can track the provenance of a result by carrying dependencies.
But the usual way of adding these important and powerful features to the text of a program turns the program text into a tangled mess. To continue with the architecture analogy, it does not separate the served spaces from the servant spaces. The separation of the "essential" features of a program (the code that defines its behavior) from the "accidental" ones (e.g., type information for a compiler or code for logging) has been an important issue. Aspectoriented programming [67] was an attempt to address part of this problem, by explicitly identifying "cross-cutting concerns" such as
logging. Layering is another way to effect the separation. The ability to annotate any piece of data or code with other data or code is a crucial mechanism in building flexible systems. The decoration of a value is a generalization of the tagging used to support extensible generic operations. Here we introduce the idea of *layered programming*. Both the data and the procedures that process it will be made up of multiple layers that enable additive annotation without introducing clutter.
### 6.1 Using layers
Layers give us the ability to sketch out a computation and then elaborate that computation with metadata that is processed along with the computation. Let's consider some annotations that we think may be valuable in many situations. For example, suppose we are interested in using Newton's force law for gravity:
```
(define (F m1 m2 r)
(/ (* G m1 m2) (square r)))
```
This is a simple numerical calculation, but we can elaborate it to carry support information and units.
We find Newton's constant G by looking up a recent measurement published by NIST:
```
(define G
(layered-datum 6.67408e-11
unit-layer (unit 'meter 3 'kilogram -1 'second -2)
support-layer (support-set 'CODATA-2018)))
```
Here we show the numerical value of the measurement, the units of that measurement ( $m^3/(kg s^2)$ ), and the source of the data (its support). We could extend this to also carry the uncertainty in the measurement as a range in another layer, but we won't do that here.
We can also find the mass of the Earth, the mass of the Moon, and the distance to the Moon (semimajor axis) from other sources:
```
(define M-Earth
(layered-datum 5.9722e24
unit-layer (unit 'kilogram 1)
support-layer
(support-set 'Astronomical-Almanac-2016)))
(define M-Moon
(layered-datum 7.342e22
unit-layer (unit 'kilogram 1)
support-layer
(support-set 'NASA-2006)))
(define a-Moon
(layered-datum 384399e3
unit-layer (unit 'meter 1)
support-layer
(support-set 'Wieczorek-2006)))
```
Now we can ask the question, "What is the gravitational force of attraction between the Earth and the Moon at that distance?" and we will get the answer:
```
(pp (F M-earth M-Moon a-moon))
#[layered-datum 1.9805035857209e20]
(base-layer 1.9805035857209e20)
(unit-layer (unit kilogram 1 meter 1 second -2))
(support-layer
(support-set Wieczorek-2006
NASA-2006
Astronomical-Almanac-2016
CODATA-2018))
```
The result gives the numerical value, the units of that result, and the sources that the result depended upon.
### **6.2 Implemention of layering**
There are two parts to layering. The first is that it must be possible to create a datum that contains multiple layers of information. In our example, we used layered-datum to do this. The second part is that we need to be able to enhance a procedure so that it can process
each layer (somewhat) independently. A procedure enhanced in this way is called a *layered procedure*.
We also need a way to assign names to layers. Every layer must have a name, so that the layer in a datum can be specified. The name is also used by a layered procedure to connect the processing for that layer to the corresponding layers in the incoming data. We have written our example to use variables to refer to layer names, as in unit-layer, which is bound to the name for the unit layer. This makes the user interface independent of the details of how a layer name is specified; this will turn out to be useful.
Another aspect of layer naming is that there must be a distinguished *base layer*, which represents the underlying computation being performed. In our example using layereddatum, the base layer's value is distinguished by being the first argument and by not having an associated name.
Layered data can be built from simple data structures. We can use any convenient data structure that can associate a layer name with a value and that permits many such associations. A special name can be used to identify the base layer, making the data structure simple and uniform.
Building layered procedures is more complicated, because the processing for most layers will need some information from the computation in the base layer. For example, suppose we are multiplying two numbers that carry support information. Normally, the support of the result is the union of the supports of the arguments. But suppose one argument has a base-layer value of zero; then the support of the result is the support of the zero, and the support of the other argument is irrelevant.
The base layer must not depend on any non-base layer because that violates the idea of the base layer: that it is an independent computation that the other layers enhance. And a non-base layer should not depend on another non-base layer. A non-base layer generally shouldn't share information with another non-base layer since its behavior would be different depending on the presence or absence of the other layer. This would be inconsistent with our general approach of building additive programs.
So building a layered procedure involves a balance between sharing information from the base layer to the non-base layers and isolating layers in most other cases. We will address this in the next sections as we explore the details of implementing layering.
#### **6.2.1 Layered data**
A layered data item is a base value annotated with extra information about that value. The annotation is an association of layer names with their values. For example, the number 2 may be the base value in many data items: if we are dealing in potatoes there may be a 2 dollar price tag on a 2-pound bag of potatoes. Each of these instances of the number 2 must be a distinct data item, with different values (dollars or pounds) for the units layer. There may be other layers as well: the 2-dollar price may have information saying how it was derived from the price paid to the farmer and the cost of transportation and processing.
To address this issue we introduce the *layered datum*. A layered datum is represented as a bundle that contains an association of layers and their values. So a 2-pound quantity of potatoes and a 2 dollar price for potatoes will be separate layered data items:
```
(define (make-layered-datum base-value alist)
(if (null? alist)
base-value
(let ((alist
(cons (cons base-layer base-value)
alist)))
(define (has-layer? layer)
(and (assv layer alist) #t))
(define (get-layer-value layer)
(cdr (assv layer alist)))
(define (annotation-layers)
(map car (cdr alist)))
(bundle layered-datum?
has-layer? get-layer-value
annotation-layers))))
```
The associations between layers and their values are represented as an *association list*, or *alist*—a list of keyvalue pairs.
For convenience, we provide layered-datum, which takes its layer arguments in *property-list* form (alternating layer and value, as in the examples on page 300) and calls make-layered-datum with the corresponding alist.
```
(define (layered-datum base-value . plist)
(make-layered-datum base-value (plist->alist plist)))
```
This design provides great flexibility. There may be many different kinds of layered data, and for each there is no a priori commitment to any particular layer or number of layers. The only common feature is that each layered datum has a distinguished layer, the base-layer, which contains the object that all the other layer values are annotations on.
Each layer is represented by a bundle that embodies the specifics of that layer. The simplest is the base layer:
```
(define base-layer
(let ()
(define (get-name) 'base)
(define (has-value? object) #t)
(define (get-value object)
(if (layered-datum? object)
(object 'get-layer-value base-layer)
object))
(bundle layer? get-name has-value? get-value)))
```
This shows the primary operation of a layer: the get-value operation that fetches the layer value, if present, or returns a default. In the case of the base layer, the default is the object itself.
The *annotation layers* have a little more complexity. In addition to the above, they also manage a set of named procedures that will be explored when we look at layered procedures. The makeannotation-layer procedure provides the common infrastructure used by all annotation layers; it calls its constructor argument to supply the layer-specific parts.
```
(define (make-annotation-layer name constructor)
(define (get-name) name)
(define (has-value? object)
(and (layered-datum? object)
```
```
(object 'has-layer? layer)))
(define (get-value object)
(if (has-value? object)
(object 'get-layer-value layer)
(layer 'get-default-value)))
(define layer
(constructor get-name has-value? get-value))
layer)
```
We use make-annotation-layer to construct the units layer:
```
(define unit-layer
(make-annotation-layer 'unit
(lambda (get-name has-value? get-value)
(define (get-default-value)
unit:none)
(define (get-procedure name arity)
See definition on page 308.)
(bundle layer?
get-name has-value? get-value
get-default-value get-procedure))))
```
This implementation shows the rest of the layer structure: a provider for the default value, and the procedure get-procedure that implements this layer's support for layered procedures, which we will examine in the next section (page 308).
As a convenience for a common use case, layer-accessor creates an accessor procedure that is equivalent to calling a layer's get-value delegate:
```
(define (layer-accessor layer)
(lambda (object)
(layer 'get-value object)))
(define base-layer-value
(layer-accessor base-layer))
```
#### **6.2.2 Layered procedures**
Procedures are also data that can be layered. A layered procedure is similar to a generic procedure, in which there are handlers for different argument types. A layered procedure instead provides implementations for separate layers in the incoming data, and
<span id="page-7-0"></span>processes all of them to produce a layered result. [1](#page-30-0) For example, when combining a numeric layer with a units layer, the procedure can process the numeric parts of the arguments using its numeric layer, and also process the units parts of the arguments using its units layer.
In the numerical example shown in section 6.1, the code F for Newton's force represents the *parti*, the essential plan for the computation to be performed. It operates on numbers; the units annotate the numbers. The layered generic procedures that implement the arithmetic operators, such as multiplication, have a base component that operates on the numbers in the base layer and they have other components, one for each layer that might annotate the numerical base layer. The units layer is an annotation layer that gives more information about the data and the computation, but is not essential to the computation.
In a layered system the base layer must be able to compute without reference to the other layers. But the annotation layers may need access to the values that are in the base layer. If an annotation layer of an argument is missing, the procedure's annotation layer may use a default value or simply not run. In any case, the base layer always runs.
To construct a layered procedure, we need a unique name and arity for the procedure, and a base-procedure to implement the base computation:
```
(define (make-layered-procedure name arity base-procedure)
(let* ((metadata
(make-layered-metadata name arity base-procedure))
(procedure
(layered-procedure-dispatcher metadata)))
(set-layered-procedure-metadata! procedure metadata)
procedure))
```
Information about the layered procedure is kept in metadata for that procedure. The metadata also manages the handlers for the base layer and the annotation layers.
The metadata for a layered procedure is implemented as a bundle. It is created with the name of the layered procedure, its
arity, and the base-procedure (the handler for the base layer). The metadata provides access to each of these. It also provides sethandler! for assigning a handler for an annotation layer and gethandler for retrieving the handler for an annotation layer.
Each annotation layer, for example the unit-layer, provides get-procedure that when given a procedure name and arity returns the appropriate handler for that procedure name and arity for that layer. The get-handler provided by the layered metadata first checks if it has a handler for that layer. If so it returns that handler; otherwise it returns the result of the layer's getprocedure.
```
(define (make-layered-metadata name arity base-procedure)
(let ((handlers (make-weak-alist-store eqv?)))
(define (get-name) name) (define (get-arity) arity)
(define (get-base-procedure) base-procedure)
(define has? (handlers 'has?))
(define get (handlers 'get))
(define set-handler! (handlers 'put!))
(define (get-handler layer)
(if (has? layer)
(get layer)
(layer 'get-procedure name arity)))
(bundle layered-metadata?
get-name get-arity get-base-procedure
get-handler set-handler!)))
```
The actual work of applying a layered procedure is done by layered-procedure-dispatcher. The dispatcher must be able to access and apply the base procedure and the annotation layer procedures that are associated with the layered procedure. All of this information is provided by the metadata.
```
(define (layered-procedure-dispatcher metadata)
(let ((base-procedure (metadata 'get-base-procedure)))
(define (the-layered-procedure . args)
(let ((base-value
(apply base-procedure
(map base-layer-value args)))
(annotation-layers
(apply lset-union eqv?
(map (lambda (arg)
(if (layered-datum? arg)
```
```
(arg 'annotation-layers)
'()))
args))))
(make-layered-datum base-value
(filter-map ; drops #f values
(lambda (layer)
(let ((handler (metadata 'get-handler layer)))
(and handler
(cons layer
(apply handler base-value args)))))
annotation-layers))))
the-layered-procedure))
```
When called, a layered procedure first calls base-procedure on the base-layer values of the arguments to get the base value. It also determines which annotation layers are applicable by examining each of the arguments; if there are no annotation layers that have handlers, then the result is just the base-layer value, because makelayered-datum (on page 303), will return the unannotated base value. Otherwise, each applicable layer's handler is called to produce a value for that layer. The layer-specific handler is given access to the computed base-value and the arguments to the layered procedure; it does not need any layer values other than its own and those of the base layer. Generally, the result is a layered datum containing the base value and the values of the applicable annotation layer handlers.
To see how this works in practice, let's look at the implementation for the units layer (on page 304). The getprocedure handler of the units layer (below) looks up the layerspecific procedure by name if the layered procedure's name is an arithmetic operator, and then calls the layer-specific procedure with the units from each argument. (There is a special exception for expt, whose second argument is not decorated with units—it is a number.) For other procedures, the units handling is undefined, so get-procedure returns #f to indicate that.
```
(define (get-procedure name arity)
(if (operator? name)
(let ((procedure (unit-procedure name)))
(case name
```
```
((expt)
(lambda (base-value base power)
(procedure (get-value base)
(base-layer-value power))))
(else
(lambda (base-value . args)
(apply procedure (map get-value args))))))
#f))
```
Notice that because get-procedure is an internal procedure of unit-layer, it has access to the units layer get-value inherited from make-annotation-layer (on page 304). We will see unitprocedure when we talk about the units implementation in section 6.3.1.
Let's look at an example. Consider the simple procedure square that squares its argument.
```
(define (square x) (* x x))
```
We make a layered version of our square procedure, giving the numerical version to the base layer.
```
(define layered-square
(make-layered-procedure 'square 1 square))
```
This layered squaring procedure behaves the same as the base version:
```
(layered-square 4)
16
(layered-square 'm)
(* m m)
```
However, if we provide an argument with a units layer, both the base layer and units layer will be processed separately and combined in the output:
```
(pp (layered-square
(layered-datum 'm
unit-layer (unit 'kilogram 1))))
#[layered-datum (* m m)]
```
```
(base-layer (* m m))
(unit-layer (unit kilogram 2))
```
### **6.3 Layered arithmetic**
Now that we know how to make layered procedures, we can add layers to an arithmetic. All that is required is to build an arithmetic with a layered procedure for each operation supplied in the base arithmetic. We start with a pleasant arithmetic
```
(define (generic-symbolic)
(let ((g (make-generic-arithmetic
make-simple-dispatch-store)))
(add-to-generic-arithmetic! g numeric-arithmetic)
(extend-generic-arithmetic! g function-extender)
(extend-generic-arithmetic! g symbolic-extender)
g))
```
and build an extender to handle the layers on that substrate:
```
(define generic-with-layers
(let ((g (generic-symbolic)))
(extend-generic-arithmetic! g layered-extender)
g))
```
The layered extender has to do a bit of work. It makes a layered extension arithmetic that operates on layered data. The domain predicate of the layered extension arithmetic is layered-datum?. The base predicate for the layered operations is just the domain predicate of the underlying arithmetic, with the extra provision that it must reject layered data items. [2](#page-30-1) The constants are the base constants, and for each arithmetic operator the operation is a layered procedure applicable if any argument is layered, with the base procedure inherited from the underlying arithmetic.
```
(define (layered-extender base-arith)
(let ((base-pred
(conjoin (arithmetic-domain-predicate base-arith)
(complement layered-datum?))))
(make-arithmetic (list 'layered
```
```
(arithmetic-name base-arith))
layered-datum?
(list base-arith)
(lambda (name base-value)
base-value)
(lambda (operator base-operation)
(make-operation operator
(any-arg (operator-arity operator)
layered-datum?
base-pred)
(make-layered-procedure operator
(operator-arity operator)
(operation-procedure base-operation)))))))
```
Nearly all of this is boilerplate, including leaving the constant objects alone and requiring that at least one argument to an operation be layered. The only interesting part is the final three lines, in which the base arithmetic's operation procedure is wrapped in a layered procedure. The operator is used as the name of the layered procedure, so that each layer can provide special handling should that operation need it.
#### **6.3.1 Unit arithmetic**
<span id="page-12-0"></span>We need an arithmetic of units for the units annotation layer on an arithmetic. A unit specification has named base units, and an exponent for each base unit. [3](#page-30-2) In the units arithmetic, the product of unit specifications is a new unit specification where the exponent of each base unit is the sum of the exponents of the corresponding base units in the arguments.
```
(unit:* (unit 'kilogram 1 'meter 1 'second -1)
(unit 'second -1))
(unit kilogram 1 meter 1 second -2)
```
Here we assume that the base units are just named by symbols, such as kilogram.
#### **Representation of unit specifications**
To make it easy to create a unit specification, we represent it externally as a property list (with alternating keys and values) of base unit names and exponents.
But internally, it is convenient to represent a unit specification as a tagged alist; so we must convert a raw property list to the alist representation, using plist->alist. We keep the alists sorted by the base unit name. In this conversion we do some error checking. The argument list to unit must be in the form of a property list. The exponent associated with each base unit name must be an exact rational number (usually an integer). It is an error if a named base unit is duplicated. The sort by base unit names will signal an error if the base unit name is not a symbol.
```
(define (unit . plist)
(guarantee plist? plist 'unit)
(let ((alist
(sort (plist->alist plist)
(lambda (p1 p2)
(symbol<? (car p1) (car p2))))))
(if (sorted-alist-repeated-key? alist)
(error "Base unit repeated" plist))
(for-each (lambda (p)
(guarantee exact-rational? (cdr p)))
alist)
(alist->unit alist)))
(define (sorted-alist-repeated-key? alist)
(and (pair? alist)
(pair? (cdr alist))
(or (eq? (caar alist) (caadr alist))
(sorted-alist-repeated-key? (cdr alist)))))
```
The procedure alist->unit just attaches a unique tag to an alist; and unit->alist extracts the alist from a unit specification:
```
(define (alist->unit alist)
(cons %unit-tag alist))
(define (unit->alist unit)
(guarantee unit? unit 'unit->alist)
(cdr unit))
```
Here, the value of %unit-tag is just a unique symbol that we use to head a unit specification alist. To make the printed output of unit specifications look like the property lists that we give to unit to make a unit specification, we arrange that the Scheme printer prints unit specifications in property list form. This magic arrangement (not shown here) is triggered by the unit-tag symbol at the head of the list.
The predicate unit? is true if its argument is a legitimate unit specification:
```
(define (unit? object)
(and (pair? object)
(eq? (car object) %unit-tag)
(list? (cdr object))
(every (lambda (elt)
(and (pair? elt)
(symbol? (car elt))
(exact-rational? (cdr elt))))
(cdr object))))
```
### **Unit arithmetic operations**
We construct the unit arithmetic as a mapping between the operator name and the operation that implements the required behavior. Pure numbers, like *π*, are unitless. When a quantity with units is multiplied by a unitless number, the result is the units of the quantity with units. So the unit arithmetic needs a multiplicative identity for unitless numbers—this is unit:none. The procedure simple-operation combines the operator, the test for applicability, and the procedure that implements the operation:
```
(define (unit-arithmetic)
(make-arithmetic 'unit unit? '()
(lambda (name)
(if (eq? name 'multiplicative-identity)
unit:none
(default-object)))
(lambda (operator)
(simple-operation operator
unit?
(unit-procedure operator)))))
```
We call unit-procedure to get the appropriate procedure for each operator:
```
(define (unit-procedure operator)
(case operator
((*) unit:*)
((/) unit:/)
((remainder) unit:remainder)
((expt) unit:expt)
((invert) unit:invert)
((square) unit:square)
((sqrt) unit:sqrt)
((atan) unit:atan)
((abs ceiling floor negate round truncate)
unit:simple-unary-operation)
((+ - max min)
unit:simple-binary-operation)
((acos asin cos exp log sin tan)
unit:unitless-operation)
((angle imag-part magnitude make-polar make-rectangular
real-part)
;; first approximation:
unit:unitless-operation)
(else
(if (eq? 'boolean (operator-codomain operator))
(if (n:= 1 (operator-arity operator))
unit:unary-comparison
unit:binary-comparison)
unit:unitless-operation))))
```
For each case above we must provide the appropriate operation. For example, to multiply two unit quantities we must add corresponding exponents and elide any base unit that has zero exponent:
```
(define (unit:* u1 u2)
(alist->unit
(let loop ((u1 (unit->alist u1)) (u2 (unit->alist u2)))
(if (and (pair? u1) (pair? u2))
(let ((factor1 (car u1)) (factor2 (car u2)))
(if (eq? (car factor1) (car factor2)) ; same unit
(let ((n (n:+ (cdr factor1) (cdr factor2))))
(if (n:= 0 n)
(loop (cdr u1) (cdr u2))
(cons (cons (car factor1) n)
(loop (cdr u1) (cdr u2)))))
```
```
(if (symbol<? (car factor1) (car factor2))
(cons factor1 (loop (cdr u1) u2))
(cons factor2 (loop u1 (cdr u2))))))
(if (pair? u1) u1 u2)))))
```
Some operators, such as remainder, expt, invert, square, sqrt, and atan, require special treatment. The rest of the operators fit into a few simple classes. Simple unary operations, like negate, just propagate the units of their argument to their result:
```
(define (unit:simple-unary-operation u)
u)
```
But some, like the implementation of addition, check that they are not "combining apples and oranges:"
```
(define (unit:simple-binary-operation u1 u2)
(if (not (unit=? u1 u2))
(error "incompatible units:" u1 u2))
u1)
```
# **Exercise 6.1: Derived units**
Although the unit computation given above is correct and reasonably complete, it is not very nice to use. For example, the unit specification for kinetic energy (as shown on page 316) is:
```
(unit kilogram 1 meter 2 second -2)
```
This is correct in terms of the International System of Units (SI) base units {kilogram, meter, second}, but it would be much nicer if expressed in terms of joules, the SI derived unit of energy:
```
(unit joule 1)
```
The full system of SI base units is {kilogram, meter, second, ampere, kelvin, mole, candela}, and there is an approved set of derived units. For example:
newton = kilogram*·*meter*·*second **2
- joule = newton*·*meter
- coulomb = ampere*·*second
- watt = joule*·*second **1
- volt = watt*·*ampere **1
- ohm = volt*·*ampere **1
- siemens = ohm**<sup>1</sup>
- farad = coulomb*·*volt **1
- weber = volt*·*second
- henry = weber*·*ampere **1
- hertz = second **1
- tesla = weber*·*meter **2
- pascal = newton*·*meter **2
- **a.** Make a procedure that takes a unit description in terms of SI base units and, if possible, makes a simpler description using derived units.
- **b.** The expression of a unit description in terms of the derived units is not unique—there may be many such equivalent descriptions. This is similar to a problem of algebraic simplification, but the criterion of "simpler" is not obvious. Make a nice version that you like and explain why you like it.
- **c.** It is nice to be able to use the standard abbreviations and multipliers for the units. For example, 1 mA is the nice way to write 0*.*001 A or 1*/*1000 ampere. Design and implement a simple extensible system that allows the use of these notational conveniences for both input and output. But remember that "syntactic sugar causes cancer of the semicolon."
### **6.4 Annotating values with dependencies**
One kind of annotation that a programmer may want to deploy in some parts of a program is the tracking of dependencies. Every piece of data (or procedure) came from somewhere. Either it entered the computation as a premise that can be labeled with its external provenance, or it was created by combining other data. We can provide primitive operations of the system with annotation layers that, when processing data with justifications, can annotate the results with appropriate justifications.
Justifications can be at differing levels of detail. The simplest kind of justification is just a set of those premises that contributed to the new data. A procedure such as addition can form a sum with a justification that is just the union of the premises of the justifications of the addends that were supplied. Multiplication is similar, but a zero multiplicand is sufficient to force the product to be zero, so the justifications of the other factors do not need to be included in the justification of the zero product.
Such simple justifications can be computed and carried without much more than a constant overhead, but they can be invaluable in debugging complex processes and in the attribution of credit or blame for outcomes of computations. Just this much is sufficient to support dependency-directed backtracking. (See section 7.5.)
Externally supplied data can be annotated with a *premise* that identifies its origin. More generally, any data value can be annotated with a set of premises, which is called its *support set*. The support set annotating a datum is usually referred to as its *support*. When a support-aware procedure is applied to multiple arguments, it must combine the support sets of the arguments to represent the support of the result.
Managing support sets is a straightforward application of our layered data mechanism. We add a support layer to our generic arithmetic to handle support sets. It coexists with other layers, such as the units layer. So this is an additive feature.
On page 309 we built an arithmetic that supports layered data and procedures:
```
(define generic-with-layers
(let ((g (generic-symbolic)))
(extend-generic-arithmetic! g layered-extender)
g))
(install-arithmetic! generic-with-layers)
```
We don't need to specify what layers are to be supported by layered-extender, since it automatically uses the layers in each layered procedure's arguments. So if, say, + is called with arguments that have units, then the result will also have units. But if none of the arguments have units, then neither does the result, and the unit addition procedure is not invoked. Similarly, if the arguments have support, then the result will have support. But if the arguments do not have support, the result will not have support, and the support addition procedure is not invoked.
For example, we can define the kinetic energy of a particle with mass m and velocity v:
```
(define (KE m v)
(* 1/2 m (square v)))
```
Now we can see the result of evaluating the kinetic energy on some arguments:
```
(pp (KE (layered-datum 'm
unit-layer (unit 'kilogram 1)
support-layer (support-set 'cph))
(layered-datum 'v
unit-layer (unit 'meter 1 'second -1)
support-layer (support-set 'gjs))))
#[layered-datum (* (* 1/2 m) (square v))]
(base-layer (* (* 1/2 m) (square v)))
(unit-layer (unit kilogram 1 meter 2 second -2))
(support-layer (support-set gjs cph))
```
We supply each argument with annotations for the units layer and the support layer. For the support layer we give a set of premises (the support set). Here, each argument is supported by a single
premise, cph and gjs respectively. The value is a layered object with three layers: the base generic arithmetic layer value is the appropriate algebraic expression; the units are correct; and the support set is the set of named premises that contributed to the value.
Here we accepted the definition of KE without supplying explicit support for that procedure. More generally, we might want to add such support. For example, we may want to say that KE is supported by a premise KineticEnergy-classical. Then if we find a result of some complex computation that seems wrong, we can find out which procedures contributed to the wrong answer, as well as the numerical or symbolic input values that were used. We will attack this problem in exercise 6.2.
Not all premises that appear in the arguments to a computation need to appear in a result. For example, if a factor contributing to a product is zero, that is sufficient reason for the product to be zero, independent of any other finite factors. This is illustrated by supplying a zero mass:
```
(pp (KE (layered-datum 0
unit-layer (unit 'kilogram 1) support-
layer
(support-set 'jems))
(layered-datum 'v
unit-layer (unit 'meter 1 'second -1)
support-layer (support-set 'gjs))))
#[layered-datum 0]
(base-layer 0)
(unit-layer (unit kilogram 1 meter 2 second -2))
(support-layer (support-set jems))
```
Here the support for the numeric value of the result being zero is just the support supplied for the zero value for the mass.
#### **6.4.1 The support layer**
Now we will see how the support layer is implemented. It is somewhat different from the units layer, because units can be
combined without any reference to the base layer, whereas the support layer needs to look at the base layer for some operations.
The support layer is somewhat simpler than the units layer, because all but three of the arithmetic operators use the default: the support set of the result is the union of the support sets of the arguments.
```
(define support-layer
(make-annotation-layer 'support
(lambda (get-name has-value? get-value)
(define (get-default-value)
(support-set))
(define (get-procedure name arity)
(case name
((*) support:*)
((/) support:/)
((atan2) support:atan2)
(else support:default-procedure)))
(bundle layer?
get-name has-value? get-value
get-default-value get-procedure))))
(define support-layer-value
(layer-accessor support-layer))
(define (support:default-procedure base-value . args)
(apply support-set-union (map support-layer-value args)))
```
Multiplication is the first interesting case. The support layer needs to look at the values of the base arithmetic arguments to determine the computation of support. If either argument is zero, then the support for the result is only the support for the zero argument.
```
(define (support:* base-value arg1 arg2)
(let ((v1 (base-layer-value arg1))
(v2 (base-layer-value arg2))
(s1 (support-layer-value arg1))
(s2 (support-layer-value arg2)))
(if (exact-zero? v1)
(if (exact-zero? v2)
(if (< (length (support-set-elements s1))
(length (support-set-elements s2)))
s1
```
```
s2) ;arbitrary
s1)
(if (exact-zero? v2)
s2
(support-set-union s1 s2)))))
```
Division (and arctangent, not shown) also has to examine the base layer to deal with zero arguments. If the dividend is zero, that is sufficient to support the result that the quotient is zero. The divisor won't ever be zero because the base-layer computation will have signaled an error and this code won't be run.
```
(define (support:/ base-value arg1 arg2)
(let ((v1 (base-layer-value arg1))
(s1 (support-layer-value arg1))
(s2 (support-layer-value arg2)))
(if (exact-zero? v1)
s1
(support-set-union s1 s2))))
```
These optimizations for \* and / make sense only when we can prove that an argument is really zero, not an unsimplified symbolic expression. (But if an expression simplifies to exact zero we can use that fact!)
```
(define (exact-zero? x)
(and (n:number? x) (exact? x) (n:zero? x)))
```
The support-set abstraction is implemented as a list starting with the symbol support-set
```
(define (%make-support-set elements)
(cons 'support-set elements))
(define (support-set? object)
(and (pair? object)
(eq? 'support-set (car object))
(list? (cdr object))))
(define (support-set-elements support-set)
(cdr support-set))
```
along with a few extra utilities to complete the abstraction.
```
(define (make-support-set elements)
(if (null? elements)
%empty-support-set
(%make-support-set (delete-duplicates elements))))
(define (support-set . elements)
(if (null? elements)
%empty-support-set
(%make-support-set (delete-duplicates elements))))
(define %empty-support-set
(%make-support-set '()))
(define (support-set-empty? s)
(null? (support-set-elements s)))
```
We need to be able to compute the union of support sets and adjoin new elements to them. Since we chose to keep our elements in a list, we can use the lset library from Scheme. [4](#page-30-3)
```
(define (support-set-union . sets)
(make-support-set
(apply lset-union eqv?
(map support-set-elements sets))))
(define (support-set-adjoin set . elts)
(make-support-set
(apply lset-adjoin eqv? (support-set-elements set) elts)))
```
### **Exercise 6.2: Procedural responsibility**
The support layer based on arithmetic is extremely low level. Every primitive arithmetic operation is support-aware, and there is no way to bypass that work for common conditions. There needs to be a means of abstraction. For example, suppose we have a procedure that computes the numerical definite integral of a function. The units of the numerical value of the integral is the product of the units of the numerical value of the integrand and the units of the numerical value of the limits of integration. (The units of the upper and lower limit must be the same!) However, it is not a good idea to carry the units computation through all of the detailed arithmetic
going on in the integration process. It should be possible to annotate the integrator so that the result has the correct units without requiring every internal addition and multiplication to be a layered procedure operating on layered data.
- **a.** Make it possible to allow compound procedures that may be built out of the primitive arithmetic procedures (or possibly not) to modify the support of their results by adding a premise (such as "made by George").
- **b.** Allow compound procedures to be executed in a way that hides their bodies from the support layer. Thus, for example, a trusted library procedure may annotate its result with appropriate support, but the operations in its body will not incur the overhead of computing the support of intermediate results.
- **c.** The support layer is organized around the operators of an arithmetic system. But sometimes it is useful to distinguish the specific occurrences of an operator. For example, when dealing with numerical precision it is not very helpful to say that a loss of significance is due to subtraction of almost equal quantities. It would be more helpful to show the particular instance of subtraction that is the culprit. Is there some way to add the ability to identify instances of an operator to the support layer?
# **Exercise 6.3: Paranoid programming**
Sometimes we are not confident that a library procedure does what we expect. In that case it is prudent to "wrap" the library procedure with a test that checks its result. For example, we may be using a program solve that takes as inputs a set of equations and a set of unknowns, that may occur in the equations, producing a set of substitutions for the unknowns that satisfy the equations. We might want to wrap the solve program with a wrapper that checks that the result of substituting the outputs into the input equations indeed makes them tautologies. But we don't want such a paranoia
wrapper to appear as part of our *parti*. How can this sort of thing be implemented as a layer? Explain your design and implement it.
### **Exercise 6.4: IDE for layered programs**
This exercise is a major design project: the invention of and development of an IDE (Integrated Development Environment) for layered systems.
The idea of layered programs, using layered data and layered procedures, is a very nice idea. The goal is to be able to annotate programs with useful and executable metadata—such as type declarations, assertions, units, and support—without cluttering the text of the base program. However, the text of the program must be linked with the text of the annotations, so that as any part of the program is edited, the related layers are also edited. For example, suppose it is necessary to edit the base procedure of some layered procedure. The layers may be information like type declarations or how it handles units and support sets. It would be nice for the editor to show us these layers and how they are connected to the text of the base program, when necessary. Perhaps edits to the text of the base program entail edits to the annotation layers. Sometimes this can be done automatically, but often the programmer must edit the layers.
- **a.** Imagine what you would like to see in an IDE to support the development of layered systems. What would you like to see on a screen? How would you keep the parts that are edited synchronized?
- **b.** Emacs is a powerful infrastructure for building such an IDE. It supports multiple windows and per-window editing modes. It has syntactic support for many computer languages, including Scheme. There are Emacs subsystems, like org-mode, that have the flavor of a layered structure for documents. Can this be extended to help with layered programming? Sketch out a way to build your IDE using Emacs.
- **c.** Build a small but extensible prototype on the Emacs base, and try it out. What problems do you encounter? Did Emacs really provide a good place to start? If not, why not? Report on your experiment.
- **d.** If your prototype was promising, develop a solid system and make it into a loadable Emacs library, so we can all use your great system.
#### **6.4.2 Carrying justifications**
More complex justifications may also record the particular operations that were used to make the data. This kind of annotation can be used to provide explanations (proofs), but it is intrinsically expensive in space—potentially linear in the number of operations performed. However, sometimes it is appropriate to attach a detailed audit history describing the derivation of a data item, to allow some later process to use the derivation for some purpose or to evaluate the validity of the derivation for debugging. [5](#page-30-4)
<span id="page-26-0"></span>For many purposes, such as legal arguments, it is necessary to know the provenance of data: where it was collected, how it was collected, who collected it, how the collection was authorized, etc. The detailed derivation of a piece of evidence, giving the provenance of each contribution, may be essential to determining if it is admissable in a trial.
<span id="page-26-1"></span>The symbolic arithmetic that we built in section 3.1 is one way this can be done. In fact, if symbolic arithmetic is used as a layer on numeric arithmetic, then every numerical value is annotated with its derivation. The symbolic arithmetic annotation could be very expensive, because the symbolic expression for an application of a numerical operator includes the symbolic expressions of its inputs. However, because we need only include a pointer to each input, the space and time cost of annotating each operation is often acceptable. [6](#page-30-5) So one may overlay this kind of justification when it is necessary to provide an explanation, or even temporarily, to track a difficult-to-catch bug.
### **Exercise 6.5: Justifications**
Sketch out the issues involved in carrying justifications for data. Notice that the reason for a value depends on the values that it was derived from and the way those values were combined. What do we do if the reason for a value is some numerically weighted combination of many factors, as in a deep neural network? This is a research question that we need to address to make the systems that affect us accountable.
# **6.5 The promise of layering**
We have only scratched the surface of what can be done with an easy and convenient mechanism for layering of data and programs. It is an open area of research. The development of systems to support such layering can have huge consequence for the future.
Sensitivity analysis is an important feature that can be built using annotated data and layered procedures. For example, in mechanics, if we have a system that evolves the solution of a system of differential equations from some initial conditions, it is often valuable to understand the way a tube of trajectories that surround a reference trajectory deforms. This is usually accomplished by integrating a variational system along with the reference trajectory. Similarly, it may be possible to carry a probability distribution of values around a nominal value along with the nominal value computed in some analyses. This may be accomplished by annotating the values with distributions and providing the operations with overlaying procedures to combine the distributions, guided by the nominals, perhaps implementing Bayesian analysis. Of course, to do this well is not easy.
An even more exciting but related idea is that of perturbational programming. By analogy with the differential equations example, can we program symbolic systems to carry a "tube" of variations
around a reference trajectory, thus allowing us to consider small variations of a query? Consider, for example, the problem of doing a search. Given a set of keywords, the system does some magic that comes up with a list of documents that match the keywords. Suppose we incrementally change a single keyword. How sensitive is the search to that keyword? More important, is it possible to reuse some of the work that was done getting the previous result in the incrementally different search? We don't know the answers to these questions, but if it is possible, we want to be able to capture the methods by a kind of perturbational program, built as an overlay on the base program.
#### **Dependencies mitigate inconsistency**
Dependency annotations on data give us a powerful tool for organizing human-like computations. For example, all humans harbor mutually inconsistent beliefs: an intelligent person may be committed to the scientific method yet have a strong attachment to some superstitious or ritual practices; a person may have a strong belief in the sanctity of all human life, yet also believe that capital punishment is sometimes justified. If we were really logicians this kind of inconsistency would be fatal: if we really were to simultaneously believe both propositions P and NOT P, then we would have to believe all propositions! But somehow we manage to keep inconsistent beliefs from inhibiting all useful thought. Our personal belief systems appear to be locally consistent, in that there are no contradictions apparent. If we observe inconsistencies we do not crash; we may feel conflicted or we may chuckle.
We can attach to each proposition a set of supporting assumptions, allowing deductions to be conditional on the assumption set. Then, if a contradiction occurs, a process can determine the particular "nogood set" of inconsistent assumptions. The system can then "chuckle," realizing that no deductions based on any superset of those assumptions can be believed. This chuckling process, dependency-directed backtracking, can be used to optimize a complex search process, allowing a search to make the
best use of its mistakes. But enabling a process to simultaneously hold beliefs based on mutually inconsistent sets of assumptions without logical disaster is revolutionary.
#### **Restrictions on the use of data**
Data is often encumbered by restrictions on the ways it may be used. These encumberances may be determined by statute, by contract, by custom, or by common decency. Some of these restrictions are intended to control the diffusion of the data, while others are intended to delimit the consequences of actions predicated on that data.
The allowable uses of data may be further restricted by the sender: "I am telling you this information in confidence. You may not use it to compete with me, and you may not give it to any of my competitors." Data may also be restricted by the receiver: "I don't want to know anything about this that I may not tell my spouse."
Although the details may be quite involved, as data is passed from one individual or organization to another, the restrictions on the uses to which it may be put are changed in ways that can often be formulated as algebraic expressions. These expressions describe how the restrictions on the use of a particular data item may be computed from the history of its transmission: the encumberances that are added or deleted at each step. When parts of one data set are combined with parts of another data set, the restrictions on the ways that the extracts may be used and the restrictions on the ways that they may be combined must determine the restrictions on the combination. A formalization of this process is a *data-purpose algebra* [53] description.
Data-purpose algebra layers can be helpful in building systems that track the distribution and use of sensitive data to enable auditing and to inhibit the misuse of that data. But this kind of application is much larger than just a simple matter of layering. To make it effective requires ways of ensuring the security of the process, to prevent leakage through uncontrolled channels or
compromise of the tracking layers. There is a great deal of research to be done here.
- <span id="page-30-0"></span>[1](#page-7-0) Note that a layer's implementation for a layered procedure may itself be a generic procedure. Likewise, a handler for a generic procedure may be a layered procedure.
- <span id="page-30-1"></span>[2](#page-11-0) The procedures conjoin and complement are combinators for predicates: conjoin makes a new predicate that is the boolean and of its arguments, and complement makes a new predicate that is the negation of its argument.
- <span id="page-30-2"></span>[3](#page-12-0) Watch out! The "base units" are not to be confused with the base-layer in our layered-data system. A system of units is built on a set of base units, such as kilograms, meters, and seconds. There are derived units, such as the newton, which is a combination of the base units: 1 N = 1 kg *·* m *·* s 2
- <span id="page-30-3"></span>[4](#page-23-0) If the support sets get large we can try to represent them much more efficiently, but here we are dealing with only small sets.
- <span id="page-30-4"></span>[5](#page-26-0) In Patrick Suppes's beautiful *Introduction to Logic* [118] the proofs are written in four columns. The columns are an identifier for the line, the statement for that line, the rule that was used for deriving that line from previous lines, and the set of premises that support the line. This proof structure is actually the inspiration for the way we carry justifications and support sets.
- <span id="page-30-5"></span>[6](#page-26-1) This is not really true. The problem is that the composition of numerical operations may incur no significant memory access cost, but the construction of a symbolic expression, however small, requires access to memory. And memory access time is huge compared with the time to do arithmetic in CPU registers. Sigh...

File diff suppressed because it is too large Load Diff

Binary file not shown.

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.6 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

View File

@@ -0,0 +1,730 @@
# **Epilogue**
Serious engineering is only a few thousand years old. Our attempts at deliberately producing very complex robust systems are immature at best. We have yet to glean the lessons that biological evolution has learned over the last few billion years.
We have been more concerned with efficiency and correctness than with the kind of robustness of biological systems that comes from optimizing evolvability, flexibility, and resistance to attack. This is sensible for developing mission-critical systems that have barely enough resources to perform their function. However, the rapid advance of microelectronics has alleviated the resource problem for most applications. Our increasing dependence on computational and communications infrastructure, and the development of ever more sophisticated attacks on that infrastructure, make it imperative that we turn our attention to robustness.
We are not advocating biomimetics; but observations of biological systems give us hints about how to incorporate powerful principles of robustness into our engineering practice. Many of these principles are in direct conflict with the established practices of optimization of efficiency and of the ability to prove correctness. In this book we deliberately violate these established practices to explore the possibilities of optimizing for flexibility. A motivation of our approach is the observation that most systems that have survived the test of time are built as an assembly of domain-specific languages, each of which is appropriate to make some parts of the system easy to construct.
As part of the effort to build artificially intelligent symbolic systems, the AI community has incidentally developed technological tools that can be used to support principles of flexible and robust design. For example, rather than thinking of backtracking as a method of organizing search, we can employ it to increase the general applicability of components in a complex system that organizes itself to meet externally imposed constraints. We believe that by pursuing this new synthesis we will obtain better hardware and software systems.
We started out in chapter 2 with some rather unobjectionable techniques that are universally applicable. We introduced the strategy of building systems of combinators—libraries of parametric parts that have standardized interfaces. Such parts can be combined in many ways to meet a great variety of needs. We demonstrated how this idea can be used to simplify the construction of a language of regular-expression matchers. We introduced systems of wrappers that allow us to adapt parts to applications with different standards than the parts were built to, and we used this to make a language of unit-conversion wrappers. We progressed to build a rule interpreter for a language to express the rules of board games like checkers.
In chapter 3 we embarked on an exciting and dangerous adventure: we investigated what can be done if we are allowed to modulate the meanings of the primitive procedures of a language. We extended arithmetic to handle symbolic expressions and functions, as well as numbers. We created extensible generic procedures and used the extension mechanism to integrate forwardmode automatic differentiation into our arithmetic. This kind of extension is dangerous, but if we are careful, we can make old programs have new abilities without losing their old abilities. To make this strategy efficient and even more powerful, we proceeded to explore user-defined types, with declarable subtype relationships, and we used that to make a simple but easily extensible adventure game.
Pattern matching and pattern-directed invocation, introduced in chapter 4, are crucial techniques for erecting domain-specific languages. We started with term-rewriting rules for algebraic
simplification. We then showed an elegant strategy for compiling patterns into a composition of elementary pattern matchers in a system of pattern-matching combinators. We then expanded our pattern-matching tools to allow pattern variables on both sides of a match, implementing unification, which we then used to make an elementary type-inference system. Finally, we built matchers that match arbitrary graphs, not just expression trees, and used graphs and graph matching to express the rules for moves in chess in an elegant manner.
Because all sane computer languages are universal, programmers do not have an excuse that a solution cannot be expressed in some language. If seriously pressed, good programmers can make an interpreter or compiler for any language they please in any language they are stuck with. This is not very hard, but it is probably the most powerful move a programmer can make. In chapter 5 we showed how to make increasingly powerful languages by interpretation and compilation. We started with a simple applicative-order interpreter for a Scheme-like language. For extensibility, the interpreter was built on generic procedures. We then extended it to allow procedure definitions to declare lazy formal parameters. Next we compiled the language to a combination of execution procedures—a system of combinators. We then added a model of nondeterministic evaluation, with the amb operator. Finally, we showed how by exposing the underlying continuations we could arrange to get the power of amb in the underlying Scheme system. In chapter 6 we began to explore multilayer computations, based on a novel mechanism closely related to generic procedures. For example, we modified our arithmetic so that a program that computes numerical results from numerical arguments could be extended, without modification, to compute the same results, augmented with units. The units of the result are automatically derived from the units of the inputs, and combinations are checked for consistent units: adding 5 kilograms to 2 meters will signal an error. We used the same layering mechanism to augment programs to carry dependencies, so that a result automatically has reference to the sources of the ingredients that went into making that result.
The propagator model of chapter 7 is really a way of thinking about the plumbing of large systems. Although, in the examples we show in this chapter, the propagators are all simple arithmetic functions or relations, the idea is far more general. A propagator could be hardware or software. It could be a simple function or a huge computer doing an enormous crunch. If it is software, it could be written in any language. Indeed, a system of propagators does not have to be homogeneous. Different propagators may be constructed differently. Cells may be specialized to hold different kinds of information and they may merge information in their own favorite way. The communication between propagators and cells may be signals on a chip or on a global network. All that matters is the protocol for a propagator to query a cell and to add information to a cell.
In this book we introduced many programming ideas. It is now up to you to evaluate them and perhaps apply them.
# **A**
# **Appendix: Supporting Software**
All of the code shown in this book and the infrastructure code that supports it can be downloaded as an archive file from <http://groups.csail.mit.edu/mac/users/gjs/sdf.tgz>
The archive is organized as a directory tree, where each subdirectory approximately corresponds with a section of this book. The software runs in MIT/GNU Scheme version 10.1.10 or later, which can be obtained from <http://www.gnu.org/software/mit-scheme>
The software uses a number of features specific to the MIT/GNU implementation, so it won't work with other distributions. It should be possible to port it to another distribution, but we have not tried this and it is likely to require some work. Because this is free software (licensed under the GPL) you may modify it and distribute it to others.
The software archive is a tar file called sdf.tgz, which can be unpacked using the command
```
tar xf .../sdf.tgz
```
This tar command produces a directory sdf in whatever directory the tar command is executed in.
The primary interface to the software archive is a management program, which is distributed with the archive. To use this program, start MIT/GNU Scheme and load it like this:
```
(load ".../sdf/manager/load")
```
where .../ refers to the directory in which the archive was unpacked. The manager creates a single definition in the global
environment, called manage. Once loaded, it's not necessary to reload the manager unless a new instance of Scheme is started.
Suppose you are working on section 4.2 "Term rewriting," and you'd like to play with the software or work on an exercise. The loader for code in that section is stored in the subdirectory .../sdf/term-rewriting, along with files that are specific to that section. But you do not need to know how the loader works. (Of course, you may read the manager code. It is pretty interesting.)
The manage command
```
(manage 'new-environment 'term-rewriting)
```
will create a new top-level environment, load all of the necessary files for that section, and move the read-eval-print loop into that environment. After you are done with that section, you can use the manage command to load the software for another section by replacing term-rewriting with the name corresponding to the new section.
Usually, the name of a subdirectory can be used as an argument to (manage 'new-environment ...). When used in this context, the subdirectory name is called a *flavor*. However, some of the subdirectories have multiple flavors, and in those cases the available flavor names differ from the subdirectory names.
The correspondence between sections of the book and subdirectories/flavors in the archive can be found in the file
```
.../sdf/manager/sections.scm
```
In addition, there are two special subdirectories: common holds shared files that are used extensively; and manager holds the implementation of manage.
The software management program manage has many other useful abilities. Among them are managing working environments by name, finding the files that define a name and those that refer to it, and running unit tests. For more information refer to the documentation that is included in the manager subdirectory.
Using the software may require additional steps that are not spelled out in the book text, such as initialization. Every subdirectory contains tests: any file named test-*FOO*.scm is a "standard" test, using a testing framework similar to those of other programming languages. Additionally, the load-spec files in each subdirectory may contain references to tests, marked with the inline-test? symbol, that use a different testing framework that is similar to read-eval-print loop transcripts. Look there for examples of how to run the programs.
# **Appendix: Scheme**
Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary. Scheme demonstrates that a very small number of rules for forming expressions, with no restrictions on how they are composed, suffice to form a practical and efficient programming language that is flexible enough to support most of the major programming paradigms in use today.
*IEEE Standard for the Scheme Programming Language* [61], p. 3
Here we give an elementary introduction to the Scheme dialect of Lisp. For a longer introduction see the textbook *Structure and Interpretation of Computer Programs (SICP)* [1].
For a more precise explanation of the language see the IEEE standard [61] and the Seventh *Revised Report on the Algorithmic Language Scheme (R7RS)* [109]
Some of the programs in this book depend on nonstandard features in MIT/GNU Scheme; for documentation of this system see the *MIT/GNU Scheme Reference Manual* [51]. Also, for Scheme features that are documented elsewhere the index to the Reference Manual provides pointers to the appropriate documents.
# **B.1 Essential Scheme**
Scheme is a simple programming language based on expressions. An expression names a value. For example, the numeral 3.14 names an approximation to a familiar number, and the numeral 22/7 names another approximation to it. There are primitive expressions, such as numerals, that we directly recognize, and there are compound expressions of several kinds.
Compound expressions are delimited by parentheses. Those that start with distinguished keywords, such as if, are called *special forms*. Those that are not special forms, called *combinations*, denote applications of procedures to arguments.
### **Combinations**
A *combination*—also called a *procedure application*—is a sequence of expressions delimited by parentheses:
```
(operator operand-1 ... operand-n)
```
The first subexpression in a combination, called the *operator*, is taken to name a procedure, and the rest of the subexpressions, called the *operands*, are taken to name the arguments to that procedure. The value returned by the procedure when applied to the given arguments is the value named by the combination. For example,
```
(+ 1 2.14)
3.14
(+ 1 (* 2 1.07))
3.14
```
<span id="page-8-0"></span>are both combinations that name the same number as the numeral 3.14. [1](#page-26-0) In these cases the symbols + and \* name procedures that add and multiply, respectively. If we replace any subexpression of any expression with an expression that names the same thing as the
original subexpression, the thing named by the overall expression remains unchanged.
Note that in Scheme every parenthesis is essential: you cannot add extra parentheses or remove any.
### **Lambda expressions**
<span id="page-9-0"></span>Just as we use numerals to name numbers, we use lambda expressions to name procedures. [2](#page-26-1) For example, the procedure that squares its input can be written:
```
(lambda (x) (* x x))
```
This expression can be read: "The procedure of one argument, *x*, that multiplies *x* by *x*." Of course, we can use this expression in any context where a procedure is needed. For example,
```
((lambda (x) (* x x)) 4)
16
```
The general form of a lambda expression is
```
(lambda formal-parameters body)
```
where *formal-parameters* is (usually) a parenthesized list of symbols that will be the names of the formal parameters of the procedure. When the procedure is applied to arguments, the formal parameters will have the arguments as their values. The *body* is an expression that may refer to the formal parameters. The value of a procedure application is the value of the body of the procedure with the arguments substituted for the formal parameters. [3](#page-27-0)
<span id="page-9-1"></span>In the example shown above, the symbol x is the only formal parameter of the procedure named by (lambda (x) (\* x x)). That procedure is applied to the value of the numeral 4, so in the body, (\* x x), the symbol x has the value 4, and the value of the combination ((lambda (x) (\* x x)) 4) is 16.
We said "usually" above because there are exceptions. Some procedures, such as the procedure that multiplies numbers, named
by the symbol \*, can take an indefinite number of arguments. We will explain how to do that later (on page 389).
### **Definitions**
We can use the define special form to give a name to any object. We say that the name identifies a *variable* whose value is the object. For example, if we make the definitions
```
(define pi 3.141592653589793)
(define square (lambda (x) (* x x)))
```
we can then use the symbols pi and square wherever the numeral or the lambda expression could appear. For example, the area of the surface of a sphere of radius 5 is
```
(* 4 pi (square 5))
314.1592653589793
```
Procedure definitions may be expressed more conveniently using "syntactic sugar." The squaring procedure may be defined
```
(define (square x) (* x x))
```
which we may read: "To square *x* multiply *x* by *x*."
In Scheme, procedures are *first-class* objects: they may be passed as arguments, returned as values, and incorporated into data structures. For example, it is possible to make a procedure that implements the mathematical notion of the composition of two functions: [4](#page-27-1)
```
(define compose
(lambda (f g)
(lambda (x)
(f (g x)))))
((compose square sin) 2)
.826821810431806
(square (sin 2))
.826821810431806
```
One thing to notice is that the values of f and g in the returned procedure, (lambda (x) (f (g x))), are the values of the formal parameters of the outer procedure, (lambda (f g) ...). This is the essence of the lexical scoping discipline of Scheme. The value of any variable is obtained by finding its binding in the lexically apparent context. There is an implicit context for all the variables defined globally by the system. (For example, + is globally bound by the system to the procedure that adds numbers.)
Using the syntactic sugar shown above for square, we can write the definition of compose more conveniently:
```
(define (compose f g)
(lambda (x)
(f (g x))))
```
In MIT/GNU Scheme we can use the sugar recursively, to write:
```
(define ((compose f g) x)
(f (g x)))
```
Sometimes it is advantageous to make a definition local to another definition. For example, we may define compose as follows:
```
(define (compose f g)
(define (fog x)
(f (g x)))
fog)
```
The name fog is not defined outside the definition of compose, so it is not particularly useful in this case, but larger chunks of code are often easier to read if internal pieces are given names. Internal definitions must always precede any expressions that are not definitions in the body of the procedure.
### **Conditionals**
Conditional expressions may be used to choose among several expressions to produce a value. For example, a procedure that implements the absolute value function may be written:
```
(define (abs x)
(cond ((< x 0) (- x))
((= x 0) x)
((> x 0) x)))
```
The conditional cond takes a number of *clauses*. Each clause has a *predicate expression*, which may be either true or false, and a *consequent expression*. The value of the cond expression is the value of the consequent expression of the first clause for which the corresponding predicate expression is true. The general form of a conditional expression is
```
(cond (predicate-1 consequent-1)
...
(prredicate-n consequent-n))
```
For convenience there is a special keyword else that can be used as the predicate in the last clause of a cond.
The if special form provides another way to make a conditional when there is only a binary choice to be made. For example, because we have to do something special only when the argument is negative, we could have defined abs as:
```
(define (abs x)
(if (< x 0)
(- x)
x))
```
The general form of an if expression is
```
(if predicate consequent alternative)
```
If the *predicate* is true the value of the if expression is the value of the *consequent*, otherwise it is the value of the *alternative*.
# **Recursive procedures**
Given conditionals and definitions, we can write recursive procedures. For example, to compute the *n*th factorial number we may write:
```
(define (factorial n)
(if (= n 0)
1
(* n (factorial (- n 1)))))
(factorial 6)
720
(factorial 40)
815915283247897734345611269596115894272000000000
```
### **Local names**
A let expression is used to give names to objects in a local context. For example,
```
(define (f radius)
(let ((area (* 4 pi (square radius)))
(volume (* 4/3 pi (cube radius))))
(/ volume area)))
(f 3)
1
```
The general form of a let expression is
```
(let ((variable-1 expression-1)
...
(variable-n expression-n))
body)
```
The value of the let expression is the value of the *body* expression in the context where the variables *variable-i* have the values of the expressions *expression-i*. The expressions *expression-i* may not refer to any of the variables *variable-j* given values in the let expression.
A let\* expression is the same as a let expression except that an expression *expression-i* may refer to variables *variable-j* given values earlier in the let\* expression.
A slight variant of the let expression provides a convenient way to write a loop. We can write a procedure that implements an alternative algorithm for computing factorials as follows:
```
(define (factorial n)
(let factlp ((count 1) (answer 1))
(if (> count n)
answer
(factlp (+ count 1) (* count answer)))))
(factorial 6)
720
```
Here, the symbol factlp following the let is locally defined to be a procedure that has the variables count and answer as its formal parameters. It is called the first time with 1 and 1 as arguments, initializing the loop. Whenever the procedure named factlp is called later, these variables get new values that are the values of the operand expressions (+ count 1) and (\* count answer).
An equivalent way to express this procedure has an explicitly defined internal procedure:
```
(define (factorial n)
(define (factlp count answer)
(if (> count n)
answer
(factlp (+ count 1) (* count answer))))
(factlp 1 1))
```
The procedure factlp is defined locally; it exists only in the body of factorial. Because factlp is lexically enclosed in the definition of factorial, the value of n in its body is the value of the formal parameter of factorial.
## **Compound data—lists, vectors, and records**
Data can be glued together to form compound data structures. A *list* is a data structure in which the elements are linked sequentially. A *vector* is a data structure in which the elements are packed in a linear array. New elements can be added to lists, but to access the *n*th element of a list takes computing time proportional to *n*. By contrast, a vector is of fixed length, and its elements can be accessed in constant time. A *record* is similar to a vector, except that its fields are addressed by names rather than index numbers. Records also
provide new data types, which are distinguishable by type predicates and are guaranteed to be different from other types.
Compound data objects are constructed from components by procedures called *constructors* and the components are accessed by *selectors*.
<span id="page-15-0"></span>The procedure list is the constructor for lists. The predicate list? is true of any list, and false of all other types of data. [5](#page-27-2) For example,
```
(define a-list (list 6 946 8 356 12 620))
a-list
(6 946 8 356 12 620)
(list? a-list)
#t
(list? 3)
#f
```
Here #t and #f are the printed representations of the boolean values true and false. [6](#page-27-3)
<span id="page-15-1"></span>Lists are built from pairs. A *pair* is made using the constructor cons. The selectors for the two components of the pair are car and cdr (pronounced "could-er"). [7](#page-27-4)
```
(define a-pair (cons 1 2))
a-pair
(1 . 2)
(car a-pair)
1
(cdr a-pair)
2
```
A list is a chain of pairs, such that the car of each pair is the list element and the cdr of each pair is the next pair, except for the last cdr, which is a distinguishable value called the empty list and written (). Thus,
```
(car a-list)
6
(cdr a-list)
(946 8 356 12 620)
(car (cdr a-list))
946
(define another-list
(cons 32 (cdr a-list)))
another-list
(32 946 8 356 12 620)
(car (cdr another-list))
946
```
The lists a-list and another-list share their tail (their cdr).
The predicate pair? is true of pairs and false of all other types of data. The predicate null? is true only of the empty list.
Vectors are simpler than lists. There is a constructor vector that can be used to make vectors and a selector vector-ref for accessing the elements of a vector. In Scheme all selectors that use a numerical index are zero-based:
```
(define a-vector
(vector 37 63 49 21 88 56))
a-vector
#(37 63 49 21 88 56)
(vector-ref a-vector 3)
21
(vector-ref a-vector 0)
37
```
The printed representation of a vector is distinguished from the printed representation of a list by the character # before the initial parenthesis.
There is a predicate vector? that is true of vectors and false for all other types of data.
Scheme provides a numerical selector for the elements of a list, list-ref, analogous to the selector for vectors:
```
(list-ref a-list 3)
356
(list-ref a-list 0)
6
```
Records are more involved, as they must be declared before they can be constructed. A simple record declaration might be
```
(define-record-type point
(make-point x y)
point?
(x point-x)
(y point-y))
```
After this declaration, we can make and use points:
```
(define p (make-point 1 2))
(point? p)
#t
(point-x p)
1
(point-y p)
2
```
The elements of lists, vectors, and records may be any kind of data, including numbers, procedures, lists, vectors, and records. Numerous other procedures for manipulating lists, vectors, and records can be found in the Scheme online documentation.
## **Procedures with an indefinite number of arguments**
The procedures that we have seen are specified with a list of formal parameters that are bound to the arguments that the procedure is called with. However, there are many procedures that take an indefinite number of arguments. For example, the arithmetic procedure that multiplies numbers can take any number of
arguments. To define such a procedure we specify the formal parameters as a single symbol rather than a list of symbols. The single symbol is then bound to a list of the arguments that the procedure is called with. For example, given a binary multiplier \*:binary we can write
```
(define * (lambda args (accumulate *:binary 1 args)))
where accumulate is just
(define (accumulate proc initial lst)
(if (null? lst)
initial
(proc (car lst)
(accumulate proc initial (cdr lst)))))
```
Sometimes we want a procedure that takes some named arguments and an indefinite number of others. In a procedure definition a parameter list that has a dot before the last parameter name (called *dotted-tail notation*) indicates that the parameters before the dot will be bound to the initial arguments, and the final parameter will be bound to a list of any remaining arguments. In the example of \* above there are no initial arguments, so the value of args is a list of all the arguments. Thus, alternatively, we could define \* as:
```
(define (* . args) (accumulate *:binary 1 args))
```
The procedure named by - is more interesting, as it requires at least one argument: when given one argument - negates it; when given more than one argument it subtracts the rest from the first:
```
(define (- x . ys)
(if (null? ys) ; Only one argument?
(-:unary x)
(-:binary x (accumulate +:binary 0 ys))))
```
### This can also be written
```
(define -
(lambda (x . ys)
(if (null? ys)
```
```
(-:unary x)
(-:binary x (accumulate +:binary 0 ys)))))
```
Parameters like args and ys in the examples above are called *rest parameters* because they are bound to the rest of the arguments.
## **Symbols**
Symbols are a very important kind of primitive data type that we use to make programs and algebraic expressions. You probably have noticed that Scheme programs look just like lists. In fact, they *are* lists. Some of the elements of the lists that make up programs are symbols, such as + and vector. [8](#page-27-5)
<span id="page-19-0"></span>If we are to make programs that can manipulate programs, we need to be able to write an expression that names such a symbol. This is accomplished by the mechanism of *quotation*. The name of the symbol + is the expression '+, and in general the name of an expression is the expression preceded by a single quote character. Thus the name of the expression (+ 3 a) is '(+ 3 a).
We can test if two symbols are identical by using the predicate eq?. For example, we can write a program to determine if an expression is a sum:
```
(define (sum? expression)
(and (pair? expression)
(eq? (car expression) '+)))
(sum? '(+ 3 a))
#t
(sum? '(* 3 a))
#f
```
Consider what would happen if we left out the quote in the expression (sum? '(+ 3 a)). If the variable a had the value 4, we would be asking if 7 is a sum. But what we wanted to know was whether the expression (+ 3 a) is a sum. That is why we need the quote.
## **Backquote**
To manipulate patterns and other forms of list-based syntax, it is often useful to intersperse quoted and evaluated parts in the same expression. Lisp systems provide a mechanism called *quasiquotation* that makes this easy.
<span id="page-20-0"></span>Just as we use the apostrophe character to indicate regular quotation, we use the backquote character to indicate quasiquotation. [9](#page-27-6) We specify such a partially quoted expression as a list in which the parts to be evaluated are prefixed with the comma character. For example,
```
'(a b ,(+ 20 3) d)
(a b 23 d)
```
The backquote mechanism also provides for "splicing" into a list expression: an evaluated subexpression produces a list, which is then spliced into the enclosing list. For example,
```
'(a b ,@(list (+ 20 3) (- 20 3)) d)
(a b 23 17 d)
```
Consult the Scheme Report [109] for a more detailed explanation of quasiquotation.
## **Effects**
<span id="page-20-1"></span>Sometimes we need to perform an action, such as plotting a point or printing a value, in the process of a computation. Such an action is called an *effect*. [10](#page-27-7) For example, to see in more detail how the factorial program computes its answer, we can interpolate a writeline statement in the body of the factlp internal procedure to print a list of the count and the answer for each iteration:
```
(define (factorial n)
(let factlp ((count 1) (answer 1))
(write-line (list count answer))
(if (> count n)
```
```
answer
(factlp (+ count 1) (* count answer)))))
```
When we call the modified factorial procedure we can watch the counter being incremented and the answer being built:
```
(factorial 6)
(1 1)
(2 1)
(3 2)
(4 6)
(5 24)
(6 120)
(7 720)
720
```
The body of every procedure or let, as well as the consequent of every cond clause, allows statements that have effects to be used. The effect statement generally has no useful value. The final expression in the body or clause produces the value that is returned. In this example the if expression produces the value of the factorial.
# **Assignments**
Effects like printing a value or plotting a point are pretty benign, but there are more powerful (and thus dangerous) effects, called *assignments*. An assignment *changes* the value of a variable or an entry in a data structure. Almost everything we are computing is a mathematical function: for a particular input it always produces the same result. However, with assignment we can make objects that change their behavior as they are used. For example, we can use set! to make a device that increments a count every time we call it: [11](#page-28-0)
```
(define (make-counter)
(let ((count 0))
(lambda ()
(set! count (+ count 1))
count)))
```
## Let's make two counters:
```
(define c1 (make-counter))
(define c2 (make-counter))
```
These two counters have independent local state. Calling a counter causes it to increment its local state variable, count, and return its value.
```
(c1)
1
(c1)
2
(c2)
1
(c1)
3
(c2)
2
```
For assigning to the elements of a data structure, such as a pair, a list, or a vector, Scheme provides:
```
(set-car! pair new-value)
(set-cdr! pair new-value)
(list-set! list index new-value)
(vector-set! vector index new-value)
```
A record may be defined to allow assignments to its fields (compare page 388:
```
(define-record-type point
(make-point x y)
point?
(x point-x set-x!)
(y point-y set-y!))
(define p (make-point 1 2))
(point-x p)
```
```
1
(point-y p)
2
(set-x! p 3)
(point-x p)
3
(point-y p)
2
```
<span id="page-23-0"></span>In general, it is good practice to avoid assignments when possible, but if you need them they are available. [12](#page-28-1)
# **B.2 More advanced stuff**
Scheme provides many more powerful features, but we won't try to describe them here. For example, you will probably want to know about hash tables. In general, the best sources are the
*Revised Report on the Algorithmic Language Scheme (R7RS)* [109] and the *MIT/GNU Scheme Reference Manual* [51]. But here are two fairly complex features that you may need to reference while reading this book:
### **Dynamic binding**
We sometimes want to specify the way in which some evaluation or action will be accomplished—for example, to specify the radix to use when printing a number. To do this we use an object called a *parameter*.
For example, the Scheme procedure number->string produces a character string that represents a number in a given radix:
```
(number->string 100 2)
"1100100"
(number->string 100 16)
"64"
```
Suppose we want to use number->string in many places in a complex program that we run by calling myprog, but we want to be able to control the radix used when the program is run. We can accomplish this by making a parameter radix with the default value 10:
```
(define radix (make-parameter 10))
```
The value of a parameter is obtained by calling the parameter with no arguments:
```
(radix)
10
```
We define a specialized version of number->string to use instead of number->string:
```
(define (number->string-radix number)
(number->string number (radix)))
```
In an execution of (myprog), every call to number->string-radix will produce a decimal string, because the default value of (radix) is 10. However, we can wrap our program with parameterize to change the execution to use another radix:
```
(parameterize ((radix 2))
(myprog))
```
The syntax of parameterize is the same as the syntax of let, but it can be used only for parameters created by make-parameter.
### **Bundles**
MIT/GNU Scheme provides a simple mechanism for building a collection of related procedures with shared state: a *bundle*. A bundle is a procedure that delegates to a collection of named procedures: the first argument to the bundle is the name of the delegate to use, and the rest of the arguments are passed to the specified delegate. This is similar to the way that some objectoriented languages work, but much simpler, and without classes or inheritance.
A bundle is sometimes called a *message-accepting procedure*, where the message type is the delegate name and the message body is the arguments. [13](#page-28-2) This emphasizes that the bundle supports a message-passing protocol and can be thought of as a node in a communications network.
<span id="page-25-0"></span>Here is a simple example:
```
(define (make-point x y)
(define (get-x) x)
(define (get-y) y)
(define (set-x! new-x) (set! x new-x))
(define (set-y! new-y) (set! y new-y))
(bundle point? get-x get-y set-x! set-y!))
```
The procedure make-point defines four internal procedures, which share the state variables x and y. The bundle macro creates a bundle procedure, for which those procedures are the delegates.
The first argument to the bundle macro is a predicate, which is created with make-bundle-predicate. The bundle that is created will satisfy this predicate:
```
(define point? (make-bundle-predicate 'point))
(define p1 (make-point 3 4))
(define p2 (make-point -1 1))
(point? p1)
#t
(point? p2)
#t
(point? (lambda (x) x))
#f
```
The argument to make-bundle-predicate is a symbol that is used to identify the predicate when debugging.
If a predicate is not needed, bundle alternatively accepts #f as a first argument. In that case there will be no way to distinguish the created bundle procedure from other procedures.
The remaining arguments to the bundle macro are the names of the delegate procedures: get-x, get-y, set-x!, and set-y!. These names are looked up in the lexical environment of the macro to get the corresponding delegate procedures. A bundle procedure is then created, containing an association from each name to its delegate procedure.
When the resulting bundle procedure is called, its first argument is a symbol that must be the name of one of the delegate procedures. The association is used to select the named delegate procedure, which is then called with the bundle procedure's remaining arguments as its arguments.
It is easier to use a bundle than to describe it:
```
(p1 'get-x)
3
(p1 'get-y)
4
(p2 'get-x)
-1
(p2 'get-y)
1
(p1 'set-x! 5)
(p1 'get-x)
5
(p2 'get-x)
-1
```
- <span id="page-26-0"></span>[1](#page-8-0) In examples we show the value that would be printed by the Scheme system using *slanted* characters following the input expression.
- <span id="page-26-1"></span>[2](#page-9-0) The logician Alonzo Church [16] invented *λ* notation to allow the specification of an anonymous function of a named parameter: *λx*[expression in *x*]. This is read, "That function of one argument whose value is obtained by substituting the argument for *x* in the indicated expression."
- <span id="page-27-0"></span>[3](#page-9-1) We say that the formal parameters are *bound* to the arguments, and the *scope* of the binding is the body of the procedure.
- <span id="page-27-1"></span>[4](#page-10-0) The examples are indented to help with readability. Scheme does not care about extra white space, so we may add as much as we please to make things easier to read.
- <span id="page-27-2"></span>[5](#page-15-0) A *predicate* is a procedure that returns true or false. By Scheme cultural convention, we usually give a predicate a name ending with a question mark (?), except for the elementary arithmetic comparison predicates: =, <, >, <=, and >=. This is just a stylistic convention. To Scheme the question mark is just an ordinary character.
- <span id="page-27-3"></span>[6](#page-15-1) It is convenient, but irritating to some, that the conditional expressions (if and cond) treat any predicate value that is not explicitly #f as true.
- <span id="page-27-4"></span>[7](#page-15-2) These names are accidents of history. They stand for "Contents of the Address part of Register" and "Contents of the Decrement part of Register" of the IBM 704 computer, which was used for the first implementation of Lisp in the late 1950s. Scheme is a dialect of Lisp.
- <span id="page-27-5"></span>[8](#page-19-0) A symbol may have any number of characters. A symbol may not normally contain whitespace or delimiter characters, such as parentheses, brackets, quotation marks, comma, or #; but there are special notations that allow any characters to be included in a symbol's name.
- <span id="page-27-6"></span>[9](#page-20-0) On an American keyboard the backquote character "'" is the lowercase character on the key that has the tilde character "" as the uppercase character.
- <span id="page-27-7"></span>[10](#page-20-1) This is computer-science jargon. An effect is a change to something. For example, write-line changes the display by printing something to the display.
- <span id="page-28-0"></span>[11](#page-21-0) It is another cultural convention that we terminate the name of a procedure that has "side effects" with an exclamation point (!). This warns the reader that changing the order of effects may change the results of running the program.
- <span id="page-28-1"></span>[12](#page-23-0) The discipline of programming without assignments is called *functional programming*. Functional programs are generally easier to understand and have fewer bugs than *imperative programs*.
- <span id="page-28-2"></span>[13](#page-25-0) This terminology dates back to the ACTOR framework [58] and the Smalltalk programming language [46].

File diff suppressed because it is too large Load Diff

0
raw/notes/.gitkeep Normal file
View File

0
raw/repos/.gitkeep Normal file
View File