feat: LLM Wiki 세컨드 브레인 초기 셋팅

- CLAUDE.md 생성 (볼트 운영 규칙, Karpathy LLM Wiki 10가지 규칙) - 나의 핵심 맥락.md 생성 (아키텍트 프로필, 세컨드 브레인 목적, 핵심 소스) - raw/ 구조 정립 (book/기존 설계원칙 보존, articles/repos/notes/ 추가) - wiki/ 초기화 (index.md, log.md, concepts/sources/patterns/ 폴더) - output/ 초기화 - LLMWiki/ 기존 프롬프트 패턴 파일 보존 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 14:34:29 +09:00
parent d7a123de97
commit 44e26d6972
48 changed files with 14334 additions and 0 deletions
--- a/raw/book/설계원칙-140-161.md
+++ b/raw/book/설계원칙-140-161.md
@@ -0,0 +1,703 @@
+structures play well with vectors and matrices that have symbolic numerical expressions as elements? Caution: This is quite hard. Perhaps it is appropriate as part of a long-term project.
+
+# **3.3 Example: Automatic differentiation**
+
+One remarkable application of extensible generic procedures is *automatic differentiation*. <sup>14</sup> This is a beautiful way to obtain a program that computes the derivative of the function computed by a given program. <sup>15</sup> Automatic differentiation is now an important component in machine learning applications.
+
+We will see that a simple way to implement automatic differentiation is to extend the generic arithmetic primitives to work with *differential objects*, a new compound data type. This will enable the automatic differentiation of symbolic as well as numerical functions. It will also enable us to make automatic differentiation work with higher-order procedures—procedures that return other procedures as values.
+
+Here is a simple example of automatic differentiation to illustrate what we are talking about:
+
+```
+((derivative (lambda (x) (expt x 3))) 2)
+12
+```
+
+Note that the derivative of the function that computes the cube of its argument is a new function, which when given 2 as its argument returns 12 as its value.
+
+If we extend the arithmetic to handle symbolic expressions, and we do some algebraic simplification on the result, we get:
+
+```
+((derivative (lambda (x) (expt x 3))) 'a)
+(* 3 (expt a 2))
+```
+
+And the full power of the programming language is available, including higher-order procedures. This kind of system is useful in
+
+working with the very large expressions that occur in interesting physics problems. 16
+
+Let's look at a simple application: the computation of the roots of an equation by Newton's method. The idea is that we want to find values of *x* for which *f* (*x*) = 0. If *f* is sufficiently smooth, and we have a sufficiently close guess *x*<sup>0</sup> , we can improve the guess by computing a new guess *x*<sup>1</sup> by the formula:
+
+$$x_{n+1} = x_n - \frac{f(x_n)}{Df(x_n)}$$
+
+This can be repeated, as necessary, to get a sufficiently accurate result. An elementary program to accomplish this is:
+
+```
+(define (root-newton f initial-guess tolerance)
+  (let ((Df (derivative f)))
+    (define (improve-guess xn)
+      (- xn (/ (f xn) (Df xn))))
+    (let loop ((xn initial-guess))
+      (let ((xn+1 (improve-guess xn)))
+        (if (close-enuf? xn xn+1 tolerance)
+            xn+1
+            (loop xn+1))))))
+```
+
+Notice that the local procedure named Df in root-newton is a procedure that computes the derivative of the function computed by the procedure passed in as *f*.
+
+For example, suppose we want to know the angle *θ* in the first quadrant for which cos(*θ*) = sin(*θ*). (The answer is *π/*4 ≈ *.*7853981633974484) We can write:
+
+```
+(define (cs theta)
+  (- (cos theta) (sin theta)))
+(root-newton cs 0.5 1e-8)
+.7853981633974484
+```
+
+This result is correct to full machine accuracy.
+
+#### **3.3.1 How automatic differentiation works**
+
+The program for automatic differentiation is directly derived from the definition of the derivative. Suppose that given a function *f* and a point *x* in its domain, we want to know the value of the function at a nearby point *f* (*x* + Δ*x*), where Δ*x* is a small increment. The derivative of a function *f* is defined to be the function *Df* whose value for particular arguments *x* is something that can be "multiplied" by an increment Δ*x* of the argument to get the best possible linear approximation to the increment in the value of *f*:
+
+$$f(x + \Delta x) \approx f(x) + Df(x) \Delta x$$
+
+We implement this definition using a data type that we call a *differential object*. A differential object [*x, δx*] can be thought of as a number with a small increment, *x* + *δx*. But we treat it as a new numerical quantity similar to a complex number: it has two components, a *finite part* and an *infinitesimal part*. <sup>17</sup> We extend each primitive arithmetic function to work with differential objects: each primitive arithmetic function *f* must know its derivative function *Df* , so that:
+
+$$[x, \delta x] \xrightarrow{f} [f(x), Df(x)\delta x]$$
+ (3.5)
+
+Note that the derivative of *f* at the point *x*, *Df* (*x*), is the coefficient of *δx* in the infinitesimal part of the resulting differential object.
+
+Now here is the powerful idea: If we then pass the result of *f* ([*x, δx*]) (equation 3.5) through another function *g*, we obtain the chainrule answer we would hope for:
+
+$$[f(x), Df(x)\delta x] \stackrel{g}{\longmapsto} [g(f(x)), Dg(f(x))Df(x)\delta x]$$
+
+Thus, if we can compute the results of all primitive functions on differential objects, we can compute the results of all compositions of functions on differential objects. Given such a result, we can extract the derivative of the composition: the derivative is the coefficient of the infinitesimal increment in the resulting differential object.
+
+To extend a generic arithmetic operator to compute with differential objects, we need only supply a procedure that computes the derivative of the primitive arithmetic function that the operator names. Then we can use ordinary Scheme compositions to get the derivative of any composition of primitive functions. 18
+
+Given a procedure implementing a unary function f, the procedure derivative produces a new procedure the-derivative that computes the derivative of the function computed by f. <sup>19</sup> When applied to some argument, x, the derivative creates a new infinitesimal increment dx and adds it to the argument to get the new differential object [*x, δx*] that represents *x* + *δx*. The procedure f is then applied to this differential object and the derivative of f is obtained by extracting the coefficient of the infinitesimal increment dx from the value:
+
+```
+(define (derivative f)
+  (define (the-derivative x)
+    (let* ((dx (make-new-dx))
+           (value (f (d:+ x (make-infinitesimal dx)))))
+      (extract-dx-part value dx)))
+  the-derivative)
+```
+
+The procedure make-infinitesimal makes a differential object whose finite part is zero and whose infinitesimal part is dx. The procedure d:+ adds differential objects. The details will be explained in section 3.3.3.
+
+#### **Extending the primitives**
+
+We need to make handler procedures that extend the primitive arithmetic generic procedures to operate on differential objects. For each unary procedure we have to make the finite part of the result
+
+and the infinitesimal part of the result, and we have to put the results together, as expressed in equation 3.5. So the handler for a unary primitive arithmetic procedure that computes function *f* is constructed by diff:unary-proc from the procedure f for *f* and the procedure df for its derivative *Df*. These are glued together using special addition and multiplication procedures d:+ and d:\* for differential objects, to be explained in section 3.3.3.
+
+```
+(define (diff:unary-proc f df)
+ (define (uop x) ; x is a differential object
+     (let ((xf (finite-part x))
+           (dx (infinitesimal-part x)))
+       (d:+ (f xf) (d:* (df xf) dx))))
+ uop)
+```
+
+For example, the sqrt procedure handler for differential objects is just:
+
+```
+(define diff:sqrt
+  (diff:unary-proc sqrt (lambda (x) (/ 1 (* 2 (sqrt x))))))
+```
+
+The first argument of diff:unary-proc is the sqrt procedure and the second argument is a procedure that computes the derivative of sqrt.
+
+We add the new handler to the generic sqrt procedure using
+
+```
+(assign-handler! sqrt diff:sqrt differential?)
+```
+
+where differential? is a predicate that is true only of differential objects. The procedure assign-handler! is just shorthand for a useful pattern:
+
+```
+(define (assign-handler! procedure handler . preds)
+  (define-generic-procedure-handler procedure
+    (apply match-args preds)
+    handler))
+```
+
+And the procedure match-args makes an applicability specification from a sequence of predicates.
+
+Handlers for other unary primitives are straightforward: 20
+
+```
+(define diff:exp (diff:unary-proc exp exp))
+(define diff:log (diff:unary-proc log (lambda (x) (/ 1 x))))
+(define diff:sin (diff:unary-proc sin cos))
+(define diff:cos
+```
+
+Binary arithmetic operations are a bit more complicated.
+
+$$g(x + \Delta x, y + \Delta y) \approx g(x, y) + \partial_0 g(x, y) \Delta x + \partial_1 g(x, y) \Delta y$$
+ (3.6)
+
+where  $\partial_0 f$  and  $\partial_1 f$  are the partial derivative functions of f with respect to the two arguments. Let f be a function of two arguments; then  $\partial_0 f$  is a new function of two arguments that computes the partial derivative of f with respect to its first argument:
+
+$$\partial_0 f(x,y) = \left. \frac{\partial}{\partial u} f(u,v) \right|_{u=x,v=y}$$
+
+So the rule for binary operations is
+
+$$([x, \delta x], [y, \delta y]) \xrightarrow{f} [f(x, y), \partial_0 f(x, y) \delta x + \partial_1 f(x, y) \delta y]$$
+
+To implement binary operations we might think that we could simply follow the plan for unary operations, where dof and dlf are the two partial derivative functions:
+
+```
+(define (diff:binary-proc f d0f d1f)
+  (define (bop x y)
+          (let ((dx (infinitesimal-part x))
+                (dy (infinitesimal-part y))
+                (xf (finite-part x))
+```
+
+```
+(yf (finite-part y)))
+  (d:+ (f xf yf)
+```
+
+This is a good plan, but it isn't quite right: it doesn't ensure that the finite and infinitesimal parts are consistently chosen for the two arguments. We need to be more careful about how we choose the parts. We will explain this technical detail and fix it in section 3.3.3, but let's go with this approximately correct code for now.
+
+Addition and multiplication are straightforward, because the partial derivatives are simple, but division and exponentiation are more interesting. We show the assignment of handlers only for diff:+ because all the others are similar.
+
+```
+(define diff:+
+ (diff:binary-proc +
+                    (lambda (x y) 1)
+                    (lambda (x y) 1))
+(assign-handler! + diff:+ differential? any-object?)
+(assign-handler! + diff:+ any-object? differential?)
+(define diff: *
+ (diff:binary-proc *
+                    (lambda (x y) y)
+                    (lambda (x y) x))
+(define diff:/
+  (diff:binary-proc /
+                    (lambda (x y)
+                      (/ 1 y))
+                    (lambda (x y))
+                       (* -1 (/ x (square y))))))
+```
+
+The handler for exponentiation f(x, y) = x is a bit more complicated. The partial with respect to the first argument is simple:  $\partial_0 f(x, y) = yx^{-1}$ . But the partial with respect to the second argument is usually  $\partial_1 f(x, y) = x \log x$ , except for some special cases:
+
+```
+(define diff:expt
+  (diff:binary-proc expt
+    (lambda (x y)
+      (* y (expt x (- y 1))))
+    (lambda (x y)
+      (if (and (number? x) (zero? x))
+          (if (number? y)
+              (if (positive? y)
+                  0
+                  (error "Derivative undefined: EXPT"
+                          x y))
+              0)
+          (* (log x) (expt x y))))))
+```
+
+## **Extracting the derivative's value**
+
+To compute the value of the derivative of a function, we apply the function to a differential object and obtain a result. We have to extract the derivative's value from that result. There are several possibilities that must be handled. If the result is a differential object, we have to pull the derivative's value out of the object. If the result is not a differential object, the derivative's value is zero. There are other cases that we have not mentioned. This calls for a generic procedure with a default that produces a zero.
+
+```
+(define (extract-dx-default value dx) 0)
+(define extract-dx-part
+  (simple-generic-procedure 'extract-dx-part 2
+                            extract-dx-default))
+```
+
+In the case where a differential object is returned, the coefficient of dx is the required derivative. This will turn out to be a bit complicated, but the basic idea can be expressed as follows:
+
+```
+(define (extract-dx-differential value dx)
+  (extract-dx-coefficient-from (infinitesimal-part value)
+dx))
+(define-generic-procedure-handler extract-dx-part
+  (match-args differential? diff-factor?)
+  extract-dx-differential)
+```
+
+The reason this is not quite right is that for technical reasons the structure of a differential object is more complex than we have already shown. It will be fully explained in section 3.3.3.
+
+Note: We made the extractor generic to enable future extensions to functions that return functions or compound objects, such as vectors, matrices, and tensors. (See exercise 3.12 on page 124.)
+
+Except for the fact that there may be more primitive operators and data structures to be included, this is all that is really needed to implement automatic differentiation! All of the procedures referred to in the handlers are the usual generic procedures on arithmetic; they may include symbolic arithmetic and functional arithmetic.
+
+### **3.3.2 Derivatives of n-ary functions**
+
+For a function with multiple arguments we need to be able to compute the partial derivatives with respect to each argument. One way to do this is: 21
+
+```
+(define ((partial i) f)
+  (define (the-derivative . args)
+    (if (not (< i (length args)))
+        (error "Not enough arguments for PARTIAL" i f args))
+    (let* ((dx (make-new-dx))
+           (value
+            (apply f (map (lambda (arg j)
+                            (if (= i j)
+                                 (d:+ arg
+                                      (make-infinitesimal dx))
+                                 arg))
+                          args (iota (length args))))))
+      (extract-dx-part value dx)))
+  the-derivative)
+```
+
+Here we are extracting the coefficient of the infinitesimal dx in the result of applying f to the arguments supplied with the ith argument incremented by dx. 22
+
+Now consider a function *g* of two arguments. Expanding on equation 3.6 we find that the derivative *Dg* is multiplied by a vector of increments to the arguments:
+
+The derivative *Dg* of *g* at the point *x, y* is the pair of partial derivatives in square brackets. The inner product of that *covector* of partials with the *vector* of increments is the increment to the function *g*. The general-derivative procedure computes this result:
+
+```
+(define (general-derivative g)
+  (define ((the-derivative . args) . increments)
+    (let ((n (length args)))
+      (assert (= n (length increments)))
+      (if (= n 1)
+          (* ((derivative g) (car args))
+             (car increments))
+          (reduce (lambda (x y) (+ y x))
+                  0
+                  (map (lambda (i inc)
+                         (* (apply ((partial i) g) args)
+                             inc))
+                       (iota n)
+                       increments)))))
+  the-derivative)
+```
+
+Unfortunately general-derivative does not return the structure of partial derivatives. It is useful in many contexts to have a derivative procedure gradient that actually gives the covector of partial derivatives. (See exercise 3.10.)
+
+# **Exercise 3.8: Partial derivatives**
+
+Another way to think about partial derivatives is in terms of *λ*calculus currying. Draw a diagram of how the data must flow. Use currying to fix the arguments that are held constant, producing a one-argument procedure that the ordinary derivative will be applied to. Write that version of the partial derivative procedure.
+
+# **Exercise 3.9: Adding handlers**
+
+There are primitive arithmetic functions for which we did not add handlers for differential objects, for example tan.
+
+- **a.** Add handlers for tan and atan1 (atan1 is a function of one argument).
+- **b.** It would be really nice to have atan optionally take two arguments, as in the Scheme Report [109], because we usually want to preserve the quadrant we are working in. Fix the generic procedure atan to do this correctly—using atan1 for one argument and atan2 if given two arguments. Also, install an atan2 handler for differentials. Remember, it must coexist with the atan1 handler.
+
+# **Exercise 3.10: Vectors and covectors**
+
+As described above, the idea of derivative can be generalized to functions with multiple arguments. The gradient of a function of multiple arguments is the covector of partial derivatives with respect to each of the arguments.
+
+- **a.** Develop data types for vectors and covectors such that the value of *Dg*(*x, y*) is the covector of partials. Write a gradient procedure that delivers that value. Remember, the product of a vector and a covector should be their inner product—the sum of the componentwise products of their elements.
+- **b.** Notice that if the input to a function is a vector, that is similar to multiple inputs, so the output of the gradient should be a covector. Note also that if the input to a function is a covector, then the output of the gradient should be a vector. Make this work.
+
+#### **3.3.3 Some technical details**
+
+Although the idea behind automatic differentiation is not complicated, there are a number of subtle technical details that must be addressed for it to work correctly.
+
+### **Differential algebra**
+
+If we want to compute a second derivative we must take a derivative of a derivative function. The evaluation of such a function will have two infinitesimals in play. To enable the computation of multiple derivatives and derivatives of functions of several variables we define an algebra of differential objects in "infinitesimal space." The objects are multivariate power series in which no infinitesimal increment has exponent greater than one. 23
+
+A differential object is represented by a tagged list of the terms of a power series. Each term has a coefficient and a list of infinitesimal incremental factors. The terms are kept sorted, in descending order. (Order is the number of incrementals. So *δxδy* is higher order than *δx* or *δy*.) Here is a quick and dirty implementation: 24
+
+```
+(define differential-tag 'differential)
+(define (differential? x)
+  (and (pair? x) (eq? (car x) differential-tag)))
+(define (diff-terms h)
+  (if (differential? h)
+      (cdr h)
+      (list (make-diff-term h '()))))
+```
+
+The term list is just the cdr of the differential object. However, if we are given an object that is not explicitly a differential object, for example a number, we coerce it to a differential object with a single term and with no incremental factors. When we make a differential object from a (presorted) list of terms, we always try to return a simplified version, which may be just a number, which is not explicitly a differential object:
+
+```
+(define (make-differential terms)
+ (let ((terms ; Nonzero terms
+        (filter
+         (lambda (term)
+           (let ((coeff (diff-coefficient term)))
+             (not (and (number? coeff) (= coeff 0)))))
+         terms)))
+   (cond ((null? terms) 0)
+         ((and (null? (cdr terms))
+               ;; Finite part only:
+               (null? (diff-factors (car terms))))
+          (diff-coefficient (car terms)))
+         ((every diff-term? terms)
+          (cons differential-tag terms))
+         (else (error "Bad terms")))))
+```
+
+In this implementation the terms are also represented as tagged lists, each containing a coefficient and an ordered list of factors.
+
+```
+(define diff-term-tag 'diff-term)
+(define (make-diff-term coefficient factors)
+  (list diff-term-tag coefficient factors))
+(define (diff-term? x)
+  (and (pair? x) (eq? (car x) diff-term-tag)))
+(define (diff-coefficient x)
+  (cadr x))
+(define (diff-factors x)
+  (caddr x))
+```
+
+To compute derivatives we need to be able to add and multiply differential objects:
+
+```
+(define (d:+ x y)
+  (make-differential
+   (+diff-termlists (diff-terms x) (diff-terms y))))
+(define (d:* x y)
+  (make-differential
+   (*diff-termlists (diff-terms x) (diff-terms y))))
+```
+
+and we also need this:
+
+```
+(define (make-infinitesimal dx)
+  (make-differential (list (make-diff-term 1 (list dx)))))
+```
+
+Addition of term lists is where we enforce and use the sorting of terms, with higher-order terms coming earlier in the lists. We can add two terms only if they have the same factors. And if the sum of the coefficients is zero we do not include the resulting term.
+
+```
+(define (+diff-termlists l1 l2)
+  (cond ((null? l1) l2)
+        ((null? l2) l1)
+        (else
+         (let ((t1 (car l1)) (t2 (car l2)))
+           (cond ((equal? (diff-factors t1) (diff-factors
+t2))
+                  (let ((newcoeff (+ (diff-coefficient t1)
+                                      (diff-coefficient t2))))
+                    (if (and (number? newcoeff)
+                             (= newcoeff 0))
+                        (+diff-termlists (cdr l1) (cdr l2))
+                        (cons
+                         (make-diff-term newcoeff
+                                          (diff-factors t1))
+                         (+diff-termlists (cdr l1)
+                                           (cdr l2))))))
+                 ((diff-term>? t1 t2)
+                  (cons t1 (+diff-termlists (cdr l1) l2)))
+                 (else
+                  (cons t2
+                        (+diff-termlists l1 (cdr l2)))))))))
+```
+
+Multiplication of term lists is straightforward, if we can multiply individual terms. The product of two term lists l1 and l2 is the term list resulting from adding up the term lists resulting from multiplying every term in l1 by every term in l2.
+
+```
+(define (*diff-termlists l1 l2)
+  (reduce (lambda (x y)
+            (+diff-termlists y x))
+         '()
+          (map (lambda (t1)
+                 (append-map (lambda (t2)
+                                (*diff-terms t1 t2))
+                              l2))
+               l1)))
+```
+
+A term has a coefficient and a list of factors (the infinitesimals). In a differential object no term may have an infinitesimal with an exponent greater than one, because *δx* <sup>2</sup> = 0. Thus, when we multiply two terms we must check that the lists of factors we are merging have no factors in common. This is the reason that \*diff-terms returns a list of the product term or an empty list, to be appended in \*diff-termlists. We keep the factors sorted when we merge the two lists of factors; this makes it easier to sort the terms.
+
+```
+(define (*diff-terms x y)
+  (let ((fx (diff-factors x)) (fy (diff-factors y)))
+    (if (null? (ordered-intersect diff-factor>? fx fy))
+        (list (make-diff-term
+               (* (diff-coefficient x) (diff-coefficient y))
+               (ordered-union diff-factor>? fx fy)))
+        '())))
+```
+
+### **Finite and infinitesimal parts**
+
+A differential object has a finite part and an infinitesimal part. Our diff:binary-proc procedure on page 109 is not correct for differential objects with more than one infinitesimal. To ensure that the parts of the arguments x and y are selected consistently we actually use:
+
+```
+(define (diff:binary-proc f d0f d1f)
+  (define (bop x y)
+    (let ((factor (maximal-factor x y)))
+      (let ((dx (infinitesimal-part x factor))
+            (dy (infinitesimal-part y factor))
+            (xe (finite-part x factor))
+            (ye (finite-part y factor)))
+        (d:+ (f xe ye)
+             (d:+ (d:* dx (d0f xe ye))
+                  (d:* (d1f xe ye) dy))))))
+  bop)
+```
+
+where factor is chosen by maximal-factor so that both x and y contain it in a term with the largest number of factors.
+
+The finite part of a differential object is all terms except for terms containing the maximal factor in a term of highest order, and the
+
+infinitesimal part is the remaining terms, all of which contain that factor.
+
+Consider the following computation:
+
+The highest-order term is *∂*0*∂*<sup>1</sup> *f* (*x, y*) · *δxδy*. It is symmetrical with respect to *x* and *y*. The crucial point is that we may break the differential object into parts in any way consistent with any one of the maximal factors (here *δx* or *δy*) being primary. It doesn't matter which is chosen, because mixed partials of **R → R** commute. 25
+
+```
+(define (finite-part x #!optional factor)
+  (if (differential? x)
+      (let ((factor (default-maximal-factor x factor)))
+        (make-differential
+         (remove (lambda (term)
+                   (memv factor (diff-factors term)))
+                 (diff-terms x))))
+      x))
+(define (infinitesimal-part x #!optional factor)
+  (if (differential? x)
+      (let ((factor (default-maximal-factor x factor)))
+        (make-differential
+         (filter (lambda (term)
+                   (memv factor (diff-factors term)))
+                 (diff-terms x))))
+      0))
+(define (default-maximal-factor x factor)
+  (if (default-object? factor)
+      (maximal-factor x)
+      factor))
+```
+
+## **How extracting really works**
+
+As explained on page 114, to make it possible to take multiple derivatives or to handle functions with more than one argument, a
+
+differential object is represented as a multivariate power series in which no infinitesimal increment has exponent greater than one. Each term in this series has a coefficient and a list of infinitesimal incremental factors. This complicates the extraction of the derivative with respect to any one incremental factor. Here is the real story:
+
+In the case where a differential object is returned we must find those terms of the result that contain the infinitesimal factor dx for the derivative we are evaluating. We collect those terms, removing dx from each. If there are no terms left after taking out the ones with dx, the value of the derivative is zero. If there is exactly one term left, which has no differential factors, then the coefficient of that term is the value of the derivative. But if there are remaining terms with differential factors, we must return the differential object with those residual terms as the value of the derivative.
+
+```
+(define (extract-dx-differential value dx)
+  (let ((dx-diff-terms
+         (filter-map
+          (lambda (term)
+            (let ((factors (diff-factors term)))
+              (and (memv dx factors)
+                   (make-diff-term (diff-coefficient term)
+                                   (delv dx factors)))))
+          (diff-terms value))))
+    (cond ((null? dx-diff-terms) 0)
+          ((and (null? (cdr dx-diff-terms))
+                (null? (diff-factors (car dx-diff-terms))))
+           (diff-coefficient (car dx-diff-terms)))
+          (else (make-differential dx-diff-terms)))))
+(define-generic-procedure-handler extract-dx-part
+  (match-args differential? diff-factor?)
+  extract-dx-differential)
+```
+
+## **Higher-order functions**
+
+For many applications we want our automatic differentiator to work correctly for functions that return functions as values:
+
+```
+(((derivative
+   (lambda (x)
+     (lambda (y z)
+       (* x y z))))
+  2)
+ 3
+ 4)
+;Value: 12
+```
+
+Including literal functions and partial derivatives makes this even more interesting.
+
+```
+((derivative
+  (lambda (x)
+    (((partial 1) (literal-function 'f))
+     x 'v)))
+ 'u)
+(((partial 0) ((partial 1) f)) u v)
+```
+
+And things can get even more complicated:
+
+```
+(((derivative
+   (lambda (x)
+     (derivative
+      (lambda (y)
+        ((literal-function 'f)
+         x y)))))
+  'u)
+ 'v)
+(((partial 0) ((partial 1) f)) u v)
+```
+
+Making this work introduces serious complexity in the procedure extract-dx-part.
+
+If the result of applying a function to a differential object is a function—a derivative of a derivative, for example—we need to defer the extraction until that function is called with arguments:
+
+In a case where a function is returned, as in
+
+```
+(((derivative
+   (lambda (x)
+     (derivative
+      (lambda (y)
+        (* x y)))))
+  'u)
+```
+
+```
+'v)
+1
+```
+
+we cannot extract the derivative until the function is applied to arguments. So we defer the extraction until we get the value resulting from that application. We extend our generic extractor:
+
+```
+(define (extract-dx-function fn dx)
+  (lambda args
+    (extract-dx-part (apply fn args) dx)))
+(define-generic-procedure-handler extract-dx-part
+  (match-args function? diff-factor?)
+  extract-dx-function)
+```
+
+Unfortunately, this version of extract-dx-function has a subtle bug. <sup>26</sup> Our patch is to wrap the body of the new deferred procedure with code that remaps the factor dx to avoid the unpleasant conflict. So, we change the handler for functions to:
+
+```
+(define (extract-dx-function fn dx)
+  (lambda args
+    (let ((eps (make-new-dx)))
+       (replace-dx dx eps
+        (extract-dx-part
+         (apply fn
+           (map (lambda (arg)
+                  (replace-dx eps dx arg))
+                args))
+         dx)))))
+```
+
+This creates a brand-new factor eps and uses it to stand for dx in the arguments, thus preventing collision with any other instances of dx.
+
+Replacement of the factors is itself a bit more complicated, because the code has to grovel around in the data structures. We will make the replacement a generic procedure, so we can extend it to new kinds of data. The default is that the replacement is just the identity on the object:
+
+```
+(define (replace-dx-default new-dx old-dx object) object)
+(define replace-dx
+```
+
+```
+(simple-generic-procedure 'replace-dx 3
+                          replace-dx-default))
+```
+
+For a differential object we have to actually go in and substitute the new factor for the old one, and we have to keep the factor lists sorted:
+
+```
+(define (replace-dx-differential new-dx old-dx object)
+  (make-differential
+   (sort (map (lambda (term)
+                (make-diff-term
+                 (diff-coefficient term)
+                 (sort (substitute new-dx old-dx
+                                   (diff-factors term))
+                       diff-factor>?)))
+              (diff-terms object))
+         diff-term>?)))
+(define-generic-procedure-handler replace-dx
+  (match-args diff-factor? diff-factor? differential?)
+  replace-dx-differential)
+```
+
+Finally, if the object is itself a function we have to defer it until arguments are available to compute a value:
+
+```
+(define (replace-dx-function new-dx old-dx fn)
+  (lambda args
+    (let ((eps (make-new-dx)))
+      (replace-dx old-dx eps
+        (replace-dx new-dx old-dx
+          (apply fn
+            (map (lambda (arg)
+                   (replace-dx eps old-dx arg))
+                 args)))))))
+(define-generic-procedure-handler replace-dx
+  (match-args diff-factor? diff-factor? function?)
+  replace-dx-function)
+```
+
+This is quite a bit more complicated than we might expect. It actually does three replacements of the differential factors. This is to prevent collisions with factors that may be free in the body of fn that are inherited from the lexical environment of definition of the function fn. 27
+
+# **Exercise 3.11: The bug!**
+
+Before we became aware of the bug pointed out in footnote 26 on page 121, the procedure extract-dx-function was written:
+
+```
+(define (extract-dx-function fn dx)
+  (lambda args
+    (extract-dx-part (apply fn args) dx)))
+```
+
+Demonstrate the reason for the use of the replace-dx wrapper by constructing a function whose derivative is wrong with this earlier version of extract-dx-part but is correct in the fixed version. This is not easy! You may want to read the references pointed at in footnote 26.
+
+#### **3.3.4 Literal functions of differential arguments**
+
+For simple arguments, applying a literal function is just a matter of constructing the expression that is the application of the function expression to the arguments. But literal functions must also be able to accept differential objects as arguments. When that happens, the literal function must construct (partial) derivative expressions for the arguments that are differentials. For the ith argument of an nargument function the appropriate derivative expression is:
+
+```
+(define (deriv-expr i n fexp)
+  (if (= n 1)
+      '(derivative ,fexp)
+      '((partial ,i) ,fexp)))
+```
+
+Some arguments may be differential objects, so a literal function must choose, for each argument, a finite part and an infinitesimal part. Just as for binary arithmetic handlers, the maximal factor must be consistently chosen. Our literal functions are able to take many arguments, so this may seem complicated, but we wrote the maximal-factor procedure to handle many arguments. This is explained in section 3.3.3.
+
+If there are no differential objects among the arguments we just cons up the required expression. If there are differential objects we need to make a derivative of the literal function. To do this we find a maximal factor from all of the arguments and separate out the finite parts of the arguments—the terms that do not have that factor. (The infinitesimal parts are the terms that have that factor.) The partial derivatives are themselves literal functions with expressions that are constructed to include the argument index. The resulting differential object is the inner product of the partial derivatives at the finite parts of the arguments with the infinitesimal parts of the arguments.
+
+This is all brought together in the following procedure:
+
+```
+(define (literal-function fexp)
+  (define (the-function . args)
+    (if (any differential? args)
+        (let ((n (length args))
+              (factor (apply maximal-factor args)))
+          (let ((realargs
+                 (map (lambda (arg)
+                        (finite-part arg factor))
+                      args))
+                (deltargs
+                 (map (lambda (arg)
+                        (infinitesimal-part arg factor))
+                      args)))
+            (let ((fxs (apply the-function realargs))
+                  (partials
+                   (map (lambda (i)
+                          (apply (literal-function
+                                   (deriv-expr i n fexp))
+                                  realargs))
+                        (iota n))))
+              (fold d:+ fxs
+                (map d:* partials deltargs)))))
+        '(,fexp ,@args)))
+  the-function)
+```
+
+# **Exercise 3.12: Functions with structured values**