Asp Forum - SICP's description of pointers

chong.franklin

2/1/2016 6:34:00 AM

This quote is from
SICP (https://mitpress.mit.edu/sicp/full-text/book/book-Z-H-15.html#...) that I think is talking about pointers/references in programming languages.

"As we have seen, pairs provide a primitive "glue" that we can use to construct compound data objects. Figure 2.2 shows a standard way to visualize a pair--in this case, the pair formed by (cons 1 2). In this representation, which is called box-and-pointer notation, each object is shown as a pointer to a box. The box for a primitive object contains a representation of the object. For example, the box for a number contains a numeral. The box for a pair is actually a double box, the left part containing (a pointer to) the car of the pair and the right part containing the cdr."

In this case the book is talking about pairs (E.G. (cons 1 2)) and how they are represented. However we can also use pairs to construct a list like this:

(cons 1 (cons 2 '()))

Though box-and-pointer-notation is just a notation and useless, I think this looks a lot like a linked list. As I understand it, a linked list is a data
structure that contains a value and a pointer to another linked list. Having said that I think cons can be constructed like a linked list. I'm confused
by:

"The box for a pair is actually a double box, the left part containing (a pointer to) the car of the pair and the right part containing the cdr."

I originally thought the pointer should be on cdr because that would be the next list if we were constructing a list through pairs.

I think this might be a different kind of pointer all together. What exactly does a pointer mean in this case? The only pointer I know is pointers used
in c and c++.

1 Answer

Pascal J. Bourguignon

2/1/2016 7:16:00 PM

franklin <chong.franklin@gmail.com> writes:

> I think this might be a different kind of pointer all together. What
> exactly does a pointer mean in this case? The only pointer I know is
> pointers used in c and c++.

C and C++ pointers are something way more complex, so I would find it
difficult to explain pointers in general, and pointers as an
implementation technique in lisp, using or making reference to C or C++
pointers.

Instead, I will show you how you can implement cons car cdr using a
memory model that is commonly found in processor machines. There are
variants of this model, but the fundamentals are those:

A machine memory is a vector where objects can be stored, indexed by
small integers.

So, in Lisp, this gives:

(defparameter *nil* 0)
(defparameter *memory-size* 8192)
(defparameter *memory* (make-array *memory-size* :initial-element *nil*))

(defun .cons (a d) â?¦) --> c
(defun .car (c) â?¦) --> a
(defun .cdr (c) â?¦) --> d

Now, cons will have to allocate a new pair, to store A and D.

So we will need a way to manage our machine memory, to know what memory
slot is free and what memory slot is not free. One simple way to do it,
is to use a heap pointer. A pointer is simply an index into the memory
vector, a small integer then. We will assume that all the memory from this
heap pointer to the end of the memory is free. When we need to allocate
a new memory block, we will take it from the memory cell "pointed to" by
this heap pointer, and we will increment the heap pointer by the size of
the new block.

We will store the hear pointer in the memory, so the first slot (number
0) will be occupied by it, and the first free slot will be the slot
number 1.

(define-symbol-macro heap-pointer (aref *memory* 0))

(defun initialize-memory ()
(setf heap-pointer 1))

(defun allocate (size)
(if (< (+ heap-pointer size) *memory-size*)
(prog1 heap-pointer
(incf heap-pointer size))
(progn
(garbage-collector)
(if (< (+ heap-pointer size) *memory-size*)
(prog1 heap-pointer
(incf heap-pointer size))
(error 'out-of-memory)))))

So now we can implement cons, car and cdr:

(defun .cons (a d)
(let ((c (allocate 2)))
(setf (aref *memory* c) d
(aref *memory* (1+ c)) a)
c))

(defun .car (c) (aref *memory* (1+ c)))
(defun .cdr (c) (aref *memory* c))
(defun (setf .car) (a c) (setf (aref *memory* (1+ c)) a))
(defun (setf .cdr) (d c) (setf (aref *memory* c) d))

(defun .null (o) (= *nil* o))
(defun .list (&rest elements)
(let* ((head (.cons *nil* *nil*))
(tail head))
(dolist (e elements)
(setf (.cdr tail) (.cons e *nil*)
tail (.cdr tail)))
(.cdr head)))

(let ((c (.cons 'alpha (.cons 'beta *nil*))))
(prin1 (.car c)) (terpri)
(prin1 (.car (.cdr c))) (terpri)
(princ (if (.null (.cdr (.cdr c)))
"nil"
"error"))
(terpri)
(format t "pointer to the pair c: ~D~%" c))
alpha
beta
nil
pointer to the pair c: 14

(defun memory-dump (&key (start 0) (end *memory-size))
(loop :for address :from start :below end
:do (format t "~4D: ~S~%" address (aref *memory* address))))

(memory-dump :end 20)
0: 16
1: 1
2: 2
3: 1
4: 0
5: 2
6: 4
7: 1
8: 0
9: beta
10: 8
11: alpha
12: 0
13: beta
14: 12
15: alpha
16: 0
17: 0
18: 0
19: 0

So, here you can see that pointer = address = small integer that is an
index in the memory vector. Calling .cons "allocates" two slots in this
vectors, incrementing the heap pointer (which is stored at the address
0). .cons then stores the d and a values in those two slots. Notice
that I made the choice of storing the d value in the smallest index:
this is an optimization that is^w was often made in real processors,
since this allows to follow the chain of cdr faster, without having to
compute an offset.

You can see from my memory dump that I made a few tests, but the final
..cons cell allocated was at the addres 14. At this address, you can see
the cdr, which is a pointer to the slot at address (index) 12, and the
car, which is at the address 15: alpha. If you follow the pointer in the
cdr, at address 12 you will find another cdr whose value is the pointer
0 (*nil*), and whose car (at address 13) is beta.

You may notice that there's now way, by looking at what's in the memory,
to distinguish a pointer from a small integer. This can be a
problem. How do we know that the cons cell c is:
`(alpha . ,address-to-the-cons-cell-in-the-cdr)
and not:
(alpha . 12)

(.cons alpha 12) would produce the value 12, in its cdrâ?¦

There are basically two solutions:

- use a hardware memory that has tag bits, that allows us to specify the
type of the data stored in the memory cell (this is what we had on the
lisp machines, and also on the 7090 in a way).

- use a usual hardware memory, but reserve up some bits to store a type
tag. We can do that for example, by storing numbers by shifting them
one bit to the left, and by using pointers to the car, instead of the
cdr of the cons cells.

So the integer 12 will be stored as 24
and the pointer to the cons cell at the address 12 will be stored as
13.

Any even integer in the memory will represent an integer half of it,
and any odd integer in the memory will represent a pointer to the cons
cell, with car = (aref *memory* c) and cdr = (aref *memory* (1- c)).
Notice, if we do that, we can switch back a and d, so that we don't
have to use the 1- offset for the cdr, but for the car again. To
represent .nil, we will initialize a special .cons cell, with .nil in
its .car and and in its .cdr (it was a trick used in the early lisp,
where the symbols were implemented using cons cells, and it happened
that by storing NIL at the address 0, you had 0 in the car and in the
cdr of the cons cell representing the symbol NIL (which is the
historical reason why (car nil) -> nil and (cdr nil) -> nil in most
lisps). Nowadays, symbols have a specific representation, with a
different type tag, so we would have to use up more bits to store the
type tags, which would be beyond this simple example (but you can do
it as an exercise, adding characters, vectors and strings too (so you
can store the symbol name as a string, and strings as vectors of
characters, and characters as a specific type having as code an
integer).

(defparameter *memory-size* 8192)
(defparameter *memory* (make-array *memory-size* :initial-element 0))

(define-symbol-macro heap-pointer (aref *memory* 0))
(define-symbol-macro .nil 3)
(defun .null (o) (= .nil o))

(defun initialize-memory ()
(fill *memory* 0)
(setf heap-pointer 4)
(setf (.car .nil) .nil
(.cdr .nil) .nil))

;; So initially we have:
;; 0: heap-pointer
;; 1: --- not used
;; 2: (.car .nil)
;; 3: (.cdr .nil)
;; 4: --- first free cell
;; 5: â?¦

(defun allocate (size)
(if (< (+ heap-pointer size) *memory-size*)
(prog1 heap-pointer
(incf heap-pointer size))
(progn
(garbage-collector)
(if (< (+ heap-pointer size) *memory-size*)
(prog1 heap-pointer
(incf heap-pointer size))
(error 'out-of-memory)))))

(defun tag-cons (c) (1+ c))
(defun tag-integer (i) (ash i 1))
(defun .consp (o) (oddp o))
(defun .integerp (o) (evenp o))
(defun untag (o)
(cond ((.consp o) (1- c))
((.integerp o) (ash o -1))))

(defun .cons (a d)
(let ((c (allocate 2)))
(setf (aref *memory* c) a
(aref *memory* (1+ c)) d)
(tag-cons c)))

;; Alternatively, we could write:
#-(and)
(defun .cons (a d)
(let ((c (allocate 2)))
(tag-cons c)
(setf (.car c) a
(.cdr c) d)
c))
;; but it would be a little less optimized ;-)

;; Now that we have tagged values in memory, we need a printer to
;; recover the external representation (untagged):
(defun .prin1 (o)
(cond ((.null o)
(princ "nil"))
((.consp o)
(princ "(")
(.prin1 (.car o))
(princ " . ")
(.prin1 (.cdr o))
(princ ")")
o)
((.integerp o)
(prin1 (untag o)))
(t
(prin1 o))))

;; and we can do type checking:
(defun .car (c) (cond ((.consp c) (aref *memory* (1- c)))
((.null c) c)
(t (error 'type-error :datum c :expected-type '.cons))) )
(defun .cdr (c) (cond ((.consp c) (aref *memory* c))
((.null c) c)
(t (error 'type-error :datum c :expected-type '.cons))))

(defun (setf .car) (a c) (cond ((.consp c) (setf (aref *memory* (1- c)) a))
((.null c) c)
(t (error 'type-error :datum c :expected-type '.cons))))
(defun (setf .cdr) (d c) (cond ((.consp c) (setf (aref *memory* c) d))
((.null c) c)
(t (error 'type-error :datum c :expected-type '.cons))))

(defun .+ (a b) (+ a b)) ; no need to untag, since our tag bit is 0,
; 0+0=0, no carry.
(defun .* (a b) (* (untag a) b) ; we need to adjust one multiplicand.
;; so using this low order bits tag type has a rather inexpensive impact
;; on fixnum operations.

(defun .list (&rest elements)
(let* ((head (.cons *nil* *nil*))
(tail head))
(dolist (e elements)
(setf (.cdr tail) (.cons e *nil*)
tail (.cdr tail)))
(.cdr head)))

(defun memory-dump (&key (start 0) (end *memory-size*))
(loop :for address :from start :below end
:do (format t "~4D: ~S~%" address (aref *memory* address))))

(initialize-memory)
(let ((c (.cons (tag-integer 41)
(.cons (tag-integer 42)
(.cons (tag-integer 43)
.nil)))))
(.prin1 c) (terpri)
(format t "pointer to the pair c: ~D~%" c))
(41 . (42 . (43 . nil)))
pointer to the pair c: 9

(memory-dump :end 15)
0: 10
1: 0
2: 3
3: 3
4: 86
5: 3
6: 84
7: 5
8: 82
9: 7
10: 0
11: 0
12: 0
13: 0
14: 0

--
__Pascal Bourguignon__ http://www.informat...
â??The factory of the future will have only two employees, a man and a
dog. The man will be there to feed the dog. The dog will be there to
keep the man from touching the equipment.â? -- Carl Bass CEO Autodesk

comp.lang.lisp

SICP's description of pointers

chong.franklin

Pascal J. Bourguignon

x Login to ForumsZone