AWL

Academic Word List

A Project for Teaching and Learning Academic English Vocabulary

Wednesday, 10 16th

Last updateFri, 05 Sep 2014 10am

Object Oriented Programming

Search:

OBJECT ORIENTED PROGRAMMING

 

Preface

No programming technique solves all problems.

No programming language produces only correct results.

No programmer should start each project from scratch.

Object-oriented programming is the current cure-all — although it has been around for much more then ten years. At the core, there is little more to it then finally applying the good programming principles which we have been taught for more then twenty years. C++ (Eiffel, Oberon-2, Smalltalk ... take your pick) is the

New Language because it is object-oriented — although you need not use it that way if you do not want to (or know how to), and it turns out that you can do just as well with plain ANSI-C. Only object-orientation permits code reuse between projects — although the idea of subroutines is as old as computers and good programmers always carried their toolkits and libraries with them.

This book is not going to praise object-oriented programming or condemn the Old Way. We are simply going to use ANSI-C to discover how object-oriented programming is done, what its techniques are, why they help us solve bigger problems, and how we harness generality and program to catch mistakes earlier. Along the way we encounter all the jargon — classes, inheritance, instances, linkage, methods, objects, polymorphisms, and more — but we take it out of the realm of magic and see how it translates into the things we have known and done all along.

I had fun discovering that ANSI-C is a full-scale object-oriented language. To share this fun you need to be reasonably fluent in ANSI-C to begin with — feeling comfortable with structures, pointers, prototypes, and function pointers is a must.

Working through the book you will encounter all the newspeak — according to Orwell and Webster a language ‘‘designed to diminish the range of thought’’ — and I will try to demonstrate how it merely combines all the good programming principles that you always wanted to employ into a coherent approach. As a result, you may well become a more proficient ANSI-C programmer.

The first six chapters develop the foundations of object-oriented programming with ANSI-C. We start with a careful information hiding technique for abstract data types, add generic functions based on dynamic linkage and inherit code by judicious lengthening of structures. Finally, we put it all together in a class hierarchy that makes code much easier to maintain.

Programming takes discipline. Good programming takes a lot of discipline, a large number of principles, and standard, defensive ways of doing things right. Programmers use tools. Good programmers make tools to dispose of routine tasks once and for all. Object-oriented programming with ANSI-C requires a fair amount of immutable code — names may change but not the structures. Therefore, in chapter seven we build a small preprocessor to create the boilerplate required. It looks like yet another new object-oriented dialect language (yanoodl perhaps?) but it should not be viewed as such — it gets the dull parts out of the way and lets us concentrate on the creative aspects of problem solving with better techniques. Ooc  (sorry) is pliable: we have made it, we understand it and can change it, and it writes the ANSI-C code just like we would.

The following chapters refine our technology. In chapter eight we add dynamic type checking to catch our mistakes earlier on. In chapter nine we arrange for automatic initialization to prevent another class of bugs. Chapter ten introduces delegates and shows how classes and callback functions cooperate to simplify, for example, the constant chore of producing standard main programs. More chapters are concerned with plugging memory leaks by using class methods, storing and loading structured data with a coherent strategy, and disciplined error recovery through a system of nested exception handlers.

Finally, in the last chapter we leave the confines of ANSI-C and implement the obligatory mouse-operated calculator, first for curses and then for the X Window System. This example neatly demonstrates how elegantly we can design and implement using objects and classes, even if we have to cope with the idiosyncrasies of foreign libraries and class hierarchies.

Each chapter has a summary where I try to give the more cursory reader a rundown on the happenings in the chapter and their importance for future work. Most chapters suggest some exercises; however, they are not spelled out formally, because I firmly believe that one should experiment on one’s own. Because we are building the techniques from scratch, I have refrained from making and using a massive class library, even though some examples could have benefited from it. If you want to understand object-oriented programming, it is more important to first master the techniques and consider your options in code design; dependence on somebody else’s library for your developments should come a bit later.

An important part of this book is the enclosed source floppy — it has a DOS file system containing a single shell script to create all the sources arranged by chapter.

There is a ReadMe file — consult it before you say make. It is also quite instructive to use a program like diff and trace the evolution of the root classes and ooc reports through the later chapters.

The techniques described here grew out of my disenchantment with C++ when

I needed object-oriented techniques to implement an interactive programming language and realized that I could not forge a portable implementation in C++. I turned to what I knew, ANSI-C, and I was perfectly able to do what I had to. I have shown this to a number of people in courses and workshops and others have used the methods to get their jobs done. It would have stopped there as my footnote to

a fad, if Brian Kernighan and my publishers, Hans-Joachim Niclas and John Wait, had not encouraged me to publish the notes (and in due course to reinvent it all once more). My thanks go to them and to all those who helped with and suffered through the evolution of this book. Last not least I thank my family — and no, object-orientation will not replace sliced bread.

 

 

Hollage, October 1993

Axel-Tobias Schreiner

 

 

 

 

 

 

 

 

 

 

 

Abstract Data Types

Information Hiding

1.1 Data Types

Data types are an integral part of every programming language. ANSI-C has int, double and char to name just a few. Programmers are rarely content with what’s available and a programming language normally provides facilities to build new data types from those that are predefined. A simple approach is to form aggregates such as arrays, structures, or unions. Pointers, according to C. A. R. Hoare ‘‘a step from which we may never recover,’’ permit us to represent and manipulate data of essentially unlimited complexity.

What exactly is a data type? We can take several points of view. A data type is a set of values — char typically has 256 distinct values, int has many more; both are evenly spaced and behave more or less like the natural numbers or integers of mathematics. double once again has many more values, but they certainly do not behave like mathematics’ real numbers.

Alternatively, we can define a data type as a set of values plus operations to work with them. Typically, the values are what a computer can represent, and the operations more or less reflect the available hardware instructions. int in ANSI-C does not do too well in this respect: the set of values may vary between machines, and operations like arithmetic right shift may behave differently.

More complicated examples do not fare much better. Typically we would define an element of a linear list as a structure

Type def struct node {

struct node * next;

... information ...

} node; and for the operations we specify function headers like

node * head (node * elt, const node * tail);

This approach, however, is quite sloppy. Good programming principles dictate that we conceal the representation of a data item and declare only the possible manipulations.

1.2 Abstract Data Types

We call a data type abstract, if we do not reveal its representation to the user. At a theoretical level this requires us to specify the properties of the data type by mathematical axioms involving the possible operations. For example, we can remove an element from a queue only as often as we have added one previously, and we retrieve the elements in the same order in which they were added.

_

D_a_t_a_ T_y_p_e_s_ —__ _

In_f_o_rm__a_t_io_n_ H__id_in__g

Abstract data types offer great flexibility to the programmer. Since the representation is not part of the definition, we are free to choose whatever is easiest or most efficient to implement. If we manage to distribute the necessary information correctly, use of the data type and our choice of implementation are totally independent.

Abstract data types satisfy the good programming principles of information hiding and divide and conquer. Information such as the representation of data items isgiven only to the one with a need to know: to the implementer and not to the user.

With an abstract data type we cleanly separate the programming tasks of implementation and usage: we are well on our way to decompose a large system into smaller modules.

1.3 An Example — Set

So how do we implement an abstract data type? As an example we consider a set of elements with the operations add, find, and drop.* They all apply to a set and an element and return the element added to, found in, or removed from a set. Find can be used to implement a condition contains which tells us whether an element is already contained in a set.

Viewed this way, set is an abstract data type. To declare what we can do with

a set, we start a header file Set.h:

#ifndef SET_H

#define SET_H

extern const void * Set;

void * add (void * set, const void * element);

void * find (const void * set, const void * element);

void * drop (void * set, const void * element);

int contains (const void * set, const void * element);

#endif

The preprocessor statements protect the declarations: no matter how many times we include Set.h, the C compiler only sees the declarations once. This technique of protecting header files is so standard, that the GNU C preprocessor recognizes it and does not even access such a file when its protecting symbol is defined.

Set.h is complete, but is it useful? We can hardly reveal or assume less: Set will have to somehow represent the fact, that we are working with sets; add() takes an element, adds it to a set, and returns whatever was added or alreadypresent in the set; find() looks for an element in a set and returns whatever ispresent in the set or a null pointer; drop() locates an element, removes it from aset, and returns whatever was removed; contains() converts the result of find() into a truth value.

 

* Unfortunately, remove is an ANSI-C library function to remove a file. If we used this name for a set function, we could no longer include stdio.h.

The generic pointer void * is used throughout. On the one hand it makes it impossible to discover what a set looks like, but on the other hand it permits us to pass virtually anything to add() and the other functions. Not everything will behave like a set or an element — we are sacrificing type security in the interest of information hiding. However, we will see in chapter 8 that this approach can be made completely secure.

1.4 Memory Management

We may have overlooked something: how does one obtain a set? Set is a pointer, not a type defined by typedef; therefore, we cannot define local or global variables of type Set. Instead, we are only going to use pointers to refer to sets and elements, and we declare source and sink of all data items in new.h: void * new (const void * type, ...); void delete (void * item);

Just like Set.h this file is protected by a preprocessor symbol NEW_H. The text only shows the interesting parts of each new file, the source diskette contains the complete code of all examples. new() accepts a descriptor like Set and possibly more arguments for initialization and returns a pointer to a new data item with a representation conforming to the descriptor. delete() accepts a pointer originally produced by new() and recycles the associated resources. new() and delete() presumably are a frontend to the ANSI-C functions calloc() and free(). If they are, the descriptor has to indicate at least how much memory is required.

1.5 Object

If we want to collect anything interesting in a set, we need another abstract data type Object described by the header file Object.h: extern const void * Object; /* new(Object); */

int differ (const void * a, const void * b); differ() can compare objects: it returns true if they are not equal and false if they are. This description leaves room for the functionality of strcmp(): for some pairs of objects we might choose to return a negative or positive value to specify an ordering.

Real life objects need more functionality to do something useful. For the moment, we restrict ourselves to the bare necessities for membership in a set. If we built a bigger class library, we would see that a set — and in fact everything else — is an object, too. At this point, a lot of functionality results more or less for free.

 

1.6 An Application

With the header files, i.e., the definitions of the abstract data types, in place we can

write an application main.c:

#include <stdio.h>

#include "new.h"

#include "Object.h"

#include "Set.h"

int main ()

{ void * s = new(Set);

void * a = add(s, new(Object));

void * b = add(s, new(Object));

void * c = new(Object);

if (contains(s, a) && contains(s, b))

puts("ok");

if (contains(s, c))

puts("contains?");

if (differ(a, add(s, a)))

puts("differ?");

if (contains(s, drop(s, a)))

puts("drop?");

delete(drop(s, b));

delete(drop(s, c));

return 0;

}

We create a set and add two new objects to it. If all is well, we find the objects in the set and we should not find another new object. The program should simply print ok.

The call to differ() illustrates a semantic point: a mathematical set can only contain one copy of the object a; an attempt to add it again must return the original object and differ() ought to be false. Similarly, once we remove the object, it should no longer be in the set.

Removing an element not in a set will result in a null pointer being passed to delete(). For now, we stick with the semantics of free() and require this to be acceptable.

1.7 An Implementation — Set

main.c will compile successfully, but before we can link and execute the program, we must implement the abstract data types and the memory manager. If an object stores no information and if every object belongs to at most one set, we can represent each object and each set as small, unique, positive integer values used as indices into an array heap[]. If an object is a member of a set, its array element contains the integer value representing the set. Objects, therefore, point to the set containing them.

This first solution is so simple that we combine all modules into a single file Set.c. Sets and objects have the same representation, so new() pays no attention to the type description. It only returns an element in heap[] with value zero:

#if ! defined MANY || MANY < 1

#define MANY 10

#endif

static int heap [MANY];

void * new (const void * type, ...)

{ int * p; /* & heap[1..] */

for (p = heap + 1; p < heap + MANY; ++ p)

if (! * p)

break;

assert(p < heap + MANY);

* p = MANY;

return p;

}

We use zero to mark available elements of heap[]; therefore, we cannot return a reference to heap[0] — if it were a set, its elements would contain the index value zero.

Before an object is added to a set, we let it contain the impossible index value

MANY so that new() cannot find it again and we still cannot mistake it as a member of any set.

new() can run out of memory. This is the first of many errors, that ‘‘cannot happen’’. We will simply use the ANSI-C macro assert() to mark these points. A more realistic implementation should at least print a reasonable error message or use a general function for error handling which the user may overwrite. For our purpose of developing a coding technique, however, we prefer to keep the code uncluttered. In chapter 13 we will look at a general technique for handling exceptions.

delete() has to be careful about null pointers. An element of heap[] is recycled

by setting it to zero:

void delete (void * _item)

{ int * item = _item;

if (item)

{ assert(item > heap && item < heap + MANY);

* item = 0;

}

}

We need a uniform way to deal with generic pointers; therefore, we prefix their names with an underscore and only use them to initialize local variables with the desired types and with the appropriate names.

A set is represented in its objects: each element points to the set. If an element contains MANY, it can be added to the set, otherwise, it should already be in the set because we do not permit an object to belong to more than one set.

void * add (void * _set, const void * _element)

{ int * set = _set;

const int * element = _element;

assert(set > heap && set < heap + MANY);

assert(* set == MANY);

assert(element > heap && element < heap + MANY);

if (* element == MANY)

* (int *) element = set — heap;

else

assert(* element == set — heap);

return (void *) element;

}

assert() takes out a bit of insurance: we would only like to deal with pointers into heap[] and the set should not belong to some other set, i.e., its array element value ought to be MANY.

The other functions are just as simple. find() only looks if its element contains the proper index for the set:

void * find (const void * _set, const void * _element)

{ const int * set = _set;

const int * element = _element;

assert(set > heap && set < heap + MANY);

assert(* set == MANY);

assert(element > heap && element < heap + MANY);

assert(* element);

return * element == set — heap ? (void *) element : 0;

}

contains() converts the result of find() into a truth value:

int contains (const void * _set, const void * _element)

{

return find(_set, _element) != 0;

}

drop() can rely on find() to check if the element to be dropped actually belongs to the set. If so, we return it to object status by marking it with MANY: void * drop (void * _set, const void * _element)

{ int * element = find(_set, _element);

if (element)

* element = MANY;

return element;

}

If we were pickier, we could insist that the element to be dropped not belong to another set. In this case, however, we would replicate most of the code of find()

in drop(). Our implementation is quite unconventional. It turns out that we do not need differ() to implement a set. We still need to provide it, because our application uses this function.

int differ (const void * a, const void * b)

{

return a != b;

}

Objects differ exactly when the array indices representing them differ, i.e., a simple pointer comparison is sufficient.

We are done — for this solution we have not used the descriptors Set and

Object but we have to define them to keep our C compiler happy: const void * Set; const void * Object;

We did use these pointers in main() to create new sets and objects.

1.8 Another Implementation — Bag

Without changing the visible interface in Set.h we can change the implementation.

This time we use dynamic memory and represent sets and objects as structures:

struct Set { unsigned count; };

struct Object { unsigned count; struct Set * in; };

count keeps track of the number of elements in a set. For an element, count records how many times this element has been added to the set. If we decrement count each time the element is passed to drop() and only remove the elementonce count is zero, we have a Bag, i.e., a set where elements have a reference

count.

Since we will use dynamic memory to represent sets and objects, we need to initialize the descriptors Set and Object so that new() can find out how much memory to reserve:

static const size_t _Set = sizeof(struct Set);

static const size_t _Object = sizeof(struct Object);

const void * Set = & _Set;

const void * Object = & _Object;

new() is now much simpler:

void * new (const void * type, ...)

{ const size_t size = * (const size_t *) type;

void * p = calloc(1, size);

assert(p);

return p;

}

delete() can pass its argument directly to free() — in ANSI-C a null pointer may be passed to free().

add() has to more or less believe its pointer arguments. It increments the element’s reference counter and the number of elements in the set:

void * add (void * _set, const void * _element)

{ struct Set * set = _set;

struct Object * element = (void *) _element;

assert(set);

assert(element);

if (! element —> in)

element —> in = set;

else

assert(element —> in == set);

++ element —> count, ++ set —> count;

return element;

}

find() still checks, if the element points to the appropriate set:

void * find (const void * _set, const void * _element)

{ const struct Object * element = _element;

assert(_set);

assert(element);

return element —> in == _set ? (void *) element : 0;

}

contains() is based on find() and remains unchanged. If drop() finds its element in the set, it decrements the element’s reference count and the number of elements in the set. If the reference count reaches zero, the element is removed from the set:

void * drop (void * _set, const void * _element)

{ struct Set * set = _set;

struct Object * element = find(set, _element);

if (element)

{ if (—— element —> count == 0)

element —> in = 0;

—— set —> count;

}

return element;

}

We can now provide a new function count() which returns the number of elements

in a set: unsigned count (const void * _set) { const struct Set * set = _set; assert(set);

return set —> count;

}

Of course, it would be simpler to let the application read the component .count directly, but we insist on not revealing the representation of sets. The overhead ofa function call is insignificant compared to the danger of an application being able tooverwrite a critical value.

Bags behave differently from sets: an element can be added several times; it will only disappear from the set, once it is dropped as many times as it was added.

Our application in section 1.6 added the object a twice to the set. After it is dropped from the set once, contains() will still find it in the bag. The test program now has the output ok drop?

 

1.9 Summary

For an abstract data type we completely hide all implementation details, such as the representation of data items, from the application code.

The application code can only access a header file where a descriptor pointer represents the data type and where operations on the data type are declared as functions accepting and returning generic pointers.

The descriptor pointer is passed to a general function new() to obtain a pointer to a data item, and this pointer is passed to a general function delete() to recycle the associated resources.

Normally, each abstract data type is implemented in a single source file.

Ideally, it has no access to the representation of other data types. The descriptor pointer normally points at least to a constant size_t value indicating the space requirements of a data item.

1.10 Exercises

If an object can belong to several sets simultaneously, we need a different representation for sets. If we continue to represent objects as small unique integer values, and if we put a ceiling on the number of objects available, we can represent

a set as a bitmap stored in a long character string, where a bit selected by the object value is set or cleared depending on the presence of the object in the set.

A more general and more conventional solution represents a set as a linear list of nodes storing the addresses of objects in the set. This imposes no restriction on objects and permits a set to be implemented without knowing the representation of an object.

For debugging it is very helpful to be able to look at individual objects. A reasonably general solution are two functions int store (const void * object, FILE * fp);

int storev (const void * object, va_list ap);

store() writes a description of the object to the file pointer. storev() uses va_arg()

to retrieve the file pointer from the argument list pointed to by ap. Both functions return the number of characters written. storev() is practical if we implement the

 Following function for sets:

int apply (const void * set,

int (* action) (void * object, va_list ap), ...);

apply() calls action() for each element in set and passes the rest of the argument

list. action() must not change set but it may return zero to terminate apply() early.

apply() returns true if all elements were processed.

 

2

Dynamic Linkage

Generic Functions

 

2.1 Constructors and Destructors

Let us implement a simple string data type which we will later include into a set. For a new string we allocate a dynamic buffer to hold the text. When the string is deleted, we will have to reclaim the buffer. new() is responsible for creating an object and delete() must reclaim the resources it owns. new() knows what kind of object it is creating, because it has the description of the object as a first parameter. Based on the parameter, we could use a chain of if statements to handle each creation individually. The drawback is that new() would explicitly contain code for each data type which we support. delete(), however, has a bigger problem. It, too, must behave differently based on the type of the object being deleted: for a string the text buffer must be freed; for an object as used in chapter 1 only the object itself has to be reclaimed; and a set may have acquired various chunks of memory to store references to its elements.

We could give delete() another parameter: either our type descriptor or the function to do the cleaning up, but this approach is clumsy and error-prone. There is a much more general and elegant way: each object must know how to destroy its own resources. Part of each and every object will be a pointer with which we can locate a clean-up function. We call such a function a destructor for the object. Now new() has a problem. It is responsible for creating objects and returning pointers that can be passed to delete(), i.e., new() must install the destructor information in each object. The obvious approach is to make a pointer to the destructor part of the type descriptor which is passed to new(). So far we need something like the following declarations:

struct type {

size_t size; /* size of an object */

void (* dtor) (void *); /* destructor */

};

struct String {

char * text; /* dynamic string */

const void * destroy; /* locate destructor */

};

struct Set {

... information ...

const void * destroy; /* locate destructor */

};

It looks like we have another problem: somebody needs to copy the destructor pointer dtor from the type description to destroy in the new object and the copy may have to be placed into a different position in each class of objects.

Initialization is part of the job of new() and different types require different work

new() may even require different arguments for different types: new(Set); /* make a set */ new(String, "text"); /* make a string */

For initialization we use another type-specific function which we will call a constructor.

Since constructor and destructor are type-specific and do not change, we pass both to new() as part of the type description.

Note that constructor and destructor are not responsible for acquiring and releasing the memory for an object itself — this is the job of new() and delete().

The constructor is called by new() and is only responsible for initializing the memory area allocated by new(). For a string, this does involve acquiring another piece of memory to store the text, but the space for struct String itself is allocated by new(). This space is later freed by delete(). First, however, delete() calls the destructor which essentially reverses the initialization done by the constructor before

delete() recycles the memory area allocated by new().

 

2.2 Methods, Messages, Classes and Objects delete() must be able to locate the destructor without knowing what type of objectit has been given. Therefore, revising the declarations shown in section 2.1, wemust insist that the pointer used to locate the destructor must be at the beginningof all objects passed to delete(), no matter what type they have.

What should this pointer point to? If all we have is the address of an object, this pointer gives us access to type-specific information for the object, such as its destructor function. It seems likely that we will soon invent other type-specific functions such as a function to display objects, or our comparison function differ(), or a function clone() to create a complete copy of an object. Therefore we will use

a pointer to a table of function pointers.

Looking closely, we realize that this table must be part of the type description passed to new(), and the obvious solution is to let an object point to the entire type description:

struct Class {

size_t size;

void * (* ctor) (void * self, va_list * app);

void * (* dtor) (void * self);

void * (* clone) (const void * self);

int (* differ) (const void * self, const void * b);

};

struct String {

const void * class; /* must be first */

char * text;

};

struct Set {

const void * class; /* must be first */

...

};

Each of our objects starts with a pointer to its own type description, and through this type description we can locate type-specific information for the object: .size is the length that new() allocates for the object; .ctor points to the constructor called by new() which receives the allocated area and the rest of the argument list passed to new() originally; .dtor points to the destructor called by delete() which receives the object to be destroyed; .clone points to a copy function which receives the object to be copied; and .differ points to a function which compares its object to something else.

Looking down this list, we notice that every function works for the object through which it will be selected. Only the constructor may have to cope with a partially initialized memory area. We call these functions methods for the objects.

Calling a method is termed a message and we have marked the receiving object of the message with the parameter name self. Since we are using plain C functions, self need not be the first parameter.

Many objects will share the same type descriptor, i.e., they need the same amount of memory and the same methods can be applied to them. We call all objects with the same type descriptor a class; a single object is called an instance of the class. So far a class, an abstract data type, and a set of possible values together with operations, i.e., a data type, are pretty much the same.

An object is an instance of a class, i.e., it has a state represented by the memory allocated by new() and the state is manipulated with the methods of its class. Conventionally speaking, an object is a value of a particular data type.

 

2.3 Selectors, Dynamic Linkage, and Polymorphisms

Who does the messaging? The constructor is called by new() for a new memory

area which is mostly uninitialized:

void * new (const void * _class, ...)

{ const struct Class * class = _class;

void * p = calloc(1, class —> size);

assert(p);

* (const struct Class **) p = class;

if (class —> ctor)

{ va_list ap;

va_start(ap, _class);

p = class —> ctor(p, & ap);

va_end(ap);

}

return p;

}

The existence of the struct Class pointer at the beginning of an object is extremely

important. This is why we initialize this pointer already in new():

_1_4_________________________________________2__D_y_n_a_m_i_c _L_in_k_a_g_e_ _—__ G_e_n_e_r_ic_ _F_u_n_c_ti_o_n_s

p •

object

...........

class

size

ctor

dtor

clone

differ

struct Class

The type description class at the right is initialized at compile time. The object is created at run time and the dashed pointers are then inserted. In the assignment

* (const struct Class **) p = class;

p points to the beginning of the new memory area for the object. We force a conversion of p which treats the beginning of the object as a pointer to a struct Class and set the argument class as the value of this pointer.

Next, if a constructor is part of the type description, we call it and return its result as the result of new(), i.e., as the new object. Section 2.6 illustrates that a clever constructor can, therefore, decide on its own memory management.

Note that only explicitly visible functions like new() can have a variable parameter list. The list is accessed with a va_list variable ap which is initialized using the macro va_start() from stdarg.h. new() can only pass the entire list to the constructor; therefore, .ctor is declared with a va_list parameter and not with its own variable parameter list. Since we might later want to share the original parameters among several functions, we pass the address of ap to the constructor — when it returns, ap will point to the first argument not consumed by the constructor. delete() assumes that each object, i.e., each non-null pointer, points to a type description. This is used to call the destructor if any exists. Here, self plays the role of p in the previous picture. We force the conversion using a local variable cp and very carefully thread our way from self to its description: void delete (void * self)

{ const struct Class ** cp = self;

if (self && * cp && (* cp) —> dtor)

self = (* cp) —> dtor(self);

free(self);

}

The destructor, too, gets a chance to substitute its own pointer to be passed to free() by delete(). If the constructor decides to cheat, the destructor thus has a chance to correct things, see section 2.6. If an object does not want to be deleted, its destructor would return a null pointer.

All other methods stored in the type description are called in a similar fashion.

In each case we have a single receiving object self and we need to route the method call through its descriptor:

int differ (const void * self, const void * b)

{ const struct Class * const * cp = self;

assert(self && * cp && (* cp) —> differ);

return (* cp) —> differ(self, b);

}

The critical part is, of course, the assumption that we can find a type description pointer * self directly underneath the arbitrary pointer self. For the moment at least, we guard against null pointers. We could place a ‘‘magic number’’ at the beginning of each type description, or even compare * self to the addresses or an address range of all known type descriptions, but we will see in chapter 8 that we can do much more serious checking. In any case, differ() illustrates why this technique of calling functions is called dynamic linkage or late binding: while we can call differ() for arbitrary objects as long as they start with an appropriate type description pointer, the function that actually does the work is determined as late as possible — only during execution of

the actual call, not before.

We will call differ() a selector function. It is an example of a polymorphic function,

i.e., a function that can accept arguments of different types and act differently on them based on their types. Once we implement more classes which all contain .differ in their type descriptors, differ() is a generic function which can be applied to

any object in these classes.

We can view selectors as methods which themselves are not dynamically linked but still behave like polymorphic functions because they let dynamically linked functions do their real work.

Polymorphic functions are actually built into many programming languages, e.g., the procedure write() in Pascal handles different argument types differently, and the operator + in C has different effects if it is called for integers, pointers, or floating point values. This phenomenon is called overloading: argument types and the operator name together determine what the operator does; the same operator name can be used with different argument types to produce different effects.

There is no clear distinction here: because of dynamic linkage, differ() behaves like an overloaded function, and the C compiler can make + act like a polymorphic function — at least for the built-in data types. However, the C compiler can create different return types for different uses of the operator + but the function differ() must always have the same return type independent of the types of its arguments.

Methods can be polymorphic without having dynamic linkage. As an example, consider a function sizeOf() which returns the size of any object:

size_t sizeOf (const void * self)

{ const struct Class * const * cp = self;

assert(self && * cp);

return (* cp) —> size;

}

All objects carry their descriptor and we can retrieve the size from there. Notice the difference:

void * s = new(String, "text");

assert(sizeof s != sizeOf(s)); sizeof is a C operator which is evaluated at compile time and returns the number of bytes its argument requires. sizeOf() is our polymorphic function which at run time returns the number of bytes of the object, to which the argument points.

2.4 An Application

While we have not yet implemented strings, we are still ready to write a simple test program. String.h defines the abstract data type: extern const void * String;

All our methods are common to all objects; therefore, we add their declarations to the memory management header file new.h introduced in section 1.4:

void * clone (const void * self);

int differ (const void * self, const void * b);

size_t sizeOf (const void * self);

The first two prototypes declare selectors. They are derived from the corresponding components of struct Class by simply removing one indirection from the

declarator. Here is the application:

#include "String.h"

#include "new.h"

int main ()

{ void * a = new(String, "a"), * aa = clone(a);

void * b = new(String, "b");

printf("sizeOf(a) == %u\n", sizeOf(a));

if (differ(a, b))

puts("ok");

if (differ(a, aa))

puts("differ?");

if (a == aa)

puts("clone?");

delete(a), delete(aa), delete(b);

return 0;

}

We create two strings and make a copy of one. We show the size of a String object — not the size of the text controlled by the object — and we check that twodifferent texts result in different strings. Finally, we check that a copy is equal butnot identical to its original and we delete the strings again. If all is well, the programwill print something like

sizeOf(a) == 8

ok

 

2.5 An Implementation — String

We implement strings by writing the methods which need to be entered into the type description String. Dynamic linkage helps to clearly identify which functions need to be written to implement a new data type.

The constructor retrieves the text passed to new() and stores a dynamic copy in the struct String which was allocated by new():

struct String {

const void * class; /* must be first */

char * text;

};

static void * String_ctor (void * _self, va_list * app)

{ struct String * self = _self;

const char * text = va_arg(* app, const char *);

self —> text = malloc(strlen(text) + 1);

assert(self —> text);

strcpy(self —> text, text);

return self;

}

In the constructor we only need to initialize .text because new() has already set up .class.

The destructor frees the dynamic memory controlled by the string. Since

delete() can only call the destructor if self is not null, we do not need to check

things:

static void * String_dtor (void * _self)

{ struct String * self = _self;

free(self —> text), self —> text = 0;

return self;

}

String_clone() makes a copy of a string. Later both, the original and the copy,

will be passed to delete() so we must make a new dynamic copy of the string’s

text. This can easily be done by calling new():

static void * String_clone (const void * _self)

{ const struct String * self = _self;

return new(String, self —> text);

}

String_differ() is certainly false if we look at identical string objects and it is

true if we compare a string with an entirely different object. If we really compare

two distinct strings, we try strcmp():

static int String_differ (const void * _self, const void * _b)

{ const struct String * self = _self;

const struct String * b = _b;

if (self == b)

return 0;

if (! b || b —> class != String)

return 1;

return strcmp(self —> text, b —> text);

}

Type descriptors are unique — here we use that fact to find out if our second argument really is a string.

All these methods are static because they should only be called through new(), delete(), or the selectors. The methods are made available to the selectors by way of the type descriptor:

#include "new.r"

static const struct Class _String = {

sizeof(struct String),

String_ctor, String_dtor,

String_clone, String_differ

};

const void * String = & _String;

String.c includes the public declarations in String.h and new.h. In order to properly initialize the type descriptor, it also includes the private header new.r which contains the definition of the representation for struct Class shown in section 2.2.

 

2.6 Another Implementation — Atom

To illustrate what we can do with the constructor and destructor interface we implement atoms. An atom is a unique string object; if two atoms contain the same strings, they are identical. Atoms are very cheap to compare: differ() is true if the two argument pointers differ. Atoms are more expensive to construct and destroy: we maintain a circular list of all atoms and we count the number of times an atom is cloned:

struct String {

const void * class; /* must be first */

char * text;

struct String * next;

unsigned count;

};

static struct String * ring; /* of all strings */

static void * String_clone (const void * _self)

{ struct String * self = (void *) _self;

++ self —> count;

return self;

}

Our circular list of all atoms is marked in ring, extends through the .next component, and is maintained by the string constructor and destructor. Before the constructor saves a text it first looks through the list to see if the same text is already stored. The following code is inserted at the beginning of String_ctor():

if (ring)

{ struct String * p = ring;

do

if (strcmp(p —> text, text) == 0)

{ ++ p —> count;

free(self);

return p;

}

while ((p = p —> next) != ring);

}

else

ring = self;

self —> next = ring —> next, ring —> next = self;

self —> count = 1;

If we find a suitable atom, we increment its reference count, free the new string object self and return the atom p instead. Otherwise we insert the new string object into the circular list and set its reference count to 1.

The destructor prevents deletion of an atom unless its reference count is decremented to zero. The following code is inserted at the beginning of String_dtor():

if (—— self —> count > 0)

return 0;

assert(ring);

if (ring == self)

ring = self —> next;

if (ring == self)

ring = 0;

else

{ struct String * p = ring;

while (p —> next != self)

{ p = p —> next;

assert(p != ring);

}

p —> next = self —> next;

}

If the decremented reference count is positive, we return a null pointer so that delete() leaves our object alone. Otherwise we clear the circular list marker if our string is the last one or we remove our string from the list.

With this implementation our application from section 2.4 notices that a cloned string is identical to the original and it prints

sizeOf(a) == 16

ok

clone?

 

2.7 Summary

Given a pointer to an object, dynamic linkage lets us find type-specific functions: every object starts with a descriptor which contains pointers to functions applicable to the object. In particular, a descriptor contains a pointer to a constructor which initializes the memory area allocated for the object, and a pointer to a destructor which reclaims resources owned by an object before it is deleted.

We call all objects sharing the same descriptor a class. An object is an instance of a class, type-specific functions for an object are called methods, and messages are calls to such functions. We use selector functions to locate and call dynamically linked methods for an object.

Through selectors and dynamic linkage the same function name will take different actions for different classes. Such a function is called polymorphic. Polymorphic functions are quite useful. They provide a level of conceptual abstraction: differ() will compare any two objects — we need not remember which particular brand of differ() is applicable in a concrete situation. A cheap and very convenient debugging tool is a polymorphic function store() to display any object on a file descriptor.

 

2.8 Exercises

To see polymorphic functions in action we need to implement Object and Set with dynamic linkage. This is difficult for Set because we can no longer record in the set elements to which set they belong.

There should be more methods for strings: we need to know the string length, we want to assign a new text value, we should be able to print a string. Things get interesting if we also deal with substrings.

Atoms are much more efficient, if we track them with a hash table. Can the value of an atom be changed?

String_clone() poses an subtle question: in this function String should be the same value as self > class. Does it make any difference what we pass to new()?

 

3 Programming Savvy

Arithmetic Expressions

Dynamic linkage is a powerful programming technique in its own right. Rather than writing a few functions, each with a big switch to handle many special cases, we can write many small functions, one for each case, and arrange for the proper function to be called by dynamic linkage. This often simplifies a routine job and it usually results in code that can be extended easily.

As an example we will write a small program to read and evaluate arithmetic expressions consisting of floating point numbers, parentheses and the usual operators for addition, subtraction, and so on. Normally we would use the compiler generator tools lex and yacc to build that part of the program which recognizes an arithmetic expression. This book is not about compiler building, however, so just this once we will write this code ourselves.

 

3.1 The Main Loop

The main loop of the program reads a line from standard input, initializes things so that numbers and operators can be extracted and white space is ignored, calls up a function to recognize a correct arithmetic expression and somehow store it, and finally processes whatever was stored. If things go wrong, we simply read the next input line. Here is the main loop:

#include <setjmp.h>

static enum tokens token; /* current input symbol */

static jmp_buf onError;

int main (void)

{ volatile int errors = 0;

char buf [BUFSIZ];

if (setjmp(onError))

++ errors;

while (gets(buf))

if (scan(buf))

{ void * e = sum();

if (token)

error("trash after sum");

process(e);

delete(e);

}

return errors > 0;

}

void error (const char * fmt, ...)

{ va_list ap;

va_start(ap, fmt);

vfprintf(stderr, fmt, ap), putc(’\n’, stderr);

va_end(ap);

longjmp(onError, 1);

}

The error recovery point is defined with setjmp(). If error() is called somewhere in the program, longjmp() continues execution with another return from setjmp(). In this case, the result is the value passed to longjmp(), the error is counted, and the next input line is read. The exit code of the program reports if any errors were encountered.

3.2 The Scanner

In the main loop, once an input line has been read into buf[], it is passed to scan(), which for each call places the next input symbol into the variable token. At the end of a line token is zero:

#include <ctype.h>

#include <errno.h>

#include <stdlib.h>

#include "parse.h" /* defines NUMBER */

static double number; /* if NUMBER: numerical value */

static enum tokens scan (const char * buf)

/* return token = next input symbol */

{ static const char * bp;

if (buf)

bp = buf; /* new input line */

while (isspace(* bp))

++ bp;

if (isdigit(* bp) || * bp == ’.’)

{ errno = 0;

token = NUMBER, number = strtod(bp, (char **) & bp);

if (errno == ERANGE)

error("bad value: %s", strerror(errno));

}

else

token = * bp ? * bp ++ : 0;

return token;

}

We call scan() with the address of an input line or with a null pointer to continue work on the present line. White space is ignored and for a leading digit or decimal point we extract a floating point number with the ANSI-C function strtod(). Any other character is returned as is, and we do not advance past a null byte at the end of the input buffer.

The result of scan() is stored in the global variable token — this simplifies the recognizer. If we have detected a number, we return the unique value NUMBER and we make the actual value available in the global variable number.

 

3.3 The Recognizer

At the top level expressions are recognized by the function sum() which internally calls on scan() and returns a representation that can be manipulated by process() and reclaimed by delete().

If we do not use yacc we recognize expressions by the method of recursive descent where grammatical rules are translated into equivalent C functions. For example: a sum is a product, followed by zero or more groups, each consisting of an addition operator and another product. A grammatical rule like

sum : product { +|— product }...

is translated into a C function like

void sum (void)

{

product();

for (;;)

{ switch (token) {

case ’+’:

case ’—’:

scan(0), product(); continue;

}

return;

}

}

There is a C function for each grammatical rule so that rules can call each other. Alternatives are translated into switch or if statements, iterations in the grammar produce loops in C. The only problem is that we must avoid infinite recursion. token always contains the next input symbol. If we recognize it, we must call scan(0) to advance in the input and store a new symbol in token.

 

3.4 The Processor

How do we process an expression? If we only want to perform simple arithmetic on numerical values, we can extend the recognition functions and compute the result as soon as we recognize the operators and the operands: sum() would expect a double result from each call to product(), perform addition or subtraction as soon as possible, and return the result, again as a double function value.

If we want to build a system that can handle more complicated expressions we need to store expressions for later processing. In this case, we can not only perform arithmetic, but we can permit decisions and conditionally evaluate only part of an expression, and we can use stored expressions as user functions within other expressions. All we need is a reasonably general way to represent an expression.

The conventional technique is to use a binary tree and store token in each node:

 

struct Node {

enum tokens token;

struct Node * left, * right;

};

This is not very flexible, however. We need to introduce a union to create a node in which we can store a numerical value and we waste space in nodes representing unary operators. Additionally, process() and delete() will contain switch statements which grow with every new token which we invent.

 

3.5 Information Hiding

Applying what we have learned thus far, we do not reveal the structure of a node at all. Instead, we place some declarations in a header file value.h:

const void * Add;

...

void * new (const void * type, ...);

void process (const void * tree);

void delete (void * tree);

Now we can code sum() as follows:

#include "value.h"

static void * sum (void)

{ void * result = product();

const void * type;

for (;;)

{ switch (token) {

case ’+’:

type = Add;

break;

case ’—’:

type = Sub;

break;

default:

return result;

}

scan(0);

result = new(type, result, product());

}

}

product() has the same architecture as sum() and calls on a function factor() to recognize numbers, signs, and a sum enclosed in parentheses:

static void * sum (void);

static void * factor (void)

{ void * result;

switch (token) {

case ’+’:

scan(0);

return factor();

case ’—’:

scan(0);

return new(Minus, factor());

default:

error("bad factor: ’%c’ 0x%x", token, token);

case NUMBER:

result = new(Value, number);

break;

case ’(’:

scan(0);

result = sum();

if (token != ’)’)

error("expecting )");

}

scan(0);

return result;

}

Especially in factor() we need to be very careful to maintain the scanner invariant: token must always contain the next input symbol. As soon as token is consumed we need to call scan(0).

 

3.6 Dynamic Linkage

The recognizer is complete. value.h completely hides the evaluator for arithmetic expressions and at the same time specifies what we have to implement. new() takes a description such as Add and suitable arguments such as pointers to the operands of the addition and returns a pointer representing the sum:

struct Type {

void * (* new) (va_list ap);

double (* exec) (const void * tree);

void (* delete) (void * tree);

};

void * new (const void * type, ...)

{ va_list ap;

void * result;

assert(type && ((struct Type *) type) —> new);

va_start(ap, type);

result = ((struct Type *) type) —> new(ap);

* (const struct Type **) result = type;

va_end(ap);

return result;

}

We use dynamic linkage and pass the call to a node-specific routine which, in the

case of Add, has to create the node and enter the two pointers:

struct Bin {

const void * type;

void * left, * right;

};

static void * mkBin (va_list ap)

{ struct Bin * node = malloc(sizeof(struct Bin));

assert(node);

node —> left = va_arg(ap, void *);

node —> right = va_arg(ap, void *);

return node;

}

Note that only mkBin() knows what node it creates. All we require is that the various nodes start with a pointer for the dynamic linkage. This pointer is entered by new() so that delete() can reach its node-specific function:

void delete (void * tree)

{

assert(tree && * (struct Type **) tree

&& (* (struct Type **) tree) —> delete);

(* (struct Type **) tree) —> delete(tree);

}

static void freeBin (void * tree)

{

delete(((struct Bin *) tree) —> left);

delete(((struct Bin *) tree) —> right);

free(tree);

}

Dynamic linkage elegantly avoids complicated nodes. .new() creates precisely the right node for each type description: binary operators have two descendants, unary operators have one, and a value node only contains the value. delete() is a very simple function because each node handles its own destruction: binary operators delete two subtrees and free their own node, unary operators delete only one subtree, and a value node will only free itself. Variables or constants can even remain behind — they simply would do nothing in response to delete().

3.7 A Postfix Writer

So far we have not really decided what process() is going to do. If we want to emit a postfix version of the expression, we would add a character string to the struct Type to show the actual operator and process() would arrange for a single output line indented by a tab:

void process (const void * tree)

{

putchar(’\t’);

exec(tree);

putchar(’\n’);

}

exec() handles the dynamic linkage:

static void exec (const void * tree)

{

assert(tree && * (struct Type **) tree

&& (* (struct Type **) tree) —> exec);

(* (struct Type **) tree) —> exec(tree);

}

and every binary operator is emitted with the following function:

static void doBin (const void * tree)

{

exec(((struct Bin *) tree) —> left);

exec(((struct Bin *) tree) —> right);

printf(" %s", (* (struct Type **) tree) —> name);

}

The type descriptions tie everything together:

static struct Type _Add = { "+", mkBin, doBin, freeBin };

static struct Type _Sub = { "—", mkBin, doBin, freeBin };

const void * Add = & _Add;

const void * Sub = & _Sub;

It should be easy to guess how a numerical value is implemented. It is represented

as a structure with a double information field:

struct Val {

const void * type;

double value;

};

static void * mkVal (va_list ap)

{ struct Val * node = malloc(sizeof(struct Val));

assert(node);

node —> value = va_arg(ap, double);

return node;

}

Processing consists of printing the value:

static void doVal (const void * tree)

{

printf(" %g", ((struct Val *) tree) —> value);

}

We are done — there is no subtree to delete, so we can use the library function

free() directly to delete the value node:

static struct Type _Value = { "", mkVal, doVal, free };

const void * Value = & _Value;

A unary operator such as Minus is left as an exercise.

 

3.8 Arithmetic

If we want to do arithmetic, we let the execute functions return double values to be printed in process(): static double exec (const void * tree)

{

return (* (struct Type **) tree) —> exec(tree);

}

void process (const void * tree)

{

printf("\t%g\n", exec(tree));

}

For each type of node we need one execution function which computes and returns the value for the node. Here are two examples: static double doVal (const void * tree)

{

return ((struct Val *) tree) —> value;

}

static double doAdd (const void * tree)

{

return exec(((struct Bin *) tree) —> left) +

exec(((struct Bin *) tree) —> right);

}

static struct Type _Add = { mkBin, doAdd, freeBin };

static struct Type _Value = { mkVal, doVal, free };

const void * Add = & _Add;

const void * Value = & _Value;

 

3.9 Infix Output

Perhaps the highlight of processing arithmetic expressions is to print them with a minimal number of parentheses. This is usually a bit tricky, depending on who is responsible for emitting the parentheses. In addition to the operator name used for postfix output we add two numbers to the struct Type:

struct Type {

const char * name; /* node’s name */

char rank, rpar;

void * (* new) (va_list ap);

void (* exec) (const void * tree, int rank, int par);

void (* delete) (void * tree);

};

.rank is the precedence of the operator, starting with 1 for addition. .rpar is set for operators such as subtraction, which require their right operand to be enclosed in parentheses if it uses an operator of equal precedence. As an example consider

$ infix

1 + (2 — 3)

1 + 2 — 3

1 — (2 — 3)

1 — (2 — 3)

This demonstrates that we have the following initialization: static struct Type _Add = {"+", 1, 0, mkBin, doBin, freeBin}; static struct Type _Sub = {"—", 1, 1, mkBin, doBin, freeBin};

The tricky part is for a binary node to decide if it must surround itself with parentheses. A binary node such as an addition is given the precedence of its superior and a flag indicating whether parentheses are needed in the case of equal precedence. doBin() decides if it will use parentheses: static void doBin (const void * tree, int rank, int par) { const struct Type * type = * (struct Type **) tree;

par = type —> rank < rank

|| (par && type —> rank == rank);

if (par) putchar(’(’);

If our node has less precedence than its superior, or if we are asked to put up parentheses on equal precedence, we print parentheses. In any case, if our description has .rpar set, we require only of our right operand that it put up extra parentheses:

exec(((struct Bin *) tree) —> left, type —> rank, 0);

printf(" %s ", type —> name);

exec(((struct Bin *) tree) —> right,

type —> rank, type —> rpar);

if (par) putchar(’)’);

}

The remaining printing routines are significantly simpler to write.

 

3.10 Summary

Three different processors demonstrate the advantages of information hiding.Dynamic linkage has helped to divide a problem into many very simple functions.The resulting program is easily extended — try adding comparisons and an operatorlike ? : in C.

 

4 Inheritance

Code Reuse and Refinement

4.1 A Superclass — Point

In this chapter we will start a rudimentary drawing program. Here is a quick test for one of the classes we would like to have:

#include "Point.h"

#include "new.h"

int main (int argc, char ** argv)

{ void * p;

while (* ++ argv)

{ switch (** argv) {

case ’p’:

p = new(Point, 1, 2);

break;

default:

continue;

}

draw(p);

move(p, 10, 20);

draw(p);

delete(p);

}

return 0;

}

For each command argument starting with the letter p we get a new point which is drawn, moved somewhere, drawn again, and deleted. ANSI-C does not include standard functions for graphics output; however, if we insist on producing a picture we can emit text which Kernighan’s pic [Ker82] can understand:

$ points p

"." at 1,2

"." at 11,22

The coordinates do not matter for the test — paraphrasing a commercial and

OOspeak: ‘‘the point is the message.’’

What can we do with a point? new() will produce a point and the constructor expects initial coordinates as further arguments to new(). As usual, delete() will recycle our point and by convention we will allow for a destructor. draw() arranges for the point to be displayed. Since we expect to work with other graphical objects — hence the switch in the test program — we will provide dynamic linkage for draw().

move() changes the coordinates of a point by the amounts given as arguments.

 

If we implement each graphical object relative to its own reference point, we will be able to move it simply by applying move() to this point. Therefore, we should be able to do without dynamic linkage for move().

 

4.2 Superclass Implementation — Point

The abstract data type interface in Point.h contains the following:

extern const void * Point; /* new(Point, x, y); */

void move (void * point, int dx, int dy);

We can recycle the new.? files from chapter 2 except that we remove most

methods and add draw() to new.h:

void * new (const void * class, ...);

void delete (void * item);

void draw (const void * self);

The type description struct Class in new.r should correspond to the method

declarations in new.h:

struct Class {

size_t size;

void * (* ctor) (void * self, va_list * app);

void * (* dtor) (void * self);

void (* draw) (const void * self);

};

The selector draw() is implemented in new.c. It replaces selectors such as differ() introduced in section 2.3 and is coded in the same style:

void draw (const void * self)

{ const struct Class * const * cp = self;

assert(self && * cp && (* cp) —> draw);

(* cp) —> draw(self);

}

After these preliminaries we can turn to the real work of writing Point.c, the implementation of points. Once again, object-orientation has helped us to identify precisely what we need to do: we have to decide on a representation and implement a constructor, a destructor, the dynamically linked method draw() and the statically linked method move(), which is just a plain function. If we stick with two-dimensional, Cartesian coordinates, we choose the obvious representation:

struct Point {

const void * class;

int x, y; /* coordinates */

};

The constructor has to initialize the coordinates .x and .y — by now absolutely routine:

static void * Point_ctor (void * _self, va_list * app)

{ struct Point * self = _self;

self —> x = va_arg(* app, int);

self —> y = va_arg(* app, int);

return self;

}

It turns out that we do not need a destructor because we have no resources to reclaim before delete() does away with struct Point itself. In Point_draw() we print the current coordinates in a way which pic can understand:

static void Point_draw (const void * _self)

{ const struct Point * self = _self;

printf("\".\" at %d,%d\n", self —> x, self —> y);

}

This takes care of all the dynamically linked methods and we can define the type descriptor, where a null pointer represents the non-existing destructor: static const struct Class _Point = {

sizeof(struct Point), Point_ctor, 0, Point_draw

};

const void * Point = & _Point;

move() is not dynamically linked, so we omit static to export it from Point.c and we do not prefix its name with the class name Point:

void move (void * _self, int dx, int dy)

{ struct Point * self = _self;

self —> x += dx, self —> y += dy;

}

This concludes the implementation of points in Point.? together with the support for dynamic linkage in new.?.

 

4.3 Inheritance — Circle

A circle is just a big point: in addition to the center coordinates it needs a radius.

Drawing happens a bit differently, but moving only requires that we change the coordinates of the center.

This is where we would normally crank up our text editor and perform source code reuse. We make a copy of the implementation of points and change those parts where a circle is different from a point. struct Circle gets an additional component:

int rad;

This component is initialized in the constructor

self —> rad = va_arg(* app, int);

and used in Circle_draw():

printf("circle at %d,%d rad %d\n",

self —> x, self —> y, self —> rad);

 

We get a bit stuck in move(). The necessary actions are identical for a point and a circle: we need to add the displacement arguments to the coordinate components. However, in one case, move() works on a struct Point, and in the other case, it works on a struct Circle. If move() were dynamically linked, we could provide two different functions to do the same thing, but there is a much better way.

Consider the layout of the representations of points and circles:

point

class

x

y

struct Point

circle

class

x

y

rad

struct Circle

......

. . . . . . . . . . .. . . . . .

The picture shows that every circle begins with a point. If we derive struct Circle by adding to the end of struct Point, we can pass a circle to move() because theinitial part of its representation looks just like the point which move() expects toreceive and which is the only thing that move() can change. Here is a sound wayto make sure the initial part of a circle always looks like a point:struct Circle { const struct Point _; int rad; };We let the derived structure start with a copy of the base structure that we areextending. Information hiding demands that we should never reach into the basestructure directly; therefore, we use an almost invisible underscore as its name andwe declare it to be const to ward off careless assignments.This is all there is to simple inheritance: a subclass is derived from a superclass(or base class) merely by lengthening the structure that represents an object of thesuperclass.

Since representation of the subclass object (a circle) starts out like the representation of a superclass object (a point), the circle can always pretend to be a point — at the initial address of the circle’s representation there really is a point’s representation. It is perfectly sound to pass a circle to move(): the subclass inherits the methods of the superclass because these methods only operate on that part of the subclass’ representation that is identical to the superclass’ representation for which the methods were originally written. Passing a circle as a point means converting from a struct Circle * to a struct Point *. We will refer to this as an up-cast from a subclass to a superclass — in ANSI-C it can only be accomplished with an explicitconversion operator or through intermediate void * values.

It is usually unsound, however, to pass a point to a function intended for circles such as Circle_draw(): converting from a struct Point * to a struct Circle * is only permissible if the point originally was a circle. We will refer to this as a down-cast from a superclass to a subclass — this requires explicit conversions or void * values, too, and it can only be done to pointers to objects that were in the subclass to begin with.

 

4.4 Linkage and Inheritance

move() is not dynamically linked and does not use a dynamically linked method to do its work. While we can pass points as well as circles to move(), it is not really a polymorphic function: move() does not act differently for different kinds of objects, it always adds arguments to coordinates, regardless of what else might be attached to the coordinates.

The situation is different for a dynamically linked method like draw(). Let us look at the previous picture again, this time with the type descriptions shown explicitly: point

When we up-cast from a circle to a point, we do not change the state of the circle,

i.e., even though we look at the circle’s struct Circle representation as if it were a

struct Point, we do not change its contents. Consequently, the circle viewed as a point still has Circle as a type description because the pointer in its .class component has not changed. draw() is a selector function, i.e., it will take whatever argument is passed as self, proceed to the type description indicated by .class, and call the draw method stored there.

A subclass inherits the statically linked methods of its superclass — those methods operate on the part of the subclass object which is already present in the superclass object. A subclass can choose to supply its own methods in place of the dynamically linked methods of its superclass. If inherited, i.e., if not overwritten, the superclass’ dynamically linked methods will function just like statically linked methods and modify the superclass part of a subclass object. If overwritten, the subclass’ own version of a dynamically linked method has access to the full representation of a subclass object, i.e., for a circle draw() will invoke

Circle_draw() which can consider the radius when drawing the circle.

 

4.5 Static and Dynamic Linkage

A subclass inherits the statically linked methods of its superclass and it can choose to inherit or overwrite the dynamically linked methods. Consider the declarations for move() and draw(): void move (void * point, int dx, int dy); void draw (const void * self);

We cannot discover the linkage from the two declarations, although the implementation of move() does its work directly, while draw() is only the selector function which traces the dynamic linkage at runtime. The only difference is that we declare

a statically linked method like move() as part of the abstract data type interface in Point.h, and we declare a dynamically linked method like draw() with the memory management interface in new.h, because we have thus far decided to implement the selector function in new.c.

Static linkage is more efficient because the C compiler can code a subroutine call with a direct address, but a function like move() cannot be overwritten for a subclass. Dynamic linkage is more flexible at the expense of an indirect call — we have decided on the overhead of calling a selector function like draw(), checking the arguments, and locating and calling the appropriate method. We could forgo the checking and reduce the overhead with a macro* like

#define draw(self) \

((* (struct Class **) self) —> draw (self)) but macros cause problems if their arguments have side effects and there is no clean technique for manipulating variable argument lists with macros. Additionally, the macro needs the declaration of struct Class which we have thus far made available only to class implementations and not to the entire application.

Unfortunately, we pretty much decide things when we design the superclass.

While the function calls to the methods do not change, it takes a lot of text editing, possibly in a lot of classes, to switch a function definition from static to dynamic linkage and vice versa. Beginning in chapter 7 we will use a simple preprocessor to simplify coding, but even then linkage switching is error-prone. In case of doubt it is probably better to decide on dynamic rather than static linkage even if it is less efficient. Generic functions can provide a useful conceptual abstraction and they tend to reduce the number of function names which we need to remember in the course of a project. If, after implementing all required classes, we discover that a dynamically linked method was never overwritten, it is a lot less trouble to replace its selector by its single implementation, and even waste its slot in struct Class, than to extend the type description and correct all the initializations.

* In ANSI-C macros are not expanded recursively so that a macro may hide a function by the same

name.

 

4.6 Visibility and Access Functions

We can now attempt to implement Circle_draw(). Information hiding dictates that we use three files for each class based on a ‘‘need to know’’ principle. Circle.h contains the abstract data type interface; for a subclass it includes the interface file of the superclass to make declarations for the inherited methods available:

#include "Point.h" extern const void * Circle; /* new(Circle, x, y, rad) */

The interface file Circle.h is included by the application code and for the implementation of the class; it is protected from multiple inclusion.

The representation of a circle is declared in a second header file, Circle.r. For a subclass it includes the representation file of the superclass so that we can derive the representation of the subclass by extending the superclass:

#include "Point.r"

struct Circle { const struct Point _; int rad; };

The subclass needs the superclass representation to implement inheritance: struct

Circle contains a const struct Point. The point is certainly not constant — move() will change its coordinates — but the const qualifier guards against accidentallyoverwriting the components. The representation file Circle.r is only included for theimplementation of the class; it is protected from multiple inclusion.

 

Finally, the implementation of a circle is defined in the source file Circle.c which includes the interface and representation files for the class and for object management:

BLOG COMMENTS POWERED BY DISQUS

About AWL

The AWL is the most recent and widely referred word list for teaching and learning academic vocabulary. The AWL was developed by Averil Coxhead from the Victoria University of Wellington,

Read More...

Contact us

  • Add: Prof. Amit Hiray
    Assistant Professor
    MIT Academy of Engineering,
    Alandi (D) Pune – 412 105
  • Tel: +91-9923083701