Designing and Coding Reusable C++
Martin Carroll and Margaret Ellis
Chapter 1. Introduction to reusability
Essential properties of reusable code:
- It is easy to find and to understand
- There is a reasonable assurance that it is correct
- It requires no separation from any containing code
- It requires no changes to be used in a new program
Myths of reuse:
- Reuse will resolve the software crisis.
- All code should be reusable.
- Reusing code is always preferable to coding from scratch.
- Object-oriented languages make writing reusable code easy.
Nontechnical obstacles:
- The author of Widget must have suspected that a
reusable version of Widget would be useful.
- The author of Widget must have expected to be rewarded
for writing Widget reusably and making it available.
- Someone must maintain Widget.
- The eventual user of Widget must suspect that
Widget exists and must be able to find it.
- The eventual user of Widget must be able to obtain
Widget.
- There must be no legal obstacles to reuse of Widget.
- The user of Widget must be rewarded for reusing
Widget.
Technical obstacles:
- Reusable code must work in many contexts.
- We almost never know all the contexts.
- User requirements often conflict.
- We cannot provide everything everyone wants.
- The contexts change.
Chapter 2. Class design
Every C++ class (whether reusable or not) should represent some
abstraction. Functions should represent abstract behavior. [p13]
Attempts to define a minimal standard interface for all classes,
although well motivated, are misguided. No function should be provided
by every class. The argument in support of this claim works as follows:
for each function that might be proposed fort the minimal standard
interface, it is possible to describe a class that should not provide
that function. [p18]
Two operations require special mention because they have a
reputation for being generally useful in spite of their undesirable
properties: the shallow and deep copy copy operations. For most real
classes, neither shallow nor deep copy correctly implements the copy
constructor. For nontrivial classes, shallow and deep copy operations
usually have an undesirable property: they do not preserve program
invariants. [p25]
Library designers must pay careful attention to conversions.
"Fanout" can be defined as the number of other types that a type can be
converted to implicitly. Large fanouts are undesirable because they
are potential causes of ambiguity. [p35]
The interface of a C++ library should use "const" everywhere it
applies - that is, everywhere that the use of "const" makes a promise
that the library keeps. Failure to use "const" maximally can cause
problems for library users. [p38]
The regular functions (functions whose semantics are the same in all
well-designed classes - the copy constructor, the destructor, the principal
assignment operator, and the equality and inequality operators [p15]) should
implement the same semantics in all classes.
Although there is no minimal standard interface, the nice functions
(the default constructor, the destructor, the copy constructor, the
assignment operator, and the equality operator) should be provided by
most classes. No function should be provided by all classes. The
shallow and deep copy operations should be provided by almost no
classes.
Careful thought should be given to uniformity of interface for classes
within a library, but consistency should not be so rigidly adhered to that
it renders the interface of a class inappropriate or counterintuitive.
When deciding what conversions to provide, library designers should
provide sensible conversions while preventing multiple ownership, avoid
nonsensible conversions when possible, and limit fanout.
Use of const in libraries also requires attention. In general, libraries
should implement abstract const in their interfaces, and they should use
the const keyword every place it makes a promise that the library keeps.
Chapter 3. Extensibility
A user might want to inherit a class's implementation but not its
interface. Private derivation accomplishes this kind of inheritance.
[p49]
The ability to pass a pointer or reference to an object of type X
to a function declared to take a pointer or reference to a type from
which X directly or indirectly inherits is called "substitutability".
[p50]
There are costs associated with providing extensibility. Occasionally, a
reasonable alternative to designing a C++ library extensibly is to provide
all the functionality users will ever want so that they do not need to extend
the library's classes.
More often, users will want extensibility. Extensibility in C++ is
provided primarily through inheritance. Properly defining the
inheritance semantics of a class and assuming only those semantics
throughout the library are essential to writing an extensible class.
The burden for successful inheritance rests partly on the user -
inheritance will not be successful if a publicly derived type does not
adhere to the inheritance semantics of its intended base classes.
It can be difficult to derive from classes not written carefully to allow
inheritance. The obstacles to inheritability are as follows:
- Nonvirtual member functions
- Overprotection of data and function members
- Undermodularization of member functions
- Use of friends
- Excess data members
- Nonvirtual derivations
- Inheritance-preventing member functions
Because most of the obstacles to inheritability cannot occur in "interface"
classes, libraries for which extensibility is important should "interface" all
the classes whose interfaces users might wish to inherit.
[An interface class is a class containing no data members,
all of whose member functions are pure virtual, and all of whose base
classes are interface classes. A class X is
interfaced if either X is an interface class or
each public member function of X is declared in at least one
interface class from which X directly or indirectly
inherits.]
Chapter 4. Efficiency
One well-known technique for reducing code size is never to put the
definition of two large functions in the same library implementation
file, if it is possible for a program to need one but not the other of
them. We shall call this technique "source-file partitioning".
[p86]
Explicit and implicit "inline" declarations are only a request to
the compiler. All current [1995] C++ compilers have limits on the
functions they can inline (goto, loops, more than 15 statements in
length, recursion can inhibit inlining). If a function f that
is declared inline is not inline expanded at one or more call sites in
a translation unit, then many compilers will generate in that
translation unit an out-of-line copy of f with internal
linkage. If such a copy is generated in n translation units, the
executable file will contain n copies of f. The amount of code
devoted to "outlined inlines" can be significant if programmers are not
careful about which functions they declare inline. [p87]
Programmers tend to think that inlining: speeds execution, may
cause code bloat, and should only be considered for small functions.
All three of these thoughts may, or may not, be justified. [p89]
Returning references from functions has the advantage of being more
efficient that return by value. Returning references from functions
has two disadvantages: it makes user code more error prone, and it
restricts the ways a class can be implemented. [p94]
C++ libraries should generally free resources that they have
acquired as soon as possible. [p99]
On many systems, the stack space that is available to a program is
significantly less than the heap space available. For this reason,
huge objects should be allocated on the heap, rather than declared on
the stack. [p101]
Efficiency is a crucial property for reusable code.
Build time is particularly important for development teams. Minimizing
the amount of code that a library includes, preinstantiating templates,
defining function templates inline, hoisting template code, and using
pointer containers can help keep down build times.
Library implementors can reduce users' code size by partitioning the
library's source files and by ensuring that functions declared inline are not
laid down out of line. The library implementation itself should use as few
templates as possible.
To many users, the most important measure of efficiency is run time.
Run time can often be improved significantly through appropriate inlining.
It is not always obvious, however, which functions to inline. Returning
references is a technique for improving run time, but it can make user code
more error prone and limit the ways a class can be implemented.
Free-store and stack space must also be used efficiently. Being careful
to use efficient algorithms and freeing resources as soon as possible are two
of the best ways to minimize use of space. Large objects usually should be
created in the free store rather than on the stack.
Unfortunately, efficiency trades off with almost every other desirable
property of a C++ library. In particular, designing a library to be as
efficient as possible usually renders that library more difficult to implement
and to use.
Chapter 5. Errors
In practice, checking two kinds of invariants, function
preconditions and representation invariants, can detect many errors.
[p113]
Writers of reusable code must ensure that their code is exception
safe - that it behaves correctly even when an exception is thrown.
[p126] A class X is exception safe if it is impossible for an
exception thrown during execution of any of X's member functions
to cause the user of X to be left with an inconsistent X
object. [p128]
Code intended for reuse must consider whether to detect and how to handle
any error that might arise. Invariants can be used to detect many kinds
of errors. Libraries should make good use of function preconditions and
representation invariants.
Different variants of a library may handle errors differently. Here are
the most common ways to handle an error:
- Correct the problem and continue execution.
- Exit or abort (not acceptable for many libraries).
- Throw an exception.
- Create a nil value.
- Interpret invalid data as valid.
- Do not detect the error (and therefore have undefined behavior).
Among the errors that library designers must consider is exhaustion of
system resources. The stack might overflow, the free store might be
exhausted, or some file system limit might be reached, to name three
possibilities.
With the introduction of exceptions to the C++ language, special care
must be taken to ensure that reusable code is exception safe. Classes must
be designed so objects are not rendered inconsistent when an exception is
thrown. Libraries must be designed to avoid other ill effects from nonlocal
flow of control when an exception is thrown.
Chapter 6. Conflict
When two libraries conflict, use of both of them in a single program
will be difficult, if not impossible. To maximize reusability, library
designers should avoid conflicting with other code. Use of sound naming
conventions and the namespace construct is essential for all global,
public macro, and environmental names defined by a library unless the
library can safely be unclean. Good-citizen libraries avoid another
form of conflict: conflicting attempts to own global or
application-specific resources.
Chapter 7. Compatibility
Almost every change to a C++ library is source incompatible in
theory. [p159]
Library developers should be concerned with providing backward
compatibility for their current users and with anticipating forward
compatibility so that they can provide backward compatibility in future
releases. A library should try to provide source compatibility, link
compatibility, and run compatibility whenever possible. Some libraries
will also try to provide process compatibility. Providing compatibility
requires careful thought about changes to a library. Deprecating
(discouraging the use of in the documentation), rather than removing,
functionality provides source compatibility and allows users to change
their code at their convenience.
Incompatibilities between releases of libraries should always be
documented clearly, along with instructions on how to upgrade user
programs. Library providers should also be aware of the possibility
that users are relying on undocumented properties of a library.
Chapter 8. Inheritance hierarchies
There is come confusion among C++ programmers about whether to base
a design for a class hierarchy on templates or on inheritance. They
sometimes over-use inheritance. [p192]
Very popular (yet unfounded and contradictory) inheritance hierarchy
design rules include:
- Singly rooted hierarchies are best.
- Multiply rooted hierarchies are best.
- Shallow and wide hierarchies are best. The depth of a hierarchy
should be no more than seven plus-or-minus two.
- Deep and narrow hierarchies are best. The fanout of a hierarchy
should be no more than seven plus-or-minus two.
The appropriate rootedness, depth, and fanout for an inheritance hierarchy
depend on the domain the hierarchy is intended to model and on the desired
properties of the hierarchy.
The design of a reusable library can be based on one of several inheritance
hierarchy styles or on a combination of styles. These include:
- Direct hierarchy
Design as usual - map domain types directly to C++ classes.
Intermixes interface and implementation. Minimizes the number of
classes. Implementation changes generally cause link
incompatiblilities.
- Interfaced hierarchy
Mirror every direct hierarchy class with a pure abstract class
(interface only). This doubles the number of classes but improves
extensibility and link compatibility.
- Interfaced + Factory hierarchy
Add an "object factory" to the interfaced hierarchy. Each factory
encapsulates all client creation services (clients never "new" their
own instances). Clients can now write programs that will need no
recompiling whatsoever to upgrade to a new release of the library.
- Handle hierarchy
Wrap every direct hierarchy class with a "handle" class. The
original class becomes a hidden "body" or "representation" class. Each
"handle" class is a thin veneer that simply delegates all user requests
to its contained "body" class. Any change to the implementation of a
"body" class will be link compatible for clients of its encapsulating
"handle" class. Not as extensible as Direct and Interfaced
hierarchies.
- Interfaced Handle hierarchy
Add a third set of interface classes (body classes are the first
set, and handle classes are the second set) to reinstate some amount of
extensibility.
Although a direct inheritance hierarchy is the easiest style to implement
and understand as well as the most efficient, interfaced hierarchies,
object factories, and handle hierarchies facilitate link compatibility
between releases of a library. Further, interfaced hierarchies increase
a library's extensibility. The table summarizes the most important
differences among the hierarchy styles. As always, no single design is
best for all libraries. Library designers must decide which is the
best choice for their library and their users.
Hierarchy style |
Complexity |
Efficiency |
Extensibility |
Link compatibility |
---|
Direct |
simple |
good |
mediocre |
minimal |
Interfaced |
complex |
reduced |
good |
partial |
Interfaced + Factory |
complex |
reduced |
good |
total |
Handle |
simple |
reduced |
poor |
total |
Interfaced Handle |
complex |
reduced |
good |
total |
Library designers should be careful not to use inheritance when
use of templates would produce a better design.
Chapter 9. Portability
The portable code is, the more reusable it is. [p203]
Portability often trades off with efficiency and ease of
implementation. Specifically, portable code that is easy to implement
is often not efficient enough on one or more platforms. [p204]
Currently, writing highly portable C++ code is challenging. Part of the
challenge comes from the continuing evolution of the C++ language.
There is controversy over how an implementation should interpret
certain constructs. Further, many implementations of the language are
not complete.
Even after the ANSI/ISO C++ standard is finalized, the language will
allow legal programs that will not be portable. C++ inherits from C
many undefined, unspecified, and implementation-defined behaviors and
adds a few new ones. Memory and object layout, in particular, need
careful attention in code that must be portable.
Template instantiation mechanisms vary considerably among C++
implementations. Some automatic instantiation schemes require template
code to be organized in specific ways, but the requirements vary from
implementation to implementation. Manual instantiation schemes use a
variety of directives to give users control over template
instantiation. Thus, porting code that uses templates can involve some
effort.
Finally, portability can be complicated for programs that depend on
standard as well as nonstandard run-time libraries, system commands, file
systems, and window systems.
Chapter 10. Using other libraries
We discuss drawbacks of using other libraries in code intended for
reuse: requiring users to obtain the reused code, concerns about
efficiency, the potential for name-space conflicts introduced by
reusing other libraries, and the problem of synchronizing releases of
libraries. [p233]
There are strong reasons to prefer reusing another library's
collection class. First, why should we write yet another collection
class if a suitable one already exists? Second, if we are the aspiring
authors of a medical library, not a container class library, we might
not have experience writing high-quality container classes. Third,
because we are providers of a library intended for reuse, we would like
to set a good example by practicing reuse ourselves. [p234]
Using other libraries eases implementation but brings the problems of
acquisition, conflict, release synchronization, and possibly
efficiency.
Self-contained libraries avoid these problems, but they trade off ease of
implementation, ease of use, and efficiency. Some programmers also use
the library that otherwise would have been reused. Self-contained libraries
can reduce ease of use for those programmers by requiring them to learn
multiple interfaces and to write and invoke conversion functions explicitly.
Self-contained libraries can also cause such users' executables to be bloated.
Finally, self-contained libraries isolate themselves from other libraries -
with both desirable and undesirable effects.
Chapter 11. Documentation
Code that is not documented properly is not reusable. [p245]
Producing high-quality documentation often takes a significant
fraction of the time required to design, implement, and test a
library. The need for good documentation is one of the reasons
developing reusable code is more expensive that producing single-use
code. [p245]
A library should be documented while it is being
designed and implemented. Documenting usually reveals ways to improve
the code being documented. Postponing the effort of documenting, even
for apparently simple library facilities, is a mistake. Donald Knuth's
observations on designing the typesetting language TEX are germane:
The designer of a new system must not only be the implementor and the
first large-scale user; the designer should also write the first user
manual ... If I had not participated fully in all these activities,
literally hundreds of improvements would never have been made, because
I would never have thought of them or perceived why they were
important. [p246]
Good documentation is crucial for reusable code. Every reusable C++
library should be accompanied by at least a design paper, a set of
tutorials, and a reference manual.
The design paper for a library should discuss significant decisions made
in the design of the library, why each was decided the way it was, for whom
the library is intended, and what the library provides those users.
Tutorials should be written clearly and simply, and should be written
appropriately for the background of the library's intended users. They
should discuss the library functionality in terms of abstract values, not
implementation. Examples in tutorials should show legal, correct code.
A library reference manual should define the abstraction of each library
class, show the syntactic interface for each class, give the semantics of
each function in the interface of each class, and present any restrictions on
template arguments.
Chapter 12. Miscellaneous topics
If a C++ library defines and uses any nonsimple, nonlocal, static
objects, it will be possible for a user of the library to build
successfully a program that uses an object before it is constructed
(unless the library implementors have taken precautions to prevent such
uses; see Section 12.1.6). Such a program will have undefined
behavior. [p266]
Two classes are coupled if either class's interface or
implementation uses the other class ... Some programmers believe that
coupling is generally undesirable; others believe that it is usually a
good idea. Actually, coupling has both advantages and disadvantages.
[p283]
Designers of C++ libraries need to be aware of the static
initialization problem. We recommend that libraries not define and use
nonsimple, nonlocal, static objects. Instead, libraries should use
objects in the free store. To allocate and initialize such objects,
libraries can use init functions, init checks, and init objects. Each
of these approaches has disadvantages.
Localizing costs is an important consideration for any library design.
If a program does not use a feature of a library, it should not incur costs
associated with the presence of that feature.
Containers are an important kind of reusable class. Designers of
container classes should be careful to make those classes either
endogenous (contained values are stored directly in the underlying data
structures) or exogenous (contained values are stored in separate
objects), but not a hybrid. [A mistake some library designers make is
to provide a class that for some operations models an endogenous
container that contains things of type T*, and for other
operations models an exogenous container that contains things of type
T. (p278)] Designing iterators with the right semantics requires
attention and care as well.
Usually, coupling of classes within a library simplifies implementation
of the library. Coupling sometimes makes the library easier to use, other
times, it makes the library more difficult to use. Library designers must
therefore weigh carefully the advantages and disadvantages of any proposed
couplings.
Sometimes, making a difficult decision can be avoided by deferring it to
users. A common technique for deferring a decision is to allow the user to
specify a parameter for a template.