Large-Scale C++ Software Design
John Lakos
- logical design
- that which pertains to modeling, IsA, HasA, UsesA, classes, functions,
operators, public vs private, member function vs free function, virtual vs
non-virtual
- physical design
- that which pertains to partitioning, DependsOn, files, directories,
libraries, compile and link coupling, cycles
- component
- the smallest unit of physical design
- package
- a collection of components organized as a physically cohesive unit
- encapsulation
- contained implementation details (type, data, or function) are not
accessible programmatically through the interface of the component; a
logical property of design
- insulation
- contained implementation details (type, data, or function) can be
altered without forcing clients of the component to recompile; a
physical property of design
- internal linkage
- name is local to its translation unit and cannot collide at link
time with an identical name in another translation unit
- external linkage
- name can interact with other translation units at link time
(e.g. global data)
- compile-time dependency
- Y depends on X if x.h is needed in order to compile y.c
- link-time dependency
- Y depends on X if y.o contains undefined symbols that x.o will help
resolve (either directly or indirectly) at link time
- handle
- a class that maintains a pointer to an object that is programmatically
accessible through the public interface of the handle class [430]
-
-
Politically incorrect
- The component is the fundamental unit of design.
- All classes for a component go in one header and one source file.
- not a header-source file pair per class
- Physical design must run in parallel with logical design.
- Common sense in physical design is more important than ideological
integrity in logical design.
- 12 hour builds are the real enemy
- Clients must #include everything they are directly or indirectly
dependent upon.
- components should not do #include's for the client
- The theory of OO encapsulation is often the problem.
- file-scoped free data and functions can be good
- duplication of code can be okay
- Manager classes are okay.
- this may be the wild west, but we're still engaged in
engineering
- Encapsulation does not remotely approximate insulation.
- protected and private members significantly impact
clients
- A small set of coding rules are more important than high-powered tools.
- making explicit and straightening-out dependencies is the
real path to progress
- Compile-time dependencies are inconvenient, link-time dependencies are
a killer.
- Compile-time dependencies are mostly a local issue, and
are dominant in 50k LOC projects.
- Link-time dependencies are dominant for large projects.
Summary
-
Small project experience does not scale to large projects.
-
A sound physical design is essential to the success of larger systems.
-
Logical design addresses only architectural issues; physical design
addresses organizational issues. [100]
-
Physical design must be a primary consideration from the very start.
(Physical design will influence logical design almost from the outset.)
-
Common sense design rules make physical dependencies explicit. [ch 2]
-
The include graph should alone be sufficient to infer all physical
dependencies within a system (provided the system compiles). [134]
-
The component (not the class) is the fundamental unit of design. [ch 3]
-
Hierarchical testing improves reliability while reducing costs. [ch 4]
-
In most real-world situations, large designs must be levelizable if
they are to be tested effectively. [171]
- Each leaf abstraction can be tested in isolation.
- Higher-level abstractions can be tested in a minimal context.
- Only the value added at each level needs to be tested.
- The complexity of a test mirrors the complexity of the component itself.
-
CCD (Cumulative Component Dependency) is a metric for monitoring the
link-time cost of incremental regression testing. [ch 4]
-
The primary purpose of CCD is to quantify the change in overall
coupling resulting from a minor perturbation to a given architecture.
[197]
-
Minimizing CCD for a given set of components is a design goal. [200]
-
Techniques exist for untangling cyclically-dependent designs. [ch 5]
-
Allowing two components to "know" about each other via #include
directives implies cyclic physical dependencies. [205]
-
Cyclic physical dependencies among components degrade design quality and
inhibit understanding, testing, and reuse. [185]
-
Acyclic physical dependencies can dramatically reduce link-time costs
associated with developing, maintaining, and testing large systems.
[189]
-
Techniques exist for insulating clients from implementation details. [ch 6]
-
Not all components should have insulating interfaces.
-
The package extends these concepts to larger projects. [ch 7]
-
Minimizing the time it takes to recompile after a source-code change
can significantly reduce the cost of development. [517]
-
Packaging subsystems so as to minimize the cost of linking to other
subsystems is a design goal. [275]
Miscellaneous
-
Place a redundant include guard around each include directive. [85]
-
Explicitly include all header files you depend on, do not rely on one header
file to include another. [113]
-
The dominant purpose of a name prefix is to identify uniquely the physical
package in which the component or class is defined. [490]
Ground rules
-
All definitions with external linkage should be declared in the component's
.h file.
-
When some thing has external linkage, it's not okay to forward declare it
in the .c file.
-
Only #include what is needed to compile in isolation.
-
Don't rely on other files to #include .h files that you "depend" on.
-
The component's .h file must be the first #include in the component's .c file.
-
Then #include all other .h files from least global to most global.
Major design rules:
-
Put global data in a struct. [70]
-
Avoid free functions (except operator functions) at file scope in .h
files. [72]
-
Avoid free functions with external linkage (including operator
functions) in .c files. [72]
-
Avoid preprocessor macros in .h files. [75]
-
Only classes, structs, unions, and free operator functions should be
declared at file scope in .h files. [77]
-
Only classes, structs, unions, and inline functions should be defined
at file scope in .h files. [77]
-
Include a header file only if you make direct substantive use of a class or
free function defined in the header. [135]
-
A .h file should only export what it must.
-
All definitions with external linkage go in the component's .c file and the
declarations go in the component's .h file. [115]
-
No local declarations in the .c file for entities with external
linkage. These declarations belong in the appropriate .h file and they
are accessed by #include'ing that .h file.
-
Do not use a local declaration for a non-local definition.
Include the necessary .h file instead. [119]
Examples of internal linkage:
- file scope variables and functions
- static int x;
- static int foo() { ... }
- const data definitions
- const double PI = 3.14159;
- enum type and legal values
- enum Boolean { TRUE = 1, FALSE = 0 };
- typedefs
- inline functions
- inline int operator==( ... ) { ... }
- class forward declaration
- class declarations and definitions
- class Point { public: int x() { return d_x; } };
Examples of external linkage:
- anything that produces symbols in the .o file
- global data (including enum instances)
- unresolved function reference/call
- non-inline member functions
- non-inline, non-static free functions
- class static data member declarations and definitions
Components
-
The root names of the .c and .h files should match exactly. [110]
-
The .c file of every component should include its own .h file as the first
line of code (even before system include files). [110]
-
Logical entities declared within a component should not be
defined outside that component. [108]
-
A component defining a function will usually have a physical dependency on
any component defining a type used by that function. [127]
-
Avoid cyclic dependencies among components. [185]
-
Friendship within a component is an implementation detail of that component.
[137]
-
Granting (local) friendship to classes defined within the same component does
not violate encapsulation. [139]
-
Friendship affects access priviledge but no implied dependency [141].
Granting friendship does not create dependencies but can induce physical
coupling in order to preserve encapsulation [308].
-
Escalating the level at which encapsulation occurs can remove the need to
grant private access to cooperating components within a subsystem. [315]
-
Defining an iterator class along with a container class in the same component
enables user extensibility, improves maintainability, and enhances reusability
while preserving encapsulation. [140]
-
Minimizing the number and size of exported header files enhances
usability. [503]
-
Minimizing the use of externally defined types in a component's
interface facilitates reuse in a wider variety of contexts. [558]
-
A good test for encapsulation is to see whether a given interface will
simultaneously support two significantly different implementation
strategies without modification. [562]
Testing
-
Distributing system testing throughout the design hierarchy can be much
more effective per testing dollar than testing at only the highest
level interface. [159]
-
Testing a component in isolation is an effective way to ensure
reliability. [162]
-
Hierarchical testing requires a separate test driver for every
component. [175]
-
Testing only the functionality directly implemented within a component
enables the complexity of the test to be proportional to the complexity
of the component. [178]
-
Components that use objects "in name only" can be thoroughly tested,
independently of the named object. [250]
.
Levelization (breaking dependencies)
- Escalation.
Moving mutually dependent functionality higher in the physical
hierarchy. [325]
-
If peer components are cyclicly dependent, it may be possible to
escalate the interdependent functionality from each of these components
to static members in a potentially new higher-level component that
depends on each of the original components. [220]
- Demotion.
If peer components are cyclicly dependent, it may be possible to demote
the interdependent functionality from each of these components to a
potentially new lower-level (shared) component upon which each of the
original components depends. [229]
-
Demoting common code enables independent reuse. [234]
-
Escalating policy and demoting infrastructure can combine to enhance
independent reuse. [235]
- Opaque pointers.
A pointer is said to be opaque if the definition of the type to
which it points is not included in the current translation unit [251].
Components that use objects "in name only" can be thoroughly tested,
independently of the named object.
- Dumb data.
Refers to a generalization of the concept of
opaque pointers. Dumb data is any kind of information that an object
holds but does not know how to interpret. Such data must be used in the
context of another object, usually at a higher level. [257]
-
Dumb data can be used to break "in name only" dependencies, facilitate
testability, and reduce implementation size. However, opaque pointers
can preserve type safety and encapsulation; dumb data, in general,
cannot. [264]
- Redundancy.
Deliberately repeating code or data in order to avoid unwanted
coupling brought on by reuse. [269]
-
The additional coupling of some forms of reuse may outweigh the
advantage gained from that reuse. [269]
-
Supplying a small amout of redundant data can enable the use of an
object "in name only", thus eliminating the cost of linking to the
definition of that object's type. [271]
- Callbacks.
A callback is a function, provided by a client to a subsystem,
that allows the callee to perform a specific operation in the context
of the caller. [275]
-
The indiscriminate use of callbacks can lead to designs that are
difficult to understand, debug, and maintain. [279]
-
The need for callbacks can be a symptom of a poor overall architecture.
[282]
- Manager class.
The idea of a "mediator". Creating a class that owns and coordinates
lower-level objects. Often makes a system easier to understand and
maintain. [290]
- Factoring.
Factoring means extracting pockets of cohesive functionality and moving
them to a lower level where they can be independently tested and reused.
[294]
-
Factoring a concrete class into two classes containing higher and lower
levels of functionality can facilitate levelization. [241]
-
Factoring an abstract base class into two classes - one defining a pure
interface, the other defining its partial implementation - can facilitate
levelization. [241]
-
Factoring a system into smaller components makes it both more flexible
and also more complex, since there are now more physical pieces to work
with. [243]
- Escalating encapsulation.
The idea of a "facade" or "wrapper". Moving the point at which
implementation details are hidden from clients to a higher level in the
physical hierarchy. Escalating the level at which encapsulation occurs
can remove the need to grant private access to cooperating components
within a subsystem. [315]
-
What is and what is not an implementation detail depends on the level
of abstraction within the physical hierarchy. [313]
OOD
-
A protocol class is a nearly perfect insulator. [386]
-
A protocol class can be used to eliminate both compile- and link-time
dependencies. [389]
-
Holding only a single opaque pointer to a structure containing all of a
class's private members enables a concrete class to insulate its
implementation from its clients. [402]
-
All fully insulated implementations can be modified without affecting
any header file. [404]
-
What, when, how much, and the costs of insulatation have lots of issues.
[448-462]
-
Settling for less than full encapsulation is sometimes the right
choice. [571]
-
The indiscriminate use of callbacks can lead to designs that are difficult
to understand, debug, and maintain. [279]
-
The need for callbacks can be a symptom of a poor overall architecture. [282]
-
Establishing hierarchical ownership of lower-level objects makes a system
easier to understand and more maintainable (triangle decomposition).
"Dumb-bell" decompositions are bad. [290]
-
Hiding header files from clients is no substitute for proper
encapsulation. It may make programmatic access difficult, but it is
routinely still possible. Additionally, clients will still be impacted
if implementation choices change. [316]
-
Virtual functions implement variation in behavior; data members
implement variation in value. [601]
-
Member functions that are not public expose general users to
uninsulated implementation details. [613]
-
A variety of problems can be solved by adding an extra level of
indirection. [671]
-
Design patterns are an effective way of communicating reusable concepts
and ideas at an architectural level. [731]
-
Design patterns, like the design process itself, address both logical
and physical issues. [732]
-
Self-registering objects are "cute". Non-invasive extension mechanisms
are an inappropriate dream.
C++
-
Default arguments can be an effective alternative to function
overloading, especially where insulation is not relevant. [619]
-
Never pass a user-defined type to a function by value, pass it by const
reference. [622]
-
Whenever a parameter is passed by reference or pointer, and it is neither
modified nor stored, the parameter should be declared const. [629]
-
Avoid declaring parameters passed by value as const. [629]
-
Passing in the address of a previously constructed object to be
assigned the return value (called return by argument) can improve
performance while preserving total encapsulation. (p. 565)
-
Returning a non-const object from a const member function can rupture
the const-correctness of a system. [607]
-
Avoid declaring a function inline whose body produces object code that is
larger than the object code produced by the equivalent non-inline function
call itself. [631]
-
Avoid declaring a function inline that the compiler will not inline. [632]
-
Explicitly declare (either public or private) the constructor and
assignment operator for any class defined in a header file, even when
the default implementations are adequate. [650]
-
In general, an object cannot be copied (or moved) using a bitwise copy.
[721]
-
In every class that declares or is derived from a class that declares a virtual
function, explicitly declare the destructor as the first virtual function in the
class and define it out-on-line. [651]
-
In classes that do not otherwise declare virtual functions, explicitly declare
the destructor as non-virtual and define it appropriately (either inline or
out-of-line). [654]
-
Supplying support for derived-class authors in the form of protected
member functions of a base class exposes public clients of the base
class to uninsulated implementation details of the derived classes. [364]
-
The construction of each non-local static object in a program
potentially contributes to invocation time. [533]
-
Avoid "hiding" a non-virtual base class function in a derived class. [602]
-
Static member functions are commonly used to implement non-primitive
functionality in a separate utility class. [604]
-
Avoid "cast" operators, especially to fundamental integral types, make the
conversion explicit instead. [649]
-
Constructors that enable implicit conversion, especially from widely
used or fundamental types (e.g., int), erode the safety afforded by
strong typing. [646]
-
Instrumenting global operators new and delete is a simple but effective
way to understand and test the behavior of dynamic memory allocation
within a system. [692]
General programming
-
Use assert statements to document the assumptions made in implementation. [90]
-
For functions that return an error status, an integral value of 0
should always mean success. [615]
-
Functions that answer a yes-or-no question should be worded appropriately
(e.g. isValid()) and return an int value of either 0 (no) or 1 (yes). [617]
-
In a procedural interface, having clients explicitly destroy only those
objects that they explicitly create reduces confusion over ownership
and can lead to improved performance. [437]
-
Functions should never take, and store, the address of an argument in
a location that will persist after the function terminates. If storing an
address is necessary, the client should be required to pass an address. [626]
-
Avoid using short in the interface, use int instead. [633]
-
Avoid using unsigned in the interface, use int instead. [637,667]
-
Avoid using long in the interface, use assert(sizeof(int) >= 4) and use either
int or a user-defined large-integer type instead. [642]
-
Avoid using float or long double in the interface, use double instead. [645]