Clean Code: A Handbook of Agile Software Craftmanship By Robert C Martin
Writing clean code is what you must do in order to call yourself a professional. There is no reasonable excuse for doing anything less than your best.
Introduction
The rule of thumb for a project to scale fast, be done quickly, be easy to write and be easy to read is to have clean code. However, very often, programmers are concerned with the implementation details and making sure that the code works rather than ensuring the quality of the code. After all, having a working mess is better than nothing. This is a bad choice! Over time, as the code increases in complexity, we find ourselves struggling to wade through the bad code and end up wasting more time, spending unnecessary time re-reading the code and fixing bugs that spring up from random places.
As a novice, I am guilty of this. The worst part is, I often commit this crime unknowingly.
Personally, I find that writing clean code is not something intuitive. It doesn’t come naturally to me, no matter how hard I rack my brain. Even when I make the effort to double check my work, I tend to have difficulties in identifying areas of improvement to my code. There have been instances where I think that my code looks fine, feel satisfied with myself, hit commit, push and several hours later a code reviewer point out to me the flaws of my code and then I think to myself Aha, why didn’t I realise that myself? Perhaps it is due to my limited experience and limited exposure to the good practices of other more experienced programmers that I face such an issue. This is bad and I don’t want it to continue. In fact, this issue only came to light recently, in my final year of university, and I am still in the process of making a conscious effort to rectify it. It is also why I have decided to pick up some books that touches on this topic and hopefully, glean some wisdom to help me in my quest of becoming a better programmer.
Today’s article will be about the book, Clean Code: A Handbook of Agile Software Craftmanship. I found it to be quite a good read, albeit a rather lengthy one. It showcases the thought processes of good programmers and the tricks, techniques, and tools that they use to write clear, understandable code. Oh boy, I wished someone had recommended this book to me earlier. Nevertheless, this article serves to document my learnings based on my understanding of this book. It is written based on certain selected chapters of the book and not the entire book itself. It is recommended however, to read the entire book yourself if you want to get a full understanding of the author’s intention.
In case the article ends up being long and lengthy, this is the big picture:
- What does writing clean code entail
- Crafting Meaningful Names
- Things to note about Functions
- Things to note about comments
- How to format code
- The difference between objects and data structures
- How to perform Error Handling
- Things to note about Classes
(1) What does writing Clean Code entail?
Writing clean code is similar to painting a picture. As observers, we are able to judge if a piece of paint work is good or bad. However, how many of us actually know how to paint? As quoted from the book, writing clean code requires the disciplined use of little techniques applied through a painstakingly acquired sense of “cleanliness”. This sense, refers to “code-sense” (think of, ball sense in sports). A programmer with code sense will be able to look at a messy module and see options to plot a sequence of behaviour and transformation to construct an elegantly coded system.
The book then shares some definitions of clean code by other programmers and concludes by saying that none of the different schools of thinking is absolutely right. Just like how we cannot seem to agree on which school of martial arts is the best technique, there is no clear cut definition to what is clean code. Each definition by different programmers has it’s own validity and the act of accepting one definition does not invalidate another.
However, we can all come to a consensus that clean code is easy to read and write.
(2) Crafting Meaningful Names
Use Intention Revealing Names:
(I found this to be the most useful tip)
Without the need for comments, the name of a variable, function or class should tell you why it exists, what it does, and how it is used.
We need to make sure that the code does not implicitly require the reader to know some facts beforehand.
Lets say we have the following code below:
The problem is not the simplicity of the code but the implicity of the code: the degree to which the context is not explicit in the code itself.
The code implicitly requires that we know the answers to questions such as:
1. What kinds of things are in theList?
2. What is the significance of the 0th subscript of an item in theList?
3. What is the significance of the value 4?
4. How is the return value of this function used?
The answers to these questions are not present in the code sample, but they could have been. A rewrite of the function could be as seen:
We could also go further and write a simple class for cells instead of using an array of ints. It can include a intention-revealing function (call it isFlagged) to hide the magic numbers. This then results in a new version of the function as shown below:
Avoid Disinformation:
We should always ensure that our naming are done correctly and that the implementation reflects the name definition. Beware of using names which vary in small ways, for eg differ by only a single character.
Make Meaningful Distinctions:
Avoid giving names that are hard to distinct.
Imagine finding classes named as
- Customer
- CustomerObject
or
- getActiveAccount();
- getActiveAccounts();
- getActiveAccountInfo();
What should you understand as the distinction? How are the programmers in this project supposed to know which of these functions to call? Therefore, distinguish names in such a way that the reader knows what the differences offer.
Class Names:
Classes and objects should have noun or noun phrase names like Customer, WikiPage, Account, and AddressParser. Avoid words like Manager, Processor, Data, or Info in the name of a class. A class name should not be a verb.
Method Names:
Methods should have verb or verb phrase names like postPayment, deletePage, or save.
Accessors, mutators, and predicates should be named for their value and prefixed with get, set, and is.
Pick One Word per Concept:
For the entire project, make sure to stick with one word for one abstract idea. For example, having fetch, retrieve, and get as equivalent methods of different classes is confusing. How do you remember which method name corresponds to which class?
Add Meaningful Context to variable names:
Most of the time, you’ll need to enclose names in well-named classes, functions, or namespaces to put them in context for your readers. Prefixing the name could be a method used as a last resort if all else fails.
Consider variables with names like firstName, lastName, street, houseNumber, area, state, and zipcode. When viewed as a whole, they clearly form a part of an address. But what if you just saw the state variable being used alone in a method? Will you be able to infer that it was a part of an address?
You can add context by using prefixes: addrFirstName, addrLastName, addrState, and so on. At least readers will understand that these variables are part of a larger structure. Of course, a better solution is to create a class named Address. Then, even the compiler knows that the variables belong to a bigger concept.
Example:
Consider the method in Listing 2–1. Do the variables need a more meaningful context? The function name provides only part of the context; the algorithm provides the rest. Once you read through the function, you see that the three variables, number, verb, and pluralModifier, are part of the “guess statistics” message. Unfortunately, the context must be inferred.
The code below is much better, although longer.
(3) Things to note about Functions
Keep functions Small!:
They should hardly be at 20 lines long! They should usually be shorter than Listing 3–2! Indeed, Listing 3–2 should be shortened to Listing 3–3.
This implies that the blocks within if statements, else statements, while statements and so on should be 1 line long…probably that line should be a function call. Therefore, the indent level of a function should not be greater than one or two.
Do One Thing:
Functions should only do one and only one thing.
To ensure that our functions are only doing “one thing,” we can follow the rule of making sure that the statements within our function are all at the same level of abstraction.
Example of when a code violates this rule….There are concepts in there that are at a very high level of abstraction, such as getHtml(); others that are at an intermediate level of abstraction, such as: String pagePathName = PathParser.render(pagePath); and others that are a low level, such as: .append(“\n”).
Switch Statements:
It can be a challenge to make switch statements small as they are known to do N things. Therefore, it can be said that the rule of keeping switch statements small could be ignored. We just have to make sure that each switch statement is buried in a low level class and is never repeated with the help of our friend, polymorphism.
Problems with this function: First, it’s big, and it’ll get bigger as more employee types are introduced. Second, it very clearly does more than one thing. Third, it violates the Single Responsibility Principle (SRP) because there is more than one reason for it to change. Fourth, it violates the Open Closed Principle (OCP) because it must change whenever new types are added. But possibly the worst problem with this function is that there are an unlimited number of other functions that could potentially have the same structure. For example we could have
isPayday(Employee e, Date date)
or
deliverPay(Employee e, Money pay)
All of these functions will also require to check for the type of employee and hence would have the same deleterious switch structure.
The solution to this problem (see Listing 3–5) is to bury the switch statement in the basement of an ABSTRACT FACTORY. The factory will use the switch statement to create appropriate instances of the derivatives of Employee, and the various functions, such as calculatePay, isPayday, and deliverPay, will be dispatched polymorphically through the Employee interface -> Each of these Sub-Classes of Employee will contain the respective 3 functions.
The general rule for switch statements is that they can be tolerated if they appear only once, are used to create polymorphic objects, and are hidden behind an inheritance relationship so that the rest of the system can’t see them.
Function Arguments:
The ideal number of arguments for a function is zero. More than 3 arguments requires very special justification and shouldn’t be used…
Consider, for instance, the StringBuffer in the example. We could have passed it around as an argument rather than making it an instance variable, but then our readers would have had to interpret it each time they saw it. For example, includeSetupPage() is easier to understand than includeSetupPageInto(newPage_content). The argument is at a different level of abstraction than the function name and forces you to know a detail (in other words, StringBuffer) that isn’t particularly important at that point.
Arguments are even harder from a testing point of view. Imagine the difficulty of writing all the test cases to ensure that all the various combinations of arguments work properly.
(a-Single Argument)
There are two common reasons to pass a single argument into a function:
- Asking a question about that argument or you may be operating on that argument, transforming it into something else and returning it.
- Another usage for a single argument function, is an event where there is an input argument but no output argument. The overall program is meant to interpret the function call as an event and use the argument to alter the state of the system, for example, void passwordAttemptFailedNtimes(int attempts) could be a function to print some lines.
(b-Flag Arguments)
It is considered a bad practice to pass a Boolean into a function as It complicates the signature of the method, loudly indicating that the function does more than one thing. If the flag is true, it does something. If the flag is false, it does another thing.
(c-Two arguments)
A function with two arguments is harder to understand than a function with one argument. For example, writeField(name) is easier to understand than writeField(output_stream, name).
The user is required to know what the ordering of the arguments should be…of course you are not able to avoid them sometimes. For example, in the construction of a Point, we require two arguments.
But you should be aware that they come at a cost and should take advantage of what mechanism may be available to you to convert them into monads (functions with 1 arg).
(d-Argument Object)
When a function seems to need more than two or three arguments, it is likely that some of those arguments ought to be wrapped into a class of their own.
For instance, consider the difference between the following two declarations:
Circle makeCircle(double x, double y, double radius);
Circle makeCircle(Point center, double radius);
(e-Verbs and Keywords)
In the case of a function with one argument, the function and argument should form a very nice verb/noun pair. For example, write(name) or writeField(name).
Have No Side Effects:
Your function should only perform something that it promises. It should not perform any hidden actions such as making unexpected changes to the variables of its own class.
(Output Arguments):
In general output arguments should be avoided. If your function must change the state of something, have it change the state of its owning object rather than change the state of the input argument.
Command Query Separation:
Functions should either only change the state of an object, or only return some information about that object, but not both.
public boolean set(String attribute, String value);
This function sets the value of a named attribute and returns true if it is successful and false if no such attribute exists. This leads to odd statements in the caller, such as:
if (set(“username”, “unclebob”))…
Imagine you as the reader. It might be difficult to identify what is happening here since it is not clear if the word “set” is a verb or adjective.
- Is it asking whether the “username” attribute was previously set to “unclebob”?
- Is it asking whether the “username” attribute was successfully set to “unclebob”?
We could try to resolve this by renaming the set function to setAndCheckIfExists() but it does not improve the readability.
A better proposed solution is to separate the command from the query so that ambiguity cannot occur.
if (attributeExists(“username”)) {
setAttribute(“username”, “unclebob”);
…
}
Prefer Exception to Returning Error Codes:
When we have functions returning error codes, we are actually performing a violation of the previous point on command query separation as it results in commands being used as expressions in the predicates of if statements.
if (deletePage(page) == E_OK)
This does not suffer from verb/adjective confusion but will indeed lead to deeply nested structures.
Furthermore, the caller must act on the error immediately.
On the other hand, if you use exceptions instead of returned error codes, then the error processing code can be separated from the happy path code and can be simplified:
On this note, Try/Catch blocks could make the structure of the code confusing as it mixes error processing with normal processing. Therefore, we should extract the bodies of the try and catch blocks out into functions of their own.
In the above, the delete function is all about error processing. The deletePageAndAllReferences function is concerned only with the processes of fully deleting a page and ignores error processing. This provides a nice separation that makes the code easier to understand and modify.
Functions should do one thing and Error handing is one thing on its own. Thus, a function that handles errors should do nothing else. This implies (as in the example above) that if the keyword try exists in a function, it should be the very first word in the function and that there should be nothing after the catch/finally blocks.
(4) Things to note about Comments
Bad code is not compensated by comments.
In general, comments can be used for
- Explanation of intent behind a decision.
- Clarification. Make the argument or return value clear.
- Warning of consequences.
- TODO comments.
- Amplification of something that might seem inconsequential.
(5) How to format code
Lines of code that are tightly related should appear vertically dense. Concepts that are closely related should belong to the same source file.
Variable declarations: Variables should be declared as close to their usage as possible. Assuming our functions are short, we should place local variables to be at the top of each function.
Control variables: Should be declared within the loop statement.
Instance variables: Should be declared at the top of the class. This is because most of the time, they are expected to be used by most of the methods within the class.
Dependent Functions: If one function calls another, they should be vertically close. We try to position the caller function above the callee function. Top function call Bottom functions.
(6) The difference between Objects and Data Structures
Data Abstraction:
Listing 6–1 is a data structure while 6–2 is an object.
In listing 6–2, there is no way you can tell if the implementation is in rectangular or polar coordinates. In Listing 6–1, on the other hand, is very clearly implemented in rectangular coordinates, and it forces us to manipulate those coordinates independently. This exposes implementation. Indeed, it would expose implementation even if the variables were private and we were using single variable getters and setters.
A class exposes abstract interfaces that allow its users to manipulate the essence of the data, without having to know its implementation.
Data/Object Anti-Symmetry:
Objects hide their data behind abstractions and expose functions that operate on that data. Data structure expose their data and have no meaningful functions.
Example One with Procedural Solution:
The Geometry class operates on the three shape classes. The Shape classes are simple data structures without any behaviour. All the behaviours are kept in the Geometry class.
If a new perimeter() function were added to Geometry. The Shape classes would be unaffected! This follows for any other classes that depended upon the shapes. On the other hand, if we added a new shape, we will have to change all the functions in Geometry to deal with it.
Example two with Object Oriented Solution:
Here the area() method is polymorphic and no Geometry class is necessary. So if we add a new shape, none of the existing functions are affected, but if we add a new function all of the shapes must be changed!
Procedural code (code using data structures) makes it easy to add new functions without changing the existing data structures. OO code, on the other hand, makes it easy to add new classes without changing existing functions.
Procedural code makes it hard to add new data structures because all the functions must change. OO code makes it hard to add new functions because all the classes must change.
In any complex system there are going to be times when we want to add new data types rather than new functions. For these use cases, objects and OO are most appropriate. On the other hand, there will also be times when we’ll want to add new functions as opposed to data types. In that case procedural code and data structures will be more appropriate.
The Law of Demeter:
A module should not know about the inner workings of the objects it manipulates.
A method f of a class C should only call the methods of these:
- C
- An object created by f
- An object passed as an argument to f
- An object held in an instance variable of C
Talk to friends, not to strangers.
The following code below appears to violate the Law of Demeter (among other things) because it calls the getScratchDir() function on the return value of getOptions() and then calls getAbsolutePath() on the return value of getScratchDir().
final String outputDir=ctxt.getOptions().getScratchDir().getAbsolutePath();
Train Wrecks: The above code is sloppy. It should be broken into:
The containing module knows that the ctxt object contains options, which contains options, which contain a scratch directory, which has an absolute path. This is a lot of information for a single function to be aware of.
Whether this is a violation of Demeter depends on whether or not ctxt, Options, and ScratchDir are objects or data structures. If they are objects, then their internal structure should be hidden rather than exposed, and so knowledge of their inner workings is a clear violation of the Law of Demeter. On the other hand, if ctxt, Options, and ScratchDir are just data structures with no behavior, then they naturally expose their internal structure, and so Demeter does not apply. In this case, the code will be:
final String outputDir = ctxt.options.scratchDir.absolutePath;
If ctxt is an object, we should be calling methods to operate on something and not dealing with its’ internal properties. So in the first place, lets trace back to why we wanted the absolute path of the scratch directory. The intention was to create a scratch file of a given name.
So, we can change the code to be
BufferedOutputStream bos = ctxt.createScratchFileStream(classFileName);
This allows ctxt to hide its internals and prevent the current function from having to violate the Law of Demeter.
(7) How to perform Error Handling
Use Exception rather than Return codes:
As explained earlier, return codes means the caller must check for errors immediately after the call. Hence, we try to avoid them and use Exception Try/Catch instead.
Write your Try-Catch-Finally Statement First:
We can understand Try/Catch blocks to be like transactions. Your catch has to leave your program in a consistent state, no matter what happens in the try. Therefore it is good practice to start with a try-catch-finally statement when you are writing code that could throw exceptions.
Don’t return Null:
If you are tempted to return null from a method, consider throwing an exception or returning a SPECIAL CASE object instead. If you are calling a null-returning method from a third-party API, consider wrapping that method with a method that either throws an exception or returns a special case object.
Don’t pass Null
Returning null from methods is bad, but passing null into methods is worse. Unless you are working with an API which expects you to pass null, you should avoid passing null in your code whenever possible. This is to avoid NullPointerException.
(8) Things to note about Classes
Class Organisation:
A class should begin with a list of variables. Public static constants, if any, should come first. Then private static variables, followed by private instance variables. Most of the time, it is hard to come up with a good reason to keep public variables.
Follow the list of variables would be public functions. Usually private utility functions called by the public functions are placed below the caller functions.
Encapsulation: While we try to keep our variables and utility functions private, sometimes we need to make them protected or accessible to items in the same package so that they can be accessed by a test.
Classes should be small!:
With functions, we measured size by counting physical lines. With classes, we measure them by their number of responsibilities.
Avoid God-Class: For eg, if we have a class that exposes about 70 public methods.
Even if the class only has 5 functions, it can still have too many responsibilities.
To avoid god classes, we can make use of the process of deriving class names to help us in gauging the size of the class. The name of a class should describe what responsibilities it fulfills. If we cannot derive a concise name for a class, then it’s likely that we have to break it down further. The more ambiguous the class name, the more likely it has too many responsibilities. For example, class names including words like Processor or Manager or Super often hint at a class having too much responsibilities.
We should also be able to write a brief description of the class in about 25 words, without using the words “if,” “and,” “or,” or “but.”
In the above example, how would we describe the SuperDashboard? “The SuperDashboard provides access to the component that last held the focus, and it also allows us to track the version and build numbers.” The first “and” is a hint that SuperDashboard has too many responsibilities.
So how do we decide to break down the class? We can make use of SRP to help us.
The Single Responsibility Principle (SRP) states that a class or module should have one, and only one, reason to change. Classes should have one responsibility — one reason to change.
In the above example of Listing 10–2, it has two reasons to change. First, it manages version information that would need to be updated every time the software gets shipped. Second, it manages Java Swing components. If we change the Swing code, we’ll almost certainly want to update the version number, but the opposite isn’t always true: we might change the version details based on changes to other code in the system.
Trying to identify responsibilities (reasons to change) often helps us recognize and create better abstractions in our code.
In the above case, we can easily extract all three SuperDashboard methods that deal with version information into a separate class named Version. (See Listing 10–3.)
It is a widely held belief that having a large number of small, single-purpose classes makes seeing the big picture more difficult as readers need to navigate from class to class in order to do so. However, a system with many small classes has no more moving parts than one with a few large classes. So the question is: Do you want to arrange your tools into toolboxes with several small drawers, each containing well-defined and well-labeled components? Or do you want a few drawers that you just toss everything into?
We want our systems to be made up of many small classes rather than a few large ones. Each small class encapsulates a single responsibility, has a single reason to change, and works with a few others to achieve the desired system behaviors.
Cohesion
In general, the more instance variables a method manipulates, the more cohesive that method is to its class. A class in which each variable is used by each method is maximally cohesive.
We aim to create classes with high cohesion.
Conclusion
That’s the end of my article. Hopefully it is helpful!