Friday 16 December 2016

Rebase as an Integration Strategy for Feature Branches

There are generally two reasons for using git rebase, 1) To tidy up/ rearrange commits that aren't publically in use 2) As a strategy for integrating branches. This post discusses the second use case. Rebase gets a lot of bad press, I think that this is parlty due to misunderstanding, it's like a dog that people kick, it bites someone, then it gets put down, so lets try and understand it and then play fetch with it or something instead.

So most people are familiar with merging as an integration strategy, the problem with merging is that it creates a non-linear history polluted with merge commits. The murky history manifests itself as ambiguity in tools like git log and increased difficulty in using git bisect, it generally makes archeological spelunking and working with history harder. Merges do have some benefits however, they naturally work well with pull requests and also preserve branch history (that may or may not be valuable to you). For a full discussion of pros and cons check out this excellent article.

Merge and Rebase Workflows

Starting steps, create a feature branch off the tip of up-to-date/ upstream master

 $ git checkout master
 $ git pull
 $ git checkout -b feature-branch

State of Play
      C             feature-branch
A---B---D        master

A normal merge workflow
... do some work, stage it
 $ git commit -m "C"
 $ git checkout master
 $ git merge feature-branch
 $ git push origin master
       C            feature-branch
      /   \
A---B---D        master

A normal rebase workflow
... do some work, stage it
 $ git commit -m "C"
 $ git rebase master
 $ git push origin feature-branch:master
Post Rebase
        C'         feature-branch
A---B---D        master

The Rebase Workflow Analysed

The rebase command above takes master and forward ports C on top and sets this to be feature-branch, this can be considered as 'rebasing' "feature commit" on the updated master. Through this process C is rewritten, with a different SHA, it is now a different commit, C', this can be a sticking point in understanding, but an important feature of a commit is that it is immutable. Rebases rewrite commits whereas merges don't.

The git push means, push the ref before the colon to the ref after the colon on the remote origin. If your git push is being rejected as a non-fast forward, you are doing something wrong or someone has pushed in the time since you pulled, the blighters, repull master and rebase onto it again.

Consider Dry Run

If you are worried about what you are commiting, note that you can always see what you are doing before you push it using the --dry-run argument to git push, which stops short of sending the actual update, you can then run git log on the SHA range it outputs.

$ git push origin feature-branch:master --dry-run  
   d4f3294..6c53234 feature-branch -> master  
 $ git log -U3 d4f3294..6c53234  

On a personal note, I prefer generally prefer rebase over merge for integrating feature branches, I am a bit keen on a nice clean history, the importance of well-kept history is a topic for a future post.

Sunday 11 December 2016

Coupling Under Christmas Trees

So I have been reading a couple of programming books recently. Reading reviews of these books I encountered a common complaint, that they 'state the obvious' or 'say things one already knows'.


Some Amazon Reviews

'If you have 7+ years of java development dont buy it... Most of what he has said took me some time to work out for myself' - Clean Code
'A lot of common sense and stuff a seasoned programmer probably already know' - Clean Code
'What little it does say has been said several times before' - Pragmatic Programmer

I have also found this to be the case, however I do not necessarily consider that this detracts from the work. Having been programming for several years you tend to pickup common patterns, smells and designs, they tend to 'fall out' of the code and into use over time. Implicitly forming in your monkey brain, however there is a risk that they may become malformed in their formation in isolation from wider discussion. There is a danger of developing bad habits and you know what they say about old dogs, never mind old programmers.
'The limits of my language means the limits of my world' - Wittgenstein

Covering these topics can serve to cement your understanding and formalise definitions of concepts or patterns of which perhaps you have already have grasped by a thread, but have not fully unwound in a deeper analysis. Indeed, formalisation is an important process, having a common language and nomenclature is an important prerequisite for effective discussion and deeper analysis. Also many of the most important truths that merit discussion are self-evident ones, like tests are good and duplication is bad.

The Bit In Which I Try and Justify The Slightly Erotic Title

Cohesion and coupling are two such concepts, I have found that a good understanding of these topics, alongside their conscious application can help you to write more extensible and maintainable software. Since we spend the vast majority of our time maintaining rather than creating, optimising for maintenance can save you a lot of hassle down the line.



In the Pragmatic Programmer, a helicopter's controls is used as an example of a tightly coupled system, indeed helicopters are considered considerably more difficult to fly than aeroplanes as constant action is required to keep them in the air, left to their own devices, they will, quickly plummet, sound like any software systems you know? Every control creates a side effect that requires the application of another control in order to resolve and so using the second control requires more corrections and so on, ad infinitum.

'I never liked riding in helicopters because there's a fair probability that the bottom part will get going around as fast as the top part' - Lt. Col. John Wittenborn, USAFR

Coupling is frequently expressed in terms of orthogonality, a term derived from mathematics, but essentially a grown up synonym for de-coupled, you too can use this term to befuddle new developers.

Working with a non-orthogonal (tightly coupled) system, any change can lead to a host of side effects, reducing confidence in your ability to make changes, this can result in inertia and paralysis, as you do not fully comprehend the full effects of potential changes. This is exacerbated by the fact that such a system is likely not well tested. This is because the coupling is tight, and prohibitively so, rendering isolation of components for unit testing difficult. The resultant dirth in test coverage further contributes to a deeper, more pungent rot. Maintaining such a system is akin to some some sort of feverish festive nightmare in which you are trapped in an eternal game of jenga, but you are alarmed to discover that your hands have turned into claws.

'Talk to friends not strangers' - Clean Coder

Orthogonality is often also expressed in reference to the 'Law of Demeter', which outlines design guidelines useful for designing a non-orthogonal system:
-  Each unit should have only limited knowledge about other units: only units "closely" related to the current unit.
- Each unit should only talk to its friends; don't talk to strangers.
-  Only talk to your immediate friends.

Something most developers have at least some understanding of are train wrecks, in addition to being generally bad practice, the presence of train wrecks, are frequently a sign of a breach of the law of demeter. Train wrecks are long chained calls, symptomatic of overly friendly objects, that have no respect for each other and are privy to each others dirty implementation details.

To borrow an example from Clean Coder:

final String outputDir = ctxt.getOptions().getScratchDir().getAbsolutePath()

Again, experienced developers will have certainly encountered this 'code smell', and will likely have realised that it is a rotten one. Now, this example may benefit from being broken into several lines, but our code here should certainly not have intimate knowledge of the implementation of scratchDir. Suppose this code is run on a test system that does not have access to the production file structure, our code is tightly coupled to the implementation of scratchDir and any change to the interface of scratchDir will necessitate changes here and to the multitude of other distantly related callers.

The Model, View, Controller (MVC) design pattern is a well known method for increasing maintainability via decoupling and the breaking out of responsibilites.

Christmas Trees

Imagine if you never threw away a christmas decoration, if instead you hung them all on your christmas tree, you featured on one of those Hoarders TV shows on Channel 5, it's the christmas special. So you kept all the crappy decorations made out of tissue paper that you made in primary school and the top of your tree shared angelic congestion issues the envy of heaven. Under the layer upon layer of decaying tinsel, the poor old tree strains under the additional weight of your magpie like compulsion.

You may have had the displeasure of encountering an object or class that bears a resemblance to such a tree, one where every bauble of information is pinned onto this object, until it begins to creak and strain under the complexity. Every time you receive a request to add extra functionality in this area, you groan, hold your breath, do it, wince, stack the technical debt higher, defer the refactor and try really hard to forget.

The Single Responsibility Principle

It is likely that such an overburdened class has many responsibilities and is in violation of SRP (Single Responsibility Principle), this states that a class should only have one responsibility and thus generally only one reason to change, this rule helps us keep class size reasonable. So, our class should have one over-arching high level responsibility but complex logic should be embodied in sub classes composed upon or utilised in conjunction to our class. The SRP helps us reason in terms of the language agnostic concept of responsibilities in contrast to line count. This deserves its own post.

Low Cohesion and its Effects

The presence of tight coupling and low cohesion are usually a good indicator of the benefits of a refactor and represent technical debt, but the subject of refactoring also deserves its own post! For now let's try and define cohesion. Cohesion is a measure of how well an object hangs together as a logical whole. In our christmas tree example we note that the parts bare little relevance to each other, colours and styles clashing garishly. Technically, cohesion is defined as being inversely proportional to the number of instance variables and methods a class has. A class having methods that only utilise a small subset of its variables is said to have low cohesion, a class where each variable is used by every method is said to be maximally cohesive. Note that maximum cohesion is rarely a goal, but low cohesion undesirable.

Similarly to a tightly coupled class, in an uncohesive class it is often difficult to make changes with confidence, fully aware of the knock on effects and to have comprehensive tests. The difficulty in testing is apparent if you following a thought exercise.

Take an uncohesive class, replace a class method with a function call in which each class variable is an argument and the return value is a composite value of all class attributes and the method return value. Now attempt to write exhaustive unit tests for this function over the range of possible inputs and outputs, hard, eh? We can see that the potential for side effects of such methods are large and testing is a combinatoric disaster. In addition, it is likely impossible to keep a full understanding of such a class in your head, thus we can see how low cohesion is detrimental to effective maintenance and extensibility.

Bit of a rambling one, but food for thought, links into some other topics I'd like to discuss. Anwyay, I hope that this can help you to write more maintainable and extensible code, or at the least provoke thought and discussion that does so.

Until next time