A graph showing the relationship between service size and the typical number of services affected by a change |
This overhead, along with many others such as frequent re-implementation of common functionality, is an accepted cost of using microservices as services are naturally more granular, composable and reusable.
There are two accepted processes for version controlling microservices, the 'monorepo' and the repository per service 'manyrepo' approach. With the monorepo, all services are kept in the same source control repository. When studying microservices from a theoretical perspective it seems logical to add yet another layer of separation between services by adopting manyrepo but there are some real issues and overheads associated with doing so, which your author has recently been experiencing first hand! Below I have attempted to communicate the relative pros and cons for the 'manyrepo' approach.
Pros
- The separation of repos with explicit dependencies upon one another makes the scope of any commit easy to reason about. That is, to say which services a commit affects and requires deployment of. With a single repo 'the seams' between services aren't as well defined / bounded. This makes it harder to release services independently with confidence, thus hampering the continuous deployment of independent releases that is considered crucial to microservices based architectures. That is to say the monorepo, to an extent encourages lock-step releases.
- It scales better with large organisations of many developers, it ensures that developer's version control workflows do not become congested.
- It helps repos stay smaller and so keeps inbound network traffic for pulls smaller for developers providing a faster workflow.
Cons
- With multiple repos changelogs become fragmented across multiple pull requests making it harder to review, deploy and roll back a single feature. Indeed, deploying a feature can require deployments across mulitiple services, making it more difficult to track the status of features through environments. There is a lot of version control overhead here.
- Making common changes to each service. Making a certain change to N services requires N git pushes.
- It is harder to detect API breakage as the tests in the consumer of the API will not run until a change to that service is pushed. Note that this can be resolved with 'contract testing', that is, testing of the API of a service within the service itself, which you do not get for free in the consumer.
- It is more difficult to build a continuous integration system where all dependencies are not immediately available via the same repository.
This a very interesting topic that definitely warrants further discussion and debate, indeed the the choice of monorepo vs manyrepo has interesting implications for release strategy and whether supporting varied permutations of co-existing versions is worth the very real and visible overhead that it incurs.
Lots of the workflow pros of the manyrepo don't really come into affect until you have a very large engineering staff which most companies won't / are unlikely to ever have. Also google have solved some of these problems for monorepos by doing fancy stuff with virtual filesystems.
Personally I considered the manyrepo to be superior until I had to experience the increased developer overhead first hand. However it remains to be seen if this encouragement to think about services as separate distinct entities is worth it in the long run.