Monorepo

[comment]: cspell:ignore microrepos polyrepos

This document outlines the best practice of using monorepos at CRUK for javascript and typescript projects.

:::

info

The following content is recommended.

Rationale

The rationale for using the monorepo approach:

Engineers can change multiple components in a single PR. Refactors and adjustments to one package can be applied across all packages in a single commit/PR.
All product components and packages are stored in a single place making source code easier to find.
Works well alongside a microservice architecture where each component is its own package within a monorepo.
Testing of dependencies can be done in the same PR.
Keeping the monorepo at the product level allows git history against a single product and keeps the repo a manageable size. It also allows access restriction at the product level.
Commands can be run across all packages in monorepo workspaces. This saves going into every directory and running commands (in the right order).

For using workspaces and monorepo tools:

All dependencies can be retrieved for every project in one command. This saves time and space.
Monorepo tools offer several advantages over traditional workspaces including caching results, dependency visualisation, distributed task execution, versioning and publishing, editor integrations.

Description

A monorepo is a software development strategy where the source code for multiple projects and packages are stored in a single git repository.

At CRUK monorepos are applied at the product level. The monorepo approach means that the frontend, backend, infrastructure and shared packages are stored within the same repository for a given product. For new products and re-writes the monorepo approach should be used moving forward.

Choosing the Right Repository Strategy

While monorepos offer many advantages, they also come with challenges and considerations. Here are some recommendations on when each approach is appropriate:

Monorepos: Ideal for projects with multiple components that need to be developed and maintained together. They work well for teams making frequent changes across components and ensuring consistency. However, as the monorepo grows, it may require additional tools to manage dependencies, builds, and CI/CD pipelines.
Polyrepos: Polyrepos use multiple repositories for different components or services. Each repo is managed separately, making it easier to handle independent projects without affecting others. This approach can reduce the complexity of managing a large codebase and allow teams to work more autonomously.
Microrepos: Microrepos are tiny, single-purpose repositories. They are ideal for small projects with few dependencies, making them easy to develop and deploy. This approach can simplify the development and deployment process for individual services but may lead to challenges in managing multiple repositories and ensuring consistency across them.

Workspaces

Workspaces allow multiple projects within a monorepo to reference each other and reflect code changes instantly. A monorepo can utilise different kinds of workspaces to manage its modules. A few prominent ones are listed below,

A good comparison of these can be found here

Using this setup engineers need only run npm ci/yarn install/pnpm install in the root directory to retrieve all dependencies for packages in the monorepo.

Scaling Monorepos

Monorepos can scale in two ways, depending on the architecture and tools used:

Vertical Scaling: Similar to monolithic apps, monorepos can scale by adding more resources (e.g., CPU, memory) to a single machine. This has limits based on the machine's capacity.
Horizontal Scaling: Monorepos can also scale by spreading the workload across multiple machines. This requires additional tools and infrastructure for managing distributed builds, tests, and deployments. Tools like Nx and Rush can help by enabling distributed task execution and caching.

Horizontal scaling improves performance, speeds up build times, and uses resources more efficiently. However, the effort needed depends on how loosely or tightly coupled the monorepo packages are. Loosely coupled packages are easier to scale horizontally, while tightly coupled packages may require more work to manage dependencies.

Tools

A monorepo is setup simply by having multiple packages within a single git repository. The monorepo can also include workspaces and in most cases, this is enough for building and maintaining the packages. There are however monorepo tools that can aid development in a monorepo setup. These tools offer dependency management, caching, versioning amongst other benefits in a monorepo (detailed in the rationale).

Current trends of various JS/TS monorepo tools are available here.

The two most widely used tools at CRUK for managing monorepos are NX and Rush; greater detail is provided below on these tools and example repos can be found below.

Specific tool use is not enforced and is provided as a guide; it is up to the product team to decide what tool to use.

NX

Instructions can be found on the link above on how to add NX to an existing monorepo.

The repository setup for a Nx monorepo (by default) is:

nx-repo/
  packages/
  package.json
  nx.json

Here we have a package.json at root with repo-wide dependencies (e.g. eslint, prettier, lerna). The nx.json has configuration for nx itself, this defines where packages are contained, versioning, npm client etc. Finally the packages/ directory is where all sub-directories of packages are contained for the repo.

Commands can be run across all packages. This is useful, for instance, for GitHub actions to run unit tests across all packages via nx run test.

Rush

Rush is another monorepo tool developed by Microsoft. It has similar features to Lerna and is a professional, fast solution for managing multiple packages within a monorepo.

Currently, Rush has issues with both npm(v6 and higher) and yarn workspaces detailed over here. This tool would be best suited to work with PNPM workspaces.

Details can be found here on setting up a new rush monorepo.

The repository setup for a rush monorepo (by default) is:

rush-repo/
  package.json
  rush.json

Here we have a package.json at root with repo-wide dependencies (e.g. eslint, prettier, lerna). The rush.json has configuration for rush itself, this defines where packages are contained, versioning, npm client etc. Rush does not have a default semantic on package location and can be chosen by the team. It may be worth following the same pattern as lerna by using a packages/ directory to align lerna and rush monorepos across the organisation.

Rush can run commands against all packages using custom commands.

Other options

Lerna

The NX team currently manages Lerna and Lerna uses Nx for most operations under the hood.

The general advice is to setup Nx first on new projects and consider using Lerna only if additional features like versioning and publishing are required.

Existing projects with Lerna can continue using it.

Turborepo

Turbo is a rising contender and is backed by Vercel and the team behind React. It doesn't include all the features offered by Nx. Without the added complexity and features of nx, it's relatively straightforward and simple to setup Turbo. Setup instructions are included here.

The repository setup for a turbo monorepo (by default) is:

turbo-repo/
  package.json
  turbo.json

turbo.json defines all the configurations required for Turbo detailed here.
Although Turborepo supports both mono and polyrepos, it's capabilities have not been utilised in projects in CRUK currently since it is relatively new (released in December 2021) and it's use is not discouraged.

No build tools

Monorepo tools offer certain features like caching, parallel operations and publishing that traditional workspaces don't. If build tools don't benefit a project, you might consider excluding them altogether and relying on the capabilities of npm/yarn/pnpm workspaces instead.

Examples

This repository is setup as a monorepo using Lerna with Yarn Workspaces. Activity Management and PersonHub are also setup as a monorepo using NX and Rush tools respectively. ECTF currently uses Nx with Yarn workspaces.

Lerna with Yarn Workspaces: https://github.com/CRUKorg/engineering-guidebook
NX with PNPM workspaces: https://github.com/CRUKorg/activity-management
Rush: https://github.com/CRUKorg/cruk_personhub_integration
Nx with Yarn Workspaces: https://github.com/CRUKorg/finding-clinical-trials

References & Further Reading

Wikipedia Monorepo (https://en.wikipedia.org/wiki/Monorepo)
Why monorepo (https://rushjs.io/pages/intro/why_mono/)
Why Lerna and Yarn Workspaces is a Perfect Match for Building Mono-Repos (https://doppelmutzi.github.io/monorepo-lerna-yarn-workspaces/)
Workspaces in Yarn (https://classic.yarnpkg.com/blog/2017/08/02/introducing-workspaces/)
Mono Repository Tool Comparison (https://gist.github.com/morewry/d3419a38d74590493042544d4afa49a7)
Javascript monorepo tooling (https://dev.to/hipstersmoothie/javascript-monorepo-tooling-48b9)

Monorepo

Rationale​

Description​

Choosing the Right Repository Strategy​

Workspaces​

Scaling Monorepos​

Tools​

NX​

Rush​

Other options​

Lerna​

Turborepo​

No build tools​

Examples​

References & Further Reading​