Project Archival
This document outlines the best practices for archiving an engineering project at Cancer Research UK.
The following content is recommended.
Rationale
As business requirements and software best practices change, some projects will be made redundant over time. As part of our workflow, we should be ensuring that such redundancies are flagged & actively planned for removal. Failure to do so can result in an unneeded project's code or infrastructure causing maintenance confusion. In addition, being active in archiving unnecessary assets will improve our sustainability.
Typically, software projects become redundant when another project, either internal or external to CRUK, solves the problem a project was solving more effectively. When this happens, the focus tends to be on fleshing out the new solution, with attention mostly going to the previous solution to extract data from it. This shift in focus on the new without making a plan to properly phase out the old solution is what keeps the old project's artifacts around.
Flagging a legacy project for archival
This step is crucial to kicking off the whole process, and is often missed out when a new solution to a problem comes along. Once a proof of concept for the new solution has been approved for further development with an aim to ultimately be deployed to production, an Epic to retire the legacy system should be raised in to the ticket backlog. This should ensure that the work required to properly shutdown the legacy project is visible and considered part of the project team's work.
Documentation
Once a project has been identified as a legacy project, its documentation should make this explicit. Any documentation relating to the legacy project should clearly link to the equivalent documentation for the new project. Additionally, any GitHub repositories involved in the legacy project should have their description updated to explicitly label the project as '[LEGACY]', preferably at the start of the description.
Data
While a project may no longer be required, the data is used may still be needed for various reasons. How data for a legacy project is handled will vary significantly based on the context of that project. That being said, there are a few guiding principles that should be followed:
- If the data is intentionally being retained for reasons other than support, any personally-identifiable information should be sanitised from it.
- When data is archived, any database queries that were ran to retrieve this data should also be archived alongside it. This can help engineers in the future identify if there may have been any oversights with the archival process.
- Any data that's archived should have a clearly documented retention policy that makes it clear when that data is no longer required. Once those conditions are met, the data should be promptly deleted.
Infrastructure
Any servers relating to the legacy system should, if applicable, have redirects enabled to direct supporters & CRUK staff to the preferred system. This may include non-obvious URLs, so all domains available to a project should be identified in documentation to reflect this, even if a domain isn't intended for public use (for example, a default domain provided by AWS). This will allow us to ensure we are serving a consistent user journey.