Logging
Core principles
The most important bit of logging is to capture errors, when something breaks we want to know what broke and how to fix it. When something goes wrong on the server logging is fairly straight forward, not only do we own the servers or lambdas but often out of the box solutions for tracking errors. It is as bit more complicated with the frontend. If an error happens on the browser for a user we need to know as much about it as possible so we can attempt to find out what happened and fix the issue. However, we are dealing with a few more unknowns, what browser is the user using, do they have an issue with their connection speeds, how are we tracking caught and uncaught errors, how will stacktraces work if my code is minified?
The simple answer is that we are currently sending everything to Datadog RUM (Realtime User Monitoring). Datadog can capture a whole bunch of logs but it's the RUM bit which is concerned with the user's frontend experience.
https://app.datadoghq.eu/rum/list
Setting up DataDog
Datadog should work with your SSO log in, if you have any issues with access please contact the cloud and hosting team for support of ping someone of on the #ask-sysops slack channel.
In the side panel of Datadog there is a a section called UX monitoring, click on RUM applications. If you can't see the add application button then you will need to find a someone with the correct access to set your app up for you.
Once the app has been added to Datadog you click on 'edit application' you will see in "Instrument your application" section the code snippet that you need to add into your app to get Datadog to work. Make sure that you cut the correct holes in your CSP in order for Datadog to connect https://docs.datadoghq.com/real_user_monitoring/faq/content_security_policy/.
![Datadog RUM code snippet]](../../../static/img/best-practices/datadog_rum_snippet.png)
Once initialised in your frontend app (client side not server side) datadog should be enabled on window and you should be able to add error events. Here is an example service file for datadog.
import { datadogRum } from "@datadog/browser-rum";
import { isBrowser } from "src/utils/browserUtils";
import { version } from "../../package.json";
const ENV_NAME = process.env.NEXT_PUBLIC_ENV_NAME || "";
export const setUpDataDog = () => {
datadogRum.init({
applicationId: "****-*******-******-******-********",
clientToken: "************************",
site: "datadoghq.eu",
service: "****************",
env: ENV_NAME,
// Specify a version number to identify the deployed version of your application in Datadog
version,
sampleRate: 100,
trackInteractions: true,
defaultPrivacyLevel: "mask-user-input",
});
datadogRum.startSessionReplayRecording();
};
export const trackDataDogError = (err: Error, extraInfo?: object) => {
if (!isBrowser) return;
if (ENV_NAME === "development") return;
datadogRum.addError(err, extraInfo);
};
Largest Contentful Paint
Largest Contentful Paint (LCP) is a metric that measures the time it takes for the largest content element on a webpage to become visible to the user. It is part of the Core Web Vitals, a set of performance metrics defined by Google to quantify user experience on the web.
A fast LCP ensures that the user can see and interact with the primary content quickly, leading to a better user experience.
Ideal LCP Values
- Good: Less than 2.5 seconds.
- Needs Improvement: Between 2.5 and 4.0 seconds.
- Poor: More than 4.0 seconds.
Setup
To set up monitoring for LCP, clone the following alert: CRUK Datadog Monitor and add the desired recipients.
This alert triggers when an anomaly detection for LCP exceeds a threshold of 15 minutes or more.
Error boundaries
Datadog should capture and log all uncaught errors, but ideally all errors should be caught within your app. Where you catch the error you want to track it and send it to Datadog. You might have an app level or route level error boundary
Sometimes frameworks have their own error boundaries like Next JS has it's own error pages so you might do something like this:
// src/pages/_error
import { NextPageContext } from "next";
import { useTrackingContext } from "src/contexts/TrackingContext";
import ErrorPage from "src/components/ErrorPage";
type Props = {
statusCode: number;
err: Error;
};
const Error = ({ statusCode, err }: Props) => {
const { trackError } = useTrackingContext();
if (err) {
trackDataDogError(err);
}
return <ErrorPage statusCode={statusCode} error={err} />;
};
Error.getInitialProps = ({ res, err }: NextPageContext) => {
const statusCode = res ? res.statusCode : err ? err.statusCode : 404;
return { statusCode, err };
};
export default Error;
One way of catching errors is with try/catch
blocks or using .catch()
blocks with thenables
and async code for example:
const oneTimeLogin: OneTimeLoginType | void = await fetch(
`${OAUTH_BASE_URL}${finaliseRegistration.oneTimeLogin.urlPath}?noninteractive=true`,
{
credentials: "include",
},
)
.then((res) => res as OneTimeLoginType)
.catch((err) => {
trackDataDogError(err as Error, {
component: "ActivityManagementJourney",
});
});
It's worth mentioning that our eslint config should warn you if you have an uncaught promise, telling you that you need to set up a .catch() and the datadog tracking is exactly the sort of thing that you should be doing inside that catch block. Remember with all caught errors if you aren't sending them somewhere you are never going to know about them. If something has broken for our end users we definitely want to know about it.
Logging and GDPR
We have been advised by our GDPR representative that is a user is known by us, or if a user has submitted data we are allowed to store personal identifying information like user IDs in our logs. The caveat is that this must not include sensitive data like religious, political, sexual or health information.
Source Maps
When the end user receives our JS it is usually in the format of a minified JS bundle. This is great for performance because we are sending smaller files down the wire, but if an error happens and we have a look at the stack trace it wouldn't be very human readable or useful, most of the variable names would have been replaced with a single character or two and there are zero spaces in the code. In this scenario source maps are super useful they are an additional file that can for the most be ignored but when debugging you can take the minified code and turn it into something that is once again human readable.
Source maps would essentially be different for each deploy and usually you can upload or expose different versioned source maps to tools like Datadog either automatically ad hoc when trying to debug. Different build tools have different ways to generate and expose source maps so it's hard to write a one size fits all guide on how to generate them and upload them to datadog.
If you can't for whatever reason upload source maps you can do other things to help you debug issues like add the to the some contextual info on the second param of trackDataDogError.
Uploading Source Maps
TBC
- Uploading source maps to DataDog (https://docs.datadoghq.com/real_user_monitoring/guide/upload-javascript-source-maps/?tab=webpackjs#instrument-your-code)