Metrics, Metrics Everywhere! Where to start with web performance metrics

It might start with you noticing your website feels a bit sluggish to load. Then a little bit of digging starts to show some large JavaScript files and images are being included on the page. Before you know it, you're into a world of performance metrics, acronyms, numbers and terminology.

It all gets pretty overwhelming. How do you know how well you're doing? Which metrics should you use? Where do you even start when looking at your own performance? Let's start by stepping back and remembering why performance is important.

Whilst performance is inherently technical in its implementation, it has a direct impact on the overall user experience. Users come to your site to complete a task such as reading an article or making a purchase. Anything which gets in the way of the user completing their task risks distracting them and may lead people to adandon your site – costing you both lost page views and revenue.

"1.0 second is about the limit for the user's flow of thought to stay uninterrupted, even though the user will notice the delay."

Jakob Nielsen, Usability Engineering 1993

A number of usability studies show that there are boundaries of acceptable response times in application design, after which the user will perceive there to be a delay. Typical recommendations, such as those in Usability Engineering by Jakob Nielsen are to have a 0.1 second response time for something to feel instant, or under 1 second to avoid disrupting the user's flow of throught.

These recommendations are nothing new. Usability Enineering was published in 1993 and references work going back to 1968, way before the technologies we use each day became ubiquitous. However, they are still relevant today as they describe human behaviours which only change slowly over time. Whilst internet speeds have hugely increased since 1993, so has the complexity of the websites we're building. It doesn't matter if you're building in standard HTML, CSS with just a sprinkling of JavaScript through to a full JavaScript application using React. Typical users won't know the difference (and nor should they) and will have the same expectations on response times.

There are a number of case studies which demonstrate the correlation between page load times and user engagement which help validate these guidelines. For example, the BBC found that, for every additional second a page takes to load, 10 per cent of users leave the site.

User-centric performance metrics

As there is a known correlation between performance and user engagement then what we want to find are the metrics which best measure performance as perceived by users. Google provide four questions to keep in mind when finding your performance metrics.

Is it happening?: Did the navigation start successfully? Has the server responded?
Is it useful?: Has enough content rendered that users can engage with it?
Is it usable?: Can users interact with the page, or is it busy?
Is it delightful?: Are the interactions smooth and natural, free of lag and jank?

We're going to use these questions to help ensure we use the right metrics for understanding a typical user experience on our site. We can think of these as user-centric performance metrics.

Web pages don't just appear in the browser in their fully formed state. They are typically made up of dozens of different types of network request all loading in real-time, visible to the user.

In many ways, this progressive loading is one of the best "features" of the web – it enables the page to be useful even if it has not fully loaded or when some of the requests have failed. It also gives clearer feedback to the user that something is happening than an empty screen. The downside to this is it makes poor performance painfully visible to the user.

If your page is dependant on lots of requests before it can display anything to the user then this will feel slower than a site which dispays content very quickly but then continues loading for a longer overall period. This is known as perceived performance and is where art gets mixed with science as you utilise the progressive nature of the web to ensure your webpages aren't just technically fast but feels it too. Sometimes changes which make your site slower will actually make the page feel faster – which makes measuring based on perception really important.

“Good user experience is not captured by a single point in time. It's composed of a series of key milestones in your users' journey. Understand the different metrics and track the ones that are important to your users' experience.”

Google, How To Think About Speed Tools

This whole progressive loading situation means that there is no single metric that categorically tells you how well you're doing. Instead, you need to use multiple metrics that cover the key milesones in the page load. This is especially true as the various metrics that are available all take a different perspective – some measure when something is visible on the page whilst others might check how much time the CPU is spending processing long tasks.

For example, you want to know how long it takes until the user can see something on the page so that we can answer the "is it happening?" question. To measure this, we can use the metric First Contentful Paint.

Frames capturing the key milestones in the page load

Similarly, the point where the majority of our content is visible on the page will answer the "is it useful?" question which can be done using Largest Contentful Paint. Time to Interactive helps answer "is it usable?" by tracking when enough of the logic of the page has loaded for users to be able to interact with it, whilst First Input Delay helps with "is it delightful?" by letting us know just how quickly our pages can respond to interaction to enable a smooth experience without lag.

Visualising these milestones as a graph helps highlight some of the potential pitfalls of only considering the overall load time. Focusing on the early milestones may mean the page content starts to appear quickly, but if the page is unresponsive because of a slow Time to Interactive then it'll feel broken to a user. Similarly, if you focus just on reducing Time to Interactive but users are still spending a long time with a blank screen as the page loads they will feel like the page is unresponsive and abandon the site before it's even finished loading.

Line graph showing a time on the x-axis and percentage complete on the y-axis with example lines. One linear, one with a slow start to symbolise the "is it happening?" and "is it useful?" questions, and one with a slow finish to represent the "is it usable?" question — Sample graph showing different the perceived performance of different pages can be even if the overall loading time is the same

User-centric metrics are an area that has been rapidly evolving over the last few years. A lot of the performance metrics that have historically been used, like the onload event, are telling us about performance as seen internally by the browser which doesn't neccessarily translate into how that is then perceived by the user. User-centric metrics help fill this gap with ever-increasing accuracy.

Within this space there are still a number of different metrics available, with some being best suited to lab-based synthetic testing whilst others are best suited to data from real user monitoring (RUM).

I typically use 6 metrics for getting a complete understanding of the page load from start to finish:

First Contentful Paint
Largest Contentful Paint
Time to Interactive
Total Blocking Time
First Input Delay
Cumulative Layout Shift

Google's Web Vitals initiative aims to simplify and standardise web performance metrics. A evolving subset of these, known as Core Web Vitals, act as the key indicators of page experience that all sites should pay attention to and will be used as a ranking factor in search results. The Core Web Vital metrics as currently Largest Contentful Paint, First Input Delay and Cumulative Layout Shift.

A key consideration when choosing your metrics is how you will go about measuring them. You'll want to make sure you're picking the metrics that can be measured by your performance monitoring tooling. Some metrics as unique to either synthetic or RUM tools, whilst others can be measured by both.

With the Core Web Vital metrics, Google has committed to making these available across a range of their tools and services giving you a number of ways you can start tracking your performance.

Proxy metrics

User-centrics metrics help you measure your performance as experienced by users. What they don't tell you is why your site took the time it did to load. To understand the why, we need to look at the factors which influenced your performance.

The metrics which tell us about these factors are known as proxy metrics – on their own they only give you an incidation of what your performance might be like but combining them with user-centric metrics helps you diagnose and improve your performance.

For each user-centric metric that are a number of different proxy metrics that could be contributing towards it. Think of user-centric metrics as the ones you benchmark against and monitor over time whilst your proxy metrics are more like a toolbox of metrics where you'll focus on the relevant ones for a shorter period of time as-and-when required.

As user-centric metrics are looking at key milestones of the page load they tend to use time as the unit they measure. Proxy metrics use a wider range of units, including time, quantity and size.

There are a lot of proxy metrics to choose from. Here are a few examples:

Time to First Byte – how long it takes for the server to start sending data to the user
Total number of assets – how many files does the page need for it to load correctly?
Size of page – size in bytes of the various assets on the page
JavaScript Long Tasks – amount of time spent on particularly CPU-intensive processing

Example

It's taking a long time for your page to start to display anything when it loads. Typically, this means you'll have a slow First Contentful Paint time as your page isn't painting anything for the user to see.

To understand the why, there are few proxy metrics we could look at.

Is this because your server is slow at responding to requests? Check your Time to First Byte as this forms a large part of a typical FCP time. Alternatively, this could be related to how your assets are loaded on the page. Do you have lots of CSS and JavaScript files required for your page to load? How large are they? Blocking JavaScript? Does your page require JavaScript in order for it to render?

Digging into these metrics gives you an in-depth understanding of how your page is behaviing, and why you have a slow First Contentful Paint.

Recommended metrics

Earlier, I mentioned the six user-centric metrics that I typically use when looking at web performance. There's plenty of in-depth information available elsehwere on what these metrics mean, how to measure them and what impacts them but here's a quick summary for each of them.

First Contentful Paint

The time taken, in seconds, for the page to first start displaying text or images. 2 seconds or less is considered fast. This metric is useful for demonstrating to the user that something is happening – the page has started loading.

Typically, First Contentful Paint can be negatively impacted by slow server response times or by relying purely on client-side rendering for large JavaScript applications.

https://web.dev/first-contentful-paint/

First Contentful Paint as the first key milestone in the page load

Largest Contentful Paint

Point in the page load when the main content has loaded, defined as the largest image or text block within the viewport. This is a good measurement of how long it takes for the page content to be visible on the page, and therefore for the page to start being useful. Reaching Largest Contentful Paint within 2.5 seconds is considered good.

https://web.dev/lcp/

Largest Contentful Paint arriving part way through the page load

First Input Delay

The delay, or latency, between a user first interacting with your site such as by pressing a button and your code being able to respond to that interaction. This isn't a measurement of how long your event handler itself takes, but instead how long the browser is blocked from executing your code due to the main thread being occupied by other tasks. A 100 ms First Input Delay is considered good – which corresponds to the maximum time usability studies provide in order for the interaction to feel instant.

As First Input Delay requires a human interaction it can only be measured with real user monitoring tools. See Total Blocking Time for a synthetic alternative.

https://web.dev/fid/

Time to Interactive

Time to Interactive helps you find that moment in time where your page goes beyond just looking like it's loaded and is actually usable. It does this by looking for when there are no longer long JavaScript tasks running in the browser and aren't many active network tasks.

If you're relying on frameworks like React or Angular it's typically Time to Interactive which will suffer as browsers now need to process large amounts of JavaScript in order to get the page interactive. Google recommends a Time to Interactive of under 5 seconds.

https://web.dev/tti/

Time to Interactive highlighted towards the end of the page load

Total Blocking Time

Total Blocking Time makes for a good synthetic partner to First Input Delay, and typically correlates well with Time to Interactive but there are some differences in the story it tells us.

It's calculated by combining the amount of time over 50 milliseconds spent on each long task between First Contentful Paint and Time to Interactive. So rather than telling us when the page is fully interactive, Total Blocking Time tells us just how much work it was doing to get to that state.

If your page has a long Total Blocking Time then there's a very high chance that a user attempting to interact with the page as it loads will find the page "locked" and noticeably unresponsive – creating that link with First Input Delay.

300 milliseconds or less is considered a good Total Blocking Time.

https://web.dev/tbt/

Cumulative Layout Shift

Rather than measuring time, Cumulative Layout Shift is instead a score showing how much your layout moves around (shifts) for a typical user. These layout shifts are likely to be visually jarring for the user and could be caused by things like your webfonts loading, or new elements being inserted causing the rest of the layout to shift.

Cumulative Layout Shift is the sum of all layout shifts experienced on a page. Each layout shift is calculated by multiplying two fractions – impact and distance. The impact fraction measures how much of the viewport was impacted in the frames before and after the layout shift. Whereas the distance fraction measures the distance within the viewport that the element moved between the two frames.

layout shift score = impact fraction * distance fraction

Combining both impact and distance helps to ensure we track both small elements moving around the page a lot, and larger elements moving about a little bit on the page. A Cumulative Layout Shift of less than 0.1 is considered good.

https://web.dev/cls/

Lighthouse Aggregate Scores

We've established that it takes multiple user-centric metrics to really understand the user experience of your site performance.

One of the most popular synthetic performance testing tools, Lighthouse, provides an aggregate percentage score for your performance based on a six different user-centric metrics.

This gives you a nice and straightforward score, but on it's own it doesn't help you with understanding the overall user experience or how to go about improving it. The Lighthouse documentation itself advises that it might be "more useful to think of your site performance as a distribution of scores, rather than a single number".

Weightings of performance metrics: 15% First Contentful Paint, 15% Speed Index, 25% Largest Contentful Paint, 15% Time to Interactive, 25% Total Blocking Time and 5% Cumulative Layout Shift — Weightings for performance scoring in Lighthouse 6

Summary

Performance is best measured as the timing of a number of key milestones as the page loads. These milestones are known as user-centric metrics, giving us an understanding of how users perceive your performance.

Pick a small number of metrics that you can measure over time using the performance tooling you have available, such as Lighthouse or SpeedCurve.

Start with user-centric metrics, but use proxy metrics as secondary information to help delve into the technical reasons behind the timings they show.

Don't limit the proxy metrics you use – think of them like a toolbox where you pick the right metrics for the area you are focusing on.

Cover photo by patricia serna on Unsplash.