Enumeration
Find all content and functionality on a website, to get an idea of the attack surface. Often through fuzzing
Last updated
Find all content and functionality on a website, to get an idea of the attack surface. Often through fuzzing
Last updated
For a quick recursive map of a website the feroxbuster
tool has great defaults. While it uses a medium-sized wordlist to test for non-404-like responses, it also parses links and directory listings in responses to discover even more content. While it does not have much customization for a scan, it's great for a first scan if you need something quick:
For more control over your scan, ffuf
is a great choice. It allows you to easily create your own rules for exactly how the website should be fuzzed, like where inputs are placed, what is put there, and how a good response is defined.
Check out FFUF.me for a great tutorial on how to use various options in the tool.
There is also a ffuf
module in my default tool!
Good results come from good wordlists. You also don't want to wait weeks for a scan to complete, so a short but packed wordlist is often the best choice, while this depends on your test. The SecLists repository is a collection of many such wordlists for all kinds of purposes, including discovering web content:
common.txt
:
4715 common web paths (small), alphabetically ordered
raft-large-files.txt
& raft-large-directories.txt
:
~100.000 total files and directories (large), ordered by count
subdomains-5000.txt
:
top 5000 subdomains (small), ordered by count
Another more recent resource is the autogenerated wordlists from Assetnote:
As opposed to the brute-force methods shown above, most public websites get indexed many times by search engines and other services. We can use these to find content that was indexed, but not easily findable in the results by creating complex queries.
Google search indexes lots of web pages and makes them easily searchable. This is nice for us web testers because we can ask it for pages from a certain site, and gives us lots of results. We can also ask for more specific and obscure results.
One simple way to ensure we only get pages on the target domain is to use the site:
keyword. Simply put your domain in there to only find results on that host.
site:gitbook.com
Then we can add things like ext:
to specify the file extension of the webpage.
site:gitbook.com ext:pdf
Another useful trick is the -
sign. Use this with any keyword to exclude any results that match that word.
site:gitbook.com ext:pdf -files.gitbook.com
When looking at a result from your query, you might find a page that has some interesting content in the description but appears offline when you click the link. Google has a previous (cached) version of the site with the content, but right now you can only see a preview.
cache:https://gitbook.com/about
A more powerful version of a search engine cache is the Wayback Machine, which archives snapshots of websites at specific times. If a website was changed, or some information was removed, it can often still be found using this tool. Simply search for a URL and you'll find a calendar full of snapshots to choose from.
There may be a lot of snapshots and different pages. To analyze the results there are a few options like Changes which track changes in the HTML code delivered to the browser, show you at what points the biggest changes happened, and use the @
icons to compare them. This way you won't have to search endlessly to find that one snapshot where the page changed.
Another useful option is URLs which lists all known URLs in a table where you can search. The waybackurls
tool can also extract all these URLs for you to analyze locally with more tools and can be a very effective way of finding many pages with parameters too.
Here is a polyglot payload I made of a few different injection attacks with various pieces of syntax. If any part of this payload is removed, transformed or causes errors on the target, you might have injected something and it is worth reverse engineering what part of the payload caused it to see if it is exploitable (url-encoded, JSON):
Here is another specifically for blind command injection that tries to work in as many different contexts as possible with filter bypasses. If the application waits for any multiple of 5 seconds, it has likely worked and you can try more targetted payloads (url-encoded, JSON):
For less attack-focussed fuzzing it is sometimes useful to find what characters are allowed to give you ideas on possible bypasses. Python's string.printable
variable contains all printable ASCII characters. You can input this string and see if anything is blocked. If you only get a simple "error" message, you can use binary search to remove half of the payload and see what character causes the error (keep in mind that there may be multiple) (url-encoded, JSON):
To view it, you can click the three dots after the result, press the arrow down, and view Cached. Another way to manually do this for any URL is by prefixing it with cache:
, for example: