Enumeration
Find all content and functionality on a website, to get an idea of the attack surface. Often through fuzzing
Find Content
For a quick recursive map of a website the feroxbuster
tool has great defaults. While it uses a medium-sized wordlist to test for non-404-like responses, it also parses links and directory listings in responses to discover even more content. While it does not have much customization for a scan, it's great for a first scan if you need something quick:
For more control over your scan, ffuf
is a great choice. It allows you to easily create your own rules for exactly how the website should be fuzzed, like where inputs are placed, what is put there, and how a good response is defined.
Check out FFUF.me for a great tutorial on how to use various options in the tool.
There is also a ffuf
module in my default tool!
Wordlists
Good results come from good wordlists. You also don't want to wait weeks for a scan to complete, so a short but packed wordlist is often the best choice, while this depends on your test. The SecLists repository is a collection of many such wordlists for all kinds of purposes, including discovering web content:
common.txt
: 4715 common web paths (small), alphabetically orderedraft-large-files.txt
&raft-large-directories.txt
: ~100.000 total files and directories (large), ordered by countsubdomains-5000.txt
: top 5000 subdomains (small), ordered by count
Another more recent resource is the autogenerated wordlists from Assetnote:
Find Technologies
Passively Find Content
As opposed to the brute-force methods shown above, most public websites get indexed many times by search engines and other services. We can use these to find content that was indexed, but not easily findable in the results by creating complex queries.
Googling
Google search indexes lots of web pages and makes them easily searchable. This is nice for us web testers because we can ask it for pages from a certain site, and gives us lots of results. We can also ask for more specific and obscure results.
One simple way to ensure we only get pages on the target domain is to use the site:
keyword. Simply put your domain in there to only find results on that host.
site:gitbook.com
Then we can add things like ext:
to specify the file extension of the webpage.
site:gitbook.com ext:pdf
Another useful trick is the -
sign. Use this with any keyword to exclude any results that match that word.
site:gitbook.com ext:pdf -files.gitbook.com
Viewing Cache
When looking at a result from your query, you might find a page that has some interesting content in the description but appears offline when you click the link. Google has a previous (cached) version of the site with the content, but right now you can only see a preview.
cache:https://gitbook.com/about
Internet Archive: Wayback Machine
A more powerful version of a search engine cache is the Wayback Machine, which archives snapshots of websites at specific times. If a website was changed, or some information was removed, it can often still be found using this tool. Simply search for a URL and you'll find a calendar full of snapshots to choose from.
There may be a lot of snapshots and different pages. To analyze the results there are a few options like Changes which track changes in the HTML code delivered to the browser, show you at what points the biggest changes happened, and use the @
icons to compare them. This way you won't have to search endlessly to find that one snapshot where the page changed.
Another useful option is URLs which lists all known URLs in a table where you can search. The waybackurls
tool can also extract all these URLs for you to analyze locally with more tools and can be a very effective way of finding many pages with parameters too.
Fuzzing Inputs / Polyglots
Here is a polyglot payload I made of a few different injection attacks with various pieces of syntax. Suppose any part of this payload is removed or interpreted differently by the target. In that case, you might have injected something and it is worth reversing what part of the payload caused it to see if it is exploitable (url encoded, JSON).
Here is another specifically for blind command injection that tries to work in as many different contexts as possible with filter bypasses. If the application waits for any multiple of 5 seconds, it has likely worked and you can try more targetted payloads (url encoded, JSON):
For less attack-focussed fuzzing it is sometimes useful to find what characters are allowed to give you ideas on possible bypasses. Python's string.printable
variable contains all printable ASCII characters. You can input this string and see if anything is blocked. If you only get a simple "error" message, you can use binary search to remove half of the payload and see what character causes the error (keep in mind that there may be multiple) (url encoded, JSON):
Last updated