Enumeration

Find all content and functionality on a website, to get an idea of the attack surface. Often through fuzzing

Find Content

For a quick recursive map of a website the feroxbuster tool has great defaults. While it uses a medium-sized wordlist to test for non-404-like responses, it also parses links and directory listings in responses to discover even more content. While it does not have much customization for a scan, it's great for a first scan if you need something quick:

$ feroxbuster -u http://example.com

For more control over your scan, ffuf is a great choice. It allows you to easily create your own rules for exactly how the website should be fuzzed, like where inputs are placed, what is put there, and how a good response is defined.

Check out FFUF.me for a great tutorial on how to use various options in the tool.

Examples
# # Simplest example, using a wordlist at the start of a path and auto-calibrating
$ ffuf -u http://example.com/FUZZ -w common.txt -ac
# # Probe for unknown virtual hosts on a domain by changing the Host header
$ ffuf -u http://example.com/ -H 'Host: FUZZ.example.com' -w subdomains.txt
# # Find parameters that alter the response
$ ffuf -u http://example.com/?FUZZ=1 -w parameters.txt
# # Use payload fuzzing to do less guesswork, for example Path Traversal
$ ffuf -u http://example.com/?page=FUZZ -w path-traversal.txt

# # POST with JSON data and fuzz value, filtered on 'error' RegEx
$ ffuf -X POST -u http://example.com/ -H 'Content-Type: application/json' -d '{"name": "FUZZ", "anotherkey": "anothervalue"}' -fr 'error' -w values.txt
# # Fuzz multiple parameter and values at the same time, matching reflected values
$ ffuf -u http://example.com/?PARAM=VAL -w params.txt:PARAM -w values.txt:VAL -mr "VAL"
# # POST form data using command substitution for a 1-100 sequence of IDs
$ ffuf -X POST -u http://example.com/ -H 'Content-Type: application/x-www-form-urlencoded' -d 'id=FUZZ&action=view' -fs 1341 -w <(seq 1 100)

There is also a ffuf module in my default tool!

$ default ffuf content http://example.com/
$ default ffuf param http://example.com/page
$ default ffuf vhost example.com

$ default ffuf auto example.com  # An attempt at combining content and vhost

Wordlists

Good results come from good wordlists. You also don't want to wait weeks for a scan to complete, so a short but packed wordlist is often the best choice, while this depends on your test. The SecLists repository is a collection of many such wordlists for all kinds of purposes, including discovering web content:

Another more recent resource is the autogenerated wordlists from Assetnote:

Find Technologies

Passively Find Content

As opposed to the brute-force methods shown above, most public websites get indexed many times by search engines and other services. We can use these to find content that was indexed, but not easily findable in the results by creating complex queries.

Googling

Google search indexes lots of web pages and makes them easily searchable. This is nice for us web testers because we can ask it for pages from a certain site, and gives us lots of results. We can also ask for more specific and obscure results.

One simple way to ensure we only get pages on the target domain is to use the site: keyword. Simply put your domain in there to only find results on that host.

site:gitbook.com

Then we can add things like ext: to specify the file extension of the webpage.

site:gitbook.com ext:pdf

Another useful trick is the - sign. Use this with any keyword to exclude any results that match that word.

site:gitbook.com ext:pdf -files.gitbook.com

Viewing Cache

When looking at a result from your query, you might find a page that has some interesting content in the description but appears offline when you click the link. Google has a previous (cached) version of the site with the content, but right now you can only see a preview.

cache:https://gitbook.com/about

Internet Archive: Wayback Machine

A more powerful version of a search engine cache is the Wayback Machine, which archives snapshots of websites at specific times. If a website was changed, or some information was removed, it can often still be found using this tool. Simply search for a URL and you'll find a calendar full of snapshots to choose from.

There may be a lot of snapshots and different pages. To analyze the results there are a few options like Changes which track changes in the HTML code delivered to the browser, show you at what points the biggest changes happened, and use the @ icons to compare them. This way you won't have to search endlessly to find that one snapshot where the page changed.

Another useful option is URLs which lists all known URLs in a table where you can search. The waybackurls tool can also extract all these URLs for you to analyze locally with more tools and can be a very effective way of finding many pages with parameters too.

cat domains.txt | waybackurls | tee wayback-urls.txt

Fuzzing Inputs / Polyglots

Here is a polyglot payload I made of a few different injection attacks with various pieces of syntax. Suppose any part of this payload is removed or interpreted differently by the target. In that case, you might have injected something and it is worth reversing what part of the payload caused it to see if it is exploitable (url encoded, JSON).

Generic Payload
|:<?=z>"\"a'\'b`\`/../${{<%[%'"}}%s);&|👨‍💻c
d\

Here is another specifically for blind command injection that tries to work in as many different contexts as possible with filter bypasses. If the application waits for any multiple of 5 seconds, it has likely worked and you can try more targetted payloads (url encoded, JSON):

Blind Command Injection
/*$(sleep 5)`sleep 5``*/-sleep(5)-'/*$(sleep 5)`sleep 5` #*/-sleep(5)||'"||sleep(5)||"/*`*/
sleep 5

For less attack-focussed fuzzing it is sometimes useful to find what characters are allowed to give you ideas on possible bypasses. Python's string.printable variable contains all printable ASCII characters. You can input this string and see if anything is blocked. If you only get a simple "error" message, you can use binary search to remove half of the payload and see what character causes the error (keep in mind that there may be multiple) (url encoded, JSON):

>>> import string
>>> string.printable
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'

Last updated