In the previous two editions of this series on recon, we began by understanding the significance of recon in identifying hidden vulnerabilities. Part 1 introduced subdomain enumeration, passive and active recon, and valuable tools. Part 2 took us deeper into directory bruteforcing, public archive URL fetching, parameter discovery, and advanced dorking techniques.
Now, in Part 3 we are going to learn automating majority of the open-source tools and processes for making our lives easier. This automation walkthrough is not just about thinking of automation but think of automation at scale. Almost every bug bounty hunter uses some form of automation to automate their process. The tools mentioned in previous 2 parts along with all of their different options are not meant to be typed by hand. Just imagine a scenario where you get 10K subdomains after subdomain enumeration, anyone wise enough would know that it is not possible to type in the full command: ffuf -u https://sub1.target.com -w seclists/Discovery/Web-Content/directory-list-2.3-medium.txt
for every subdomain that was found!
Most of the bug bounty hunters will at least do something like:
for sub in subs.txt; do ffuf -u https://$sub -w seclists/Discovery/Web-Content/directory-list-2.3-medium.txt | tee -a $sub.directories
While this can be helpful for smaller mundane penetration testing tasks, it is nowhere nearly efficient if you want to wake up everyday with a cup of coffee with some actionable intel to begin your bug hunting for the day, and even land some low hanging bugs,who knows!
Since the scale of bug bounty scopes and targets is really large these days with thousands of subdomains and new changes and functionalities being introduces continuosly , we have to up our recon game to match that scale. So it is really important to design an infrastructure which will regularly scan all the targets from a database so that we are among the first ones to detect the existence of a new subdomain on a target, or a new endpoint. Moreover, we must also setup a notification mechanism which automatically alerts us if something interesting has popped up in one of our targets.
This blog will completely skip the code and only go over how to design an architecture which can be implemented in our automation and is highly scalable.
Before even beginning to write the code, we must come up with a reliable, scalable, upgradable and maintainable architecture for our recon process. This is a crucial step if we don't want to start with ground up after realising any error in assumption.
First, Let's get the basic understanding of what I mean by reliable, scalable, upgradable and maintanable out of the way.
Since open source tools form the backbone of our entire process, we have to ensure that any modification of these tools does not cause an avalanche effect. So we need to assume that any of these tools can fail anytime. Thus one can say that reliability directly relates to error handling.
As there are really a wide variety of targets out there, we must ensure that our process scales to fit the target scope. This means that we must only use so much resources as the target demands. Now that the first thing that comes to mind after hearing this is to use Cloud Based Solutions. And indeed we have to move to cloud if we want to be scalable as they provide auto scaling up and down based on the load. There are basically two design choices for scaling available:
New CVE's and exploitation techniques appear nearly every week with the ever growing research in cyber security by fantastic researchers. To keep our process up-to-date with the latest exploits, we want an ability to easily add new commands and pocs in our code. Nuclei templates are the best example of how this can be achieved.
We must ensure that our infrastructure is divided into small modules, with each module performing one specific action, for example one module to run commands which we define in configuration files, a separate module for spawning workers to execute jobs, another module to queue jobs and so on. This greatly reduces the time to debug and fix errors, since you just need to find that particular module and make changes at a single place.
Now that you got a clear understanding of what the key aspects that our project must include, we can start thinking of the architecture. We are going to divide the whole process into modules to accomplish the task. A high level overview of our architecture will look as seen in the below image
Some things to pay attention to in this architecture:
[NOTE] The precise technical implementation of how the process is to be carried out has been abstracted away in this blog and the technologies mentioned wherever are only examples of how a specific purpose can be achieved, ignoring their limitations.
Before ending this blog I just want to say that before building a recon process, you must always conduct thorough research on how to outsource stuff as much as possible, For Eg. The AWS Fargate can be used to handle all of the scaling for us. This is way more better than drowning in frustration trying to implement your own logic of automatically spawning and stopping containers etc. But you must make sure that you completely understand every moving piece of your automation completely as it can help you save time and money.
Always remember automation is not equivalent to automatically finding great bugs, it can only be seen as a tool to help you focus more on what's actually interesting rather than wasting time on false signals.
Recon Series: Big Automation (Part-3)