Email Harvesting - Uses publicly available information & databases like search engines (NetCraft). Useful for gathering information like Emails, Subdomains, Hosts, Employee Names, Open Ports, Banners from different public sources like search engines, PGP key servers and SHODAN computer database.
theHarvester –help
theHarvester -d microsoft.com -l 500 -b google
theHarvester -d microsoft.com -l 500 -b linkedin
theHarvester -d certifiedhacker.com -l 300 -b all -f report
<aside> 🚨 If you having trouble to export the HTML file, keep in mind it's a huge amount of information being collected, you can reduce the search engines/sources, instead to use them all at once.
</aside>
Sometimes the output file might default to /usr/lib/python3/dist-packages/theHarvester/
grep -Po '(?<=\\<hostname\\>)[^\\s]+?(?=\\<\\/hostname\\>)' theHarvester_results.xml | sort -uf | tee -a subdomains.txt
grep -Po '(?<=\\<email\\>)[^\\s]+?(?=\\<\\/email\\>)' theHarvester_results.xml | sort -uf | tee -a emails.txt
grep -Po '(?<=record\\:\\"people\\"\\,result\\:\\").+?(?=\\"\\})' theHarvester_results.xml.html | sort -u
Also have an interval between them because many search engines will actually detect too many requests coming from your IP and they may block the requests and you may get the occasional case of having a captcha.”