theHarvester: Fast OSINT Recon — Subdomains, Emails & Hosts
When an authorized engagement starts, one of the fastest tools for getting an initial view of an organization’s digital footprint is theHarvester. It is a CLI tool that unifies dozens of public sources behind a single command, returning subdomains, emails, IPs, hosts, and employees in seconds. In this article we look at how to install it, which sources deliver the best results, and how to integrate it into a complete OSINT workflow.
Legal & ethical scope: theHarvester queries public sources about a domain. Use it only inside engagements with written authorization, against your own domains, against bug bounty entities that are in scope, or against lab placeholders (
example.com,test.local). Use against third parties without authorization may violate the GDPR and equivalent privacy laws.
What theHarvester is
theHarvester (Christian Martorella / laramies, on GitHub) is one of the longest-standing OSINT tools, designed for fast external reconnaissance. It aggregates data from:
- Search engines: Bing, Yahoo, Baidu, Qwant, DuckDuckGo.
- DNS & CT sources: crt.sh, CertSpotter, DNSdumpster, ThreatCrowd, HackerTarget.
- Code & metadata: GitHub, GitLab.
- Threat intel: Shodan, VirusTotal, SecurityTrails, Censys, AlienVault OTX (with API keys).
- Social/business: Anubis-DB, RapidDNS, DuckDuckGo, Baidu.
It is the “Swiss army knife” for the first recon pass — fast, broad, and easy to drop into scripts.
Where it fits in the methodology
Phase: Passive External Reconnaissance. MITRE ATT&CK: T1590 — Gather Victim Network Information and T1589 — Gather Victim Identity Information.
In a typical workflow:
- theHarvester for a fast first pass (subdomains + emails).
- Amass for deep enumeration.
- Recon-ng for structured storage in a database.
- Maltego for visualization.
Installation
Kali / Parrot
Pre-installed. Launch with:
theHarvester -h
From source (always latest)
git clone https://github.com/laramies/theHarvester.git
cd theHarvester
python3 -m venv venv
source venv/bin/activate
python3 -m pip install -r requirements/base.txt
python3 theHarvester.py -h
Docker
docker pull theharvester/theharvester
docker run --rm -it theharvester/theharvester -d example.com -b bing,duckduckgo
API keys — essential for full coverage
The api-keys.yaml file (usually at ~/.theHarvester/ or at the project root) accepts:
apikeys:
bing:
key: YOUR_BING_KEY
shodan:
key: YOUR_SHODAN_KEY
securityTrails:
key: YOUR_SECTRAILS_KEY
github:
key: YOUR_GITHUB_TOKEN
hunter:
key: YOUR_HUNTER_KEY
virustotal:
key: YOUR_VT_KEY
Free tiers are sufficient for educational use. Never commit this file to a public repository.
Core parameters
-d <domain> | Target domain (for example, example.com). |
-b <source> | Source or sources (comma-separated, or all). |
-l <limit> | Result limit per source. |
-s <start> | Start offset for paginated sources. |
--dns-lookup | DNS resolution on discovered hosts. |
--dns-brute | Subdomain brute force with a wordlist. |
--take-over | Check for subdomain takeover vulnerabilities (CNAMEs pointing to unused services). |
--shodan | Lookup hosts on Shodan for services and banners. |
-f <file> | Output to JSON and HTML. |
--screenshot <dir> | Screenshots of discovered hosts (requires Playwright). |
Practical examples (placeholder targets)
1. Basic search for subdomains and emails
theHarvester -d example.com -b bing,duckduckgo,crtsh -l 500 -f example_basic
Produces example_basic.json and example_basic.html with every discovered subdomain, email, and IP.
2. Targeted at DNSdumpster
theHarvester -d example.com --dns-lookup -b dnsdumpster
DNSdumpster frequently surfaces subdomains that other sources miss.
3. Full pass with all free sources
theHarvester -d example.com \
-b bing,duckduckgo,baidu,crtsh,dnsdumpster,hackertarget,rapiddns,anubis,certspotter \
-l 1000 --dns-lookup \
-f example_full
4. Subdomain brute force
theHarvester -d example.com --dns-brute
Active mode — sends DNS queries. Confirm it is inside the engagement scope.
5. Subdomain takeover check
theHarvester -d example.com -b crtsh --take-over
Identifies subdomains pointing to abandoned services (for example, an unused S3 bucket or an orphaned GitHub Pages target) — high-severity findings for the report.
6. Pipeline for a real workflow
# 1) Subdomain harvesting
theHarvester -d example.com -b all -l 1000 -f harvest_out
# 2) Extract subdomains from the JSON
jq -r '.hosts[]' harvest_out.json | sort -u > subs.txt
# 3) Probe live hosts
cat subs.txt | httpx -silent -o live.txt
# 4) Port scan with Nmap (only against in-scope hosts)
nmap -iL live.txt -sV -T3 -oN nmap_results.txt
Common mistakes
- All sources without API keys: Using
-b allmeans many sources fail without keys. Pick a targeted set. - Outdated Python: New versions require Python 3.10+. Use
pyenvon older systems. - No cross-validation: Every source delivers different results. Combine with Amass and subfinder.
- Rate-limit without fallback: After many runs, Google/Bing throttle you. Prefer DuckDuckGo and crt.sh as resilient fallbacks.
- Active mode out of scope:
--dns-bruteand--screenshotperform active queries — confirm they are inside engagement scope.
Defensive / Blue team perspective
- Self-recon: Run theHarvester against your own domains periodically — find forgotten assets before an attacker does.
- DNS hygiene:
- Disable public AXFR.
- Clean up old CNAMEs (reduces subdomain-takeover risk).
- Use DNSSEC and keep public hostnames to a minimum.
- Email obfuscation: Avoid publishing corporate emails — prefer contact forms.
- Certificate Transparency monitoring: Alert on new certificates for your domains — detects phishing and typosquat.
Best practices
- Start with passive sources, escalate to
--dns-bruteonly where required. - Record command + timestamp + flags for every run in the pentest report.
- Combine with Amass, subfinder, and assetfinder for completeness.
- Export to JSON so you can filter with
jqand pipe into other tools. - For deeper analysis, load the results into Maltego or Recon-ng.
Summary
theHarvester is the fastest way to get a first picture of an organization’s digital footprint. Used casually, it returns basic results; in a serious engagement, with the right API keys and combined with Amass and Recon-ng, it is one of the most effective tools of the Information Gathering phase.
Next steps
- Amass — deep subdomain enumeration
- Recon-ng — Part 1
- Maltego — OSINT link analysis
- All Information Gathering articles
- External references: theHarvester on GitHub, MITRE T1590.
For complete training in OSINT and external reconnaissance, explore the courses at Audax Cybersecurity Academy.

