theHarvester: Fast OSINT Recon — Subdomains, Emails & Hosts

When an authorized engagement starts, one of the fastest tools for getting an initial view of an organization’s digital footprint is theHarvester. It is a CLI tool that unifies dozens of public sources behind a single command, returning subdomains, emails, IPs, hosts, and employees in seconds. In this article we look at how to install it, which sources deliver the best results, and how to integrate it into a complete OSINT workflow.

Legal & ethical scope: theHarvester queries public sources about a domain. Use it only inside engagements with written authorization, against your own domains, against bug bounty entities that are in scope, or against lab placeholders (example.com, test.local). Use against third parties without authorization may violate the GDPR and equivalent privacy laws.

What theHarvester is

theHarvester (Christian Martorella / laramies, on GitHub) is one of the longest-standing OSINT tools, designed for fast external reconnaissance. It aggregates data from:

  • Search engines: Bing, Yahoo, Baidu, Qwant, DuckDuckGo.
  • DNS & CT sources: crt.sh, CertSpotter, DNSdumpster, ThreatCrowd, HackerTarget.
  • Code & metadata: GitHub, GitLab.
  • Threat intel: Shodan, VirusTotal, SecurityTrails, Censys, AlienVault OTX (with API keys).
  • Social/business: Anubis-DB, RapidDNS, DuckDuckGo, Baidu.

It is the “Swiss army knife” for the first recon pass — fast, broad, and easy to drop into scripts.

Where it fits in the methodology

Phase: Passive External Reconnaissance. MITRE ATT&CK: T1590 — Gather Victim Network Information and T1589 — Gather Victim Identity Information.

In a typical workflow:

  • theHarvester for a fast first pass (subdomains + emails).
  • Amass for deep enumeration.
  • Recon-ng for structured storage in a database.
  • Maltego for visualization.

Installation

Kali / Parrot

Pre-installed. Launch with:

theHarvester -h

From source (always latest)

git clone https://github.com/laramies/theHarvester.git
cd theHarvester
python3 -m venv venv
source venv/bin/activate
python3 -m pip install -r requirements/base.txt
python3 theHarvester.py -h

Docker

docker pull theharvester/theharvester
docker run --rm -it theharvester/theharvester -d example.com -b bing,duckduckgo

API keys — essential for full coverage

The api-keys.yaml file (usually at ~/.theHarvester/ or at the project root) accepts:

apikeys:
  bing:
    key: YOUR_BING_KEY
  shodan:
    key: YOUR_SHODAN_KEY
  securityTrails:
    key: YOUR_SECTRAILS_KEY
  github:
    key: YOUR_GITHUB_TOKEN
  hunter:
    key: YOUR_HUNTER_KEY
  virustotal:
    key: YOUR_VT_KEY

Free tiers are sufficient for educational use. Never commit this file to a public repository.

Core parameters

-d <domain>Target domain (for example, example.com).
-b <source>Source or sources (comma-separated, or all).
-l <limit>Result limit per source.
-s <start>Start offset for paginated sources.
--dns-lookupDNS resolution on discovered hosts.
--dns-bruteSubdomain brute force with a wordlist.
--take-overCheck for subdomain takeover vulnerabilities (CNAMEs pointing to unused services).
--shodanLookup hosts on Shodan for services and banners.
-f <file>Output to JSON and HTML.
--screenshot <dir>Screenshots of discovered hosts (requires Playwright).

Practical examples (placeholder targets)

1. Basic search for subdomains and emails

theHarvester -d example.com -b bing,duckduckgo,crtsh -l 500 -f example_basic

Produces example_basic.json and example_basic.html with every discovered subdomain, email, and IP.

2. Targeted at DNSdumpster

theHarvester -d example.com --dns-lookup -b dnsdumpster

DNSdumpster frequently surfaces subdomains that other sources miss.

3. Full pass with all free sources

theHarvester -d example.com \
  -b bing,duckduckgo,baidu,crtsh,dnsdumpster,hackertarget,rapiddns,anubis,certspotter \
  -l 1000 --dns-lookup \
  -f example_full

4. Subdomain brute force

theHarvester -d example.com --dns-brute

Active mode — sends DNS queries. Confirm it is inside the engagement scope.

5. Subdomain takeover check

theHarvester -d example.com -b crtsh --take-over

Identifies subdomains pointing to abandoned services (for example, an unused S3 bucket or an orphaned GitHub Pages target) — high-severity findings for the report.

6. Pipeline for a real workflow

# 1) Subdomain harvesting
theHarvester -d example.com -b all -l 1000 -f harvest_out

# 2) Extract subdomains from the JSON
jq -r '.hosts[]' harvest_out.json | sort -u > subs.txt

# 3) Probe live hosts
cat subs.txt | httpx -silent -o live.txt

# 4) Port scan with Nmap (only against in-scope hosts)
nmap -iL live.txt -sV -T3 -oN nmap_results.txt

Common mistakes

  • All sources without API keys: Using -b all means many sources fail without keys. Pick a targeted set.
  • Outdated Python: New versions require Python 3.10+. Use pyenv on older systems.
  • No cross-validation: Every source delivers different results. Combine with Amass and subfinder.
  • Rate-limit without fallback: After many runs, Google/Bing throttle you. Prefer DuckDuckGo and crt.sh as resilient fallbacks.
  • Active mode out of scope: --dns-brute and --screenshot perform active queries — confirm they are inside engagement scope.

Defensive / Blue team perspective

  • Self-recon: Run theHarvester against your own domains periodically — find forgotten assets before an attacker does.
  • DNS hygiene:
    • Disable public AXFR.
    • Clean up old CNAMEs (reduces subdomain-takeover risk).
    • Use DNSSEC and keep public hostnames to a minimum.
  • Email obfuscation: Avoid publishing corporate emails — prefer contact forms.
  • Certificate Transparency monitoring: Alert on new certificates for your domains — detects phishing and typosquat.

Best practices

  • Start with passive sources, escalate to --dns-brute only where required.
  • Record command + timestamp + flags for every run in the pentest report.
  • Combine with Amass, subfinder, and assetfinder for completeness.
  • Export to JSON so you can filter with jq and pipe into other tools.
  • For deeper analysis, load the results into Maltego or Recon-ng.

Summary

theHarvester is the fastest way to get a first picture of an organization’s digital footprint. Used casually, it returns basic results; in a serious engagement, with the right API keys and combined with Amass and Recon-ng, it is one of the most effective tools of the Information Gathering phase.

Next steps

For complete training in OSINT and external reconnaissance, explore the courses at Audax Cybersecurity Academy.

Reviews

0 %

User Score

0 ratings
Rate This