Information gathering on a domain

2023/03/25

Information gathering on a domain

Recently, I was asked what one can find about a website, or in more general terms, a domain. Since this is a very broad and vague question you can dig very deep into this topic.

I’ll try to list some tools I use(d) but don’t expect it to be definitive or complete.

Prerequisites

I assume you know how to work with a terminal, install software and handle a Linux.

Kali Linux was used for my experiments. Of course you may use any other Linux distribution. Some tools are just regular websites that don’t require anything else than a simple browser. Personally, I prefer a command line since you can chain the commands and there are no captchas. But sometimes there is just no such alternative.

target.corp is a fictional example domain. Some other well known domains are used discuss details.

Warning

Even though I do my best I can be wrong. I won’t take any responsibility should you be harmed or suffer consequences. You should always consider the possibility of an adversary trying to attack or working with malicious content.

Remember that using websites that offer free service may track you. Also be aware that querying services may leave a trail of “fingerprints”. OPSEC implications are not discussed, it is left to the reader to first understand the tools and then use them.

You have been warned. ⚠️

“Passive” / silent

In these terms passive is meant as “you are not directly doing a port scan or heavily hammering on the webserver” which could leave obvious traces and creates a lot of noises. Therefore in quotes.

whois

In a nutshell: whois target.corp or whois 1.2.3.4

Use case: Find out who owns a domain or IP.

whois is probably best known to query domain or IP address block information, basically telling us who owns this resource.

$ whois wikipedia.org
Domain Name: wikipedia.org
Registry Domain ID: d1a549fdfc3c4dd389c3c575a889efb1-LROR
Registrar WHOIS Server: http://whois.markmonitor.com
Registrar URL: http://www.markmonitor.com
Updated Date: 2022-12-17T09:19:13Z
Creation Date: 2001-01-13T00:12:14Z
Registry Expiry Date: 2024-01-13T00:12:14Z
Registrar: MarkMonitor Inc.
Registrar IANA ID: 292
Registrar Abuse Contact Email: abusecomplaints@markmonitor.com
Registrar Abuse Contact Phone: +1.2083895740
Domain Status: clientDeleteProhibited https://icann.org/epp#clientDeleteProhibited
Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
Domain Status: clientUpdateProhibited https://icann.org/epp#clientUpdateProhibited
Registry Registrant ID: REDACTED FOR PRIVACY
Registrant Name: REDACTED FOR PRIVACY
Registrant Organization: Wikimedia Foundation, Inc.
Registrant Street: REDACTED FOR PRIVACY
Registrant City: REDACTED FOR PRIVACY
Registrant State/Province: CA
Registrant Postal Code: REDACTED FOR PRIVACY
Registrant Country: US
Registrant Phone: REDACTED FOR PRIVACY
Registrant Phone Ext: REDACTED FOR PRIVACY
Registrant Fax: REDACTED FOR PRIVACY
Registrant Fax Ext: REDACTED FOR PRIVACY
[...]
Name Server: ns0.wikimedia.org
Name Server: ns1.wikimedia.org
Name Server: ns2.wikimedia.org

whois records usually show very little information anymore nowadays. This may be because the registrar acts as a proxy or because the user turns on privacy protection. Without proper means you won’t get any further here.

It is also possible to use some web tools for this purpose like https://who.is or https://whois.domaintools.com

The same applies if you want to check the owner of an IP address.

$ whois 20.53.203.50
[...]

NetRange:       20.33.0.0 - 20.128.255.255
CIDR:           20.48.0.0/12, 20.33.0.0/16, 20.128.0.0/16, 20.36.0.0/14, 20.40.0.0/13, 20.34.0.0/15, 20.64.0.0/10
NetName:        MSFT
NetHandle:      NET-20-33-0-0-1
Parent:         NET20 (NET-20-0-0-0-0)
NetType:        Direct Allocation
OriginAS:       
Organization:   Microsoft Corporation (MSFT)
RegDate:        2017-10-18
Updated:        2021-12-14
Ref:            https://rdap.arin.net/registry/ip/20.33.0.0

dig

In a nutshell: dig +noall +answer +multiline target.corp any @<RESOLVERIP> or dig +noall +answer -x 1.2.3.4

Use case: Resolve to an IP and vice versa.

Resolving a domain to an IP address can be done with dig. There are several flags you can use to just get the IP or a lot more verbose output depending your needs. Also it lets you specify which DNS resolver to use, should your default one block certain requests.

$ dig target.corp

; <<>> DiG 9.18.12 <<>> target.corp
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24151
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;target.corp                IN    A

;; ANSWER SECTION:
target.corp.            43200    IN    A    1.2.3.4
target.corp.            43200    IN    A    5.6.7.8

;; Query time: 66 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)

In this verbose output you simply get the A-record and also the server that responded to your query.

By using the switch -t you can specify which record you want to query (like AAAA, MX, TXT etc.) or any to get query all DNS records at once. But I have noted that some DNS server stopped answering these kind of query so you might also ask a different DNS server you can specify with @9.9.9.9 for instance.

So depending what is noted in the record you can get a lot of extras like DNSKEY, RRSIG etc.

$ dig +noall +answer +multiline target.corp any @9.9.9.9
target.corp.    3600 IN    TXT "v=spf1 redirect=_spf.google.com"
target.corp.    10800 IN NSEC3PARAM 1 0 3 EA33014A
target.corp.    10800 IN NS ns1.isp.com.
target.corp.    10800 IN NS ns2.isp.net.
target.corp.    3600 IN    MX 10 aspmx.l.google.com.
target.corp.    10800 IN A 1.2.3.4

This gives a grepable output for further use with pipes.

Also a handy feature is to see to which hostname a specific IP address resolves to aka reverse IP lookup. Let’s say you have an IP address in your logs but you don’t know which server it was.

$ dig +noall +answer -x 1.2.3.4
4.3.2.1.in-addr.arpa. 589    IN    PTR    vpn-gateway.target.corp

Note that this won’t always work since the nameserver must be able to answer this PTR query. In days of dynamic DNS service it is very well possible that you end up with this situation:

$ dig +noall +answer secrethost.dynamic.dns
secrethost.dynamic.dns. 10800 IN A 100.1.2.3

$ dig +noall answer -x 100.1.2.3
3.2.1.100.in-addr.arpa. 9000    IN    PTR    3.2.1.100.dsl.customer.isp

It is also possible to use websites like https://mxtoolbox.com or https://toolbox.googleapps.com/apps/dig

geoiplookup

Example: geoiplookup 1.2.3.4

geoiplookup lets look up the originating country of an IP address.

ipwho.is

Another method is to query some online service like https://ipwhois.io/ or https://ipinfo.io/ (account required but apparentyl free)

Example:

$ curl "http://ipwho.is/188.184.9.234" | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   750    0   750    0     0   2681      0 --:--:-- --:--:-- --:--:--  2688
{
  "ip": "188.184.9.234",
  "success": true,
  "type": "IPv4",
  "continent": "Europe",
  "continent_code": "EU",
  "country": "Switzerland",
  "country_code": "CH",
  "region": "Geneva",
  "region_code": "GE",
  "city": "Geneva",
  "latitude": 46.2043907,
  "longitude": 6.1431577,
  "is_eu": false,
  "postal": "1204",
  "calling_code": "41",
  "capital": "Bern",
  "borders": "AT,DE,FR,IT,LI",
  "flag": {
    "img": "https://cdn.ipwhois.io/flags/ch.svg",
    "emoji": "🇨🇭",
    "emoji_unicode": "U+1F1E8 U+1F1ED"
  },
  "connection": {
    "asn": 513,
    "org": "CERN - European Organization for Nuclear Research",
    "isp": "CERN - European Organization for Nuclear Research",
    "domain": "cern.ch"
  },
  "timezone": {
    "id": "Europe/Zurich",
    "abbr": "CET",
    "is_dst": false,
    "offset": 3600,
    "utc": "+01:00",
    "current_time": "<TIMESTAMP>"
  }
}

Bonus: What’s my IP again?

Let’s say you want to know your own public IP. What do you do? Right, you open a browser and type “what’s my ip” into Google. Did I mention I hate leaving the shell?

In a nutshell:

curl ifconfig.me
curl ipconfig.sh
curl icanhazip.com
curl api.ipify.org

... etc.

urlscan.io

Use case: Find out what a website does, a sandbox for websites.

This webservice scans an URL and provides a lot of insights about not only the IP, the geo location but also tries to analyze the content. The result show a estimation of potential threats, detected technologies, different domains on the same IP or ASN and the reuse of resources. The reuse of resources may help to pivot to further websites and help uncover connected online presence. Some features are restricted to logged in users or are available only for paid subscription.

web.archive.org

Use case: Go back in time, website archive.

Going back in time to see changes and older versions of a website can give further insights. Especially when earlier versions contained more (unredacted) information that can give new leads on where to find connections.

There are several sites archiving the internet.

dnsdumpster.com

Use case: Find subdomains and connected ressources.

Getting more information about other (not so obvious) subdomains that are around, can lead to more data to work on. Especially if the target set up a service and forgot about it, or thinks that mysupersecretsubdomain.target.corp is secret enough. dnsdumpster will basically list domains and subdomains related to your query, sometimes even disclosing internal hostnames.

viewdns.info

Use case: DNS lookups

viewdns provides a wide range of lookups from reverse IP lookup, reverse whois lookup, IP history, whois, ping, portscan etc. Surely, the reverse whois lookup is very interesting to find other domains that have been registered with a given email.

An alternative may be https://www.reversewhois.io

theHarvester

In a nutshell: theHarvester -d target.corp -l 100 -b all

Use case: Find subdomains and emails

theHarvester leverages several services (some requiring an account) to do recon on a domain. It will gather not only IPs and subdomains, but also email addresses of the given domain. Apparently it can also take screenshots of the subdomains.

$ theHarvester -d kali.org -l 100 -b all
*******************************************************************
*  _   _                                            _             *
* | |_| |__   ___    /\  /\__ _ _ ____   _____  ___| |_ ___ _ __  *
* | __|  _ \ / _ \  / /_/ / _` | '__\ \ / / _ \/ __| __/ _ \ '__| *
* | |_| | | |  __/ / __  / (_| | |   \ V /  __/\__ \ ||  __/ |    *
*  \__|_| |_|\___| \/ /_/ \__,_|_|    \_/ \___||___/\__\___|_|    *
*                                                                 *
* theHarvester 4.2.0                                              *
* Coded by Christian Martorella                                   *
* Edge-Security Research                                          *
* cmartorella@edge-security.com                                   *
*                                                                 *
*******************************************************************

[*] Target: kali.org

[*] Searching Certspotter.
[*] Searching Baidu.
[*] Searching Duckduckgo.
[*] Searching Hackertarget.
[*] Searching CRTsh.
[*] Searching Otx.
[*] Searching Qwant.
[*] Searching Rapiddns.
[*] Searching Dnsdumpster.
[*] Searching Omnisint.
[*] Searching Threatminer.
[*] Searching Sublist3r.

[*] ASNS found: 2
--------------------
AS13335
AS16276

[*] Interesting Urls found: 5
--------------------
http://http.kali.org/
https://www.kali.org/
https://www.kali.org/blog/kali-linux-2023-1-release/
https://www.kali.org/get-kali/
https://www.kali.org/index.min.css?ver=76b50bef7fea6f7f064462013b07b7a4

[*] LinkedIn Links found: 0
---------------------

[*] IPs found: 88
-------------------
23.92.17.15
23.239.31.82
35.185.44.232
45.33.71.210
45.33.83.49
45.33.88.48
45.56.107.246
[snip]
209.126.116.149
217.70.184.56
2606:4700::6812:49f
2606:4700::6812:59f
2606:4700:3034::ac43:a5ec
2a06:98c1:3121::3

[*] Emails found: 1
----------------------
devel@kali.org

[*] Hosts found: 482
---------------------
10cake.kali.org:104.18.4.159, 104.18.5.159
10cake.kali.org:104.18.5.159
10year.kali.org:104.18.5.159
10year.kali.org:104.18.5.159, 104.18.4.159
apollo.kali.org:23.239.31.82
apollo.kali.org
[snip]
www.kali.org:kalilinux.gitlab.io
www.pkg.kali.org
www.status.kali.org
www.tools.kali.org
zeus.kali.org:144.217.77.182

EmailFinder

In a nutshell: emailfinde -d target.corp

Use case: Find email addresses

EmailFinder lets you do recon on email addresses related to this domain that can be used for further investigation. This can be used to find out where the address has been used for other accounts like social media or forums.

$ emailfinder -d kali.org
     __________      ________________
________  ____/_________  __ \__  __ \
_  _ \_  /_   __  __ \_  / / /_  /_/ /
/  __/  __/   _  / / /  /_/ /_  _, _/
\___//_/      /_/ /_//_____/ /_/ |_|


|_ Author: @JosueEncinar
|_ Description: Search emails from a domain through search engines.
|_ Version: 0.3.0b
|_ Usage: emailfinder -d domain.com

Searching in google...
Searching in bing...
Searching in baidu...
Searching in yandex...
[+] Bing discovered 3 emails
[+] bing done!
[!]  yandex error YandexDetection, Robot detected
[i] Baidu did not discover any email IDs
[+] baidu done!
[+] Google discovered 1 emails
[+] google done!

Total emails: 3
----------------
devel@kali.org
steev@kali.org
arnaudr@kali.org

Google Dork

Use case: Find hidden or forgotten gems on the internet

Even though there some automated tools to search and crawl for email addresses using Google Dorks is a very powerful way of enriching your information set. Because knowing just that a domain, email address or name exists, doesn’t give you the context or the “intelligence” to see further. Also it is well possible that the automated tools just don’t return all data or any data at all.

E.g. you could search for intext:"@target.corp" to see where email addresses show up, but be advised that it’s better to use a full email address like intext:"user@target.corp"

A source for such queries is Google Hacking Database (GHDB)

HIBP

Use case: Find user activity

The collected email addresses may be found in leaks and give you further indication where it was used. https://haveibeenpwned.com/ may give you further pointers where to look for accounts and activities. Possibly one is able to find nicknames or usernames on these platforms.

sherlock

In a nutshell: sherlock <username>

Use case: Find where a username is used. Works better with a rather unique nickname.

With usernames at hand you can try and see if they were re-used on other social media platforms. sherlock-project/sherlock automates the hunt for them and checks over 400 sites in total. In this way maybe it is possible to enlarge the view and knowledge about your target.

$ sherlock stevejobs
[*] Checking username stevejobs on:

[+] 7Cups: https://www.7cups.com/@stevejobs
[+] 8tracks: https://8tracks.com/stevejobs
[+] 9GAG: https://www.9gag.com/u/stevejobs
[+] About.me: https://about.me/stevejobs
[snip]
[+] pikabu: https://pikabu.ru/@stevejobs
[+] pr0gramm: https://pr0gramm.com/user/stevejobs
[+] wykop.pl: https://www.wykop.pl/ludzie/stevejobs

[*] Search completed with 179 results

shodan.io

Use case: Find services running on an IP or similar services by fingerprint.

Most know shodan as a service to search for vulnerable devices but using the resolved IP can also give insights on different things like other hostnames, other services running, their versions and other information like certificates. Some features require an paid account, yet the freely available data could be interesting enough to find further valuable information.

“Active” / noisy

These tools will be very noisy and create a lot of fingerprints that you were looking into the domain. Since these tools are rather advanced I won’t go into much detail but rather give a short example command, also for my own notes.

dnsenum

In a nutshell: dnsenum --noreverse target.corp

Use case: Enumerate subdomains

SparrowOchon/dnsenum2 will try to brute force subdomains using a pre-defined list

$ dnsenum --noreverse kali.org
dnsenum VERSION:1.2.6

-----   kali.org   -----


Host's addresses:
__________________

kali.org.                                0        IN    A        50.116.58.136

[snip]

Brute forcing with /usr/share/dnsenum/dns.txt:
_______________________________________________

archive.kali.org.                        0        IN    CNAME    hera.kali.org.
hera.kali.org.                           0        IN    A        192.99.45.140
backup.kali.org.                         0        IN    CNAME    polyhymnia.kali.org.
polyhymnia.kali.org.                     0        IN    A        54.39.103.103
bugs.kali.org.                           0        IN    A        192.124.249.169
forums.kali.org.                         0        IN    A        192.124.249.12
http.kali.org.                           0        IN    CNAME    hebe.kali.org.
hebe.kali.org.                           0        IN    A        192.99.200.113
old.kali.org.                            0        IN    CNAME    terpsichore.kali.org.
terpsichore.kali.org.                    0        IN    A        54.39.49.227
www.kali.org.                            0        IN    A        104.18.4.159
www.kali.org.                            0        IN    A        104.18.5.159


kali.org class C netranges:
____________________________

 50.116.58.0/24
 54.39.49.0/24
 54.39.103.0/24
 104.18.4.0/24
 104.18.5.0/24
 192.99.45.0/24
 192.99.200.0/24
 192.124.249.0/24


kali.org ip blocks:
____________________

 50.116.58.136/32
 54.39.49.227/32
 54.39.103.103/32
 104.18.4.159/32
 104.18.5.159/32
 192.99.45.140/32
 192.99.200.113/32
 192.124.249.12/32
 192.124.249.169/32

done.

dnsrecon

In a nutshell: dnsrecon -d target.corp -D /usr/share/dnsrecon/subdomains-top1mil-5000.txt -t brt

Use case: Enumerate subdomains

darkoperator/dnsrecon will try to brute force subdomains using a pre-defined list

Example:

dnsrecon -d target.corp -D /usr/share/dnsrecon/subdomains-top1mil-5000.txt -t brt
[*] Using the dictionary file: /usr/share/dnsrecon/subdomains-top1mil-5000.txt (provided by user)
[*] brt: Performing host and subdomain brute force against target.corp...
[+]      A blog.target.corp 12.34.56.78
[+]      AAAA blog.target.corp 2a02:1234:5678:12:34:56:78:C
[+]      A email.target.corp 23.34.45.67
[+]      A test2.target.corp 100.100.100.100
[+] 4 Records Found dirbuster

dirb

In a nutshell: dirb https://www.target.corp /usr/share/dirb/wordlists/small.txt

Use case: Bruteforce for unknown, hidden or forgotten folders on a webserver

dirb is a simple bruteforcer to scan for directories on a webserver using a wordlist. This could help uncover “hidden” folders or (sensitive) content that was forgotten to be removed.

Example:

$ dirb https://www.target.corp /usr/share/dirb/wordlists/small.txt

-----------------
DIRB v2.22
By The Dark Raver
-----------------

START_TIME: <DATE>
URL_BASE: https://www.target.corp/
WORDLIST_FILES: /usr/share/dirb/wordlists/small.txt

-----------------

GENERATED WORDS: 959

---- Scanning URL: https://www.target.corp/ ----
==> DIRECTORY: https://www.target.corp/intern/
==> DIRECTORY: https://www.target.corp/css/
==> DIRECTORY: https://www.target.corp/img/
==> DIRECTORY: https://www.target.corp/private/
==> DIRECTORY: https://www.target.corp/support/

---- Entering directory: https://www.target.corp/intern/ ----

---- Entering directory: https://www.target.corp/css/ ----

---- Entering directory: https://www.target.corp/img/ ----

---- Entering directory: https://www.target.corp/private/ ----

---- Entering directory: https://www.target.corp/support/ ----

-----------------
END_TIME: <DATE>
DOWNLOADED: 5754 - FOUND: 5

Note, there is a GUI version called dirbuster.

spiderfoot

In a nutshell: spiderfoot -l 127.0.0.1:8080 and point your browser to http://127.0.0.1:8080

Use case: Intensive recon on a domain, IP, bitcoin address, email, phone number or username

smicallef/spiderfoot describes itself as “automates OSINT for threat intelligence and mapping your attack surface.” It has a web interface and a CLI that allows to specify a target to scan. It will combine several sources and methods trying to get a very detailed picture ranging from crypto money, enumeration, threat intelligence, lookups, document analysis etc. etc.

Due to the sheer amount of possibilities and feature I will skip any further introduction and just point out to their project website.

nmap

⚠️ Before you run nmap be advised to understand and know what you do! This is NOT a tool you should play around. ⚠️

Use case: Knock on every port of the target, fingerprint the services and scan it.

nmap is THE port scanner par excellence! But it is also very noisy and aggressive per default so you will probably trigger a lot of detection systems. On the other hand though a public facing webserver is literally bombarded daily with port scans. So in the end it is up to you to decide how much you want to scan the host.

Since using and explaining nmap does literally fill books and a long manpage I’d like to emphasize that you read first the manual: Chapter 15. Nmap Reference Guide | Nmap Network Scanning Not only to be able to understand every possible switch and technique but also to properly interpret the results.

For the impatient a few example:

However I do really strongly emphasize that you first learn on how to use nmap before you launch it against anything else than your own laptop or some smart device you own in your home network.

Wrap up!

Even though I didn’t go into the OSINT part itself, I think that just doing recon gives enough information to start working and digging further. The intelligence part would require to know better the goal and is not topic of this post. Using the right tools to map a domains footprint can significantly broaden your possibilities to analyze it by just giving you a more complete picture of what is lying out there to be looked at.

Good luck! And stay safe out there.