Internet Mapping Project: database information

Methods

For a given network, we generate a possible host name and send probe packets towards it, first with a TTL of 1, then 2, and so on until we reach the host, get an ICMP error of some sort, a loop is detected, or reach a hop that doesn't respond to several pings.

If we actually reached a host, it is recorded in the database as a target to be used in future scans.

Path database format

Each network target has a single text line in the database. Here is a sample line picked at random from a recent database:
200.106.184.0/24	Probe=20060103: Target=20060103:200.106.184.1 Path=20060103,somerset,ping:65.198.68.33,157.130.95.173,152.63.18.166,152.63.19.33,152.63.21.125,204.255.168.62,67.17.67.114,64.214.174.146,200.47.216.153,200.47.216.154,200.106.184.1;R96,342,280,304,508,470,594,676,10247,220,136;S769,812,865,917,975,1034,1098,1171,2200,2216,2232;T255,254,252,251,251,244,243,243,244,243,242;I18433,0,0,0,0,0,0,0,32943,24044,6961
Here is the line broken for readabilitiy, with comments:
200.106.184.0/24	The target CIDR block
Probe=20060103:		The data of the test. (Actually no longer needed)
Target=20060103:200.106.184.1
			The actual endpoint IP address.  May be missing
Path=20060103,somerset,ping:
			Date of the second (ought to be in Ken Thompson seconds)
			Source of the scan
			protocol used
65.198.68.33,157.130.95.173,152.63.18.166,152.63.19.33,152.63.21.125,204.255.168.62,67.17.67.114,64.214.174.146,200.47.216.153,200.47.216.154,200.106.184.1;
			The IP path, in IPv4 or IPv6 addresses.  May have the word
			"HOLE" or "STEALTH" for hops that didn't respond.
			The last hop might be ! for incomplete paths: see
			below.
R96,342,280,304,508,470,594,676,10247,220,136;
			Round trip times for each hop, in milliseconds.  This is
			list definitely not necessarily monotonically increasing.
S769,812,865,917,975,1034,1098,1171,2200,2216,2232;
			Time stamps for the round trip times.  Not sure how these
			are useful.
T255,254,252,251,251,244,243,243,244,243,242;
			TTLs of the return packets.
I18433,0,0,0,0,0,0,0,32943,24044,6961
			IP ID fields of the returned packets. 

Fields

The first field is the network target in the obvious CIDR form. All four octets are required, to simplify processing buy various Unix filters. The remaining fields are blank-separated, and have the format
        label=value    (or)
        label=date:value
The date is yyyymmdd, not Y10K-ready. The labels may be:
   Path       a comma separated sequence of IP numbers, possibly followed
              by a completion code and a list of round-trip times in milliseconds
   Probe      when this path was last checked
   Target     a host on the destination network, if found
   Whiner     date and email address if they don't want to be scanned
   Asnpath    not used
   Name       name of the network.  Not implemented yet.
   Complete   path scan completion code.  deprecated.
   Pathdate   path date. deprecated.
There may be multiple paths for different dates in early versions of the Internet mapping data. If other fields are duplicated, only the newest is kept.

Scan termination codes

Each path may have a termination code if the target is not reached. The code is a "!" followed by one of these codes:
                        case Complete:          code = "";      break;
                        case Loop:              code = "!L";    break;
                        case Filtered:          code = "!F";    break;
                        case HostUnreachable:   code = "!H";    break;
                        case NetUnreachable:    code = "!N";    break;
                        case OddUnreachable:    code = "!O";    break;
                        case Terminated:        code = "!T";    break;
                        case Incomplete:        code = "!?";    break;
Early databases have "?" instead of "!?".

These may be followed by a semicolon and a comma-seperated list of round-trip times in milliseconds. Note: these are not necessarily monotonic: times include routing variations and dead-packet processing times, which appear to be slow-pathed in most routers.

Bogus IP addresses

Some hosts return a bogus address for a traceroute, usually either 255.255.255.255 or 0.0.0.0 in very early databases, or "HOLE" in recent ones. In the database, these hosts are now recorded as HOLE, but use to be 255.255.255.255, 255.255.x.y, or anything above 224.0.0.0. These nodes do not stand for the same router, and it would be a tortured graph to try to link them together. Our layout software generates a separate alias for each instance of these: 255.0.0.1, 255.0.0.2, etc.

Label database format

The label database has three fields: IP address, a label, and a DNS server address:
64.213.54.138	Munis.s8-1-0-10-0.ar1.BOS1.gblx.net	204.178.16.6
134.130.9.233	pfo-stone-1.noc.RWTH-Aachen.DE	65.198.68.67
195.119.204.10	(ns.global-ip.net)	204.178.16.49
200.10.224.118	no-host.nap.telefonicamundo.cl	204.178.16.6
66.250.10.30	g0-1.na01.b015466-1.den01.atlas.cogentco.com	65.198.68.66
212.20.158.129	wtnet.demarc.cogentco.com	204.178.16.6
66.80.89.1	atm019.edge1.iad.megapath.net	204.178.16.49
217.47.106.243	(ns0.bt.net)	65.198.68.66
68.86.209.30	te-9-1-ur01.carlisle.pa.panjde.comcast.net	204.178.16.6
64.81.92.1	gw081-092-001-lax1.dsl-isp.net	65.198.68.67

The label may be the name, the site that said it doesn't exist (if in parens), or simply the IP address if no response was obtained. The label represents the first PTR record returned.