Is it possible to model the network layer of the Internet for my graph theory class? I can get traceroutes and BGP feeds to fill a data set...
Yes, but there will inevitably be well documented failings modeling the Internet using traceroutes and BGP feeds. For example, BGP only shows preferred paths, not all paths between potential sources and destinations. It will not identify paths between ASes that are not used.
At the same time, if you just want data to try out your tools, one can create an incomplete set of nodes and adjacencies and create a data set to work on. Later on, I’ll suggest that this data set be augmented with empirical data from conversations.
Modeling the Internet seems to be a popular past time. I have seen at least a dozen conferences where modeling the Internet based on these easily accessible measuring points.
When asked, here are some guiding suggestions I provide to modelers.
First, the Internet can certainly be modeled as nodes and edges, and nodes can be at the macro layer (ASes connected by peering sessions) or at the micro layer (routers connected by links). When one stops there and goes off applying graph theories, does random walks and failure analyses, etc., then one ignores some simple but critical aspects of the Internet ecosystem - the dynamics. Here are some of the dynamics I check for when someone describes their model of the Internet.
Top 10 Internet model checklist
1) All nodes are not created equally. Specifically, eyeballs love content. While there may be eyeball-to-eyeball traffic such as peer-to-peer traffic there is not much traffic between content providers typically. So consider differentiating access-heavy and content-heavy end points.
2) Traffic between content and eyeballs tends to be highly asymmetric today. It may be up to 30:1 asymmetric in the case of long running video. Given that this may be more than 50% of Internet traffic it is an important dynamic to model.
3) All vertices are not created equally. There are peering and transit interconnect relationships, and they need to be modeled differently with respect to reach and economics since these interconnects provide a different reach (customer routes vs. the entire Internet) and economics (free vs. metered transit measured in $/Mbps).
4) Nodes can be grouped and categorized into groups with the same power positions, and corresponding similarity in motivations and therefore similarity in behaviors. For example, as a group, Tier 1 ISPs peer in a full mesh in every interconnect region within their home market, but they tend to only peer with each other. They will deny peering with others. Tier 2 ISPs tend to peer in a sparse mesh, depending on geography and peering inclination. They are therefore more inclined to peer than the Tier 1s, but note that there is a spectrum of peering inclinations here. These are powerful abstractions and based on real world dynamics.
5) Access networks often have sole access to eyeballs on their network. Almost all of these customers are single homed, so the only way to reach them is via the access network. This dynamic should show up in any realistic model of the Internet.
6) The motivation to peer is largely based on self-interest based on issues such as
Many will simply accept peering without consideration of any of these issues, but a good model will have some of these “selective” criteria modeled.
You can look at my study of 28 peering policies to see samples of the peering requirements that may be scrutinized.
7) Transit prices are asymptotically approaching $0/Mbps, and they also exhibit a wide variance across each market and set of providers. Make sure any economic model takes these two things into account.
8) Most Internet attachments have a "no peering" policy. Almost all web portals, enterprises, content providers, etc. will likely never peer : they simply don’t have the expertise or inclination. It may not be strategic or they may simply be unaware of the value of peering. These attachments are only customers of Internet bandwidth.
9) Several observed behaviors are worth noting:
Operational gear rarely gets disconnected. Internet attachment arrangements may stay long past when they make sense.
Peering policies and behaviors change only rarely.
Peering policies, when they do change, change towards more becoming more restrictive.
10) The market is indeed Imperfect and Irrational. The market participants are not always rational and often don't do the math. There is great information asymmetry in the market and few do continuous trajectories analyses.
My suggestion is