The basic principle of passive measurement is shown in figure 2. There are three main situations that define what the entities represent. In the first situation the first entity represents the entire Internet and the second a single machine. In the next situation the first entity is the outside world, as seen by an organisation. The second entity represents this organisations internal network. A good example of this is is a university's Internet connection and their internal LAN. The final situation is a backbone link where the two entities are just both sections of the Internet.
A monitor `snoops' on all the traffic flowing between these two entities. What this monitor does with the traffic depends on what the aim of the system is, and also which of the specific situation listed above applies. There are two major categories that passive measurement systems can fall into. The first is to deal with the captured data in real-time. For example by looking at each packet, count the number of bytes passing the monitor every second, or minute etc. These statistics are very small, when compared to the amount of data that could pass the monitor. These values can be used, for example, to see if available bandwidth is being fully utilised, if saturation is a problem or if there are peak times where more bandwidth could be required. The second type of passive measurement creates files containing copies of all or a proportion of the traffic seen on the link over a certain time period. These trace files can then be post processed. This can allow advanced computation to be carried out that would be impossible in real-time, and also preserves data for further analysis at a later point.
Trace based system have one significant requirement. As the data will be post processed, additional information must be saved with the packet to indicate the time that this packet arrived. The accuracy of this timestamping process will directly relate to the accuracy of the results that can be drawn from a trace file. The issues related to this simple concept of timestamping form some of the hardest problems in passive measurement. A discussion of the problems that I have encountered follows in the next sections.
Real-time analysis suits high speed networks, where the volume of data on the link is too large to record copies of it to disk or even memory. This is likely to occur in the third configuration discussed, where the monitor is on a high capacity network backbone. Real-time analysis also suites projects that want to monitor links for long periods of time, for instance weeks or months. The trade off for long term, or high speed measurement, is detail. The detail provided by these systems is often of limited use for in-depth traffic analysis such as required by this project.
Traced based systems can cover a range of speed of networks. The faster the network the smaller proportion of data that can be saved. One very common subset of the data that is saved is the IP and transport layer headers. The IP header provides information on the source of the datagram, the destination of the datagram, the length of the datagram and which transport protocol is carried in the payload. The transport layer can give an indication of what type of traffic was contained within the packet, but the restrictions of this have to be understood, and care must be taken when claiming a packet contains a certain type of data. These problems will be discussed, with a focus on identifying VoIP data later in section 5.1.
Header traces are commonly used for both of the first two passive measurement configurations discussed, and where ever else network speeds allow traces to be taken. Full capture of all data on a link is normally restricted to the first situation. The data rates created by a single computer are low when compared to backbones and gateways. Full capture allows complete analysis of the actual data passing on the network, which could be used for debugging purposes and also allow later `playback' of the entire data stream.
One other common subsection of data captured is the physical layer headers. This is used primarily in ATM networks, but this type of capture has limited use for IP level analysis and has not featured as part of this project. As discussed in section 2.3 this project has made use of IP header traces exclusively. For this reason the rest of this section will refer only to trace based systems, although much of the discussion will relate to both trace and real-time analysis.