1.10.2 The Future of IPv4 As mentioned earlier, allocation of blocks of IP addresses is the responsibility of the Internet Assigned Numbers Authority. IANA long ago delegated the job of allocating network prefixes to individual sites; they limited themselves to handing out /8 blocks (class A blocks) to the five regional registries, which are • ARIN – North America • RIPE – Europe, the Middle East and parts of Asia • APNIC – East Asia and the Pacific • AfriNIC – most of Africa • LACNIC – Central and South America As of the end of January 2011, the IANA finally ran out of /8 blocks. There is a table at http://www.iana. org/assignments/ipv4-address-space/ipv4-address-space.xml of all IANA assignments of /8 blocks; examination of the table shows all have now been allocated. In September 2015, ARIN ran out of its pool of IPv4 addresses. Most of ARIN’s customers are ISPs, which can now obtain new IPv4 addresses only by buying unused address blocks from other organizations. A few months after the IANA pool ran out in 2011, Microsoft purchased 666,624 IP addresses (2604 ClassC blocks) in a Nortel bankruptcy auction for $7.5 million. Three years later, IP-address prices fell to half that, but, by 2019, had climbed to the $20-and-up range. It is possible that the market for IPv4 address blocks will continue to develop; alternatively, this turn of events may accelerate implementation of IPv6, which has 128-bit addresses. An IPv4 address price in the range of $20 is unlikely to have much impact in residential Internet access, where annual connection fees are often $600. Large organizations use NAT (7.7 Network Address Translation) extensively, leading to the need for only a small number of globally visible addresses. The IPv4 address shortage does not even seem to have affected wireless networking. It does, however, lead to inefficient routing tables, as sites that might once have had a single /17 address block – and thus a single backbone 1.10 IP - Internet Protocol 29 www.dbooks.org An Introduction to Computer Networks, Release 1.9.21 forwarding-table entry – might now be spread over more than a hundred /24 blocks and concomitant forwarding entries. 1.11 DNS IP addresses are hard to remember (nearly impossible in IPv6). The domain name system, or DNS (7.8 DNS), comes to the rescue by creating a way to convert hierarchical text names to IP addresses. Thus, for example, one can type www.luc.edu instead of 147.126.1.230. Virtually all Internet software uses the same basic library calls to convert DNS names to actual addresses. One thing DNS makes possible is changing a website’s IP address while leaving the name alone. This allows moving a site to a new provider, for example, without requiring users to learn anything new. It is also possible to have several different DNS names resolve to the same IP address, and – through some modest trickery – have the http (web) server at that IP address handle the different DNS names as completely different websites. DNS is hierarchical and distributed. In looking up cs.luc.edu four different DNS servers may be queried: for the so-called “DNS root zone”, for edu, for luc.edu and for cs.luc.edu. Searching a hierarchy can be cumbersome, so DNS search results are normally cached locally. If a name is not found in the cache, the lookup may take a couple seconds. The DNS hierarchy need have nothing to do with the IP-address hierarchy. 1.12 Transport The IP layer gets packets from one node to another, but it is not well-suited to transport. First, IP routing is a “best-effort” mechanism, which means packets can and do get lost sometimes. Additionally, data that does arrive can arrive out of order. Finally, IP only supports sending to a specific host; normally, one wants to send to a given application running on that host. Email and web traffic, or two different web sessions, should not be commingled! The Transport layer is the layer above the IP layer that handles these sorts of issues, often by creating some sort of connection abstraction. Far and away the most popular mechanism in the Transport layer is the Transmission Control Protocol, or TCP. TCP extends IP with the following features: • reliability: TCP numbers each packet, and keeps track of which are lost and retransmits them after a timeout. It holds early-arriving out-of-order packets for delivery at the correct time. Every arriving data packet is acknowledged by the receiver; timeout and retransmission occurs when an acknowledgment packet isn’t received by the sender within a given time. • connection-orientation: Once a TCP connection is made, an application sends data simply by writing to that connection. No further application-level addressing is needed. TCP connections are managed by the operating-system kernel, not by the application. • stream-orientation: An application using TCP can write 1 byte at a time, or 100 kB at a time; TCP will buffer and/or divide up the data into appropriate sized packets. • port numbers: these provide a way to specify the receiving application for the data, and also to identify the sending application. 30 1 An Overview of Networks An Introduction to Computer Networks, Release 1.9.21 • throughput management: TCP attempts to maximize throughput, while at the same time not contributing unnecessarily to network congestion. TCP endpoints are of the form xhost,porty; these pairs are known as socket addresses, or sometimes as just sockets though the latter refers more properly to the operating-system objects that receive the data sent to the socket addresses. Servers (or, more precisely, server applications) listen for connections to sockets they have opened; the client is then any endpoint that initiates a connection to a server. When you enter a host name in a web browser, it opens a TCP connection to the server’s port 80 (the standard web-traffic port), that is, to the server socket with socket-address xserver,80y. If you have several browser tabs open, each might connect to the same server socket, but the connections are distinguishable by virtue of using separate ports (and thus having separate socket addresses) on the client end (that is, your end). A busy server may have thousands of connections to its port 80 (the web port) and hundreds of connections to port 25 (the email port). Web and email traffic are kept separate by virtue of the different ports used. All those clients to the same port, though, are kept separate because each comes from a unique xhost,porty pair. A TCP connection is determined by the xhost,porty socket address at each end; traffic on different connections does not intermingle. That is, there may be multiple independent connections to xwww.luc.edu,80y. This is somewhat analogous to certain business telephone numbers of the “operators are standing by” type, which support multiple callers at the same time to the same toll-free number. Each call to that number is answered by a different operator (corresponding to a different cpu process), and different calls do not “overhear” each other. TCP uses the sliding-windows algorithm, 6 Abstract Sliding Windows, to keep multiple packets en route at any one time. The window size represents the number of packets simultaneously in transit (TCP actually keeps track of the window size in bytes, but packets are easier to visualize). If the window size is 10 packets, for example, then at any one time 10 packets are in transit (perhaps 5 data packets and 5 returning acknowledgments). Assuming no packets are lost, then as each acknowledgment arrives the window “slides forward” by one packet. The data packet 10 packets ahead is then sent, to maintain a total of 10 packets on the wire. For example, consider the moment when the ten packets 20-29 are in transit. When ACK[20] is received, the number of packets outstanding drops to 9 (packets 21-29). To keep 10 packets in flight, Data[30] is sent. When ACK[21] is received, Data[31] is sent, and so on. Sliding windows minimizes the effect of store-and-forward delays, and propagation delays, as these then only count once for the entire windowful and not once per packet. Sliding windows also provides an automatic, if partial, brake on congestion: the queue at any switch or router along the way cannot exceed the window size. In this it compares favorably with constant-rate transmission, which, if the available bandwidth falls below the transmission rate, always leads to overflowing queues and to a significant percentage of dropped packets. Of course, if the window size is too large, a sliding-windows sender may also experience dropped packets. The ideal window size, at least from a throughput perspective, is such that it takes one round-trip time to send an entire window, so that the next ACK will always be arriving just as the sender has finished transmitting the window. Determining this ideal size, however, is difficult; for one thing, the ideal size varies with network load. As a result, TCP approximates the ideal size. The most common TCP strategy – that of so-called TCP Reno – is that the window size is slowly raised until packet loss occurs, which TCP takes as a sign that it has reached the limit of available network resources. At that point the window size is reduced to half its previous value, and the slow climb resumes. The effect is a “sawtooth” graph of window size with time, which oscillates (more or less) around the “optimal” window size. For an idealized sawtooth graph, see 13.1.1 The Somewhat-Steady State; for some “real” (simulation-created) sawtooth graphs see 16.4.1 Some TCP Reno cwnd graphs. 1.12 Transport 31 www.dbooks.org An Introduction to Computer Networks, Release 1.9.21 While this window-size-optimization strategy has its roots in attempting to maximize the available bandwidth, it also has the effect of greatly limiting the number of packet-loss events. As a result, TCP has come to be the Internet protocol charged with reducing (or at least managing) congestion on the Internet, and – relatedly – with ensuring fairness of bandwidth allocations to competing connections. Core Internet routers – at least in the classical case – essentially have no role in enforcing congestion or fairness restrictions at all. The Internet, in other words, places responsibility for congestion avoidance cooperatively into the hands of end users. While “cheating” is possible, this cooperative approach has worked remarkably well. While TCP is ubiquitous, the real-time performance of TCP is not always consistent: if a packet is lost, the receiving TCP host will not turn over anything further to the receiving application until the lost packet has been retransmitted successfully; this is often called head-of-line blocking. This is a serious problem for sound and video applications, which can discretely handle modest losses but which have much more difficulty with sudden large delays. A few lost packets ideally should mean just a few brief voice dropouts (pretty common on cell phones) or flicker/snow on the video screen (or just reuse of the previous frame); both of these are better than pausing completely. The basic alternative to TCP is known as UDP, for User Datagram Protocol. UDP, like TCP, provides port numbers to support delivery to multiple endpoints within the receiving host, in effect to a specific process on the host. As with TCP, a UDP socket consists of a xhost,porty pair. UDP also includes, like TCP, a checksum over the data. However, UDP omits the other TCP features: there is no connection setup, no lost-packet detection, no automatic timeout/retransmission, and the application must manage its own packetization. This simplicity should not be seen as all negative: the absence of connection setup means data transmission can get started faster, and the absence of lost-packet detection means there is no head-of-line blocking. See 11 UDP Transport. The Real-time Transport Protocol, or RTP, sits above UDP and adds some additional support for voice and video applications. 1.12.1 Transport Communications Patterns The two “classic” traffic patterns for Internet communication are these: • Interactive or bursty communications such as via ssh or telnet, with long idle times between short bursts • Bulk file transfers, such as downloading a web page TCP handles both of these well, although its congestion-management features apply only when a large amount of data is in transit at once. Web browsing is something of a hybrid; over time, there is usually considerable burstiness, but individual pages now often exceed 1 MB. To the above we might add request/reply operations, eg to query a database or to make DNS requests. TCP is widely used here as well, though most DNS traffic still uses UDP. There are periodic calls for a new protocol specifically addressing this pattern, though at this point the use of TCP is well established. If a sequence of request/reply operations is envisioned, a single TCP connection makes excellent sense, as the connection-setup overhead is minimal by comparison. See also 11.5 Remote Procedure Call (RPC) and 12.22.2 SCTP. This century has seen an explosion in streaming video (20.3.2 Streaming Video), in lengths from a few minutes to a few hours. Streaming radio stations might be left playing indefinitely. TCP generally works well here, assuming the receiver can get, say, a minute ahead, buffering the video that has been received 32 1 An Overview of Networks An Introduction to Computer Networks, Release 1.9.21 but not yet viewed. That way, if there is a dip in throughput due to congestion, the receiver has time to recover. Buffering works a little less well for streaming radio, as the listener doesn’t want to get too far behind, though ten seconds is reasonable. Fortunately, audio bandwidth is smaller. Another issue with streaming video is the bandwidth demand. Most streaming-video services attempt to estimate the available throughput, and then adapt to that throughput by changing the video resolution (20.3 Real-time Traffic). Typically, video streaming operates on a start/stop basis: the sender pauses when the receiver’s playback buffer is “full”, and resumes when the playback buffer drops below a certain threshold. If the video (or, for that matter, voice audio) is interactive, there is much less opportunity for stream buffering. If someone asks a simple question on an Internet telephone call, they generally want an answer more or less immediately; they do not expect to wait for the answer to make it through the other party’s stream buffer. 200 ms of buffering is noticeable. Here we enter the realm of genuine real-time traffic (20.3 Realtime Traffic). UDP is often used to avoid head-of-line blocking. Lower bandwidth helps; voice-grade communications traditionally need only 8 kB/sec, less if compression is used. On the other hand, there may be constraints on the variation in delivery time (known as jitter; see 20.11.3 RTP Control Protocol for a specific numeric interpretation). Interactive video, with its much higher bandwidth requirements, is more difficult; fortunately, users seem to tolerate the common pauses and freezes. Within the Transport layer, essentially all network connections involve a client and a server. Often this pattern is repeated at the Application layer as well: the client contacts the server and initiates a login session, or browses some web pages, or watches a movie. Sometimes, however, Application-layer exchanges fit the peer-to-peer model better, in which the two endpoints are more-or-less co-equals. Some examples include • Internet telephony: there is no benefit in designating the party who place the call as the “client” • Message passing in a CPU cluster, often using 11.5 Remote Procedure Call (RPC) • The routing-communication protocols of 9 Routing-Update Algorithms. When router A reports to router B we might call A the client, but over time, as A and B report to one another repeatedly, the peer-to-peer model makes more sense. • So-called peer-to-peer file-sharing, where individuals exchange files with other individuals (and as opposed to “cloud-based” file-sharing in which the “cloud” is the server). RFC 5694 contains additional discussion of peer-to-peer patterns. 1.12.2 Content-Distribution Networks Sites with an extremely large volume of content to distribute often turn to a specialized communication pattern called a content-distribution network or CDN. To reduce the amount of long-distance traffic, or to reduce the round-trip time, a site replicates its content at multiple datacenters (also called Points of Presence (PoPs), nodes, access points or edge servers). When a user makes a request (eg for a web page or a video), the request is routed to the nearest (or approximately nearest) datacenter, and the content is delivered from there. CDN Mapping 1.12 Transport 33 www.dbooks.org An Introduction to Computer Networks, Release 1.9.21 For a geographical map of the servers in the NetFlix CDN as of 2016, see [BCTCU16]. The map was created solely through end-user measurements. Most of the servers are in North and South America and Europe. Large web pages typically contain both static content and also individualized dynamic content. On a typical Facebook page, for example, the videos and javascript might be considered static, while the individual wall posts might be considered dynamic. The CDN may cache all or most of the static content at each of its edge servers, leaving the dynamic content to come from a centralized server. Alternatively, the dynamic content may be replicated at each CDN edge node as well, though this introduces some real-time coordination issues. If dynamic content is not replicated, the CDN may include private high-speed links between its nodes, allowing for rapid low-congestion delivery to any node. Alternatively, CDN nodes may simply communicate using the public Internet. Finally, the CDN may (or may not) be configured to support fast interactive traffic between nodes, eg teleconferencing traffic, as is outlined in 20.6.1 A CDN Alternative to IntServ. Organizations can create their own CDNs, but often turn to specialized CDN providers, who often combine their CDN services with website-hosting services. In principle, all that is needed to create a CDN is a multiplicity of datacenters, each with its own connection to the Internet; private links between datacenters are also common. In practice, many CDN providers also try to build direct connections with the ISPs that serve their customers; the Google Edge Network above does this. This can improve performance and reduce traffic costs; we will return to this in 10.6.7.1 MED values and traffic engineering. Finding the edge server that is closest to a given user is a tricky issue. There are three techniques in common use. In the first, the edge servers are all given different IP addresses, and DNS is configured to have users receive the IP address of the closest edge server, 7.8 DNS. In the second, each edge server has the same IP address, and anycast routing is used to route traffic from the user to the closest edge server, 10.6.8 BGP and Anycast. Finally, for HTTP applications a centralized server can look up the approximate location of the user, and then redirect the web page to the closest edge server. 1.13 Firewalls One problem with having a program on your machine listening on an open TCP port is that someone may connect and then, using some flaw in the software on your end, do something malicious to your machine. Damage can range from the unintended downloading of personal data to compromise and takeover of your entire machine, making it a distributor of viruses and worms or a steppingstone in later break-ins of other machines. A strategy known as buffer overflow (22.2 Stack Buffer Overflow) has been the basis for a great many total-compromise attacks. The idea is to identify a point in a server program where it fills a memory buffer with network-supplied data without careful length checking; almost any call to the C library function gets(buf) will suffice. The attacker then crafts an oversized input string which, when read by the server and stored in memory, overflows the buffer and overwrites subsequent portions of memory, typically containing the stack-frame pointers. The usual goal is to arrange things so that when the server reaches the end of the currently executing function, control is returned not to the calling function but instead to the attacker’s own payload code located within the string. 34 1 An Overview of Networks An Introduction to Computer Networks, Release 1.9.21 A firewall is a mechanism to block connections deemed potentially risky, eg those originating from outside the site. Generally ordinary workstations do not ever need to accept connections from the Internet; client machines instead initiate connections to (better-protected) servers. So blocking incoming connections works reasonably well; when necessary (eg for games) certain ports can be selectively unblocked. The original firewalls were built into routers. Incoming traffic to servers was often blocked unless it was sent to one of a modest number of “open” ports; for non-servers, typically all inbound connections were blocked. This allowed internal machines to operate reasonably safely, though being unable to accept incoming connections is sometimes inconvenient. Nowadays per-host firewalls – in addition to router-based firewalls – are common: you can configure your workstation not to accept inbound connections to most (or all) ports regardless of whether software on the workstation requests such a connection. Outbound connections can, in many cases, also be prevented. The typical home router implements something called network-address translation (7.7 Network Address Translation), which, in addition to conserving IPv4 addresses, also provides firewall protection. 1.14 Some Useful Utilities There exists a great variety of useful programs for probing and diagnosing networks. Here we list a few of the simpler, more common and available ones; some of these are addressed in more detail in subsequent chapters. Some of these, like ping, are generally present by default; others will have to be installed from somewhere. ping Ping is useful to determine if another machine is accessible, eg ping www.cs.luc.edu ping 147.126.1.230 See 7.11 Internet Control Message Protocol for how it works. Sometimes ping fails because the necessary packets are blocked by a firewall. ifconfig, ipconfig, ip To find your own IP address you can use ipconfig on Windows, ifconfig on Linux and Macintosh systems, or the newer ip addr list on Linux. The output generally lists all active interfaces but can be restricted to selected interfaces if desired. The ip command in particular can do many other things as well. The Windows command netsh interface ip show config also provides IP addresses. nslookup, dig and host This trio of programs, all developed by the Internet Systems Consortium, are all used for DNS lookups. They differ in convenience and options. The oldest is nslookup, the one with the most options (by a rather wide margin) is dig, and the newest and arguably most convenient for normal usage is host. nslookup intronetworks.cs.luc.edu Non-authoritative answer: Name: intronetworks.cs.luc.edu Address: 162.216.18.28 1.14 Some Useful Utilities 35 www.dbooks.org An Introduction to Computer Networks, Release 1.9.21 dig intronetworks.cs.luc.edu ... ;; ANSWER SECTION: intronetworks.cs.luc.edu. 86400 IN A 162.216.18.28 ... host intronetworks.cs.luc.edu intronetworks.cs.luc.edu has address 162.216.18.28 intronetworks.cs.luc.edu has IPv6 address 2600:3c03::f03c:91ff:fe69:f438 See 7.8.2 nslookup and dig. traceroute This lists the route from you to a remote host: traceroute intronetworks.cs.luc.edu 1 147.126.65.1 (147.126.65.1) 0.751 ms 0.753 ms 0.783 ms 2 147.126.95.54 (147.126.95.54) 1.319 ms 1.286 ms 1.253 ms 3 12.31.132.169 (12.31.132.169) 1.225 ms 1.231 ms 1.193 ms 4 cr83.cgcil.ip.att.net (12.123.7.46) 4.983 ms cr84.cgcil.ip.att.net (12. ãÑ123.7.170) 4.825 ms 4.812 ms 5 cr83.cgcil.ip.att.net (12.123.7.46) 4.926 ms 4.904 ms 4.888 ms 6 cr1.cgcil.ip.att.net (12.122.99.33) 5.043 ms cr2.cgcil.ip.att.net (12. ãÑ122.132.109) 5.343 ms 5.317 ms 7 gar13.cgcil.ip.att.net (12.122.132.121) 3.879 ms 18.347 ms ggr4.cgcil. ãÑip.att.net (12.122.133.33) 2.987 ms 8 chi-b21-link.telia.net (213.248.87.253) 2.344 ms 2.305 ms 2.409 ms 9 nyk-bb2-link.telia.net (80.91.248.197) 24.065 ms nyk-bb1-link.telia.net ãÑ(213.155.136.70) 24.986 ms nyk-bb2-link.telia.net (62.115.137.58) 23.158 ãÑms 10 nyk-b3-link.telia.net (62.115.112.255) 23.557 ms 23.548 ms nyk-b3-link. ãÑtelia.net (80.91.248.178) 24.510 ms 11 netaccess-tic-133837-nyk-b3.c.telia.net (213.248.99.90) 23.957 ms 24. ãÑ382 ms 24.164 ms 12 0.e1-4.tbr1.mmu.nac.net (209.123.10.101) 24.922 ms 24.737 ms 24.754 ms 13 207.99.53.42 (207.99.53.42) 24.024 ms 24.249 ms 23.924 ms The last router (and intronetworks.cs.luc.edu itself) don’t respond to the traceroute packets, so the list is not quite complete. The Windows tracert utility is functionally equivalent. See 7.11.1 Traceroute and Time Exceeded for further information. Traceroute sends, by default, three probes for each router. Sometimes the responses do not all come back from the same router, as happened above at routers 4, 6, 7, 9 and 10. Router 9 sent back three distinct responses. On Linux systems the mtr command may be available as an alternative to traceroute; it repeats the traceroute at one-second intervals and generates cumulative statistics. 36 1 An Overview of Networks An Introduction to Computer Networks, Release 1.9.21 route and netstat The commands route, route print (Windows), ip route show (Linux), and netstat -r (all systems) display the host’s local IP forwarding table. For workstations not acting as routers, this includes the route to the default router and, usually, not much else. The default route is sometimes listed as destination 0.0.0.0 with netmask 0.0.0.0 (equivalent to 0.0.0.0/0). The command netstat -a shows the existing TCP connections and open UDP sockets. netcat The netcat program, often called nc, allows the user to create TCP or UDP connections and send lines of text back and forth. It is seldom included by default. See 11.1.4 netcat and 12.6.2 netcat again. WireShark This is a convenient combination of packet capture and packet analysis, from wireshark.org. See 12.4 TCP and WireShark and 8.11 Using IPv6 and IPv4 Together for examples. WireShark was originally named Etherreal. An earlier command-line-only packet-capture program is tcpdump, though WireShark has greatly expanded support for packet-format decoding. Both WireShark and tcpdump support both live packet capture and reading from .pcap (packet capture) and .pcapng (next generation) files. WireShark is the only non-command-line program listed here. It is sometimes desired to monitor packets on a remote system. If X-windows is involved (eg on Linux), this can be done by logging in from one’s local system using ssh -X, which enables X-windows forwarding, and then starting wireshark (or perhaps sudo wireshark) from the command line. Other alternatives include tcpdump and tshark; the latter is part of the WireShark distribution and supports the same packet-decoding facilities as WireShark. Finally, there is termshark, a frontend for tshark that offers a terminal-based interface reasonably similar to WireShark’s graphical interface.