CptS 302: TCP/IP Networking

TCP/IP is the networking protocol of Unix
TCP/IP is also the networking protocol of the Internet
TCP/IP has enabled the Internet to work and grow as well as it has
The Internet has helped TCP/IP become the primary protocol in use today
History
- 1969 - ARPANET started by DARPA (Defense Advanced Research Project Agency)
  - Eventually became NFSNET, connecting research sites as well as defense sites
- mid-1980s, Internet was mostly original ARPANET sites and universities
- late-1980s, many commercial sites on Internet, commercial backbone forming
- 1994, NFSNET turned off
- Mid-to-late 1990s, Internet is "discovered" and becomes a mass-market phenomenon
Today
- ICANN, the Internet Corporation for Assigned Names and Numbers, is closest to being "in charge" of the Internet
- IETF, the Internet Engineering Task Force, manages the technical standards by which the Internet runs
- ISOC, the Internet Society, is the organization which includes the IETF and several other organizations
RFCs
- Most of the documentation for how TCP/IP and the various assorted protocols work is kept in RFCs.
- RFC stands for "Request for Comments," though RFCs aren't refined via a comment practice. All of the commenting is done before the RFC becomes official.
- A central site for RFCs is http://www.rfc-editor.org/

TCP/IP layers

Application layer
- The applications using the network
Transport layer
- Communication among programs
Network layer
- Basic communication, addressing, and routing
Link layer
- Network hardware & device drivers
Physical layer
- The hardware it's running on

Application Layer	http	telnet	etc.	DNS	NFS	etc.
Transport Layer	TCP	UDP
Network Layer	IP	ICMP
Link Layer	ARP, Device drivers
Physical Layer	Twisted-Pair, Fiber

Packets

At each layer, data is encapsulated into chunks we'll generically call "packets"
- Technically, ethernet chunks are called "frames," IP chunks are called "packets," and TCP chunks are called "segments."
The packaging includes things such as a destination address
The actual data in the packet is called the "payload"
The text discusses link layer framing in a bit of detail which we won't worry about here

Addressing

There are many possible addresses for a computer
- A MAC (media access control) address for the network interface itself
- An IP address to identify the host on the network
  - (Actually, a particular network interface on a host)
  - MAC addresses and IP addresses are mapped to each other via the Address Resolution Protocol, ARP
- A hostname for humans
  - Hostnames and IP addresses are mapped to each other via the Domain Name System, which we'll discuss separately
  - (Actually, there are other ways of mapping hostnames, including static file, NIS, and NIS+)
- A port
  - These are, in essence, "sub-addresses" on an IP address
In addition to host-specific (unicast) addresses, there are addresses that address multiple hosts
- Multicast addresses address several hosts at a time
- Broadcast addresses address all hosts on the network
- Specifics of these addressing schemes are covered in the text

IP Addresses

An IP address is 4 bytes
- Usually written as 4 decimal numbers, separated by dots
  - e.g. 134.121.1.1
Part of an IP address identifies the network, and part addresses the host
- The first byte (sometimes called an "octet" is always network
- The exact division between host and network is specified with a mask
Traditionally, the division has been on byte boundaries, with addresses grouped into various classes
- Class A
  - One byte for network, three for host
  - 1-126 for first byte
  - Very few, hard to get
  - Usually subnetted
    - 16387064 possible hosts
- Class B
  - Two bytes for network, two for host
  - 128-191 for first byte
  - Not very many, relatively hard to get
  - Usually subnetted
    - 64516 possible hosts
    - WSU has a class B address (134.121.x.x)
      - Originally wasn't subnetted
      - Caused a lot of problems
- Class C
  - Three bytes for network, one for host
    - 192-223 for first byte
    - Only 254 hosts possible
    - A group of Class C addresses are often assigned to an organization
- 127 is a special network called "loopback"
  - 127.0.0.1 is called "localhost" and is the only host on the loopback network
Class A and Class B are generally too large to be practical, so they are usually subdivided, or subnetted
- The byte boundary between host and network is moved to a lower bit, effectively dividing the network into reasonably-sized chunks
The "subnet mask" defines the boundary
- Portion of mask covered by 1s is the network, portion covered by 0s is host
- Example
  - Unicorn is 134.121.1.1, mask is 255.255.248.0
  - In Hex, that's 86.79.1.1, and FF.FF.F8.0
  - In binary:

   IP    1000 0110 0111 1001 0000 0001 0000 0001
   Mask  1111 1111 1111 1111 1111 1000 0000 0000

Sometimes written as a "/XX" where "XX" is the number of bits that are "1" in the netmask
- So, for WSU, it would be "/21"
Subnetting (and grouping of Class C address) has complicated the arithmetic of IP addressing
- The text mentions some ways of dealing with this
- But the "outside world" still sees the class address
IP address shortage
- If you do the math, you start to see that the limited number of IP addresses, coupled with the "Internet boom" leads to a point where we'll run out of IP addresses
  - In the mid to late 90s, this looked like it would happen any day
- Two solutions, one short term and one long term, have been developed
  - Short term -- CIDR
    - Classless Inter-Domain Routing
    - Gets rid of "classes" and attempts to allocate IP addresses more efficiently
    - CIDR has helped a lot
      - There's a nifty graph of how the IPv4 address space is currently allocated at http://www.caida.org/outreach/resources/learn/ipv4space/
      - This is a different address than mentioned in the book on p. 279. This is the current URL
      - CAIDA is the Cooperative Association for Internet Data Analysis and probably has the coolest collection of graphs, charts, and data about the Internet anywhere.
  - Long term -- IPv6
    - A new version of IP
    - Addresses are 16 bytes
      - Gives 665,570,793,348,866,943,898,599 addresses per square meter of the surface of the planet Earth
        
        If address are "realisticially"subdivided (meaning not all addresses are readily usable), still gives between 1,564 (pessimistic estimate) and 3,911,873,538,269,506,102 (optimistic estimate) addresses per square meter of the surface of the Earth
        We shouldn't run out anytime soon
    - Still being tested and worked out
    - CIDR and NAT (described below) have reduced the pressure on the IPv4 address space, so it's unclear when (if ever?) we'll see IPv6

CIDR

Does away with the "class" scheme described above
Benefits
- Has kept the size of routing tables reasonable
- Allows for more efficient allocation of IP addresses
How it works
- Networks may be split at any bit, rather than just at byte boundaries, as is the case with Class A, B, and C addresses
- The network can still be subnetted internally
- The "advertized route" (what the rest of the Internet sees) looks something like 134.121.0.0/16 (what WSU looks like in CIDR)
  - Specifies the network number (what you'd get if you ANDed any address on the network with the netmask), and the size of the netmask
  - Routers keep this info, and if they get a packet destined for, say, 134.121.7.85, they can crunch the numbers and find that it should be sent to the router that is advertized as the correct router for network 134.121.0.0/16.
Pacific Bell has a pretty good page on CIDR at http://public.pacbell.net/dedicated/cidr.html

Private addresses and NAT

Another help with the shortage of IP addresses has been Network Address Translation (NAT)
- Called "IP Masquerading" in Linux
- Allows all internal hosts to be on a "private network" with a private address space
Three ranges of IP addresses are reserved for private use
- The rule is that you're never supposed to let a packet for these addresses onto the Internet
- The ranges are 10.0.0.0/8 (a "Class A" range), 172.16.0.0/12 (16 "Class B" ranges), and 192.168.0.0/16 (256 "Class C" ranges (which happens to be the size of one "Class B" range))
With NAT, hosts are given addresses in one of these ranges, and these addresses are translated before going out onto the Internet
- Generally, the host at the border figures out where the internal host was trying to reach, and contacts that destination on behalf of the internal host. Any responses are passed on to the internal host by the border router.
This scheme is not without disadvantages
- Nothing internal can receive connections from the outside
  - No web servers or mail servers
- Some applications may have problems, because they embed IP addresses in their data.

We won't go into the details of IPv6 addressing, routing, ARP, DHCP, or PPP. We also won't get into the security issues mentioned in the text. Which brings us to...

Adding machines to a network

So, what do you need to do to add a Unix computer to a network?
Most now will give you a simple setup application, or even ask you at install/setup time.
- But it's still helpful to know what's going on "behind the scenes"
Basic steps
- Assign hostname and IP address
- Set up host to start & configure network at boot time
- Set up routing
- Set up DNS resolver
Assigning hostname and IP address
- Generally, you manage this with DNS (which will be a separate chapter)
- Can also be handled with NIS, NIS+, and the file /etc/hosts
- /etc/hosts should at least have the local host, as both its hostname and "localhost". It might also have some other hosts that you might want/need to know about before the DNS server is available.
- The hostname on the computer is set by the "hostname" command at some time during the boot process
  - Where it gets the info is very OS-specific, see the book for details on a few OSes
Configuring network interfaces
- The network interfaces must be configured and started up
- The command that does this is ifconfig
  - Usually run at boot time and also can be done on the fly
    - (Unlike Windows implementations before Windows 2000.)
- You usually won't run the ifconfig command directly
  - Syntax of doing so is discussed in the book, or in the man page, if you need it
- Again, there is some OS-specific location for the data that ifconfig uses
- I've sometimes found it helpful to look at the startup scripts to find where ifconfig is called, so that I can figure out where and how the data is stored
- One handy syntax is ifconfig interface-name, which shows the current configuration of that interface
Setting up routing
- Usually, just adding a default route is enough
- Again this is defined somewhere OS-specific
  - The command that actually gets run is route add default gateway-ip-address
- The current routing table can be viewed with netstat -nr (the -r shows the routing table; the -n tells it to show IP addresses rather than names, which happens to be a more legible format)
Configuring DNS resolver
- Assuming you use DNS (which is a pretty safe assumption), you need to tell the system how to use it
- The configuration file is /etc/resolv.conf
  - Not os-specific!!!
- The syntax is a "search" line (listing domain name suffixes to try adding to find a match), followed by one or more "nameserver" lines which define the IP addresses of your nameserver
  - Sometimes there is a "domain" line instead of "search." This is an older configuration.
  - The second nameserver will only be tried if the first is unavailable
  - The third nameserver will only be tried if the first and second are unavailable
- The book's example:

search cs.colorado.edu colorado.edu
nameserver 128.138.242.1
nameserver 128.138.243.151
nameserver 192.108.21.1

Some OSes have a configuration that tells the system in what order (or even if) to use /etc/hosts, NIS, NIS+, and/or DNS. Details are in the book.

TCP/IP Networking

History and background

TCP/IP layers

Packets

Addressing

IP Addresses

CIDR

Private addresses and NAT

Adding machines to a network

OS Specifics