Rainbow Unit: Networks Big and Small
2B: The Infrastructure of the Internet
Technical Overview
The hardware infrastructure of the Internet happens at layers 1 and 2 of the OSI model. Layer 1 provides the cable and radio wave that interconnect devices, along with the installed within the computing device to which media connects. When formally connected to an network the NIC becomes a on the network. Layer two of the OSI model provides the identification mechanisms for the node. A computing device can have one or more than one NIC. For instance, your laptop may be simultaneously connected to a network with both a wired Ethernet media & NIC and a WiFi media & NIC, and your smartphone a cell radio wave media & NIC and also a WiFi media & NIC. Each NIC is uniquely identifiable so that information is correctly disseminated to the appropriate device. To direct the flow of information between nodes, there must be an or a combination of devices to facilitate communications. The only exception is when two nodes use the NIC, node identifiers, and media to do direct peer-to-peer communications.
Computer Network Building Blocks
There are four main components needed for a computer or other computing device to join as a node on a local area network (LAN) within a building such as a home, library, office, or cafe.
Node Identifier
Any device directly connected to the network that has been assigned a unique identifier on that network. Examples include:
- MAC address (also known as the hardware, physical, or Ethernet address): The serial number for Ethernet cards.
- IP address: The address used by the Internet protocol.
Network Interface controller (NIC)
The hardware necessary for a node to connect to a network. Examples include:
- Ethernet controller (wired Cat5/Cat6 or wireless/WiFi): Used for LANs.
- MODEM (cable, DSL, dialup): Used for traditional internetworking with the Internet.
- Optical Network Terminal (ONT): Used for fiber to the home Internet.
Media
The communications technology used to connect nodes. Examples include:
- Copper transmit low-voltage electricity (e.g. wired Ethernet, DSL, cable, dialup).
- Fiber optic transmits light (e.g. fiber to the home).
- Radio waves (e.g. wireless Ethernet or Bluetooth).
Interconnect Device
A device used to interconnect nodes. Examples include:
- Switch or hub: Wired Ethernet LAN.
- Access point: Wireless Ethernet (WiFi) LAN.
- Router or gateway: Builds an Internet by connecting different LANs together.
- Hotspot:
- Popular alternative term to access point on a WiFi LAN; or
- Mobile device (e.g. Smartphone, Coolpad Surf, NetStick USB Modem, or MiFi 8000) that is both a cell-based wireless router and a WiFi access point providing Internet access through the cell-based Internet connection.[1]
Nodes interconnect with other nodes in different ways, depending on how far they reach geographically, how many people are meant to use them, and who primarily owns or controls them. Some cover a very small area and may be used for very specific devices, while others are more general, cover larger areas, and are especially effective for use on the Internet.
Network Areas of Coverage
- Personal Area Networks (PAN) provide a simple computer network organized around a few personal devices allowing transfer of files, photos, and music without the use of the Internet or your home’s local network. Two common examples would be your Bluetooth headset or keyboard/mouse. Depending on the Bluetooth range selected (or chosen for you), this could span 3 feet, 10 feet, or 100 feet. Beyond Bluetooth, other common PAN connectivities include Infrared (IR), USB, ZigBee, Wi-Fi, and radio frequency (RF, including short distance AM and FM radio).
- The simplest type of Internet-based area network is a Local Area Network (LAN). A LAN is a network with connected devices in a close geographical range. It is generally owned, managed, and used by people in a building. For example, connecting to a WiFi network at a coffee shop or library would mean your device would be a node on the cafe or library’s publicly accessible LAN. Many public spaces may have a second, private LAN for use by staff only.
- A Wireless LAN (WLAN) is another name for a LAN used over WiFi. In some cases, a router is used to strategically isolate the wired Ethernet LAN from the wireless Ethernet LAN and may therefore distinguish between the networks specifically using LAN vs. WLAN connectivity.
- A Metropolitan Area Network (MAN) is a collection of LANs and devices in an area the size of a city. A version of a MAN is a Campus Area Network (CAN), which would be a network the size of a college, organization, or business campus. These types of networks are typically community-owned and/or managed and may also provide the infrastructure for one or more local/regional Community Networks.
- A Wide Area Network (WAN) covers the size of a state, country, hemisphere, or globe. A WAN is comprised of multiple MANs and/or LANs interconnected through a backbone or core transmission line or set of lines. The primary WAN we know of today is the Internet, a federation of local and regional networks interconnected through transmission lines typically owned and managed by one or more Internet Service Providers (ISP: the business that provides connections to each LAN), Network Service Providers (NSPs: the business(es) that provide connections between ISPs), and Backbone Providers (the business(es) that provide the more extended connections between NSPs).
The backbone of the Internet, that part serviced by Network Service Providers and Backbone Providers, is constructed using a cable infrastructure. To carry signals, rather than using electrical signals, glass fibers are used to carry light, with upwards of a thousand fibers being located within a single cable cladding. It is often the case that more fibers are included within a cable than are needed at the time of installation (called dark fiber) to allow for future growth without additional installation expense. Further, Wave Division Multiplexing (WDM) is further used to allow multiple different wavelengths of light to be distributed on each strand of fiber (multiplexed) and then later separated (de-multiplexed), transmitting multiple communication streams simultaneously though a single light pulse. As technology continues to improve, replacement of multiplexers for newer models is allowing for still further data to be transferred over existing lines without additional installation expense of the cables themselves. The data itself is transferred using pulses of light transmitted using light-emitting diodes (LEDs) or small lasers. This can be done at very high speeds and over very long distances with less susceptibility to interference. A few different techniques are used to separate different wavelengths of light in ways that allow multiple communication streams, each at high frequencies, supporting higher capacity in addition to high frequencies. This opens up data transfer rates using fiber optics that are 20 to 1,000 times faster than cable and outdoor WiFi Internet service and for a larger customer base. As Susan Crawford points out in her 2018 book Fiber: The coming tech revolution—and why America might miss it, “If the information-carrying capacity of copper wire is like a two-inch-wide pipe, fiber optic is like a river fifteen miles wide.”[2]
Within the United States, most Internet Service Providers, on the other hand, make use of existing communication technologies developed for phone and cable television to also provide Internet access. Indeed, it has often been marketed as the “triple play,” a discounted package providing these three at a discounted price compared to the purchase of each one individually from the provider, or from several different providers. In some cases, a provider primarily uses one technology, such as the cable Internet used by Xfinity/Comcast. On the other hand, depending on geographic location you can get Internet service from AT&T via copper Digital Subscriber Line (DSL) or fiber optics Internet lines, as well as via radio waves through their wireless phone services.
Internet Service Provider Technologies
Digital Subscriber Line (DSL)
- Adds two channels to standard phone line for Internet
- Hub and spoke (dedicated line) topology; full duplex
- In the US, DSL prioritizes download speeds
Cable Internet
- Redirects a cable channel to be used for Internet
- Neighborhood shares bus topology; full duplex
- In the US, cable internet prioritizes download speeds
Cell-Based Internet
- 3G adds the EV-DO (Verizon, Sprint/Nextel) or HSDPA (AT&T, T-Mobile) protocol to cell voice’s protocol
- 4G adds the WiMax (Sprint) or LTE (Verizon, AT&T) standard to cell’s voice protocol
- Equivalent to bus (shared) topology; half duplex
- Prioritizes download speeds
Satellite Internet
- Indoor Unit (IDU) provides a modem connecting premises router to antenna dish (outdoor unit, or ODU).
- The very-small-aperture terminal (VSAT) dish antenna, which can also be used for satellite television service, requires a clear line of sight to facilitate microwave communications directly with the geostationary satellite serving the Internet provider, or to a shared Gateway Earth Station (gateway hub) that then connects to the satellite.
- Broadband speeds have improved considerably, with download speeds now reaching up to 40 Mbps. Download speeds are prioritized over upload speeds.
- Latency of signal, the delay between end node data transfers, is typically over 500 milliseconds. Wired Ethernet on a LAN is typically below 2 milliseconds; regional copper Internet latency is typically below 10 milliseconds. Latencies above 100 milliseconds can be problematic for some Internet applications, such as live-stream conferencing and online gaming.
Community Wireless
- Uses standard wireless Ethernet (WiFi) outdoors; anyone can use off-the-shelf equipment to create
- Equivalent to bus (shared) topology; typically half duplex
- Synchronous upload and download speeds
Fiber Optics
- Ultra-high speed communications technology with one or more channels for Internet
- Hub and spoke (dedicated) topology; full duplex
- Synchronous upload and download speeds
For most homes, community organizations, and small office/home office contexts, a gateway is used that provides a WAN port used to connect the media leading to the first router of the Internet Service Provider. While sometimes this WAN port may need to first connect to a DSL/Cable or a fiber optics Optical Network Terminal (ONT), in other cases this interconnect device is integrated into the router. Typically, a gateway router will also incorporate both wired Ethernet switch and WiFi access point interconnect devices for interconnectivity on the LAN side of the router. In addition, a gateway router typically integrates a server that dynamically or statically assigns IP addresses to connected nodes on the LAN. The router will be configured to route essential Internet “phone book” type lookups to a designated ISP or third-party server that contains a database of public and associated IP names. All of these additional services facilitate its core function as the router between the LAN and the WAN.
We’ve worked through quite a few underlying concepts related to computer networks. Before moving into our first exercise, take a few minutes to review what we’ve already covered and also get a glimpse at materials we’ll be covering next by watching Carrie Anne Philbin’s introduction to Computer Networks, Crash Course Computer Science episode #28:
Exercise: Listing Building Blocks of Your Computing Devices
Before moving on further, take some time to look at your own Local Area Network (LAN), whether it’s the one in your place of residence, the LAN of a family or friend, or that of your workplace, library, community center, etc. To the extent possible, do this at a physical location where you can see the various network building blocks, and maybe do it with the person who took the lead in setting things up if you weren’t the one who did that.
Download the Excel spreadsheet template.
- Begin with the information regarding the Internet Service Provider connecting the premises to the Internet. What type of communication technology and media is used to connect the premises to the Internet? What are the specified upload and download speeds for the service being provided? Are there any monthly data caps limiting use? As other things come to mind, also include notes on this information.
- Next, explore the gateway router that provides the first hop between the WAN and the LAN you are documenting. To the extent possible, document the MAC and IP addresses of the router’s WAN Network Interface Controller, and also that of the router’s LAN NIC (remember, many devices have multiple network interface controllers, and therefore multiple node identifiers). Are there additional devices between the outdoor media and the router, such as a modem or optical network terminal? Does the router provide one or more LAN ports? Does it serve as a wireless access point? What are the settings for these? Does it provide a DHCP server giving IP addresses to other LAN nodes, and if so, what are the settings for this service? Are any IP addresses reserved for specific devices? Does it forward port requests that come to its WAN IP address to specific LAN node IP addresses? Are there any security access controls, block sites, and block services being used?
- Finally, do all you can to document every node connected on this LAN, noting the type of device, the media being used, the assigned MAC and IP address, and any notes you think would be helpful to keep on record. Remember to that your laptop, printer, Raspberry Pi, and other devices might be connected through multiple means such as via WiFi and also wired Ethernet. You might also document devices that are connected to multiple different networks, for instance a smartphone that is connected to the LAN using WiFi and also to a cell network. And there may be devices like that smartphone that also serve as hotspot routers, connecting some devices to the Internet via the cell network instead of the LAN’s ISP.
Key Takeaways
This exercise synthesizes concepts integral to computer and network building blocks. It also demonstrates an important principle regarding documentation. It’s easy to forget some of this information until computer network trouble is encountered. Filling out this form and occasionally updating it before again tucking it away in an easy-to-access file folder can prove of significant value when troubleshooting an array of computer network issues.
Also, it is sometimes helpful to have these notes on hand and add to them strategically in combination with the Network Troubleshooting chapter.
More on IP Addresses and IP Names
When we type in a URL, or Uniform Resource Locator, in a web browser, we’re almost always typing in an Internet Protocol name. Consider for instance the URL to this book:
https://iopn.library.illinois.edu/pressbooks/demystifyingtechnology/
Hypertext Transfer Protocol (HTTP) is the ever-present client-server protocol we have used for the last several decades to move information across the Internet. In this case, the first part, https, indicates the resource we’re searching for is the secured HTTP protocol (HTTPS). The second part indicates the providing resource is the web server with name iopn.library.illinois.edu. The last two parts indicate the directory and subdirectory in which the specific resources being requested are located. Not specified is the specific file, which in this case probably defaults to index.html, index.php, or something similar.
But as with our phone system, the name doesn’t truly get you in. Rather, the IP name needs to be associated with an IP address to pull up a web page, just as a person’s or organization’s name needs to be associated with a phone number to make a phone call. As of this writing, to get to the website, iopn.libary.illinois.edu is actually first converted to the IP address 130.126.162.192 in order to access the server. We can do this ourselves by typing in:
https://130.126.162.192/pressbooks/demystifyingtechnology/
Only IP names are converted to IP addresses. The directory and subdirectory listings use whatever characters were used to create those directories.
The Formation of IP Domain Names
The basic structure of the Internet came out of research launched in 1973 through funding from the U.S. Defense Advanced Research Projects Agency (DARPA). Researchers developed came a system of protocols known as the Transmission Control Protocol (TCP) and Internet Protocol (IP), or TCP/IP Protocol Suite.
In 1983, a conceptual framework for domain names was established through the RFC, in order to support the growing number of applications spanning multiple hosts, networks, and finally the Internet.[3] RFCs 883 and 973 expanded the domain name system (DNS) to build an intentionally extensible system. In 1987, two new RFCs made 882, 883, and 884 obsolete. These were “Domain Names – Concepts and Facilities, Request for Comments 1034” and Domain Names – Implementation and Specifications Request for Comments 1035.
These, too, have since had a range of RFC updates related to specific components of DNS: 1101, 1183, 1348, 1876, 1982, 2065, 2181, 2308, 2535, 4033, 4034, 4035, 4343, 4035, 4592, 5936, 8020, 8482.
Design Goals of DNS
The primary design goal of the domain name system (DNS) “is a consistent name space which will be used for referring to resources. In order to avoid the problems caused by ad hoc encodings, names should not be required to contain network identifiers, addresses, routes, or similar information as part of the name.”[4]
Today, we have a range of top-level domains, some of which are based on organization type (e.g., .gov, .edu), geographic location (e.g., .uk, .es), or general category (e.g., .org, .com, .net, .site).
Individuals and organizations can apply for second-level domains they can then use on the Internet (e.g., illinois.edu, raspberrypi.org, adafruit.com, wolske.site).
Individuals and organizations can then create subdomains to extend their DNS tree to represent sub-groupings (e.g., ischool.illinois.edu, makecode.adafruit.com, martin.wolske.site).
Fortunately, another thing the RFCs for domain names came up with was a solution to the pesky phone system.
Consider that to call someone with our phone we need to know a series of numbers in order to dial them, employ our smartphone’s contacts app, or create our own name/phone number listing system that informally maps a name we can remember to that series of numbers we need to dial.
By contrast, all IP names that are to be accessible need to be formally mapped within a DNS server. Each registered second-level domain runs its own local DNS server that holds the authoritative mappings. Top-level domain providers then run DNS servers that map the second-level domains, like illinois.edu, with the authoritative DNS servers for that domain. Internet service providers also run DNS servers that can be used to temporarily remember mappings for a set period of time, generally as defined by the second-level authoritative DNS servers. These are used by our local area networks so that we only have to type in illinois.edu in our web browser, and not 192.17.172.3.
IP Addresses
Internet Protocol addresses are the formal identifier of a node on a TCP/IP network. These addresses are used to route messages between source and destination across a network. Introduced in 1983, IP version 4 addresses use a 32-bit number broken into four 8-bit numbers separated by periods. When working in 8-bit binary notation, the decimal equivalent ranges between zero and 255. That is, an IPv4 IP address can range from 0.0.0.0 to 255.255.255.255. Protocols and policies have been developed to provide clear guidance regarding these addresses.
Almost all IP addresses, ones such as 192.17.172.3, are publicly accessible over the Internet. While some are not formally mapped to domain names widely known across the Internet, and some have strong security measures to restrict access, these numbers all can work across the Internet as needed/desired. For this reason, any router that is publicly available over the Internet (linking a Local Area Network to Wide Area Networks) must have one of these public IP addresses. The router you use at your home, office, or other organization to connect to the Internet through an Internet Service Provider is typically assigned one of these public IP addresses. Often, it’s only given to you temporarily, and may change dynamically in structured or semi-structured ways. But for those needing to ensure reliable access to nodes, for instance to web or database servers, you might purchase a static IP address so that you can set up routing information in a DNS server.
In the late 1990s, the Internet began running into the limits of IP addresses in version 4. As a 32-bit number, the maximum number of addresses available was 4,294,967,294. While that seems like a lot, given an increasing number of people have several different Internet “smart” devices in addition to their own laptop, four billion addresses isn’t nearly enough. These protocols were given out in formal ways that suited the 1970s’ and 1980s’ understanding of the limited uses of the Internet—a far cry from what really evolved. IPv6 was ratified in 2017 but has yet to be fully implemented. It uses 128-bit, allowing 3.4×1038 possible addresses.
Private IP Addresses
In creating the Internet Protocol, there were several blocks of IP addresses that were made private. Anyone can have access to any of these without any required registration of them. The only caveat is that they are meant for use on private networks and cannot be routed through the public Internet.
The Internet Engineering Task Force (IETF) directed the Internet Assigned Numbers Authority (IANA) to reserve the following Internet Protocol Version 4 (IPv4) address ranges for use on private networks:
IP address range | Maximum number of addresses available to a single Local Area Network |
---|---|
10.0.0.0 – 10.255.255.255 | 16, 777, 216 |
172.16.0.0 – 172.31.255.255 | 1, 048, 576 |
192.168.0.0 – 192.168.255.255 | 65, 546 |
Private IPv4 addresses are widely used today. They allow a home or organization to create personal private networks for internal use and then set up routers to translate traffic (NAT, or Network Address Translation) meant to pass between that private network and the public Internet. Or more likely, when you purchased Internet access through an Internet Service Provider, the router they purchased came set up with a Dynamic Host Configuration Protocol (DHCP) server. This router hands out private IP addresses to nodes like your laptops, desktops, phones, and printers, or to those connecting to the router via WiFi. That router also is a NAT doing the network address translation to its Wide Area Network public IP address assigned to it by that Internet Service Provider.
Take a few minutes to join Carrie Anne Philbin in this introduction to the Internet, from Crash Course Computer Science episode #29:
Exercise: Internet Detective
When the Internet is working at expected levels, we generally don’t think about the extensive collection of sociotechnical artifacts needed for one node to communicate with another node on a local network, let alone the many more that allow us to connect when the end-points cross the globe. When things are less than optimal, our responses vary from stepping out for coffee in passive acceptance, to pangs of guilt that we must be doing something wrong, to anger that nothing can be done when essential services are being lost.
This exercise equips us with sleuthing tools as we work to demystify the Internet further.
As explored in the extension chapter on Network Troubleshooting, there is a range of network troubleshooting tools such as ping, traceroute, and speedtest. And there are a couple of tools that are especially helpful when sleuthing IP names: whois and dig. These tools are often installed or available to be installed on different operating systems. They are also integrated into various webpage tool sets.
Before beginning, we must update and upgrade the Raspberry Pi operating system. This is necessary for The General Purpose Raspberry Pi Web Server exercise later, so let’s get a head start. At the same time, we can install whois and dig in the operating system. The advantage to this is ensures the following exercises are completed in a consistent manner.
Enter these commands to update the Raspberry Pi and install the following packages. Note that each of these commands start with the word “sudo.” This indicates the command that follows (such as apt) is issued as a superuser: a user with administrative privileges. As you go, you may periodically be required to enter “y” for yes, or hit “q” after reading upgrade information, before you can proceed with the upgrade.
pi@raspberrypi:~ $ sudo apt update
pi@raspberrypi:~ $ sudo apt upgrade
pi@raspberrypi:~ $ sudo apt install dnsutils
pi@raspberrypi:~ $ sudo apt install whois
Take a few minutes to test the strategies for tracking down problems and identifying if there’s something we might do about them in Network Troubleshooting. Take a close look especially at traceroute. What can the following tools tell you about a network?
- link lights
- ifconfig/ipconfig
- ping
- traceroute
- speedtest
Let’s do a quick initial exploration with the traceroute command to see where the Adafruit.com website might be located physically:
pi@raspberrypi:~ $ traceroute adafruit.com
For those running the traceroute command in a Windows PowerShell terminal, you’ll type in tracert adafruit.com instead.
I see a first hop that takes me to my gateway router on my LAN, and then the first router of my ISP. A few lines down, I see I’ve reached ntt.net. A quick search on my web browser, typing in ntt.com in the search bar, takes me to NTT Communications, a Global IP network. In my search, it actually defaulted to Japanese which I had my browser translate, indicating this is likely a Japanese-owned corporation providing backbone service in the United States. The 5th hop also includes “chcgil09.us” within the IP name, which suggests the router is probably in Chicago, Illinois, United States. This would make sense as I live and am doing this traceroute just a couple hours south in Champaign, Illinois. Hop seven takes me to an IP address that includes the name “cloudflare” before finally reaching 104.20.38.240, the IP address associated with the IP name adafruit.com as listed at the top of the traceroute.
One thing that isn’t fully clear is the details regarding the second hop of the traceroute, the one that leaves my premises and takes the packets of data to my ISP. This is a place where the ‘dig’ command can sometimes prove very helpful. Before moving on to explore the use of that command, here’s a snapshot of a search I did indicating that second hop was to iTV-3.com, the leaser of the fiber optics municipal area backbone network owned by the cities of Champaign and Urbana in collaboration with the University of Illinois Urbana-Champaign.
Once installation for dnsutils (which include the dig command) and whois is complete, proceed. Let’s learn more about the dig command, starting with a search of the manual command within Linux before moving on to do a couple of searches specific to the Adafruit.com IP name and IP address.
From the Raspberry Pi command line, type:
pi@raspberrypi:~ $ man dig
Dig (domain information groper) performs Domain Name System (DNS) lookups. It’s a flexible tool which can take some time to understand fully. After skimming through the manual entry, let’s do a couple of quick tests of the command.
pi@raspberrypi:~ $ dig adafruit.com
Here, we see within the “QUESTION SECTION” that we’re looking for type ‘A’ information. In DNS, ‘A’ records store the 32-bit IPv4 address(es) associated with a hostname. Here, we see the IP name adafruit.com actually has been assigned two, 104.20.38.240 and 104.20.39.240.
In a web browser, type in first one, then the other, of these IP addresses. What do you get as a webpage? What does this tell you?
Let’s do a reverse lookup to get the ‘PTR’, a pointer from an IP address to the associated canonical name.
pi@raspberrypi:~ $ dig -x 104.20.38.240
Here we see in the “AUTHORITY SECTION” that ARPA provided ‘SOA’, that is, the start of a zone of authority record, in which this IP address is associated with dns.cloudflare.com, the authoritative Domain Name Server for cloudflare.com. According to their “About” page, Cloudflare is a “service that protects websites from all manner of attacks, while simultaneously optimizing performance.”[5] While we don’t know specifically how, we do know that Adafruit Industry is associated with, or makes use of in some way, Cloudflare.
Let’s now use the whois command to see if we can learn more about Adafruit and Cloudflare. From the command line, type:
pi@raspberrypi:~ $ whois adafruit.com | less
(NOTE: the pipe symbol, found on the upper right between the “enter” and “backspace” keys on US keyboards, takes the output from one command and passes it to the next command, in the case “less” which allows us to view the contents one page at a time, moving back and forth within the text.)
Here, we see that the domain “adafruit.com” is registered through NameCheap, Inc. The name was created in May 2005, was last updated July 2018, and will expire May 2026 — IP names are not owned, but only leased.
We then find that the Domain Name Server for the domain name “adafruit.com” is hosted by Cloudflare.com. Adafruit, Inc. may still host the website in-house, or may use another service to do the hosting using an Infrastructure-as-a-Service or Platform-as-a-Service “cloud” web server. From this, the only thing we do know is that Cloudflare is performing as a Domain Name Server for Adafruit. The DNS system actually has quite a range of record types that can be effectively used to support a wide mix of IaaS, PaaS, and in-house infrastructures simultaneously in support of one domain name. As a result, Adafruit.com can actually be making use of a number of different in-house and remote-located services.
Use the up and down arrows to explore further the Registrant, Admin, Tech, and Name Server information records for the Adafruit.com domain name. Then let’s do the same for Cloudflare.com by typing:
pi@raspberrypi:~ $ whois cloudflare.com | less
We do see some information regarding creation, updated, and expiry dates, and also the Name Sever for the domain. But we also see that while early in the establishment of the Internet Protocol all records were kept open, today many items can be held private from the public. So we see far less in whois for Cloudflare than we did for Adafruit.
Before moving on, let’s take a quick look at the whois manual:
pi@raspberrypi:~ $ man whois
This description specifies this application searches for an object in the Request for Comments (RFC) 3912 database. Towards the bottom of the manual, NOTES are listed clarifying the search process and some of the underlying official resources searched depending on context.
Key Takeaways
The tools traceroute
, dig
, and whois
can provide us with a considerable amount of information regarding the Internet backbone and the different end-point entitities that comprise the social and technical infrastructures that for the most part we just think of as a website. From the LAN of the personal computer running the web browser to the LAN of the computer or system of computers running the web server, there is an array of other LANs running the Domain Name Systems used to go from IP names to IP addresses, and then to specific server LAN locations that may vary depending on personal computer LAN geographic location, and then to various performance and security services, such as that from Cloudflare, that are used to further advance performance, and the many other systems and services hosted on still other LANs, only some of which are using HTTP web services. This Internet web, something that happens out of sight and mind, transfers untold packets of data around the globe every millisecond, coming back to our personal electronic devices as web pages, email, and audio/video communications in what appears to be a single, consolidated information artifact. What we don’t see is the wealth of sociotechnical artifacts all influencing the shaping of this single information artifact we have received or transmitted.
Take a few minutes to check out Warriors of the Net. You can see a unique visualization of packets, routers, and even a guest appearance of the Ping of Death.
Digging Deeper
From Adafruit to Raspberry Pi: We used several tools to explore the hops to Adafruit.com, the IP address of Adafruit, the registrar used to lease the IP name and the Domain Name Server used to associate the IP name with an IP address, and other bits and bobs about the provider of core parts of the toolkit used for hands-on exercises in this textbook. Consider repeating this now to explore further raspberrypi.org. How does this compare and contrast with the adafruit.com exploration? Is there anything new that you learn from this parallel exploration?
The Author’s Site: Some resources for this textbook as well as early releases of revised chapter sections are housed on my own website: http://apcg.wolske.site (UPDATE March 16, 2021: Dreamhost now automatically forwards the apcg.wolske.site subdomain of the wolske.site domain to the IOPN server). I also have a second website, my blog http://martin.wolske.site, housed on my leased domain name wolske.site. I use the domain name, wolske.site, which is hosted by an Infrastructure-as-a-Service (IaaS) provider, Dreamhost. As the iSchool was winding down Prairienet as a regional community network web-hosting service in the early 2000s, we searched for alternative web-hosting services for our non-profit patrons. Dreamhost not only provided web hosting, they also provided domain name registration, both at no cost for 501(c)(3) non-profits. What can you learn about these websites and the IaaS through the use of traceroute, dig, and whois in addition to the things you can learn by using a web browser to go to these sites?
São Tomé é Príncipe, Africa: I’ve valued my time doing participatory action research community inquiry projects in São Tomé é Príncipe, Africa, with citizens of this island nation, to advance their community cultural valued beings and doings. We found it technically impossible to set up community wireless on the island because of volcanic deposits that significantly interrupted Wi-Fi signals. We also found a nation that valued analog interactions within the marketplace and that made use of the public and community radio stations available. And their ISP provided wired broadband that valued upload speeds over download speeds so as to bring forward the information of people to others instead of focusing only on centralized corporate information sources being brought down to the people. But to bring this all to a broader audience, the official website of the nation is not located on the island. Where is it located? Who oversees this website?
To do this last Internet detective dig, consider the different top-level domains that might be used in association with this nation. Within the United States, where the Internet was birthed from ARPANET, we would use the “.gov” top-level domain name for a government, and “.edu” top-level domain name for an educational institution. But other nations needed to either apply to the United States for second-level domains associated with a top-level domain (which they couldn’t do for .gov but could do for .com), or they needed to use their national top-level domain first (e.g., .co.uk specifies a .com site within the United Kingdom). For this detective task, try out different sites, such as stp.gov.st (the site listed within the Wikipedia listing for São Tomé), saotome.st, and saotome.org. Which are trusted sources of information about the nation, if any? Who hosts these websites? Who manages them? What is left unknown through the use of these tools?
Cloud Computing
When discussing Internet-based applications today, the term cloud or cloud computing is often used. Indeed, sometimes the word cloud is used synonymously with the word Internet. Cloud computing, like the Internet of Things, is an evolving paradigm. For this reason, in 2011 the National Institute for Standards and Technology, as part of its statutory responsibilities under the Federal Information Security Management Act of 2002, developed a short document highlighting important aspects of cloud computing. The goal was to provide a baseline for further discussion regarding cloud computing and as a means for comparison of cloud services and deployment strategies. Essential characteristics include on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service. Together, these provide a means by which multiple organizational, community, or public consumers to which these services are deployed can each have great individual flexibility and freedom to unilaterally adjust services to fit current and anticipated demands across a range of devices. These essential characteristics and deployment models are associated with one of three service models:
- Software as a Service (SaaS): The consumer runs applications running on a cloud infrastructure.
- Platform as a Service (PaaS): The consumer uses the cloud to deploy applications they have created or acquired and that make use of programming languages, libraries, services, and tools provided and/or supported by the provider as part of their platform.
- Infrastructure as a Service (IaaS): The consumer is provided a base of computer resources such as processors, storage, and networks upon which they can deploy and run software.
Others have noted that the distinction between higher-level Platform and lower-level Infrastructure as a Service found within a large data center is not a crisp line and should be considered together as utility computing.[6]
What is not generally recognized in relation to Cloud Computing is that, as part of the Internet with its foundational concepts of the end-to-end protocol creating a federation of the locals, data centers themselves are housed within their own Local Area Networks (LANs), some of which may be located within broader corporate Campus Area Networks (CANs). As part of a federation of the locals, these data centers should not be seen as centralized servers with overriding authority control, but rather local nodes internetworked with other local nodes to provision hardware and software services, a concept that was underlined within the Essential Characteristics section of the NIST Definition of Cloud Computing.
Wouldn’t it therefore be accurate to consider the research servers (e.g., Prairienet Community Network server computers), the University library servers, and University campus infrastructure servers that serve on- and off-campus associates a Cloud service?
Exercise: The General Purpose Raspberry Pi Web Server
In session one of the Rainbow Unit, we did a few different exercises exploring some of the underlying concepts behind the Internet of Things, also highlighted in Limor Fried’s segments “All the Internet of Things.” Of special note now, reflect back on the Hyptertext Transport Protocol (HTTP), an example of a client/server protocol that works at layers 6 and 7 of the OSI model.
Let’s finish this chapter by testing out a more general-purpose installation of HTTP on the Raspberry Pi, setting it up as web server that can be used on the world wide web. While we did create an HTTP-based web server in session 1, the Python library code used created a special-purpose web server. This time we’ll install the Apache web server, a widely used general purpose web server.[7] Before starting the installation of Apache, take a few minutes to follow though Carrie Anne Philbin’s Crash Course Computer Science episode #30 on the World Wide Web.
About the Apache Web Server
As we explored ways to make use of this brand new thing called a web browser to share our raw data sets across the Internet with other researchers, one of the challenges we faced during my time from 1993-1995 as a post-doctoral researcher on this project at the Neuronal Pattern Analysis group at the Beckman Institute, University of Illinois, was the rapid ongoing changes in the code for the campus’s National Center for Supercomputing Applications (NCSA) HTTP server in support of their just-being-developed Mosaic web browser. What worked on the http server or on the web browser one day might not work the next day. This is the rapid prototyping, fail-forward, growth mindset in action. It also represents those times when innovations go viral while still in their alpha phase, the lab time before an artifact or system enters the beta testing phase and eventual stable release. I was already working with the Linux operating system, version 0.9. I don’t remember the 0.x versions of Mosaic and NCSA HTTP, or if they were even listed yet.
Research institutes serve different roles at different times, but much of the core is on basic research. And so in early 1995 the Apache HTTP server project began as the NCSA research HTTP server and Mosaic web browser development was winding down. The NCSA code was available through what became the free and open source licensing framework, and served as the base for Apache HTTP. The use of existing code and a series of software patches led to the early development team’s use of the pun “A PAtCHy” server, a pun that somehow stuck and became the formal name, respelled Apache HTTP.[8]
The Apache HTTP server remains commonly used, especially on Linux-based computers, making up over 25% of the websites around the world, and is the perfect choice for a general-purpose HTTP server on the Raspberry Pi. The following instructions are a slightly modified version of those found on the Raspberry Pi Foundation documentation site.
Steps
Install Apache
- If you didn’t do so in the “Exercise: Internet Detective” above, first update the available packages by typing the following command into the Terminal:
pi@raspberrypi:~ $ sudo apt update
- Next, install version two of the Apache web server by typing:
pi@raspberrypi:~ $ sudo apt install apache2 -y
Test the web server
By default, Apache puts a test HTML file in the web folder. This default web page is served when you browse to http://localhost/ on the Pi itself, something that works well for those who have their Raspberry Pi attached directly to a keyboard, mouse, and monitor. For everyone, you can also access the test HTML file using something similar to http://192.168.1.10 (whatever the Pi’s IP address is) from another computer on the network. To find the Pi’s IP address, type hostname -I at the command line (or read more about finding your IP address).
Browse to the default web page either on the Pi or from another computer on the network and you should see the following:
Unlike in session one, this time around we do not add a port number after the IP address. The HTTP protocol has as a default port 80. So you could type in http://192.168.1.10:80, but do not need to do this, as it is assumed by default.
This also means you could have two web servers running simultaneously, the Apache general-purpose one on port 80, and one of the Internet of Things Python HTTP servers that use port 8401. Check this out by typing into the terminal window:
pi@raspberrypi:~ $ python3 simple_uartWebserverLED.py
Now, in a second web browser window, go to your IP address equivalent of http://192.168.1.10:8401. Voilà! Two web servers on one Raspberry Pi!
Initial Reflections
What is a web browser?
From where does a web browser get its web pages?
What’s different regarding the web pages retrieved from port 80 with those retrieved from port 8401? When might this matter?
Changing the default web page
The default web page created during the installation is just an HTML file in the computer’s filesystem. When installing servers, typically also installed is one or more configuration file(s). Within Linux operating systems, these are usually installed in the /etc
, or etcetera, directory. In the terminal window, type:
pi@raspberrypi:~ $ ls -al /etc/apache2
Notice a file called apache2.conf
. This is the main configuration file used to guide the launch of an Apache HTTP daemon. This provides the start-up information needed for a server to perform in the specified way. The listed apache2.conf, ports.conf, and additional conf-enabled, mods-enabled, and sites-enabled files provide those who are administering the HTTP server abilities to further innovate-in-use the base Apache software. Changes can be made on the fly, and then integrated into the server through a refresh without disrupting ongoing and new page requests.
In computing terms, a is a computer program that runs in the background providing services as needed. In this case, each time it is launched, the HTTP daemon starts itself based on the specifications within the configuration file, then mostly hangs out twiddling its thumbs waiting for a call in asking for something. When the call comes, it gets busy doing its stuff before going back to waiting mode. You’ll often see a running HTTP server daemon listed as httpd. These daemon server applications are a way things can perform a similar manual function to that done by a human call support service where the worker closely follows a checklist. When might it be better for this service to be done by the “thing”? When might it be better for this service to be done by the “person”? In what ways might machine learning and artificial intelligence advance to the point where some of the current “person” tasks may be replaced by a “thing”?
To review these configuration files for Apache, head over to /etc/apache2
. It is especially recommended to skim the apache2.conf
and ports.conf
files.
We don’t need to review the configuration files to know that the default location for HTML files in Apache is generally under the /var
, or variables, directory. When we install Apache, a new directory /var/www
is created to specifically host World Wide Web data. And Apache goes on to create a test index.html
HTML file. (In the /etc/apache2 configuration files, index.html is one of the default files looked for when someone requests a parent page for a website. This is why we can get a webpage by typing in something like www.raspberrypi.org. This web browser request, combined with the configuration of the HTTP server, results in the return of www.raspberrypi.org/index.html or something similar, as specified within the HTTP server configuration file).
Let’s go over to the HTML directory space and edit the index.html file /var/www/html/index.html
. Even as we do this, let’s keep the python3 HTTP server running for the moment as we’ll bring these together in a later step. So navigate to this directory in a second terminal window so that we can have a look at what’s inside of this test page:
pi@raspberrypi:~ $ cd /var/www/html
pi@raspberrypi:/var/www/html $ ls -al
This shows that by default there is one file in /var/www/html/
called index.html
and it is owned by the root
user (as is the enclosing folder). In order to edit the file, you need to change its ownership to your own username. Change the owner of the file (the default pi
user is assumed here) by typing into the terminal:
pi@raspberrypi:/var/www/html $ sudo chown pi: index.html
You can now try editing this file using nano
and then refreshing the browser to see the web page change. To do so, in the terminal window, type:
pi@raspberrypi:/var/www/html $ nano index.html
Next, scroll past the through the <head> section, which contains a range of metadata and also various Cascading Style Sheets (CSS) code, until you see the tags </head><body> on two separate lines.
Diving Deeper
Consider some quick changes such as changing the title “Apache2 Debian Default Page” and the note “It works!” to something different. You might also add in additional <p></p> paragraphs sharing your thoughts at this moment.
When done editing the html, type CTRL-O to save the file by the same name. You don’t need to exit, but can now refresh the browser to see the web page changes. This may take some trial and error to get working — keep failing forward!
Once done testing out changes of the default index.html file, type CTRL-X to exit nano.
Creating our own webpage
Starting at minute 4:05, Carrie Anne Philbin gives a quick example of an HTML page.
To create new web pages, you’ll need to run nano
or your favorite text editor with super user privileges as the /var/www/html directory is owned by user “root.” For example, to create the page mypage.html, type:
sudo nano mypage.html
Once you’ve opened the text editor, take a few minutes to explore creating a webpage. If you haven’t done this before, you can test out Philbin’s example. Or if you’ve done web page creating directly through text editing before, feel encouraged to use your own creativity to create a new page. Once you think you have something reasonable, save the file (you don’t need to exit, though) and then go to a web browser and go to the new page. In my case, I’d type in:
http://10.0.0.30/mypage.html
Creating an Internet of Things iframe page
The iframe tag is used to embed another document within the current HTML document. The iframe tag is used throughout this textbook to bring in YouTube videos. And we can now use this frame to bring in a web page from another server — in this case, that created by the HTTP server run through simple_uartWebserverLED.py.
Open up a new text file in the www directory called iot.html. If I’m using nano, I’d type:
pi@raspberrypi:/var/www $ sudo nano iot.html
Now, create a page that looks something like:
<html>
<head><title>IoT Gathering Page</title></head>
<body>
<h1>Internet of Things Gathering Page</h1>
<p>This page is used to gather datasets from various
local Internet of Things devices
</p>
<iframe width="1024" height="768"
src="http://10.0.0.30:8401"
frameborder="0" allowfullscreen>
</iframe>
</body>
</html>
Notice that it is a combination of static HTML and also an iframe tag bringing in source data from the other HTTP server running on the Raspberry Pi. Make sure you’ve listed the IP address of your own Raspberry Pi instead of my 10.0.0.30 Raspberry Pi address. Then save the document, make sure your Python3 program simple_uartWebserverLED.py
is up and running, go to your web browser, and open up both the Python page directly as we did in session one, and also through the Apache server as we’ve just set up now. For my Raspberry Pi, with an IP address of 10.0.0.30, I’d go to:
http://10.0.0.30:8401/
http://10.0.0.30/iot.html
And remember, if you have other LAN-based computing devices with a web browser installed, such as your smartphone, you can use those to access these pages as well.
Install PHP
When we used the Python3 BaseHTTPRequestHandler library in session one, we were able to bring together some static HTML page tags with some Python programming code to create a dynamic web page whose data was continuously updated. With the iframe tag, we were able to bring this dynamic web page data into an otherwise static web page we were creating. Let’s finish this exercise by installing PHP.
“PHP (recursive acronym for PHP: Hypertext Preprocessor) is a widely-used open source general-purpose scripting language that is especially suited for web development and can be embedded into HTML.”[9] To allow your Apache server to process PHP files, you’ll need to install the latest version of PHP and the PHP module for Apache. While still in the /var/www/html
directory, type the following command to install these:
pi@raspberrypi:/var/www/html $ sudo apt install php libapache2-mod-php -y
Now, rename the index.html file to index-orig.html:
pi@raspberrypi:/var/www/html $ sudo mv index.html index-orig.html
and create the file index.php:
pi@raspberrypi:/var/www/html $ sudo nano index.php
Put some PHP content in it:
<?php echo "hello world"; ?>
Now save and refresh your browser. You should see the phrase “hello world” displayed. This is not dynamic, but still served by PHP. Try something dynamic:
<?php echo date('Y-m-d H:i:s'); ?>
Underneath the hood, PHP is now running Linux operating system apps to collect current information and bring it back to a dynamic web page. This is the PHP equivalent to Python’s use of /opt/vc/bin/vcgencmd measure_temp
to get the current temperature of the Raspberry Pi, like we did in session one of the Rainbow Unit.
To add in some static text using standard HTML tags, consider adding an <h1>Title of page</h1> line above the PHP code, and maybe some <p>paragraphs of text</p> after the PHP to see what it looks like.
Going a step further, you can show a rich listing of your PHP info at the bottom by adding an additional PHP hypertext preprocessor call:
<?php phpinfo(); ?>
Key Takeaways
While a Python3-based HTTP server can prove of value in support of a specific task, it is the general purpose HTTP server which has become the foundation of the Internet’s World Wide Web. In this exercise, we’ve seen how flexible server platforms like Apache have been developed using the open protocols and standards of the Internet and an open community of developers to work on a range of different computers and to meet a range of different use objectives. This design strategy provides opportunities for innovation-in-use, something we tested as we moved from initial installation of Apache, to the creation of a first HTML page, to the addition of an Internet of Things iframe, to the addition of the PHP Hypertext Preprocessor. We’ve also seen how, through the use of the default port address with Apache server and an alternate 8401 port for the Python3 server, one computer could run multiple distinct instances of the HTTP server package simultaneously, providing us even greater flexibility to innovate-in-use.
Given this, what would it take for your sociotechnical networked information system to now become a cloud platform? What if, over time, the context and design parameters that resulted in the creation of this networked information system become hidden? Is it possible this, too, could become part of the mystical cloud that others use? In what ways is it of value to do works in ways that continually demystify technology? Is this even possible?
Wrap Up
As noted earlier in this session, internetworking can happen in many shapes and forms which regularly incorporate the Internet suite of protocols and applications. The key is the level to which we enter into use of an Internet-based tool from a ‘thing-oriented’ compared to a ‘person-oriented’ framing. Through the exercises within this part of the session, we’ve made use of different research methods to explore key social and technical aspects of the Internet suite, seeking to decodify key terms and concepts related to networked information systems. The infrastructure of the Internet provides an amazing opening for us to build from ‘person-centered’ digital internets past and present, and to also open new critical lenses to further challenge the dominant ‘thing-oriented’ framings through new pathways as we move forward within the information sciences.
Comprehension Check
- Mobile Beacon and Mobile Citizen are providers for such devices, dedicated to serving schools, libraries, and nonprofits. ↵
- Susan P. Crawford, Fiber: The Coming Tech Revolution and Why America Might Miss It (New Haven: Yale University Press, 2018), 6. ↵
- P. Mockapetris, “Domain Names – Concepts and Facilities,” Network Working Group, November 1983, https://www.ietf.org/rfc/rfc882.txt. ↵
- P. Mockapetris, “Domain Names – Concepts and Facilities, Request for Comments 1034,” Network Working Group, November 1987, https://tools.ietf.org/html/rfc1034. ↵
- “About Cloudflare,” Cloudflare, accessed July 18, 2020, https://www.cloudflare.com/about-overview/. ↵
- Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz, Andy Konwinski, Gunho Lee, et al., “A View of Cloud Computing,” Communications of the ACM 53, no. 4 (April 2010): 50–58. https://doi.org/10.1145/1721654.1721672. ↵
- Apache® is a either registered trademark or trademark of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks. “Debian” and the Debian Logo are trademarks of Software in the Public Interest, Inc. ↵
- “Apache Server Frequently Asked Questions,” Apache, accessed July 20, 2020, https://web.archive.org/web/19970106233141/http://www.apache.org/docs/misc/FAQ.html#relate. ↵
- “What Is PHP?” PHP Manual, accessed July 21, 2020, https://www.php.net/manual/en/intro-whatis.php. ↵
Media are used to interconnect devices on a network, and are made of four primary materials: coaxial copper cable, twisted pair copper cable, fiber optics cable, and radio waves.
The hardware necessary for a node to connect to a network. For example, an Ethernet card (wired or wireless) is used for a LAN connection. A modem (cable, DSL, dialup) is used for traditional Internet. Optical Network Terminals (ONT) are used for fiber to the home.
A key communications protocol for routing data within networks, thus enabling the Internet. This protocol delivers packets from a sender to the destination and requires IP addresses for routing these packets. Domain Name Server (DNS) services are used to associate IP names with specific IP addresses.
Any device directly connected to the network that has been assigned a unique identifier or address on that network, such as a MAC address (also known as the hardware, physical, or Ethernet address), the serial number for Ethernet cards, or an IP address: The address used by the Internet protocol.
A device used to connect nodes together. A switch or hub is used with wired Ethernet, an access point is used for wireless Ethernet (WiFi), and a router or gateway builds an Internet by connecting different LANs together.
Ultra-high-speed communications technology with one or more channels for Internet. Hub and spoke (dedicated) topology with synchronous upload and download speeds.
A router is an interconnect device used to transfer data from one Local Area Network (LAN) to another LAN connected to the router.
A portmanteau of the words Modulator-Demodulator. This device converts data provided from computing devices into binary zeros and ones so that it can be transmitted through a network.
As part of the Internet protocol suite, DHCP is a network management protocol. DHCP servers dynamically or statically assign IP addresses to connected nodes on the local area network (LAN) so that they can communicate with other IP networks.
A naming system which translates domain names to IP addresses. This ensures a consistent name space for information resources.
A computer program that runs in the background providing services as needed. In this case, each time it is launched, the HTTP daemon starts itself based on the specifications within the configuration file, then mostly hangs out twiddling its thumbs waiting for a call asking for something. When the call comes, it gets busy doing its stuff before going back to waiting mode. You'll often see a running HTTP server daemon listed as httpd.