COMP30023 - Computer Systems
The Internet
- Models
  - Layers
  - Connections
  - OSI
  - TCP/IP
- HTTP
- Domain Name System - DNS
- Email
  - Overview
  - SMTP
  - MIME
- Streaming
Layers
Sockets
Git
TCP congestion control
Address switching
Routing
Layer 2
Process management
Memory Management
Cryptography
Secure communication
System security
Future of Computing Systems

COMP30023 - Computer Systems

The Internet

Aggregation of many smaller networks. No single point of control.

Models

OSI
- Standardised pre-implementation
- Planned
  - ISO group
TCP/IP
- Standardised post-implementation
- More spontaneous
  - Independent people working
- Designed without many considerations
  - Security back-fitted

Layers

Networks consist of a stack of layers, each offering services to those above it. Services provided to those above with primitives. Interactions between layers provided by protocols.

Connections

TCP is connection oriented.

Connect, use disconnect
Similar to telephone connection

UDP is connection-less

Send message through intermediate nodes
Similar to text or mail

OSI

Layer	Name	Unit exchanged
7	Application	APDU
6	Presentation	PPDU
5	Session	SPDU
4	Transport	TPDU
3	Network	Packet
2	Data link	Frame
1	Physical	Bit

TCP/IP

Layer	Name
4	Application
3	Transport
2	Internet
1	Host-to-network

Designed to be independent of data link and physical layers.

HTTP

Hypertext Transfer Protocol. Application layer level. Referenced by URLs (URL \(\approx\) protocol + host + file location)

Overview

Client initiates TCP connection to server. Server accepts TCP connection from client. HTTP messages exchanged between client and server.

Connections

HTTP 1.0 - Single use connection
HTTP 1.1 - Persistent connection, additional headers
HTTP 2.0 - Speed improvements

Persistence

Non-persistent connections need a new connection to be established for every file to be sent. Can be done in parallel. Persistent uses an established connection to receive data. Reduces connection setup overhead and response time. Can pipeline requests.

Request methods

Method	Safe	Idempotent	Cacheable
GET	Y	Y	Y
HEAD	Y	Y	Y
POST	N	N	Y/N
PUT	N	Y	N
DELETE	N	Y	N
CONNECT	N	N	N
OPTIONS	Y	Y	N
TRACE	Y	Y	N
PATCH	N	N	N

Idempotent - identical requests have same effect. Safe - only for retrieval, shouldn't change state.

HTTP response codes

Code	Meaning	Examples
1xx	Information	100 - agree to handle request
2xx	Success	200 - success, 204 - no content
3xx	Redirect	301 - page moved, 304 - cache still valid
4xx	Client Error	403 - forbidden, 404 - page not found
5xx	Server Error	500 - internal server error, 503 - try again later

Headers

Can be sent in request and response. Carry information about the message.

Processing

Client side

Plugins/extensions - integrated into browser

Direct access to content

Helper - separate program instantiated by browser, access local cache

PDF viewer, word, etc.

Server side

Static Page

5 steps

Accept TCP from client
Identify file requested
Get file from local storage
Send file to client
Release connection

Multi-threaded server can handle multiple requests in parallel

Dynamic content

Runs series of steps

Resolve name of web page requested
Perform access control on page
Check cache
Fetch page from disk or program
Determine rest of response
Return response to client
Make entry in server log

Web cache

Access local (recently fetched) pages instead of sending remote requests. Used by sending conditional get requests. Can have a local proxy cache for many local clients acting as a client for the “real” servers.

Proxy

Can cache, help with security and share IP address. Browser sends requests to proxy and proxy checks cache and forwards request to server.

Cookies

Stores data on the client (<4Kb). 5 fields:

Domain
Path
Content
Expiry
Security

Used to keep state on the client.

Static content

HTML documents consisting of web page components. HTML is plain-text markup encoding content and presentation. Web page components are structural divisions, restricted tag sets, attributes and values (e.g. img)

Dynamic content

Generated on the fly for the client from applications running on server. Sent in HTML/CSS/etc. and XML, like static content.

Scripting

Server side - PHP. Client side - Interactive applications using JavaScript, Java, ActiveX, AJAX.

Domain Name System - DNS

Maps URL domain to IP address. IP addresses can be thought of numerical identifiers, ideally would uniquely identify a socket/jack but often not the case.

Records

Type	Meaning	Value
SOA	Start of authority	Parameters for zone
A	IPV4 address of host	32-bit integer
AAAA	IPV6 address of host	128-bit integer
MX	Mail exchange	Priority, domain willing to accept email
NS	Name server	Name server for domain
CNAME	Canonical name	Domain name
PTR	Pointer	Alias for IP address
SPF	Sender policy framework	Text encoding of mail sending policy
SRV	Service	Host that provides it
TXT	Text	Descriptive ASCII text

Registering a new domain

Register name with DNS registrar. Provide names, IP addresses of authoritative name server (primary and secondary). Registrar inserts two RRs into TLD server. Create authoritative server with A record.

Resolving a domain

Asks a local DNS server for a domain. If not known, sends query to authoritative name server and then to root name server for answer. Queries are subject to timers to avoid long delays. A hosts files contains hard coded mappings of names to addresses.

127.0.0.1 localhost

Security

Original design had no security. Allowed DNS spoofing and flooding. DNSSEC and root signing were created as solutions to this.

Email

Overview

Submission - Sender's mail agent sends message to message transfer agent
Message Transfer - Message is sent with SMTP to a different message transfer agent where it sits in the mailbox
Final Delivery - Receiver fetches message from mailbox

User agent organises messages and formats them such that they can be sent. Message consists of a header (from, to, subject, etc.) and body.

Headers

User available: To, Cc, Bcc, From, Date, Subject. Hidden: Message-Id, In-Reply-To, References, Reply-To, Sender, Return-Path, Keywords, Received.

SMTP

Simple Message Transfer Protocol. Uses port 25, with direct transfer. Transfer consists of a handshake (greeting), transfer of message and closure. Messages are 7 bit ASCII.

MIME

Multipurpose Internet Mail Extensions. Adds more features to mail. Adds additional message headers:

MIME-Version
Content-Description
Content-Id
Content-Transfer-Encoding
Content-Type

Content Types

Type	Example Sub-types	Description
text	plain, HTML. XML, CSS	Text in various formats
image	GIF, JPEG, tiff, png	Picture
audio	basic, MPEG, mp4	sounds
video	MPEG, mp4, quicktime	movies
model	vrml	3d model
application	octet-stream, PDF, zip	data from application
message	HTTP, RFC822	encapsulated message
multipart	mixed, alternative, parallel	combination of types

Message transfer access

Transfer with SMTP. Delivery though local, POP3 (Post Office Protocol), IMAP (Internet Mail Access Protocol), HTTP (web-mail).

POP3

Transaction consists of authorisation, transaction, update. Syntax consists of USER/PASS, LIST, RETR/DELE, QUIT (update). Retrieving mail causes it to delete on the remote server.

IMAP

Keeps state across sessions. Retains mailbox on server and remotely accesses it. Requires larger server infrastructure.

Streaming

40% of web traffic. Utilising sockets over ports 80 and 443. Doesn't require client to keep requesting (unlike HTTP)

Layers

Presentation Layer

OSI layer 6. Provides:

Encryption
Compression
Data conversion (e.g. CR/LF to LF)
Mapping between character sets (ASCII to UTF8)

Now done by applications. Don't apply evenly to all applications, so IETF groups them into application layer.

Session Layer

OSI Layer 5. Provides:

Authentication
Authorisation
Session restoration

Examples:

Remote procedure call
Point-to-point tunnelling protocol (PPTP)
Password(/Extensible) Authentication Protocol (PAP/EAP)

Often used between protocols in layer 2/3

Transport Layer

Overview

Sends data to the right application and makes sure all the data arrived. Sits between the application and network layers. Can provide a reliable service on top of an unreliable network. Encapsulates message, message gets frame, packet and segments headers in addition to payload.

Terminology

Segments - transport layer
Packets - network layer
Frames - data link layer

Addresses using ports in a full 5-tuple.

Source IP Address
Source Port
Destination IP Address
Destination Port
Protocol

Ports are 16 bit numbers (0-65535)

UDP

User Datagram Protocol. Connection-less communication, header followed by payload. Headers contain source and destination ports, payload handled by process attached to port. Uses a 32 bit header containing source and destination addresses, message length and a pseudo-checksum.

Strengths

Multiplexing/de-multiplexing
No delay waiting to recover lost packets

Weaknesses

No control flow, error control, re-transmission of bad segments

Good for simple client-server communication and real time services.

UDP can be good for Remote Procedure Calls when call is idempotent, TCP for non-idempotent calls. RPC allows calling of procedures on a remote machine as if they were on the local machine.

Real-Time Transport Protocol (RTP) is application layer, but uses UDP for transport. Also provides services to applications so is semi-transport layer also. Multiplexes several streams into a single UDP stream. Has a header that includes payload type, segment number and timestamp.

TCP

Transmission Control Protocol.

Primitives

Primitive	Packet Sent	Meaning
LISTEN	(none)	Block until something tries to connect
CONNECT	CONNECTION REQ	Actively attempt to establish a connection
SEND	DATA	Send information
RECEIVE	(none)	Block until DATA packet arrives
DISCONNECT	DISCONNECTION REQ	This side wants to release the connection

Overview

Provides a connection oriented protocol over connection-less network layer. Providing a connection based protocol increases reliability. Breaks down a message into separate IP datagrams and reconstructs them at the other end. Sends using sockets named by the 5-tuple. Needs an explicit connection establishment by sending and receiving hosts.

Features

Full duplex

Data in both directions simultaneously

End-to-end

Exact pairs of senders and receivers

Byte streams, not message streams

Message boundaries not preserved

Buffer capable

TCP entity can choose to buffer prior to sending, unless PUSH or URGENT flags are set.

Properties

20-60 byte header, zero or more data bytes. TCP entity decides how large segments are subject to IP payload (< 65515 bytes) and Maximum Transfer Units (~1500 bytes). Uses a sliding window protocol for reliable data delivery and congestion control.

Name	Description
Source port	Sending port
Destination port	Receiving port
Sequence number	If SYN=1: initial number, if SYN=0: accumulated number of first data byte in segment
Acknowledgement number	If ACK=1: next number that sender of ACK is expecting
Data offset	Size of TCP header (20-60 bytes)
Flags	Single bit flags (SYN, ACK, RST, FIN, etc)
Window size	Size of receive window, how much sender is willing to receive

Three way handshake

Establishes a reliable connection, even if set up packets get lost and initial sequence for sliding window. Both sides agree with strategy before submitting segments.

Synchronisation

SYN for synchronisation during establishment. SYN or FIN increments sequence number. Sequence number is offset by a random number, where only the offset matters. Acknowledgement number is the next byte the sender expects to receive. Bytes are received without gaps, needs to wait for segment, even if later segments are received.

Request is SYN=1, ACK=0 Reply is SYN=1, ACK=1. SYN used in both request and accept, ACK distinguishing between the two. After setup sequence number with the first byte of payload is sent with the acknowledgement number being the next byte sender expects to receive.

Re-transmission

Each segment has an associated re-transmission timer (RTO) Initialised with a default value and updated based on network performance. If timer expires before an ACK is received, next segment is sent.

If the receiver gets a segment with a higher than expected sequence number (segment lost), sends ACK with sequence number expecting. This is a duplicate of previously send ACK (DupACK). With 3 DupACK, sender resends lost segment, known as fast re-transmission.

Closing

FIN flag used to close connection. Once acknowledged, no further data can be sent from sender to receiver, but data can continue in the other direction. Will continue to re-transmit unacknowledged segments. Typically requires a FIN and ACK in each direction to close connection, although can be optimised down to 3 if receiver sends FIN with its ACK.

RST can be used to hard close a connection. States sender is closing and will not listen for further updates. Sent in reply to a packet to a 5-tuple without an open connection. FIN is preferred, as shows an orderly shutdown.

Sockets

Sockets provide an application an interface to the kernel for networking. Sockets addresses are denoted by the 5-tuple (protocol, source-IP, source port, destination-IP, destination port).

Primitives

State	Description
SOCKET	Creates a new communication endpoint
BIND	Associate a local address with a socket
LISTEN	Announce willingness to accept connections; give queue size
ACCEPT	Passively establish an incoming connection
CONNECT	Actively attempt to establish a connection
SEND	Send some data over a connection (write())
RECEIVE	Receive some data from the connection (read())
CLOSE	Release the connection

States - low level

Order	State	Description
1	CLOSED	No connection is active or pending
2	LISTEN	The server is waiting for an incoming call
3	SYN RCVD	A connection request has arrived; wait for ACK
3	SYN SENT	The application as started to open a connection
4	ESTABLISHED	The normal data transfer state
5a	FIN WAIT 1	The application has said it is finished
6a	FIN WAIT 2	The other side has agreed to release
7	TIME WAIT	Wait for all packets to die off
6b	CLOSING	Both sides have tried to close simultaneously
5b	CLOSE WAIT	The other side has initiated a release
6c	LAST ACK	Wait for all packets to die off

Sockets in C

//Headers
#include <arpa/inet.h>
#include <sys/socket.h>
#include <netinet/in.h>
 
//Variables
int listenfd = 0, connfd=0;
char sendBuff[1025];
struct sockaddr_in serv_addr;
 
//Creation
listenfd = socket(AF_INET, SOCK_STREAM, 0); //create socket
memset(&serv_addr, '0', sizeof(serv_addr)); //initialise server address
memset(sendBuff, '0', sizeof(sendBuff)); //initialise send buffer
 
serv_addr.sin_family = AF_INET; //type of address - internet IP
serv_addr.sin_addr.s_addr = htonl(INADDR_ANY); //listen on ANY IP Addr
serv_addr.sin_port = htons(5000); //listen on port 5000
 
//Bind and listen
bind(listenfd, (struct sockaddr*)&serv_addr, sizeof(serv_addr));
listen(listenfd, 10); //maximum number of client connections to queue
 
//Accept, send, close
connfd = accept(listenfd, (struct sockaddr*)NULL, NULL);
 
snprintf(sendBuff, sizeof(sendBuff), "Hello World!");
write(connfd, sendBuff, strlen(sendBuff));
 
close(connfd);
 
//Connect
connect(connfd, (struct sockaddr*)&serv_addr, sizeof(serv_addr));
 
//Receive
while ((n = read(connfd, recvBuff, sizeof(recvBuff) - 1)) > 0 )
{
     //process received buffer
}

TCP Sliding Window

Controlled by receiver. Determines amount of data the receiver is able to accept. Buffers are maintained separately to the application without guarantee that data is immediately sent or read from the respective buffers.

When the window is 0, the sender shouldn't send any data.

Can send URGENT data
Can send “zero window probe”

0 byte segment that causes the receiver to re-announce the next expected byte and window size. Designed to prevent deadlock. Sender may delay sending in order to fill more of the receive window.

Send window

What the sender is able to send

Receive window

Amount of data the receiver is willing to receive - window size in ACK.

Git

Distributed version control software. Used to track and control changes to files over time. Commits are stored as nodes in a DAG. Has a working directory (top level folder with single version of files), staging area (files to be added in the next commit, stored in .git) and .git directory (configuration and repository database). Takes snapshots of file system as opposed to deltas, allows referencing of duplicate files to save space. Allows for easy and cheap branching. Can use delta compression for space saving. Uses a key-value database with SHA1 hashes as the key. git checkout can be used to switch to a particular commit/branch, needs an unambiguous hash prefix. The HEAD value denotes the currently checked out commit. Recommended workflow is to branch and merge in projects. git push, pull and clone are to interact with remote directories, receiving changes, sending changes and getting repositories respectively. Tags can be used to give names to important commits, like release versions.

TCP congestion control

Re-transmission

If a packet receives a packet after the one it was expecting, it sends an ACK for the one it was expecting. If it continues to receive later packets, it continues sending ACKs. If the original ACK and 3 duplicate ACK (dupACKs) are sent, a Fast Re-transmission is triggered and the original expected packet is sent.

Lost packets

To deal with lost packets, originally “Go-back-N” was used, this caused all the packets from the lost one onward to be re-transmitted. This has the advantage of the receiver not needing to store/reorder packets. The alternative is “Selective Repeat”, where only the lost packet is re-transmitted and the packets arrive out of order. Selective Repeat is more complex and helps where loss is common.

Window size

Once the window is full, a deadlock could occur where the window size may change but the server may not know. One solution is for the receiver to send an WindowUpdate, acknowledging the last packet and the new window size. Alternately, when the sender receives a window size 0, it can start a timer and periodically send ZeroWindowProbes to find the window size.

Congestion Control Window

Introduced with selective repeat. Additionally window dynamically adjusted based on network performance. It is maintained by the sender, unlike the sliding window controlled by the receiver.

Slow start algorithm

At first CWND sends 2*maximum segment size. If the segment is acknowledged, CWND++. Each full window doubles CWND, which continues until

a timeout
it reaches a threshold, SSthresh

This achieves exponential growth over each round trip time. Eventually too many segments will be placed on the network, causing congestion and timeout. If segment loss occurs, SSthresh is set to half the current window size and the process starts again. Once the threshold is reached, the growth is slowed to linear, adding 1 MSS to the window for each successful ACK. Now we start from the new SSthresh instead of the original starting value.

Further optimisations

SACK (Selective Acknowledgements) provides the ability to track segments in flight by adding later received packets in SACK. ECN (Explicit Congestion Notification) (ECE & CWR bits set during SYN)

Allows IP layer to indicate congestion without dropping a segment by setting an ECN flag
Receiver indicates to sender by ECE (ECN Echo) flag
Sender acknowledges by setting Congestion Window Reduced Flag (CWR), reacts as if a segment has been lost without actually losing one

Macroscopic changes

Fairness between flows
Response to long round-trip times (RTTs)
Response to random packet loss

Window size

W increases once per window. When a loss occurs, W is halved. With probability p of a loss, the average increase in window size is \[\frac{1-p}{W}-p\frac{W}{2}\] To be in equilibrium, the average increase must be zero, hence \[W\approx\sqrt{2/p}\]

Rate, fairness

The window is sent ⇐ once per RTT, T. Rate is \(\frac{W}{T}\approx\frac{1}{T}\sqrt{2/p}\). For a given loss rate, longer RTT get less rate. If RTT is small, TCP forces the loss rate to be high.

Address switching

Network Layer

The Internet (network) layer is responsible for getting data from the source to the destination. Traffic must be routed effectively. There are 2 types of services on the network layer

Connection-less
- Packet switching (IP)
- Minimum required service: “send packet”
- “Datagram network”
Connection-oriented
- (Virtual) Circuit Switching
  - Asynchronous Transfer Mode - ATM
  - Multi-Protocol Label Switching - MPLS
- “Virtual Circuit” network
- Usually single link of an IP network

Store and forward packet switching

This is like the internet.

Host H1 sending packet to H2

Transmits to nearest router
Packet is buffered while arriving and checksum verified
If valid, packet is stored until the next outgoing interface is free
Router forwards packet to next router in path
Repeat

Connection-less routers store forwarding tables (also called routing tables) on how to get to other routers which are dynamically updated. Connection-oriented routers store in and out tables for sending all packets along the same route.

Connection-oriented vs connection-less

Issue	Datagram Network	Virtual Circuit
Type	Connection-less	Connection-oriented
Addressing	(-) Each packet has full source and destination	(+) Each packet contains a short VC number
State	+ Routers do not hold state information about connections	- Each VC requires router table space
		- Router reboots a problem
Routing	Each packet independent	Defined at set-up
Quality of Service	- Difficult	+ Easy if enough resources
Congestion control	- Difficult	+ Easy if enough resources

Multi-Protocol Label Switching

Primary purpose is Quality of Service.

Prioritising traffic
Service Level Agreements for performance
Reliable connectivity with known parameters

Expensive, roughly 20-100 times more per Mbps.

IPv4

32 bit address. IP addresses are given to interfaces, not hosts. Supply of addresses exhausted.

Types of addresses

Unicast: single destination
Broadcast: send to everyone
Multicast: send to particular set of nodes
Anycast: send to any one of a set of addresses
Geocast: send to all users in geographic area
…

Classes

No longer used but spoken of

Class	Bits	Range
A	0	1.0.0.0 to 127.255.255.255
B	10	128.0.0.0 to 191.255.255.255
C	110	192.0.0.0 to 223.255.255.255
D	1110	224.0.0.0 to 239.255.255.255
E	1111	240.0.0.0 to 255.255.255.255

Class D is for multicast. Class E is reserved for future use.

CIDR

Reduces waste from classes. Specifies number of bits for the network field. Network in top bits, host in bottom. Can be written as a subnet mask of 1's, e.g. /24 is 255.255.255.0. Network number = network mask (bit-wise AND) IP address. Allows for automatic aggregation, reducing the size of the routing table. Overlapping prefixes are eliminated by longest prefix first.

Special IP addresses

Private range	Prefix and mask	number of addresses
10.0.0.0 - 10.255.255.255	10.0.0.0/8 (255.0.0.0)	16,777,216
172.16.0.0 - 172.32.255.255	172.16.0.0/12 (255.240.0.0)	1,048,576
192.168.0.0 - 192.168.255.255	192.168.0.0/16 (255.255.0.0)	65,536

Address	Use
0.0.0.0	Placeholder
000Host	Host on current network
255.255.255.255	Broadcast local network
Network1111	Broadcast distant network
127...*	Loopback

IPv6

128 bit address. Simpler header, improved security, more QOS support.

Subnets

Using prefixes within an organisation to allocate IP addresses. Allows packets to be sent without the router knowing all the hosts.

Network Address Translation (NAT)

Hides multiple interfaces behind a single address. Changes a port number when leaving the NAT.

Steps

Assumes TCP or UDP in source and destination ports
Replaces TCP source address with public IP address
TCP port replaced by index of entry in NAT translation table
IP and TCP check-sums recalculated
When packet arrives, NAT looks up destination from port in TCP header.

Criticisms

Violates IP architectural model where every interface has a unique IP address
Breaks end to end connectivity as only outgoing connections can be made
Changes the internet from connection-less to partly connection-oriented
Violates the layer model by assuming payload contents only TCP or UDP
Causes problems with FTP and other protocols that use multiple connections
Limits number of outgoing connections

Still widely used. Can provide security by shielding from unsolicited incoming connections.

Fragmentation

IP packets have maximum size of 65,535, determined by 16 bit length header. Network links can't handle such sizes. Lower layers need to break up (fragment) packets.

Fragmentation can be due to

Hardware
OS
protocols
standards
reducing transmissions due to errors
efficiency in communication channel

Hosts like larger packets, reduces workload. When a packet transits, maximum packet size may change. e.g. Ethernet is 1500 bytes and WIFI is 2304 bytes. The Maximum Transmission Unit (MTU) is the maximum size for a network. The path MTU is the maximum size for a path through the network, equal to the minimum MTU on each link. Due to the connection-less nature, can't set MTU at sender as path can change. Instead, routers can break packets to be sent individually.

Reassembly is difficult.

Transparent Fragmentation

Reassembly is performed at the next router, routers unaware of fragmentation.

Nontransparent Fragmentation

Reassembly is performed at the destination host. Uses DF (Don't Fragment) and MF (More Fragments) flags. MF=1 for all fragments but last. Fragments have an offset according to size, allowing for packets to be reconstructed out of order in a buffer.

Simple approach, but large reassembly cost and a single lost fragment causes entire packet retransmission. Alternative is path MTU discovery, where each packet is sent with DF=1 and router sends an ICMP (Internet Control Message Protocol) to sender if packet is too large. This may cause initial packets to be dropped but hosts can learn optimal size quickly and reduce subsequent fragmentation. TCP and IP are often implemented together to reduce fragmentation.

IPv4 allows for nontransparent fragmentation or path MTU discovery with a minimum of 576 bytes. IPv6 expects hosts to find optimal MTU without fragmentation with a minimum size of 1280 bytes. ICMP messages can be dropped, causing low volume passes and high volume fails, if in doubt send minimum size.

Routing

Forwarding

Each router has a forwarding table which maps addresses to outgoing interfaces. Upon receiving a packet:

Inspect destination IP address in header
Index into table
Determine outgoing interface
Forward packet out to that interface

This repeats on every router in the path to the destination.

Routing Algorithm

Runs on each router. A good algorithm needs

Correctness
Simplicity
Robustness
Stability
Fairness
Flexibility

An algorithm depends on what it being optimised (packet delay or throughput). The simplest approach is to minimise the number of hops a packet has to make. This reduces packet bandwidth and improves delay, also hopefully reduces physical distance travelled. The actual algorithms assign a cost to each link.

Static routing doesn't adapt to network topology. The table is calculated offline and uploaded to the router on boot, resulting in it not responding to failure. This is reasonable where there is a clear or implicit choice. Adaptive routing dynamically changes to network topology and potentially traffic levels. It may optimise some property: distance, hops, transit time, etc. It may get information from adjacent routers.

Flooding

This is the simplest adaptive routing. It:

Guarantees shortest distance and minimal delay
Useful benchmark
Extremely robust
Highly inefficient
Has to have a way of discarding packets

It is when a router sends a packet along all its links, which results in every router having received the packet at least once.

Adaptive Routing

Utilises the optimality principle in route planning (A subset of a shortest path must also be a shortest path). This can be used to reduce a graph to a tree, involving shortest path algorithms like Dijkstra's.

Link State Routing

Replaced Distance Vector Routing (Bellman-Ford) that had problems converging quickly enough. Variants still used today. Relatively simple:

Discover neighbours
Set cost to neighbours
Construct packet containing knowledge
Send packet to and receive from all other routers
Compute shortest path to each router

Costs can be manually or automatically set, commonly using bandwidth. Building packets is easy, deciding when to build presents problems. Flooding is used to send to all other routers.

Border Gateway Protocol (BGP)

Used to route between networks. Needs to consider politics, opposed to internal networking.

Companies not willing to have their network used by others
ISPs not wanting other ISPs' traffic on their network
No commercial traffic on academic networks
Use one provider over another due to cost
Don't send traffic through certain companies or countries

Routes may be better in some respects, worse in others. Difficult to optimise.

IP Multicasting

One to many communication, useful for streaming live content or updating. Uses D class addresses. Hosts join groups at the request of processes. Routing trees are constructed through multicast algorithms. These help prevent security risks through amplification attacks. Often deployed in organisational networks, not internet as not proven to scale to that extent.

Congestion control

Occurs when they're too many packets on the network. Causes performance to plummet. Ideally would reach saturation and remain there. Different to flow control, which is between hosts. Both have the same solution, which is to slow sending rate. In order from slowest to fastest, the following are solutions to congestion

Network provisioning

Add more physical capacity, slow and expensive but will always work.

Traffic-aware routing

Temporarily reroute traffic around congested areas, however eventually all routes become saturated.

Admission control

Used on virtual circuits to control who can place traffic onto the route

Traffic throttling

Reduce sending rate

Load shedding

Effective for congestion, bad for network utility.

Explicit congestion control

Use of ECN flags in IP header.

00: Not ECN-capable
01 or 10: ECN-capable
11: congestion experienced

If the receiver gets a packet marked as experiencing congestion, it echoes to the sender with the ECE bit set. The sender then reduces its transmission rate and sets the CWR (Congestion Window Reduced) bit in acknowledgement. On receipt of CWR set, receiver stops sending ECE bit. This is closely linked with TCP, blurring the lines between the two.

Internet control protocols

These use the internet to manage functionality.

ICMP: Internet Control Message protocol
DHCP: Dynamic Host Configuration Protocol
ARP: Address Resolution Protocol

ICMP

Used in PING, but can do more.

Message type	Description
Destination unreachable	Packet could not be delivered
Time exceeded	Time to live field hit 0
Parameter problem	Invalid header field
Source quench	Choke packet
Redirect	Teach router about geography
Echo and echo reply	Check if a machine is alive
Timestamp request/reply	Same as echo, but with timestamp
Router advertisement/solicitation	Find a nearby router

Traceroute is an example that exploits the TTL to check the route being used to send a packet. Each packet is sent with an incremented TTL, meaning they hit 0 on successive routers, revealing their IP address.

DHCP

Each host must have a unique IP address (excluding NAT). Manually configuring each host is difficult to do and slow to respond. DHCP automatically allocates IP addresses. A network needs a DHCP server for issuing the addresses. A host sends a DHCP DISCOVER packet which the server receives and responds with a DHCP OFFER containing an IP address. IP addresses contain a lease, after which they will be reclaimed by the server and reissued. Hosts can request renewal before the lease expires. Alternately IP can be tied to a layer 2 address.

MAC (layer-2) address

Layer 2 addresses incorrectly called MAC (Medium Access Control) due to old “shared medium” layer 2 networks. An address is a globally unique identifier usually hard-coded by the manufacturer. They can be between 48 and 64 bits long. Used at the host-to-network/data link layer, sometimes called a physical address.

ARP

Translates addresses between the internet layer and physical network layer (IP to MAC). Works by broadcasting a packet asking who owns an IP address and the owner responds with its MAC address. It is a simple, inefficient protocol. It has many security issues:

No authentication
Caching of responses
ARP spoofing is the gateway for man-in-the-middle attacks
Provides a way of intercepting and spoofing ARP messages to associate the attacker's MAC with another host's IP.

Layer 2

Modern Ethernet like IP and old Ethernet like WiFi. Old Ethernet is a shared medium which just broadcasts packets and packets can collide if broadcast at the same time. The packet format/services are the same so software doesn't need to change.

Process management

Operating system

Provides an abstraction of hardware resources and manages these resources. Provides the concept of:

Processes (running program)
Address spaces (memory)
Files and file system
Input and Output
Protection (manage security)

Processes

A process is a program in execution. Programs are static, processes are dynamic. The state of a program consists of the code or “text” of the program, the values of the variables, the address of the current instruction and some other external data (e.g. current directory). Each process has its own address space.

Most process a user deals with are created by the user, however some are provided by the OS. An example of a system process is a print daemon.

Multi-programming

Conceptually each process has its own CPU. In reality each process runs for a short period of time and take turns. Multi-programming increases system efficiency, if a process is waiting for something another can run in the meantime.

Process creation

Four things need to happen for a process to be created:

System initialisation
Execution of a process creation system call by a running process
A user request to create a new process
Initiation of a batch job

Process termination

Typical conditions which terminate a process:

Normal exit (voluntary)
Error exit (voluntary)
Fatal error (involuntary)
Killed by another process (involuntary)

Process states

The lowest layer of a process-structured operating system handles interrupts and scheduling. Processes can occupy the following states:

Running (using the CPU at that instant)
Ready (runnable, temporarily stopped while another process is running)
Blocked (unable to run until some external event happens)

The kernel

Responsible for managing processes. Runs as the privileged part of the operating system. Also provides services such as IO. Not a process in itself.

Most programs run in an unprivileged mode, and can not issue privileged instructions or access certain parts of memory. The kernel can issue all instructions and access all memory. Instructions and memory locations which could interfere with other processes are privileged. This distinction is the foundation needed by the kernel for its security mechanisms. The privilege level is stored in hardware, given by the program status word (PSW) register.

System calls

System calls allow user processes to ask the kernel to execute privileged instructions and access privileged memory on their behalf. The OS checks these requests before execution, preserving security. To make these more convenient, system calls will typically do several privileged things in one logical operation.

Interrupts

Used by a hardware device to get attention from the CPU. Asynchronous with current process. When an interrupt occurs, CPU hardware saves process' current state in a privileged location until the interrupt is handled. The CPU handles the interrupt in kernel mode.

The interrupt vector is the address of the interrupt handler. The handler

Saves the status of the current process
Services the interrupt
Restores what was saved
Executes a return from interrupt

There may also be pseudo-interrupts which come from the CPU, rather than external hardware. User programs may generate pseudo-interrupts, such as division by zero. Pseudo-interrupts can also be intentionally generated by certain instructions, e.g. TRAP.

Threads

Threads are a sequential execution stream within a process. They can be considered a lightweight process. Threads can communicate with each other without the kernel, they share global variables (address space) and dynamic memory. Threads have their own program counter, registers, stack and state.

A process can have many threads working independently. This is advantageous as they have less overhead compared to processes. The problems involved with threading involve concurrency. As such things like race conditions and deadlocks arise with threads.

Pthreads

POSIX standard API for threads. Global variables are shared across threads. Threads can switch at any time and need to be synchronised.

User-space threads

Threads can be implemented in user or kernel spaces. This means a process can manage its threads or the kernel can. User-space has less overhead but can be blocked by blocking system calls.

Race conditions

When multiple threads access the same object at the same time. Output depends on order of execution.

An idea to eliminate this is to create a critical region where only one thread can access at a time. Requirements for a critical region:

No two processes may simultaneously be in their critical regions
No assumptions on speed or number of CPUs
No process running outside critical region should block other processes
No process should have to wait forever to enter critical region

Shared variable locks can be implemented to guard the critical region.

Methods for avoiding race conditions include:

Disabling interrupts
Strict Alternation
Test and set lock
Sleep and wakeup
Other: Semaphores, Monitors, Message passing

Their implementations are

Busy waiting
Blocking

Strict alternation

The threads have a global turn counter that they use to determine which one can access the critical region. Once a thread has accessed the critical region, it sets the counter to the other thread. The thread then continues until it needs the critical region again and waits until it's its turn again.

Test and set lock

Most CPUs have an instruction to atomically test and set. This can be used to check whether to enter the critical region.

Busy waiting

If a process wants to enter a critical region, it checks if entry is allowed and if not it executes a loop, checking if it is allowed. This can waste CPU and can starve low priority processes.

Blocking

Attempt to enter a critical region, if fails register interest and block. When the critical region is available, the OS will unblock and the process will continue.

Deadlock

Occurs when two threads are waiting for an event that only the other can cause. To deal with deadlocks we can:

Ignore the problem
Detect the lock and try to recover
Avoid by careful resource allocation
Prevention

Scheduling

Ideally the CPU would be used all the time. The problem is deciding which process should run and for how long. Scheduling depends on whether the process is CPU bound or IO bound. It also depends on whether the process is batch, interactive or real time. The scheduler should aim for:

Fairness

All processes get a fair share of CPU

Efficiency

Keep CPU at 100% utilisation

Throughput

Number of process/time

Turnaround time

Time from process start to completion

Response time

Time from request submitted to response produced A scheduling algorithm can be non-preemptive (process runs until it finishes or blocks) or preemptive (process can be suspended after some time). Context switching takes time, need to save process state. Various algorithms exist:

First-come first-served
Shortest process first
Shortest remaining time next
Round-robin
Priority
Multiple queues
Guaranteed
Lottery
Fair-share

Difficult to build a general scheduler.

Memory Management

Process memory has three segments:

Text (program code)
Data (constant data, global variables)
Stack (local variables)

Hierarchy

Memory needs to be:

Fast
Cheap
Large
Non-volatile

In reality there are various types of memory with different properties. There is a hierarchy of memory, with the largest capacity tending to be the slowest at the bottom (disk) and the fastest and smallest at the top (registers). In between there are caches and main memory.

Caches

Caches have fast random access and are managed by hardware. There can be many levels of caches, with the largest tending to be the slowest. Caches store frequently accessed data. A cache hit is when the wanted data is found in the cache, and a miss is when it isn't. There are problems in deciding what and when to move data in and out of cache.

Memory manager

Hardware responsible for

Keeping track of which parts of memory are free
Allocating memory
Freeing memory when no longer used
Protect memory against unauthorised access
Simulate appearance of larger memory by moving between memory and disk

Without a memory manager, programs can access each other's memory posing security problems.

Bitmaps can be used to keep track of what is in use and what is free. This is slow in looking for free space for new programs. Linked lists can be used instead, making free space faster to find.

Allocation has various algorithms to decide how to allocate:

First fit
Next fit
Best fit
Worst fit
Quick fit

Fragmentation

Fragmentation is when there non-consecutive memory is used. Data can be swapped between memory and disk in order to decrease memory usage. This can allow the program to load what it needs into memory at a given time, but adds extra complexity to the program.

Virtual Memory

Allows each program to have its own address space and think it has access to more memory than exists. Breaks up memory into chunks called pages. Virtual pages sit on physical pages and can be moved between physical pages (page frames). As a full page is rarely needed, paging wastes some space for each process, called internal fragmentation.

To map a virtual page to a frame, the offset needs to be found and the frame number needs to be found. This then gives the physical location. If a given page can't be found, a page fault exception is generated.

Translation Look-aside Buffer

Without optimisation, each memory reference would need two accesses to memory, one for page table and one for data. To avoid this, we can use a hardware unit to cache the table.

Page faults

The process can crash if it requests an invalid page. The OS can also suspend the process and create a new page, then rerun the instruction.

Page replacement algorithms

Optimal
Not recently used
First-in first-out (FIFO)
Second-chance
Clock
Least recently used (LRU)
Working set
WSClock

Cryptography

With the exception of the One Time Pad, there is not perfect cryptography algorithm. The aim is to make an attack take sufficiently long that the data is no longer useful. The two main types of cryptography are symmetric, where the same key is used for encryption and decryption, and asymmetric, where two keys are used, one to encrypt and one to decrypt.

Symmetric encryption

A modern example is AES (Advanced Encryption Standard). It is useful if you have some way to securely share the key with those you want to be able to decrypt. It is useful for keeping your own data secure, such as for full disk encryption. AES works by breaking data into blocks.

ECB - Electronic Code-book

Simple, parallelisable but doesn't provide diffusion and may not provide confidentiality. Shouldn't be used. Same plain-text encrypts to same cipher-text. Repeated patterns will be evident in output but at a lower fidelity.

Probabilistic encryption

Won the 2012 Turing award. Provides a secure notion of encryption by ensure the same message encrypted twice does not provide the same cipher-text.

CBC - Cipher Block Chaining

Uses a random, non-reusable initialisation vector. Each block is encrypted sequentially using part of the previous block's cipher-text. Decryption can be done in parallel. Loss or corruption of a block prevents decryption of all subsequent blocks.

Asymmetric cryptography

Commonly called public key cryptography. Consists of a public key and a private key kept by the user/owner. The public key can be received over the internet, making there no need for a secure key exchange. Asymmetric cryptography is slower than symmetric and had size limits. As a result it is often used with symmetric cryptography as a way of securely exchanging a joint secret key.

Public key signatures

Links a key holder to a file. Verify with a publicly available key to prove signer. Large documents have a hash signed instead of signing the whole file. The hash needs to appear random, be hard to invert and have high collision resistance. Effectively a hash is a special lossy compression function.

Certificates

Digitally signed documents proving ownership. In reality the signer is a Certificate Authority (CA) who is already trusted. CA's are often a third party. Certificates can be chained, A signs B who in tern signs C. This means that as A trusts B and B trusts C, A trusts C. The top level of the hierarchy are the trust anchors, who are explicitly trusted. The anchors have self-signed root certificates which they use to sign for others, such as some CA's. Trust anchors can also be explicitly coded (ssh, application specific) or build (PGP web of trust).

Types

Domain Validation
- Most common type
- Ties a certificate to a domain and checks requester has control over domain
- Validation via email/DNS/URL - possible weakness
Organisation Validation
- Ties a certificate to a domain and legal entity
Extended Validation
- Establishes legal entity, jurisdiction and presence of authorised officer
- Offline process, expensive

DV certificates do not establish a link between domain and real world entity.

Revocation

Revocation occurs when a certificate is mistakenly issued or a private key is compromised. Performed via Certificate Revocation Lists and Online Certificate Status Protocol (OCSP). When a root certificate is revoked, all certificates below them become untrusted.

Key sizes

Dependent on the degree of security required. Asymmetric is generally longer than symmetric. RSA - 2056 bits, AES - 128 bits. These keys used to be shorter but grow with computational power.

Randomness

Needed for key generation, IV generation, Padding. Pseudo-random generators are often used. Randomness must never be reused or discover-able It is difficult to generate good randomness as it must be unpredictable. The random generation varies by OS.

Secure communication

Aim to provide secure, private communication between two end points, with integrity checks to ensure data does not change in transit and authentication to establish identities of end points.

CBC Tampering

An attacker can reorder ciphertext or flip bits. Every possible ciphertext corresponds to some valid plaintext. Alters the sent message.

Message Authentication Code (MAC)

Used to identify if a message has been tampered with. Uses a key, k which can be used with the message to test the message. \[t\gets\text{MAC}(k,m);b\gets\text{Verify}(k,m,t)\] b is 0/1 indicating verification status. An adversary can not create m', t' such that the verification returns b=1 for an unknown message.

HMAC is the industry standard which is widely used in practice. \[t\gets\text{Hash}((k\oplus \text{opad}) || \text{Hash}((k\oplus \text{ipad}) || m))\] Here, ipad and opad are fixed constants used for padding.

Generally MAC is used on the ciphertext. This is done so that decryption does not need to be done if MAC fails.

Diffie-Hellman Key Exchange

Fundamental to protocols that rely on TLS, such as HTTPS and SSH. Provides perfect forward secrecy, exposure of keys does not compromise past sessions. Sends information so that both parties can calculate shared key without sending shared key.

Uses a public prime, \(p\), and a generator, \(g\). Parties pick a random value, \(x\), and compute \(X=g^x\mod p\). Parties then exchange their \(X\) values, \(X'\). The parties then calculate the secret \(s=X'^x\mod p = g^{xy}\mod p\). Both parties now have a shared secret without exchanging said secret. It is considered hard to solve the discrete log (find \(x\) after calculation). Provided both parties discard their secrets, even if one of them loses their private key the past communication cannot be decrypted.

TLS

Uses a handshake protocol to establish shared keys then uses a record protocol with the secret keys for data exchange. Requires unencrypted communication to establish common protocols and to share public keys.

Can create a local CA and use for man-in-the-middle attacks. Allows for the transparent interception of TLS communication. Can also be used by antivirus/web protection software. Same can be said of proxies and load balancers.

System security

Gain access to a system to:

Access health records/financial information/personal data
Create a bot
Fraud
Espionage
Show off

An attack can be passive (observing traffic and other activities) or active (changing content).

Access Control Lists

Put on files to restrict who can access them. Unix-like systems have three sets, owner, group, world. Rights determine read, write and execute control.

Specialised hardware

Can use things like:

Encrypted hard drive
Trusted Platform Module
Secure processors:
- Intel SGX
- ARM TrustZone

Encrypted hard drive

Transparently encrypts/decrypts content. The disk driver has an encryption engine. Can be combined with Trusted Platform Module (TPM) to allow decryption only with a certain machine. Data Encryption Key (DEK) is encrypted with Authentication key. Data is encrypted with DEK.

Trusted Platform Module

Cryptoprocessor which stores secret keys and performs some cryptographic operations. Can isolate process code and memory from the rest of the system. Can encrypt all access to hardware from a process.

Code signing

Can sign code with an encrypted hash of the program to ensure integrity.

Covert Channels

Can be used by a process to gather information about another isolated process. Can yield information about process runtime activities. Often things like file locks.

Stenography

Encodes data into another medium such that the original medium is nearly unrecognisably unchanged. Commonly done in images.

Side channels

Observations about running processes based on shared resources. Analysing things like memory, caches, page faults, timing, power analysis. Can gain information about encrypted data. Often due to inadvertent leakage.

Future of Computing Systems

Moving towards more portable, commoditised hardware. There is also the rise of online services. Historically there were large multi-user systems acting as servers, developments such as virtual machines and containers are enabling this workflow again. Software as a Service (SaaS), Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) provide varying levels of remote computing functionality. The moving of applications to a cloud shifts the responsibility for maintenance to the vendor.

Rented infrastructure is often virtualised, meaning the user is responsible for patches and updates. This has advantages of allowing easier hardware scaling and reduced hardware startup costs. Because security is dependent on the user, still need skilled sysadmins.

Platform as a Service provides a framework for development and deployment of applications. This removes the need to support the underlying frameworks and generally provides auto-scaling.

Function as a Service (Serverless) allows a level up from PaaS, allowing a developer to focus on their applications functionality, not its deployment.

All of this is changing the role of a sysadmin. The role is changing from configuration to installation. Will always be needed for hardware and IaaS, as their need will always exist.

Table of Contents

COMP30023 - Computer Systems

The Internet

Models

Layers

Connections

OSI

TCP/IP

HTTP

Overview

Connections

Persistence

Request methods

HTTP response codes

Headers

Processing

Client side

Server side

Static Page

Dynamic content

Web cache

Proxy

Cookies

Static content

Dynamic content

Scripting

Domain Name System - DNS

Records

Registering a new domain

Resolving a domain

Security

Email

Overview

Headers

SMTP

MIME

Content Types

Message transfer access

POP3

IMAP

Streaming

Layers

Presentation Layer

Session Layer

Transport Layer

Overview

UDP

TCP

Primitives

Overview

Features

Properties

Header

Three way handshake

Re-transmission

Closing

Sockets

Primitives

States - low level

Sockets in C

TCP Sliding Window

Git

TCP congestion control

Re-transmission

Lost packets

Window size

Congestion Control Window

Slow start algorithm

Further optimisations

Macroscopic changes

Window size

Rate, fairness

Address switching

Network Layer

Store and forward packet switching

Connection-oriented vs connection-less

Multi-Protocol Label Switching

IPv4

Types of addresses

Classes