$conf['savedir'] = '/app/www/public/data'; notes:comp30023

Table of Contents

COMP30023 - Computer Systems

The Internet

Aggregation of many smaller networks. No single point of control.

Models

Layers

Networks consist of a stack of layers, each offering services to those above it. Services provided to those above with primitives. Interactions between layers provided by protocols.

Connections

TCP is connection oriented.

UDP is connection-less

OSI

Layer Name Unit exchanged
7ApplicationAPDU
6PresentationPPDU
5SessionSPDU
4TransportTPDU
3NetworkPacket
2Data linkFrame
1PhysicalBit

TCP/IP

Layer Name
4Application
3Transport
2Internet
1Host-to-network

Designed to be independent of data link and physical layers.

HTTP

Hypertext Transfer Protocol. Application layer level. Referenced by URLs (URL \(\approx\) protocol + host + file location)

Overview

Client initiates TCP connection to server. Server accepts TCP connection from client. HTTP messages exchanged between client and server.

Connections

Persistence

Non-persistent connections need a new connection to be established for every file to be sent. Can be done in parallel. Persistent uses an established connection to receive data. Reduces connection setup overhead and response time. Can pipeline requests.

Request methods

Method Safe Idempotent Cacheable
GETYYY
HEADYYY
POSTNNY/N
PUTNYN
DELETENYN
CONNECTNNN
OPTIONSYYN
TRACEYYN
PATCHNNN

Idempotent - identical requests have same effect. Safe - only for retrieval, shouldn't change state.

HTTP response codes

Code Meaning Examples
1xxInformation100 - agree to handle request
2xxSuccess200 - success, 204 - no content
3xxRedirect301 - page moved, 304 - cache still valid
4xxClient Error403 - forbidden, 404 - page not found
5xxServer Error500 - internal server error, 503 - try again later

Headers

Can be sent in request and response. Carry information about the message.

Processing

Client side

Direct access to content

PDF viewer, word, etc.

Server side

Static Page

5 steps

  1. Accept TCP from client
  2. Identify file requested
  3. Get file from local storage
  4. Send file to client
  5. Release connection

Multi-threaded server can handle multiple requests in parallel

Dynamic content

Runs series of steps

  1. Resolve name of web page requested
  2. Perform access control on page
  3. Check cache
  4. Fetch page from disk or program
  5. Determine rest of response
  6. Return response to client
  7. Make entry in server log

Web cache

Access local (recently fetched) pages instead of sending remote requests. Used by sending conditional get requests. Can have a local proxy cache for many local clients acting as a client for the “real” servers.

Proxy

Can cache, help with security and share IP address. Browser sends requests to proxy and proxy checks cache and forwards request to server.

Cookies

Stores data on the client (<4Kb). 5 fields:

Used to keep state on the client.

Static content

HTML documents consisting of web page components. HTML is plain-text markup encoding content and presentation. Web page components are structural divisions, restricted tag sets, attributes and values (e.g. img)

Dynamic content

Generated on the fly for the client from applications running on server. Sent in HTML/CSS/etc. and XML, like static content.

Scripting

Server side - PHP. Client side - Interactive applications using JavaScript, Java, ActiveX, AJAX.

Domain Name System - DNS

Maps URL domain to IP address. IP addresses can be thought of numerical identifiers, ideally would uniquely identify a socket/jack but often not the case.

Records

Type Meaning Value
SOAStart of authorityParameters for zone
AIPV4 address of host32-bit integer
AAAAIPV6 address of host128-bit integer
MXMail exchangePriority, domain willing to accept email
NSName serverName server for domain
CNAMECanonical nameDomain name
PTRPointerAlias for IP address
SPFSender policy frameworkText encoding of mail sending policy
SRVServiceHost that provides it
TXTTextDescriptive ASCII text

Registering a new domain

Register name with DNS registrar. Provide names, IP addresses of authoritative name server (primary and secondary). Registrar inserts two RRs into TLD server. Create authoritative server with A record.

Resolving a domain

Asks a local DNS server for a domain. If not known, sends query to authoritative name server and then to root name server for answer. Queries are subject to timers to avoid long delays. A hosts files contains hard coded mappings of names to addresses.

127.0.0.1 localhost

Security

Original design had no security. Allowed DNS spoofing and flooding. DNSSEC and root signing were created as solutions to this.

Email

Overview

  1. Submission - Sender's mail agent sends message to message transfer agent
  2. Message Transfer - Message is sent with SMTP to a different message transfer agent where it sits in the mailbox
  3. Final Delivery - Receiver fetches message from mailbox

User agent organises messages and formats them such that they can be sent. Message consists of a header (from, to, subject, etc.) and body.

Headers

User available: To, Cc, Bcc, From, Date, Subject. Hidden: Message-Id, In-Reply-To, References, Reply-To, Sender, Return-Path, Keywords, Received.

SMTP

Simple Message Transfer Protocol. Uses port 25, with direct transfer. Transfer consists of a handshake (greeting), transfer of message and closure. Messages are 7 bit ASCII.

MIME

Multipurpose Internet Mail Extensions. Adds more features to mail. Adds additional message headers:

Content Types

Type Example Sub-types Description
textplain, HTML. XML, CSSText in various formats
imageGIF, JPEG, tiff, pngPicture
audiobasic, MPEG, mp4sounds
videoMPEG, mp4, quicktimemovies
modelvrml3d model
applicationoctet-stream, PDF, zipdata from application
messageHTTP, RFC822encapsulated message
multipartmixed, alternative, parallelcombination of types

Message transfer access

Transfer with SMTP. Delivery though local, POP3 (Post Office Protocol), IMAP (Internet Mail Access Protocol), HTTP (web-mail).

POP3

Transaction consists of authorisation, transaction, update. Syntax consists of USER/PASS, LIST, RETR/DELE, QUIT (update). Retrieving mail causes it to delete on the remote server.

IMAP

Keeps state across sessions. Retains mailbox on server and remotely accesses it. Requires larger server infrastructure.

Streaming

40% of web traffic. Utilising sockets over ports 80 and 443. Doesn't require client to keep requesting (unlike HTTP)

Layers

Presentation Layer

OSI layer 6. Provides:

Now done by applications. Don't apply evenly to all applications, so IETF groups them into application layer.

Session Layer

OSI Layer 5. Provides:

Examples:

Often used between protocols in layer 2/3

Transport Layer

Overview

Sends data to the right application and makes sure all the data arrived. Sits between the application and network layers. Can provide a reliable service on top of an unreliable network. Encapsulates message, message gets frame, packet and segments headers in addition to payload.

Terminology

Addresses using ports in a full 5-tuple.

Ports are 16 bit numbers (0-65535)

UDP

User Datagram Protocol. Connection-less communication, header followed by payload. Headers contain source and destination ports, payload handled by process attached to port. Uses a 32 bit header containing source and destination addresses, message length and a pseudo-checksum.

Strengths

Weaknesses

Good for simple client-server communication and real time services.

UDP can be good for Remote Procedure Calls when call is idempotent, TCP for non-idempotent calls. RPC allows calling of procedures on a remote machine as if they were on the local machine.

Real-Time Transport Protocol (RTP) is application layer, but uses UDP for transport. Also provides services to applications so is semi-transport layer also. Multiplexes several streams into a single UDP stream. Has a header that includes payload type, segment number and timestamp.

TCP

Transmission Control Protocol.

Primitives

Primitive Packet Sent Meaning
LISTEN(none)Block until something tries to connect
CONNECTCONNECTION REQActively attempt to establish a connection
SENDDATASend information
RECEIVE(none)Block until DATA packet arrives
DISCONNECTDISCONNECTION REQThis side wants to release the connection

Overview

Provides a connection oriented protocol over connection-less network layer. Providing a connection based protocol increases reliability. Breaks down a message into separate IP datagrams and reconstructs them at the other end. Sends using sockets named by the 5-tuple. Needs an explicit connection establishment by sending and receiving hosts.

Features

Data in both directions simultaneously

Exact pairs of senders and receivers

Message boundaries not preserved

TCP entity can choose to buffer prior to sending, unless PUSH or URGENT flags are set.

Properties

20-60 byte header, zero or more data bytes. TCP entity decides how large segments are subject to IP payload (< 65515 bytes) and Maximum Transfer Units (~1500 bytes). Uses a sliding window protocol for reliable data delivery and congestion control.

Name Description
Source portSending port
Destination portReceiving port
Sequence numberIf SYN=1: initial number, if SYN=0: accumulated number of first data byte in segment
Acknowledgement numberIf ACK=1: next number that sender of ACK is expecting
Data offsetSize of TCP header (20-60 bytes)
FlagsSingle bit flags (SYN, ACK, RST, FIN, etc)
Window sizeSize of receive window, how much sender is willing to receive
Three way handshake

Establishes a reliable connection, even if set up packets get lost and initial sequence for sliding window. Both sides agree with strategy before submitting segments.

  1. Synchronisation

SYN for synchronisation during establishment. SYN or FIN increments sequence number. Sequence number is offset by a random number, where only the offset matters. Acknowledgement number is the next byte the sender expects to receive. Bytes are received without gaps, needs to wait for segment, even if later segments are received.

Request is SYN=1, ACK=0 Reply is SYN=1, ACK=1. SYN used in both request and accept, ACK distinguishing between the two. After setup sequence number with the first byte of payload is sent with the acknowledgement number being the next byte sender expects to receive.

Re-transmission

Each segment has an associated re-transmission timer (RTO) Initialised with a default value and updated based on network performance. If timer expires before an ACK is received, next segment is sent.

If the receiver gets a segment with a higher than expected sequence number (segment lost), sends ACK with sequence number expecting. This is a duplicate of previously send ACK (DupACK). With 3 DupACK, sender resends lost segment, known as fast re-transmission.

Closing

FIN flag used to close connection. Once acknowledged, no further data can be sent from sender to receiver, but data can continue in the other direction. Will continue to re-transmit unacknowledged segments. Typically requires a FIN and ACK in each direction to close connection, although can be optimised down to 3 if receiver sends FIN with its ACK.

RST can be used to hard close a connection. States sender is closing and will not listen for further updates. Sent in reply to a packet to a 5-tuple without an open connection. FIN is preferred, as shows an orderly shutdown.

Sockets

Sockets provide an application an interface to the kernel for networking. Sockets addresses are denoted by the 5-tuple (protocol, source-IP, source port, destination-IP, destination port).

Primitives

State Description
SOCKETCreates a new communication endpoint
BINDAssociate a local address with a socket
LISTENAnnounce willingness to accept connections; give queue size
ACCEPTPassively establish an incoming connection
CONNECTActively attempt to establish a connection
SENDSend some data over a connection (write())
RECEIVEReceive some data from the connection (read())
CLOSERelease the connection

States - low level

Order State Description
1CLOSEDNo connection is active or pending
2LISTENThe server is waiting for an incoming call
3SYN RCVDA connection request has arrived; wait for ACK
3SYN SENTThe application as started to open a connection
4ESTABLISHEDThe normal data transfer state
5aFIN WAIT 1The application has said it is finished
6aFIN WAIT 2The other side has agreed to release
7TIME WAITWait for all packets to die off
6bCLOSINGBoth sides have tried to close simultaneously
5bCLOSE WAITThe other side has initiated a release
6cLAST ACKWait for all packets to die off

Sockets in C

//Headers
#include <arpa/inet.h>
#include <sys/socket.h>
#include <netinet/in.h>
 
//Variables
int listenfd = 0, connfd=0;
char sendBuff[1025];
struct sockaddr_in serv_addr;
 
//Creation
listenfd = socket(AF_INET, SOCK_STREAM, 0); //create socket
memset(&serv_addr, '0', sizeof(serv_addr)); //initialise server address
memset(sendBuff, '0', sizeof(sendBuff)); //initialise send buffer
 
serv_addr.sin_family = AF_INET; //type of address - internet IP
serv_addr.sin_addr.s_addr = htonl(INADDR_ANY); //listen on ANY IP Addr
serv_addr.sin_port = htons(5000); //listen on port 5000
 
//Bind and listen
bind(listenfd, (struct sockaddr*)&serv_addr, sizeof(serv_addr));
listen(listenfd, 10); //maximum number of client connections to queue
 
//Accept, send, close
connfd = accept(listenfd, (struct sockaddr*)NULL, NULL);
 
snprintf(sendBuff, sizeof(sendBuff), "Hello World!");
write(connfd, sendBuff, strlen(sendBuff));
 
close(connfd);
 
//Connect
connect(connfd, (struct sockaddr*)&serv_addr, sizeof(serv_addr));
 
//Receive
while ((n = read(connfd, recvBuff, sizeof(recvBuff) - 1)) > 0 )
{
     //process received buffer
}

TCP Sliding Window

Controlled by receiver. Determines amount of data the receiver is able to accept. Buffers are maintained separately to the application without guarantee that data is immediately sent or read from the respective buffers.

When the window is 0, the sender shouldn't send any data.

0 byte segment that causes the receiver to re-announce the next expected byte and window size. Designed to prevent deadlock. Sender may delay sending in order to fill more of the receive window.

What the sender is able to send

Amount of data the receiver is willing to receive - window size in ACK.

Git

Distributed version control software. Used to track and control changes to files over time. Commits are stored as nodes in a DAG. Has a working directory (top level folder with single version of files), staging area (files to be added in the next commit, stored in .git) and .git directory (configuration and repository database). Takes snapshots of file system as opposed to deltas, allows referencing of duplicate files to save space. Allows for easy and cheap branching. Can use delta compression for space saving. Uses a key-value database with SHA1 hashes as the key. git checkout can be used to switch to a particular commit/branch, needs an unambiguous hash prefix. The HEAD value denotes the currently checked out commit. Recommended workflow is to branch and merge in projects. git push, pull and clone are to interact with remote directories, receiving changes, sending changes and getting repositories respectively. Tags can be used to give names to important commits, like release versions.

TCP congestion control

Re-transmission

If a packet receives a packet after the one it was expecting, it sends an ACK for the one it was expecting. If it continues to receive later packets, it continues sending ACKs. If the original ACK and 3 duplicate ACK (dupACKs) are sent, a Fast Re-transmission is triggered and the original expected packet is sent.

Lost packets

To deal with lost packets, originally “Go-back-N” was used, this caused all the packets from the lost one onward to be re-transmitted. This has the advantage of the receiver not needing to store/reorder packets. The alternative is “Selective Repeat”, where only the lost packet is re-transmitted and the packets arrive out of order. Selective Repeat is more complex and helps where loss is common.

Window size

Once the window is full, a deadlock could occur where the window size may change but the server may not know. One solution is for the receiver to send an WindowUpdate, acknowledging the last packet and the new window size. Alternately, when the sender receives a window size 0, it can start a timer and periodically send ZeroWindowProbes to find the window size.

Congestion Control Window

Introduced with selective repeat. Additionally window dynamically adjusted based on network performance. It is maintained by the sender, unlike the sliding window controlled by the receiver.

Slow start algorithm

At first CWND sends 2*maximum segment size. If the segment is acknowledged, CWND++. Each full window doubles CWND, which continues until

This achieves exponential growth over each round trip time. Eventually too many segments will be placed on the network, causing congestion and timeout. If segment loss occurs, SSthresh is set to half the current window size and the process starts again. Once the threshold is reached, the growth is slowed to linear, adding 1 MSS to the window for each successful ACK. Now we start from the new SSthresh instead of the original starting value.

Further optimisations

SACK (Selective Acknowledgements) provides the ability to track segments in flight by adding later received packets in SACK. ECN (Explicit Congestion Notification) (ECE & CWR bits set during SYN)

Macroscopic changes

Window size

W increases once per window. When a loss occurs, W is halved. With probability p of a loss, the average increase in window size is \[\frac{1-p}{W}-p\frac{W}{2}\] To be in equilibrium, the average increase must be zero, hence \[W\approx\sqrt{2/p}\]

Rate, fairness

The window is sent ⇐ once per RTT, T. Rate is \(\frac{W}{T}\approx\frac{1}{T}\sqrt{2/p}\). For a given loss rate, longer RTT get less rate. If RTT is small, TCP forces the loss rate to be high.

Address switching

Network Layer

The Internet (network) layer is responsible for getting data from the source to the destination. Traffic must be routed effectively. There are 2 types of services on the network layer

Store and forward packet switching

This is like the internet.

Host H1 sending packet to H2

  1. Transmits to nearest router
  2. Packet is buffered while arriving and checksum verified
  3. If valid, packet is stored until the next outgoing interface is free
  4. Router forwards packet to next router in path
  5. Repeat

Connection-less routers store forwarding tables (also called routing tables) on how to get to other routers which are dynamically updated. Connection-oriented routers store in and out tables for sending all packets along the same route.

Connection-oriented vs connection-less

Issue Datagram Network Virtual Circuit
TypeConnection-lessConnection-oriented
Addressing(-) Each packet has full source and destination(+) Each packet contains a short VC number
State+ Routers do not hold state information about connections- Each VC requires router table space
- Router reboots a problem
RoutingEach packet independentDefined at set-up
Quality of Service- Difficult+ Easy if enough resources
Congestion control- Difficult+ Easy if enough resources

Multi-Protocol Label Switching

Primary purpose is Quality of Service.

Expensive, roughly 20-100 times more per Mbps.

IPv4

32 bit address. IP addresses are given to interfaces, not hosts. Supply of addresses exhausted.

Types of addresses

Classes

No longer used but spoken of

Class Bits Range
A01.0.0.0 to 127.255.255.255
B10128.0.0.0 to 191.255.255.255
C110192.0.0.0 to 223.255.255.255
D1110224.0.0.0 to 239.255.255.255
E1111240.0.0.0 to 255.255.255.255

Class D is for multicast. Class E is reserved for future use.

CIDR

Reduces waste from classes. Specifies number of bits for the network field. Network in top bits, host in bottom. Can be written as a subnet mask of 1's, e.g. /24 is 255.255.255.0. Network number = network mask (bit-wise AND) IP address. Allows for automatic aggregation, reducing the size of the routing table. Overlapping prefixes are eliminated by longest prefix first.

Special IP addresses

Private range Prefix and mask number of addresses
10.0.0.0 - 10.255.255.25510.0.0.0/8 (255.0.0.0)16,777,216
172.16.0.0 - 172.32.255.255172.16.0.0/12 (255.240.0.0)1,048,576
192.168.0.0 - 192.168.255.255192.168.0.0/16 (255.255.0.0)65,536
Address Use
0.0.0.0Placeholder
000HostHost on current network
255.255.255.255Broadcast local network
Network1111Broadcast distant network
127.*.*.*Loopback

IPv6

128 bit address. Simpler header, improved security, more QOS support.

Subnets

Using prefixes within an organisation to allocate IP addresses. Allows packets to be sent without the router knowing all the hosts.

Network Address Translation (NAT)

Hides multiple interfaces behind a single address. Changes a port number when leaving the NAT.

Steps

Criticisms

Still widely used. Can provide security by shielding from unsolicited incoming connections.

Fragmentation

IP packets have maximum size of 65,535, determined by 16 bit length header. Network links can't handle such sizes. Lower layers need to break up (fragment) packets.

Fragmentation can be due to

Hosts like larger packets, reduces workload. When a packet transits, maximum packet size may change. e.g. Ethernet is 1500 bytes and WIFI is 2304 bytes. The Maximum Transmission Unit (MTU) is the maximum size for a network. The path MTU is the maximum size for a path through the network, equal to the minimum MTU on each link. Due to the connection-less nature, can't set MTU at sender as path can change. Instead, routers can break packets to be sent individually.

Reassembly is difficult.

Reassembly is performed at the next router, routers unaware of fragmentation.

Reassembly is performed at the destination host. Uses DF (Don't Fragment) and MF (More Fragments) flags. MF=1 for all fragments but last. Fragments have an offset according to size, allowing for packets to be reconstructed out of order in a buffer.

Simple approach, but large reassembly cost and a single lost fragment causes entire packet retransmission. Alternative is path MTU discovery, where each packet is sent with DF=1 and router sends an ICMP (Internet Control Message Protocol) to sender if packet is too large. This may cause initial packets to be dropped but hosts can learn optimal size quickly and reduce subsequent fragmentation. TCP and IP are often implemented together to reduce fragmentation.

IPv4 allows for nontransparent fragmentation or path MTU discovery with a minimum of 576 bytes. IPv6 expects hosts to find optimal MTU without fragmentation with a minimum size of 1280 bytes. ICMP messages can be dropped, causing low volume passes and high volume fails, if in doubt send minimum size.

Routing

Forwarding

Each router has a forwarding table which maps addresses to outgoing interfaces. Upon receiving a packet:

This repeats on every router in the path to the destination.

Routing Algorithm

Runs on each router. A good algorithm needs

An algorithm depends on what it being optimised (packet delay or throughput). The simplest approach is to minimise the number of hops a packet has to make. This reduces packet bandwidth and improves delay, also hopefully reduces physical distance travelled. The actual algorithms assign a cost to each link.

Static routing doesn't adapt to network topology. The table is calculated offline and uploaded to the router on boot, resulting in it not responding to failure. This is reasonable where there is a clear or implicit choice. Adaptive routing dynamically changes to network topology and potentially traffic levels. It may optimise some property: distance, hops, transit time, etc. It may get information from adjacent routers.

Flooding

This is the simplest adaptive routing. It:

It is when a router sends a packet along all its links, which results in every router having received the packet at least once.

Adaptive Routing

Utilises the optimality principle in route planning (A subset of a shortest path must also be a shortest path). This can be used to reduce a graph to a tree, involving shortest path algorithms like Dijkstra's.

Replaced Distance Vector Routing (Bellman-Ford) that had problems converging quickly enough. Variants still used today. Relatively simple:

  1. Discover neighbours
  2. Set cost to neighbours
  3. Construct packet containing knowledge
  4. Send packet to and receive from all other routers
  5. Compute shortest path to each router

Costs can be manually or automatically set, commonly using bandwidth. Building packets is easy, deciding when to build presents problems. Flooding is used to send to all other routers.

Border Gateway Protocol (BGP)

Used to route between networks. Needs to consider politics, opposed to internal networking.

Routes may be better in some respects, worse in others. Difficult to optimise.

IP Multicasting

One to many communication, useful for streaming live content or updating. Uses D class addresses. Hosts join groups at the request of processes. Routing trees are constructed through multicast algorithms. These help prevent security risks through amplification attacks. Often deployed in organisational networks, not internet as not proven to scale to that extent.

Congestion control

Occurs when they're too many packets on the network. Causes performance to plummet. Ideally would reach saturation and remain there. Different to flow control, which is between hosts. Both have the same solution, which is to slow sending rate. In order from slowest to fastest, the following are solutions to congestion

Add more physical capacity, slow and expensive but will always work.

Temporarily reroute traffic around congested areas, however eventually all routes become saturated.

Used on virtual circuits to control who can place traffic onto the route

Reduce sending rate

Effective for congestion, bad for network utility.

Explicit congestion control

Use of ECN flags in IP header.

If the receiver gets a packet marked as experiencing congestion, it echoes to the sender with the ECE bit set. The sender then reduces its transmission rate and sets the CWR (Congestion Window Reduced) bit in acknowledgement. On receipt of CWR set, receiver stops sending ECE bit. This is closely linked with TCP, blurring the lines between the two.

Internet control protocols

These use the internet to manage functionality.

ICMP

Used in PING, but can do more.

Message type Description
Destination unreachablePacket could not be delivered
Time exceededTime to live field hit 0
Parameter problemInvalid header field
Source quenchChoke packet
RedirectTeach router about geography
Echo and echo replyCheck if a machine is alive
Timestamp request/replySame as echo, but with timestamp
Router advertisement/solicitationFind a nearby router

Traceroute is an example that exploits the TTL to check the route being used to send a packet. Each packet is sent with an incremented TTL, meaning they hit 0 on successive routers, revealing their IP address.

DHCP

Each host must have a unique IP address (excluding NAT). Manually configuring each host is difficult to do and slow to respond. DHCP automatically allocates IP addresses. A network needs a DHCP server for issuing the addresses. A host sends a DHCP DISCOVER packet which the server receives and responds with a DHCP OFFER containing an IP address. IP addresses contain a lease, after which they will be reclaimed by the server and reissued. Hosts can request renewal before the lease expires. Alternately IP can be tied to a layer 2 address.

MAC (layer-2) address

Layer 2 addresses incorrectly called MAC (Medium Access Control) due to old “shared medium” layer 2 networks. An address is a globally unique identifier usually hard-coded by the manufacturer. They can be between 48 and 64 bits long. Used at the host-to-network/data link layer, sometimes called a physical address.

ARP

Translates addresses between the internet layer and physical network layer (IP to MAC). Works by broadcasting a packet asking who owns an IP address and the owner responds with its MAC address. It is a simple, inefficient protocol. It has many security issues:

Layer 2

Modern Ethernet like IP and old Ethernet like WiFi. Old Ethernet is a shared medium which just broadcasts packets and packets can collide if broadcast at the same time. The packet format/services are the same so software doesn't need to change.

Process management

Operating system

Provides an abstraction of hardware resources and manages these resources. Provides the concept of:

Processes

A process is a program in execution. Programs are static, processes are dynamic. The state of a program consists of the code or “text” of the program, the values of the variables, the address of the current instruction and some other external data (e.g. current directory). Each process has its own address space.

Most process a user deals with are created by the user, however some are provided by the OS. An example of a system process is a print daemon.

Multi-programming

Conceptually each process has its own CPU. In reality each process runs for a short period of time and take turns. Multi-programming increases system efficiency, if a process is waiting for something another can run in the meantime.

Process creation

Four things need to happen for a process to be created:

  1. System initialisation
  2. Execution of a process creation system call by a running process
  3. A user request to create a new process
  4. Initiation of a batch job

Process termination

Typical conditions which terminate a process:

  1. Normal exit (voluntary)
  2. Error exit (voluntary)
  3. Fatal error (involuntary)
  4. Killed by another process (involuntary)

Process states

The lowest layer of a process-structured operating system handles interrupts and scheduling. Processes can occupy the following states:

  1. Running (using the CPU at that instant)
  2. Ready (runnable, temporarily stopped while another process is running)
  3. Blocked (unable to run until some external event happens)

The kernel

Responsible for managing processes. Runs as the privileged part of the operating system. Also provides services such as IO. Not a process in itself.

Most programs run in an unprivileged mode, and can not issue privileged instructions or access certain parts of memory. The kernel can issue all instructions and access all memory. Instructions and memory locations which could interfere with other processes are privileged. This distinction is the foundation needed by the kernel for its security mechanisms. The privilege level is stored in hardware, given by the program status word (PSW) register.

System calls

System calls allow user processes to ask the kernel to execute privileged instructions and access privileged memory on their behalf. The OS checks these requests before execution, preserving security. To make these more convenient, system calls will typically do several privileged things in one logical operation.

Interrupts

Used by a hardware device to get attention from the CPU. Asynchronous with current process. When an interrupt occurs, CPU hardware saves process' current state in a privileged location until the interrupt is handled. The CPU handles the interrupt in kernel mode.

The interrupt vector is the address of the interrupt handler. The handler

There may also be pseudo-interrupts which come from the CPU, rather than external hardware. User programs may generate pseudo-interrupts, such as division by zero. Pseudo-interrupts can also be intentionally generated by certain instructions, e.g. TRAP.

Threads

Threads are a sequential execution stream within a process. They can be considered a lightweight process. Threads can communicate with each other without the kernel, they share global variables (address space) and dynamic memory. Threads have their own program counter, registers, stack and state.

A process can have many threads working independently. This is advantageous as they have less overhead compared to processes. The problems involved with threading involve concurrency. As such things like race conditions and deadlocks arise with threads.

Pthreads

POSIX standard API for threads. Global variables are shared across threads. Threads can switch at any time and need to be synchronised.

User-space threads

Threads can be implemented in user or kernel spaces. This means a process can manage its threads or the kernel can. User-space has less overhead but can be blocked by blocking system calls.

Race conditions

When multiple threads access the same object at the same time. Output depends on order of execution.

An idea to eliminate this is to create a critical region where only one thread can access at a time. Requirements for a critical region:

  1. No two processes may simultaneously be in their critical regions
  2. No assumptions on speed or number of CPUs
  3. No process running outside critical region should block other processes
  4. No process should have to wait forever to enter critical region

Shared variable locks can be implemented to guard the critical region.

Methods for avoiding race conditions include:

Their implementations are

Strict alternation

The threads have a global turn counter that they use to determine which one can access the critical region. Once a thread has accessed the critical region, it sets the counter to the other thread. The thread then continues until it needs the critical region again and waits until it's its turn again.

Test and set lock

Most CPUs have an instruction to atomically test and set. This can be used to check whether to enter the critical region.

Busy waiting

If a process wants to enter a critical region, it checks if entry is allowed and if not it executes a loop, checking if it is allowed. This can waste CPU and can starve low priority processes.

Blocking

Attempt to enter a critical region, if fails register interest and block. When the critical region is available, the OS will unblock and the process will continue.

Deadlock

Occurs when two threads are waiting for an event that only the other can cause. To deal with deadlocks we can:

  1. Ignore the problem
  2. Detect the lock and try to recover
  3. Avoid by careful resource allocation
  4. Prevention

Scheduling

Ideally the CPU would be used all the time. The problem is deciding which process should run and for how long. Scheduling depends on whether the process is CPU bound or IO bound. It also depends on whether the process is batch, interactive or real time. The scheduler should aim for:

All processes get a fair share of CPU

Keep CPU at 100% utilisation

Number of process/time

Time from process start to completion

Time from request submitted to response produced A scheduling algorithm can be non-preemptive (process runs until it finishes or blocks) or preemptive (process can be suspended after some time). Context switching takes time, need to save process state. Various algorithms exist:

Difficult to build a general scheduler.

Memory Management

Process memory has three segments:

  1. Text (program code)
  2. Data (constant data, global variables)
  3. Stack (local variables)

Hierarchy

Memory needs to be:

In reality there are various types of memory with different properties. There is a hierarchy of memory, with the largest capacity tending to be the slowest at the bottom (disk) and the fastest and smallest at the top (registers). In between there are caches and main memory.

Caches

Caches have fast random access and are managed by hardware. There can be many levels of caches, with the largest tending to be the slowest. Caches store frequently accessed data. A cache hit is when the wanted data is found in the cache, and a miss is when it isn't. There are problems in deciding what and when to move data in and out of cache.

Memory manager

Hardware responsible for

Without a memory manager, programs can access each other's memory posing security problems.

Bitmaps can be used to keep track of what is in use and what is free. This is slow in looking for free space for new programs. Linked lists can be used instead, making free space faster to find.

Allocation has various algorithms to decide how to allocate:

Fragmentation

Fragmentation is when there non-consecutive memory is used. Data can be swapped between memory and disk in order to decrease memory usage. This can allow the program to load what it needs into memory at a given time, but adds extra complexity to the program.

Virtual Memory

Allows each program to have its own address space and think it has access to more memory than exists. Breaks up memory into chunks called pages. Virtual pages sit on physical pages and can be moved between physical pages (page frames). As a full page is rarely needed, paging wastes some space for each process, called internal fragmentation.

To map a virtual page to a frame, the offset needs to be found and the frame number needs to be found. This then gives the physical location. If a given page can't be found, a page fault exception is generated.

Translation Look-aside Buffer

Without optimisation, each memory reference would need two accesses to memory, one for page table and one for data. To avoid this, we can use a hardware unit to cache the table.

Page faults

The process can crash if it requests an invalid page. The OS can also suspend the process and create a new page, then rerun the instruction.

Page replacement algorithms

Cryptography

With the exception of the One Time Pad, there is not perfect cryptography algorithm. The aim is to make an attack take sufficiently long that the data is no longer useful. The two main types of cryptography are symmetric, where the same key is used for encryption and decryption, and asymmetric, where two keys are used, one to encrypt and one to decrypt.

Symmetric encryption

A modern example is AES (Advanced Encryption Standard). It is useful if you have some way to securely share the key with those you want to be able to decrypt. It is useful for keeping your own data secure, such as for full disk encryption. AES works by breaking data into blocks.

ECB - Electronic Code-book

Simple, parallelisable but doesn't provide diffusion and may not provide confidentiality. Shouldn't be used. Same plain-text encrypts to same cipher-text. Repeated patterns will be evident in output but at a lower fidelity.

Probabilistic encryption

Won the 2012 Turing award. Provides a secure notion of encryption by ensure the same message encrypted twice does not provide the same cipher-text.

CBC - Cipher Block Chaining

Uses a random, non-reusable initialisation vector. Each block is encrypted sequentially using part of the previous block's cipher-text. Decryption can be done in parallel. Loss or corruption of a block prevents decryption of all subsequent blocks.

Asymmetric cryptography

Commonly called public key cryptography. Consists of a public key and a private key kept by the user/owner. The public key can be received over the internet, making there no need for a secure key exchange. Asymmetric cryptography is slower than symmetric and had size limits. As a result it is often used with symmetric cryptography as a way of securely exchanging a joint secret key.

Public key signatures

Links a key holder to a file. Verify with a publicly available key to prove signer. Large documents have a hash signed instead of signing the whole file. The hash needs to appear random, be hard to invert and have high collision resistance. Effectively a hash is a special lossy compression function.

Certificates

Digitally signed documents proving ownership. In reality the signer is a Certificate Authority (CA) who is already trusted. CA's are often a third party. Certificates can be chained, A signs B who in tern signs C. This means that as A trusts B and B trusts C, A trusts C. The top level of the hierarchy are the trust anchors, who are explicitly trusted. The anchors have self-signed root certificates which they use to sign for others, such as some CA's. Trust anchors can also be explicitly coded (ssh, application specific) or build (PGP web of trust).

Types

DV certificates do not establish a link between domain and real world entity.

Revocation

Revocation occurs when a certificate is mistakenly issued or a private key is compromised. Performed via Certificate Revocation Lists and Online Certificate Status Protocol (OCSP). When a root certificate is revoked, all certificates below them become untrusted.

Key sizes

Dependent on the degree of security required. Asymmetric is generally longer than symmetric. RSA - 2056 bits, AES - 128 bits. These keys used to be shorter but grow with computational power.

Randomness

Needed for key generation, IV generation, Padding. Pseudo-random generators are often used. Randomness must never be reused or discover-able It is difficult to generate good randomness as it must be unpredictable. The random generation varies by OS.

Secure communication

Aim to provide secure, private communication between two end points, with integrity checks to ensure data does not change in transit and authentication to establish identities of end points.

CBC Tampering

An attacker can reorder ciphertext or flip bits. Every possible ciphertext corresponds to some valid plaintext. Alters the sent message.

Message Authentication Code (MAC)

Used to identify if a message has been tampered with. Uses a key, k which can be used with the message to test the message. \[t\gets\text{MAC}(k,m);b\gets\text{Verify}(k,m,t)\] b is 0/1 indicating verification status. An adversary can not create m', t' such that the verification returns b=1 for an unknown message.

HMAC is the industry standard which is widely used in practice. \[t\gets\text{Hash}((k\oplus \text{opad}) || \text{Hash}((k\oplus \text{ipad}) || m))\] Here, ipad and opad are fixed constants used for padding.

Generally MAC is used on the ciphertext. This is done so that decryption does not need to be done if MAC fails.

Diffie-Hellman Key Exchange

Fundamental to protocols that rely on TLS, such as HTTPS and SSH. Provides perfect forward secrecy, exposure of keys does not compromise past sessions. Sends information so that both parties can calculate shared key without sending shared key.

Uses a public prime, \(p\), and a generator, \(g\). Parties pick a random value, \(x\), and compute \(X=g^x\mod p\). Parties then exchange their \(X\) values, \(X'\). The parties then calculate the secret \(s=X'^x\mod p = g^{xy}\mod p\). Both parties now have a shared secret without exchanging said secret. It is considered hard to solve the discrete log (find \(x\) after calculation). Provided both parties discard their secrets, even if one of them loses their private key the past communication cannot be decrypted.

TLS

Uses a handshake protocol to establish shared keys then uses a record protocol with the secret keys for data exchange. Requires unencrypted communication to establish common protocols and to share public keys.

Can create a local CA and use for man-in-the-middle attacks. Allows for the transparent interception of TLS communication. Can also be used by antivirus/web protection software. Same can be said of proxies and load balancers.

System security

Gain access to a system to:

An attack can be passive (observing traffic and other activities) or active (changing content).

Access Control Lists

Put on files to restrict who can access them. Unix-like systems have three sets, owner, group, world. Rights determine read, write and execute control.

Specialised hardware

Can use things like:

Encrypted hard drive

Transparently encrypts/decrypts content. The disk driver has an encryption engine. Can be combined with Trusted Platform Module (TPM) to allow decryption only with a certain machine. Data Encryption Key (DEK) is encrypted with Authentication key. Data is encrypted with DEK.

Trusted Platform Module

Cryptoprocessor which stores secret keys and performs some cryptographic operations. Can isolate process code and memory from the rest of the system. Can encrypt all access to hardware from a process.

Code signing

Can sign code with an encrypted hash of the program to ensure integrity.

Covert Channels

Can be used by a process to gather information about another isolated process. Can yield information about process runtime activities. Often things like file locks.

Stenography

Encodes data into another medium such that the original medium is nearly unrecognisably unchanged. Commonly done in images.

Side channels

Observations about running processes based on shared resources. Analysing things like memory, caches, page faults, timing, power analysis. Can gain information about encrypted data. Often due to inadvertent leakage.

Future of Computing Systems

Moving towards more portable, commoditised hardware. There is also the rise of online services. Historically there were large multi-user systems acting as servers, developments such as virtual machines and containers are enabling this workflow again. Software as a Service (SaaS), Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) provide varying levels of remote computing functionality. The moving of applications to a cloud shifts the responsibility for maintenance to the vendor.

Rented infrastructure is often virtualised, meaning the user is responsible for patches and updates. This has advantages of allowing easier hardware scaling and reduced hardware startup costs. Because security is dependent on the user, still need skilled sysadmins.

Platform as a Service provides a framework for development and deployment of applications. This removes the need to support the underlying frameworks and generally provides auto-scaling.

Function as a Service (Serverless) allows a level up from PaaS, allowing a developer to focus on their applications functionality, not its deployment.

All of this is changing the role of a sysadmin. The role is changing from configuration to installation. Will always be needed for hardware and IaaS, as their need will always exist.