$conf['savedir'] = '/app/www/public/data'; ====== COMP30023 - Computer Systems ====== {{tag>notes unimelb computer_science}} ====== The Internet ====== Aggregation of many smaller networks. No single point of control. ===== Models ===== * OSI * Standardised pre-implementation * Planned * ISO group * TCP/IP * Standardised post-implementation * More spontaneous * Independent people working * Designed without many considerations * Security back-fitted ==== Layers ==== Networks consist of a stack of layers, each offering services to those above it. Services provided to those above with primitives. Interactions between layers provided by protocols. ==== Connections ==== TCP is connection oriented. * Connect, use disconnect * Similar to telephone connection UDP is connection-less * Send message through intermediate nodes * Similar to text or mail ==== OSI ==== ^ Layer^ Name^ Unit exchanged^ |7|Application|APDU| |6|Presentation|PPDU| |5|Session|SPDU| |4|Transport|TPDU| |3|Network|Packet| |2|Data link|Frame| |1|Physical|Bit| ==== TCP/IP ==== ^ Layer^ Name^ |4|Application| |3|Transport| |2|Internet| |1|Host-to-network| Designed to be independent of data link and physical layers. ===== HTTP ===== Hypertext Transfer Protocol. Application layer level. Referenced by URLs (URL \(\approx\) protocol + host + file location) ==== Overview ==== Client initiates TCP connection to server. Server accepts TCP connection from client. HTTP messages exchanged between client and server. === Connections === * HTTP 1.0 - Single use connection * HTTP 1.1 - Persistent connection, additional headers * HTTP 2.0 - Speed improvements ==== Persistence ==== Non-persistent connections need a new connection to be established for every file to be sent. Can be done in parallel. Persistent uses an established connection to receive data. Reduces connection setup overhead and response time. Can pipeline requests. ==== Request methods ==== ^ Method^ Safe^ Idempotent^ Cacheable^ |GET|Y|Y|Y| |HEAD|Y|Y|Y| |POST|N|N|Y/N| |PUT|N|Y|N| |DELETE|N|Y|N| |CONNECT|N|N|N| |OPTIONS|Y|Y|N| |TRACE|Y|Y|N| |PATCH|N|N|N| Idempotent - identical requests have same effect. Safe - only for retrieval, shouldn't change state. ==== HTTP response codes ==== ^ Code^ Meaning^ Examples^ |1xx|Information|100 - agree to handle request| |2xx|Success|200 - success, 204 - no content| |3xx|Redirect|301 - page moved, 304 - cache still valid| |4xx|Client Error|403 - forbidden, 404 - page not found| |5xx|Server Error|500 - internal server error, 503 - try again later| ==== Headers ==== Can be sent in request and response. Carry information about the message. ==== Processing ==== === Client side === * Plugins/extensions - integrated into browser Direct access to content * Helper - separate program instantiated by browser, access local cache PDF viewer, word, etc. === Server side === == Static Page == 5 steps - Accept TCP from client - Identify file requested - Get file from local storage - Send file to client - Release connection Multi-threaded server can handle multiple requests in parallel == Dynamic content == Runs series of steps - Resolve name of web page requested - Perform access control on page - Check cache - Fetch page from disk **or program** - Determine rest of response - Return response to client - Make entry in server log ==== Web cache ==== Access local (recently fetched) pages instead of sending remote requests. Used by sending conditional get requests. Can have a local proxy cache for many local clients acting as a client for the "real" servers. === Proxy === Can cache, help with security and share IP address. Browser sends requests to proxy and proxy checks cache and forwards request to server. ==== Cookies ==== Stores data on the client (<4Kb). 5 fields: * Domain * Path * Content * Expiry * Security Used to keep state on the client. ==== Static content ==== HTML documents consisting of web page components. HTML is plain-text markup encoding content and presentation. Web page components are structural divisions, restricted tag sets, attributes and values (e.g. img) ==== Dynamic content ==== Generated on the fly for the client from applications running on server. Sent in HTML/CSS/etc. and XML, like static content. ==== Scripting ==== Server side - PHP. Client side - Interactive applications using JavaScript, Java, ActiveX, AJAX. ===== Domain Name System - DNS ===== Maps URL domain to IP address. IP addresses can be thought of numerical identifiers, ideally would uniquely identify a socket/jack but often not the case. ==== Records ==== ^ Type^ Meaning^ Value^ |SOA|Start of authority|Parameters for zone| |A|IPV4 address of host|32-bit integer| |AAAA|IPV6 address of host|128-bit integer| |MX|Mail exchange|Priority, domain willing to accept email| |NS|Name server|Name server for domain| |CNAME|Canonical name|Domain name| |PTR|Pointer|Alias for IP address| |SPF|Sender policy framework|Text encoding of mail sending policy| |SRV|Service|Host that provides it| |TXT|Text|Descriptive ASCII text| ==== Registering a new domain ==== Register name with DNS registrar. Provide names, IP addresses of authoritative name server (primary and secondary). Registrar inserts two RRs into TLD server. Create authoritative server with A record. ==== Resolving a domain ==== Asks a local DNS server for a domain. If not known, sends query to authoritative name server and then to root name server for answer. Queries are subject to timers to avoid long delays. A hosts files contains hard coded mappings of names to addresses. 127.0.0.1 localhost ==== Security ==== Original design had no security. Allowed DNS spoofing and flooding. DNSSEC and root signing were created as solutions to this. ===== Email ===== ==== Overview ==== - Submission - Sender's mail agent sends message to message transfer agent - Message Transfer - Message is sent with SMTP to a different message transfer agent where it sits in the mailbox - Final Delivery - Receiver fetches message from mailbox User agent organises messages and formats them such that they can be sent. Message consists of a header (from, to, subject, etc.) and body. === Headers === User available: To, Cc, Bcc, From, Date, Subject. Hidden: Message-Id, In-Reply-To, References, Reply-To, Sender, Return-Path, Keywords, Received. ==== SMTP ==== Simple Message Transfer Protocol. Uses port 25, with direct transfer. Transfer consists of a handshake (greeting), transfer of message and closure. Messages are 7 bit ASCII. ==== MIME ==== Multipurpose Internet Mail Extensions. Adds more features to mail. Adds additional message headers: * MIME-Version * Content-Description * Content-Id * Content-Transfer-Encoding * Content-Type === Content Types === ^ Type^ Example Sub-types^ Description^ |text|plain, HTML. XML, CSS|Text in various formats| |image|GIF, JPEG, tiff, png|Picture| |audio|basic, MPEG, mp4|sounds| |video|MPEG, mp4, quicktime|movies| |model|vrml|3d model| |application|octet-stream, PDF, zip|data from application| |message|HTTP, RFC822|encapsulated message| |multipart|mixed, alternative, parallel|combination of types| === Message transfer access === Transfer with SMTP. Delivery though local, POP3 (Post Office Protocol), IMAP (Internet Mail Access Protocol), HTTP (web-mail). == POP3 == Transaction consists of authorisation, transaction, update. Syntax consists of USER/PASS, LIST, RETR/DELE, QUIT (update). Retrieving mail causes it to delete on the remote server. == IMAP == Keeps state across sessions. Retains mailbox on server and remotely accesses it. Requires larger server infrastructure. ===== Streaming ===== 40% of web traffic. Utilising sockets over ports 80 and 443. Doesn't require client to keep requesting (unlike HTTP) ====== Layers ====== ===== Presentation Layer ===== OSI layer 6. Provides: * Encryption * Compression * Data conversion (e.g. CR/LF to LF) * Mapping between character sets (ASCII to UTF8) Now done by applications. Don't apply evenly to all applications, so IETF groups them into application layer. ===== Session Layer ===== OSI Layer 5. Provides: * Authentication * Authorisation * Session restoration Examples: * Remote procedure call * Point-to-point tunnelling protocol (PPTP) * Password(/Extensible) Authentication Protocol (PAP/EAP) Often used between protocols in layer 2/3 ===== Transport Layer ===== ==== Overview ==== Sends data to the right application and makes sure all the data arrived. Sits between the application and network layers. Can provide a reliable service on top of an unreliable network. Encapsulates message, message gets frame, packet and segments headers in addition to payload. Terminology * Segments - transport layer * Packets - network layer * Frames - data link layer Addresses using ports in a full 5-tuple. * Source IP Address * Source Port * Destination IP Address * Destination Port * Protocol Ports are 16 bit numbers (0-65535) ==== UDP ==== User Datagram Protocol. Connection-less communication, header followed by payload. Headers contain source and destination ports, payload handled by process attached to port. Uses a 32 bit header containing source and destination addresses, message length and a pseudo-checksum. Strengths * Multiplexing/de-multiplexing * No delay waiting to recover lost packets Weaknesses * No control flow, error control, re-transmission of bad segments Good for simple client-server communication and real time services. UDP can be good for Remote Procedure Calls when call is idempotent, TCP for non-idempotent calls. RPC allows calling of procedures on a remote machine as if they were on the local machine. Real-Time Transport Protocol (RTP) is application layer, but uses UDP for transport. Also provides services to applications so is semi-transport layer also. Multiplexes several streams into a single UDP stream. Has a header that includes payload type, segment number and **timestamp**. ==== TCP ==== Transmission Control Protocol. === Primitives === ^ Primitive^ Packet Sent^ Meaning^ |LISTEN|(none)|Block until something tries to connect| |CONNECT|CONNECTION REQ|Actively attempt to establish a connection| |SEND|DATA|Send information| |RECEIVE|(none)|Block until DATA packet arrives| |DISCONNECT|DISCONNECTION REQ|This side wants to release the connection| === Overview === Provides a connection oriented protocol over connection-less network layer. Providing a connection based protocol increases reliability. Breaks down a message into separate IP datagrams and reconstructs them at the other end. Sends using sockets named by the 5-tuple. Needs an explicit connection establishment by sending and receiving hosts. == Features == * Full duplex Data in both directions simultaneously * End-to-end Exact pairs of senders and receivers * Byte streams, not message streams Message boundaries not preserved * Buffer capable TCP entity can choose to buffer prior to sending, unless PUSH or URGENT flags are set. == Properties == 20-60 byte header, zero or more data bytes. TCP entity decides how large segments are subject to IP payload (< 65515 bytes) and Maximum Transfer Units (~1500 bytes). Uses a sliding window protocol for reliable data delivery and congestion control. == Header == ^ Name^ Description^ |Source port|Sending port| |Destination port|Receiving port| |Sequence number|If SYN=1: initial number, if SYN=0: accumulated number of first data byte in segment| |Acknowledgement number|If ACK=1: next number that sender of ACK is expecting| |Data offset|Size of TCP header (20-60 bytes)| |Flags|Single bit flags (SYN, ACK, RST, FIN, etc)| |Window size|Size of receive window, how much sender is willing to receive| == Three way handshake == Establishes a reliable connection, even if set up packets get lost and initial sequence for sliding window. Both sides agree with strategy before submitting segments. -Synchronisation SYN for synchronisation during establishment. SYN or FIN increments sequence number. Sequence number is offset by a random number, where only the offset matters. Acknowledgement number is the next byte the sender expects to receive. Bytes are received without gaps, needs to wait for segment, even if later segments are received. Request is SYN=1, ACK=0 Reply is SYN=1, ACK=1. SYN used in both request and accept, ACK distinguishing between the two. After setup sequence number with the first byte of payload is sent with the acknowledgement number being the next byte sender expects to receive. == Re-transmission == Each segment has an associated re-transmission timer (RTO) Initialised with a default value and updated based on network performance. If timer expires before an ACK is received, next segment is sent. If the receiver gets a segment with a higher than expected sequence number (segment lost), sends ACK with sequence number expecting. This is a duplicate of previously send ACK (DupACK). With 3 DupACK, sender resends lost segment, known as //fast re-transmission//. == Closing == FIN flag used to close connection. Once acknowledged, no further data can be sent from sender to receiver, but data can continue in the other direction. Will continue to re-transmit unacknowledged segments. Typically requires a FIN and ACK in each direction to close connection, although can be optimised down to 3 if receiver sends FIN with its ACK. RST can be used to hard close a connection. States sender is closing and will not listen for further updates. Sent in reply to a packet to a 5-tuple without an open connection. FIN is preferred, as shows an orderly shutdown. ====== Sockets ====== Sockets provide an application an interface to the kernel for networking. Sockets addresses are denoted by the 5-tuple (protocol, source-IP, source port, destination-IP, destination port). ===== Primitives ===== ^ State^ Description^ |SOCKET|Creates a new communication endpoint| |BIND|Associate a local address with a socket| |LISTEN|Announce willingness to accept connections; give queue size| |ACCEPT|Passively establish an incoming connection| |CONNECT|Actively attempt to establish a connection| |SEND|Send some data over a connection (write())| |RECEIVE|Receive some data from the connection (read())| |CLOSE|Release the connection| ===== States - low level ===== ^ Order^ State^ Description^ |1|CLOSED|No connection is active or pending| |2|LISTEN|The server is waiting for an incoming call| |3|SYN RCVD|A connection request has arrived; wait for ACK| |3|SYN SENT|The application as started to open a connection| |4|ESTABLISHED|The normal data transfer state| |5a|FIN WAIT 1|The application has said it is finished| |6a|FIN WAIT 2|The other side has agreed to release| |7|TIME WAIT|Wait for all packets to die off| |6b|CLOSING|Both sides have tried to close simultaneously| |5b|CLOSE WAIT|The other side has initiated a release| |6c|LAST ACK|Wait for all packets to die off| ===== Sockets in C ===== //Headers #include #include #include //Variables int listenfd = 0, connfd=0; char sendBuff[1025]; struct sockaddr_in serv_addr; //Creation listenfd = socket(AF_INET, SOCK_STREAM, 0); //create socket memset(&serv_addr, '0', sizeof(serv_addr)); //initialise server address memset(sendBuff, '0', sizeof(sendBuff)); //initialise send buffer serv_addr.sin_family = AF_INET; //type of address - internet IP serv_addr.sin_addr.s_addr = htonl(INADDR_ANY); //listen on ANY IP Addr serv_addr.sin_port = htons(5000); //listen on port 5000 //Bind and listen bind(listenfd, (struct sockaddr*)&serv_addr, sizeof(serv_addr)); listen(listenfd, 10); //maximum number of client connections to queue //Accept, send, close connfd = accept(listenfd, (struct sockaddr*)NULL, NULL); snprintf(sendBuff, sizeof(sendBuff), "Hello World!"); write(connfd, sendBuff, strlen(sendBuff)); close(connfd); //Connect connect(connfd, (struct sockaddr*)&serv_addr, sizeof(serv_addr)); //Receive while ((n = read(connfd, recvBuff, sizeof(recvBuff) - 1)) > 0 ) { //process received buffer } ===== TCP Sliding Window ===== Controlled by receiver. Determines amount of data the receiver is able to accept. Buffers are maintained separately to the application without guarantee that data is immediately sent or read from the respective buffers. When the window is 0, the sender shouldn't send any data. * Can send URGENT data * Can send "zero window probe" 0 byte segment that causes the receiver to re-announce the next expected byte and window size. Designed to prevent deadlock. Sender may delay sending in order to fill more of the receive window. * Send window What the sender is able to send * Receive window Amount of data the receiver is willing to receive - window size in ACK. ====== Git ====== Distributed version control software. Used to track and control changes to files over time. Commits are stored as nodes in a DAG. Has a working directory (top level folder with single version of files), staging area (files to be added in the next commit, stored in .git) and .git directory (configuration and repository database). Takes snapshots of file system as opposed to deltas, allows referencing of duplicate files to save space. Allows for easy and cheap branching. Can use delta compression for space saving. Uses a key-value database with SHA1 hashes as the key. ''git checkout'' can be used to switch to a particular commit/branch, needs an unambiguous hash prefix. The HEAD value denotes the currently checked out commit. Recommended workflow is to branch and merge in projects. ''git push'', ''pull'' and ''clone'' are to interact with remote directories, receiving changes, sending changes and getting repositories respectively. Tags can be used to give names to important commits, like release versions. ====== TCP congestion control ====== ===== Re-transmission ===== If a packet receives a packet after the one it was expecting, it sends an ACK for the one it was expecting. If it continues to receive later packets, it continues sending ACKs. If the original ACK and 3 duplicate ACK (dupACKs) are sent, a Fast Re-transmission is triggered and the original expected packet is sent. ===== Lost packets ===== To deal with lost packets, originally "Go-back-N" was used, this caused all the packets from the lost one onward to be re-transmitted. This has the advantage of the receiver not needing to store/reorder packets. The alternative is "Selective Repeat", where only the lost packet is re-transmitted and the packets arrive out of order. Selective Repeat is more complex and helps where loss is common. ===== Window size ===== Once the window is full, a deadlock could occur where the window size may change but the server may not know. One solution is for the receiver to send an WindowUpdate, acknowledging the last packet and the new window size. Alternately, when the sender receives a window size 0, it can start a timer and periodically send ZeroWindowProbes to find the window size. ===== Congestion Control Window ===== Introduced with selective repeat. Additionally window dynamically adjusted based on network performance. It is maintained by the sender, unlike the sliding window controlled by the receiver. ==== Slow start algorithm ==== At first CWND sends 2*maximum segment size. If the segment is acknowledged, CWND++. Each full window doubles CWND, which continues until * a timeout * it reaches a threshold, SSthresh This achieves exponential growth over each round trip time. Eventually too many segments will be placed on the network, causing congestion and timeout. If segment loss occurs, SSthresh is set to half the current window size and the process starts again. Once the threshold is reached, the growth is slowed to linear, adding 1 MSS to the window for each successful ACK. Now we start from the new SSthresh instead of the original starting value. ==== Further optimisations ==== SACK (Selective Acknowledgements) provides the ability to track segments in flight by adding later received packets in SACK. ECN (Explicit Congestion Notification) (ECE & CWR bits set during SYN) * Allows IP layer to indicate congestion without dropping a segment by setting an ECN flag * Receiver indicates to sender by ECE (ECN Echo) flag * Sender acknowledges by setting Congestion Window Reduced Flag (CWR), reacts as if a segment has been lost without actually losing one ===== Macroscopic changes ===== * Fairness between flows * Response to long round-trip times (RTTs) * Response to random packet loss ==== Window size ==== W increases once per window. When a loss occurs, W is halved. With probability p of a loss, the average increase in window size is \[\frac{1-p}{W}-p\frac{W}{2}\] To be in equilibrium, the average increase must be zero, hence \[W\approx\sqrt{2/p}\] ==== Rate, fairness ==== The window is sent <= once per RTT, T. Rate is \(\frac{W}{T}\approx\frac{1}{T}\sqrt{2/p}\). For a given loss rate, longer RTT get less rate. If RTT is small, TCP forces the loss rate to be high. ====== Address switching ====== ===== Network Layer ===== The Internet (network) layer is responsible for getting data from the source to the destination. Traffic must be routed effectively. There are 2 types of services on the network layer * Connection-less * Packet switching (IP) * Minimum required service: "send packet" * "Datagram network" * Connection-oriented * (Virtual) Circuit Switching * Asynchronous Transfer Mode - ATM * Multi-Protocol Label Switching - MPLS * "Virtual Circuit" network * Usually single link of an IP network ===== Store and forward packet switching ===== This is like the internet. Host H1 sending packet to H2 - Transmits to nearest router - Packet is buffered while arriving and checksum verified - If valid, packet is stored until the next outgoing interface is free - Router forwards packet to next router in path - Repeat Connection-less routers store forwarding tables (also called routing tables) on how to get to other routers which are dynamically updated. Connection-oriented routers store in and out tables for sending all packets along the same route. ===== Connection-oriented vs connection-less ===== ^ Issue^ Datagram Network^ Virtual Circuit^ |Type|Connection-less|Connection-oriented| |Addressing|(-) Each packet has full source and destination|(+) Each packet contains a short VC number| |State|+ Routers do not hold state information about connections|- Each VC requires router table space| | | |- Router reboots a problem| |Routing|Each packet independent|Defined at set-up| |Quality of Service|- Difficult|+ Easy if enough resources| |Congestion control|- Difficult|+ Easy if enough resources| ===== Multi-Protocol Label Switching ===== Primary purpose is Quality of Service. * Prioritising traffic * Service Level Agreements for performance * Reliable connectivity with known parameters Expensive, roughly 20-100 times more per Mbps. ===== IPv4 ===== 32 bit address. IP addresses are given to interfaces, not hosts. Supply of addresses exhausted. ==== Types of addresses ==== * **Unicast:** single destination * **Broadcast:** send to everyone * **Multicast:** send to particular set of nodes * **Anycast:** send to __any one__ of a set of addresses * **Geocast:** send to all users in geographic area * ... ==== Classes ==== No longer used but spoken of ^ Class^ Bits^ Range^ |A|0|1.0.0.0 to 127.255.255.255| |B|10|128.0.0.0 to 191.255.255.255| |C|110|192.0.0.0 to 223.255.255.255| |D|1110|224.0.0.0 to 239.255.255.255| |E|1111|240.0.0.0 to 255.255.255.255| Class D is for multicast. Class E is reserved for future use. ==== CIDR ==== Reduces waste from classes. Specifies number of bits for the network field. Network in top bits, host in bottom. Can be written as a subnet mask of 1's, e.g. /24 is 255.255.255.0. Network number = network mask (bit-wise AND) IP address. Allows for automatic aggregation, reducing the size of the routing table. Overlapping prefixes are eliminated by longest prefix first. ==== Special IP addresses ==== ^ Private range^ Prefix and mask^ number of addresses^ |10.0.0.0 - 10.255.255.255|10.0.0.0/8 (255.0.0.0)|16,777,216| |172.16.0.0 - 172.32.255.255|172.16.0.0/12 (255.240.0.0)|1,048,576| |192.168.0.0 - 192.168.255.255|192.168.0.0/16 (255.255.0.0)|65,536| ^ Address^ Use^ |0.0.0.0|Placeholder| |000Host|Host on current network| |255.255.255.255|Broadcast local network| |Network1111|Broadcast distant network| |127.*.*.*|Loopback| ===== IPv6 ===== 128 bit address. Simpler header, improved security, more QOS support. ===== Subnets ===== Using prefixes within an organisation to allocate IP addresses. Allows packets to be sent without the router knowing all the hosts. ===== Network Address Translation (NAT) ===== Hides multiple interfaces behind a single address. Changes a port number when leaving the NAT. ==== Steps ==== * Assumes TCP or UDP in source and destination ports * Replaces TCP source address with public IP address * TCP port replaced by index of entry in NAT translation table * IP and TCP check-sums recalculated * When packet arrives, NAT looks up destination from port in TCP header. ==== Criticisms ==== * Violates IP architectural model where every interface has a unique IP address * Breaks end to end connectivity as only outgoing connections can be made * Changes the internet from connection-less to partly connection-oriented * Violates the layer model by assuming payload contents only TCP or UDP * Causes problems with FTP and other protocols that use multiple connections * Limits number of outgoing connections Still widely used. Can provide security by shielding from unsolicited incoming connections. ===== Fragmentation ===== IP packets have maximum size of 65,535, determined by 16 bit length header. Network links can't handle such sizes. Lower layers need to break up (fragment) packets. Fragmentation can be due to * Hardware * OS * protocols * standards * reducing transmissions due to errors * efficiency in communication channel Hosts like larger packets, reduces workload. When a packet transits, maximum packet size may change. e.g. Ethernet is 1500 bytes and WIFI is 2304 bytes. The Maximum Transmission Unit (MTU) is the maximum size for a network. The path MTU is the maximum size for a path through the network, equal to the minimum MTU on each link. Due to the connection-less nature, can't set MTU at sender as path can change. Instead, routers can break packets to be sent individually. Reassembly is difficult. * Transparent Fragmentation Reassembly is performed at the next router, routers unaware of fragmentation. * Nontransparent Fragmentation Reassembly is performed at the destination host. Uses DF (Don't Fragment) and MF (More Fragments) flags. MF=1 for all fragments but last. Fragments have an offset according to size, allowing for packets to be reconstructed out of order in a buffer. Simple approach, but large reassembly cost and a single lost fragment causes entire packet retransmission. Alternative is path MTU discovery, where each packet is sent with DF=1 and router sends an ICMP (Internet Control Message Protocol) to sender if packet is too large. This may cause initial packets to be dropped but hosts can learn optimal size quickly and reduce subsequent fragmentation. TCP and IP are often implemented together to reduce fragmentation. IPv4 allows for nontransparent fragmentation or path MTU discovery with a minimum of 576 bytes. IPv6 expects hosts to find optimal MTU without fragmentation with a minimum size of 1280 bytes. ICMP messages can be dropped, causing low volume passes and high volume fails, if in doubt send minimum size. ====== Routing ====== ===== Forwarding ===== Each router has a forwarding table which maps addresses to outgoing interfaces. Upon receiving a packet: * Inspect destination IP address in header * Index into table * Determine outgoing interface * Forward packet out to that interface This repeats on every router in the path to the destination. ===== Routing Algorithm ===== Runs on each router. A good algorithm needs * Correctness * Simplicity * Robustness * Stability * Fairness * Flexibility An algorithm depends on what it being optimised (packet delay or throughput). The simplest approach is to minimise the number of hops a packet has to make. This reduces packet bandwidth and improves delay, also hopefully reduces physical distance travelled. The actual algorithms assign a cost to each link. Static routing doesn't adapt to network topology. The table is calculated offline and uploaded to the router on boot, resulting in it not responding to failure. This is reasonable where there is a clear or implicit choice. Adaptive routing dynamically changes to network topology and potentially traffic levels. It may optimise some property: distance, hops, transit time, etc. It may get information from adjacent routers. ===== Flooding ===== This is the simplest adaptive routing. It: * Guarantees shortest distance and minimal delay * Useful benchmark * Extremely robust * Highly inefficient * Has to have a way of discarding packets It is when a router sends a packet along all its links, which results in every router having received the packet at least once. ===== Adaptive Routing ===== Utilises the optimality principle in route planning (A subset of a shortest path must also be a shortest path). This can be used to reduce a graph to a tree, involving shortest path algorithms like Dijkstra's. ===== Link State Routing ===== Replaced Distance Vector Routing (Bellman-Ford) that had problems converging quickly enough. Variants still used today. Relatively simple: - Discover neighbours - Set cost to neighbours - Construct packet containing knowledge - Send packet to and receive from all other routers - Compute shortest path to each router Costs can be manually or automatically set, commonly using bandwidth. Building packets is easy, deciding when to build presents problems. Flooding is used to send to all other routers. ===== Border Gateway Protocol (BGP) ===== Used to route between networks. Needs to consider politics, opposed to internal networking. * Companies not willing to have their network used by others * ISPs not wanting other ISPs' traffic on their network * No commercial traffic on academic networks * Use one provider over another due to cost * Don't send traffic through certain companies or countries Routes may be better in some respects, worse in others. Difficult to optimise. ===== IP Multicasting ===== One to many communication, useful for streaming live content or updating. Uses D class addresses. Hosts join groups at the request of processes. Routing trees are constructed through multicast algorithms. These help prevent security risks through amplification attacks. Often deployed in organisational networks, not internet as not proven to scale to that extent. ===== Congestion control ===== Occurs when they're too many packets on the network. Causes performance to plummet. Ideally would reach saturation and remain there. Different to flow control, which is between hosts. Both have the same solution, which is to slow sending rate. In order from slowest to fastest, the following are solutions to congestion * Network provisioning Add more physical capacity, slow and expensive but will always work. * Traffic-aware routing Temporarily reroute traffic around congested areas, however eventually all routes become saturated. * Admission control Used on virtual circuits to control who can place traffic onto the route * Traffic throttling Reduce sending rate * Load shedding Effective for congestion, bad for network utility. ==== Explicit congestion control ==== Use of ECN flags in IP header. * **00:** Not ECN-capable * **01 or 10:** ECN-capable * **11:** congestion experienced If the receiver gets a packet marked as experiencing congestion, it echoes to the sender with the ECE bit set. The sender then reduces its transmission rate and sets the CWR (Congestion Window Reduced) bit in acknowledgement. On receipt of CWR set, receiver stops sending ECE bit. This is closely linked with TCP, blurring the lines between the two. ===== Internet control protocols ===== These use the internet to manage functionality. * **ICMP:** Internet Control Message protocol * **DHCP:** Dynamic Host Configuration Protocol * **ARP:** Address Resolution Protocol ==== ICMP ==== Used in PING, but can do more. ^ Message type^ Description^ |Destination unreachable|Packet could not be delivered| |Time exceeded|Time to live field hit 0| |Parameter problem|Invalid header field| |Source quench|Choke packet| |Redirect|Teach router about geography| |Echo and echo reply|Check if a machine is alive| |Timestamp request/reply|Same as echo, but with timestamp| |Router advertisement/solicitation|Find a nearby router| Traceroute is an example that exploits the TTL to check the route being used to send a packet. Each packet is sent with an incremented TTL, meaning they hit 0 on successive routers, revealing their IP address. ==== DHCP ==== Each host must have a unique IP address (excluding NAT). Manually configuring each host is difficult to do and slow to respond. DHCP automatically allocates IP addresses. A network needs a DHCP server for issuing the addresses. A host sends a DHCP DISCOVER packet which the server receives and responds with a DHCP OFFER containing an IP address. IP addresses contain a lease, after which they will be reclaimed by the server and reissued. Hosts can request renewal before the lease expires. Alternately IP can be tied to a layer 2 address. ==== MAC (layer-2) address ==== Layer 2 addresses incorrectly called MAC (Medium Access Control) due to old "shared medium" layer 2 networks. An address is a globally unique identifier usually hard-coded by the manufacturer. They can be between 48 and 64 bits long. Used at the host-to-network/data link layer, sometimes called a physical address. ==== ARP ==== Translates addresses between the internet layer and physical network layer (IP to MAC). Works by broadcasting a packet asking who owns an IP address and the owner responds with its MAC address. It is a simple, inefficient protocol. It has many security issues: * No authentication * Caching of responses * ARP spoofing is the gateway for man-in-the-middle attacks * Provides a way of intercepting and spoofing ARP messages to associate the attacker's MAC with another host's IP. ====== Layer 2 ====== Modern Ethernet like IP and old Ethernet like WiFi. Old Ethernet is a shared medium which just broadcasts packets and packets can collide if broadcast at the same time. The packet format/services are the same so software doesn't need to change. ====== Process management ====== ===== Operating system ===== Provides an abstraction of hardware resources and manages these resources. Provides the concept of: * Processes (running program) * Address spaces (memory) * Files and file system * Input and Output * Protection (manage security) ===== Processes ===== A process is a program in execution. Programs are static, processes are dynamic. The state of a program consists of the code or "text" of the program, the values of the variables, the address of the current instruction and some other external data (e.g. current directory). Each process has its own address space. Most process a user deals with are created by the user, however some are provided by the OS. An example of a system process is a print daemon. ==== Multi-programming ==== Conceptually each process has its own CPU. In reality each process runs for a short period of time and take turns. Multi-programming increases system efficiency, if a process is waiting for something another can run in the meantime. ==== Process creation ==== Four things need to happen for a process to be created: - System initialisation - Execution of a process creation system call by a running process - A user request to create a new process - Initiation of a batch job ==== Process termination ==== Typical conditions which terminate a process: - Normal exit (voluntary) - Error exit (voluntary) - Fatal error (involuntary) - Killed by another process (involuntary) ==== Process states ==== The lowest layer of a process-structured operating system handles interrupts and scheduling. Processes can occupy the following states: - Running (using the CPU at that instant) - Ready (runnable, temporarily stopped while another process is running) - Blocked (unable to run until some external event happens) ===== The kernel ===== Responsible for managing processes. Runs as the privileged part of the operating system. Also provides services such as IO. Not a process in itself. Most programs run in an unprivileged mode, and can not issue privileged instructions or access certain parts of memory. The kernel can issue all instructions and access all memory. Instructions and memory locations which could interfere with other processes are privileged. This distinction is the foundation needed by the kernel for its security mechanisms. The privilege level is stored in hardware, given by the program status word (PSW) register. ==== System calls ==== System calls allow user processes to ask the kernel to execute privileged instructions and access privileged memory on their behalf. The OS checks these requests before execution, preserving security. To make these more convenient, system calls will typically do several privileged things in one logical operation. ===== Interrupts ===== Used by a hardware device to get attention from the CPU. Asynchronous with current process. When an interrupt occurs, CPU hardware saves process' current state in a privileged location until the interrupt is handled. The CPU handles the interrupt in kernel mode. The interrupt vector is the address of the interrupt handler. The handler * Saves the status of the current process * Services the interrupt * Restores what was saved * Executes a return from interrupt There may also be pseudo-interrupts which come from the CPU, rather than external hardware. User programs may generate pseudo-interrupts, such as division by zero. Pseudo-interrupts can also be intentionally generated by certain instructions, e.g. TRAP. ===== Threads ===== Threads are a sequential execution stream within a process. They can be considered a lightweight process. Threads can communicate with each other without the kernel, they share global variables (address space) and dynamic memory. Threads have their own program counter, registers, stack and state. A process can have many threads working independently. This is advantageous as they have less overhead compared to processes. The problems involved with threading involve concurrency. As such things like race conditions and deadlocks arise with threads. ==== Pthreads ==== POSIX standard API for threads. Global variables are shared across threads. Threads can switch at any time and need to be synchronised. ==== User-space threads ==== Threads can be implemented in user or kernel spaces. This means a process can manage its threads or the kernel can. User-space has less overhead but can be blocked by blocking system calls. ===== Race conditions ===== When multiple threads access the same object at the same time. Output depends on order of execution. An idea to eliminate this is to create a critical region where only one thread can access at a time. Requirements for a critical region: - No two processes may simultaneously be in their critical regions - No assumptions on speed or number of CPUs - No process running outside critical region should block other processes - No process should have to wait forever to enter critical region Shared variable locks can be implemented to guard the critical region. Methods for avoiding race conditions include: * Disabling interrupts * Strict Alternation * Test and set lock * Sleep and wakeup * Other: Semaphores, Monitors, Message passing Their implementations are * Busy waiting * Blocking ==== Strict alternation ==== The threads have a global turn counter that they use to determine which one can access the critical region. Once a thread has accessed the critical region, it sets the counter to the other thread. The thread then continues until it needs the critical region again and waits until it's its turn again. ==== Test and set lock ==== Most CPUs have an instruction to atomically test and set. This can be used to check whether to enter the critical region. ==== Busy waiting ==== If a process wants to enter a critical region, it checks if entry is allowed and if not it executes a loop, checking if it is allowed. This can waste CPU and can starve low priority processes. ==== Blocking ==== Attempt to enter a critical region, if fails register interest and block. When the critical region is available, the OS will unblock and the process will continue. ==== Deadlock ==== Occurs when two threads are waiting for an event that only the other can cause. To deal with deadlocks we can: - Ignore the problem - Detect the lock and try to recover - Avoid by careful resource allocation - Prevention ===== Scheduling ===== Ideally the CPU would be used all the time. The problem is deciding which process should run and for how long. Scheduling depends on whether the process is CPU bound or IO bound. It also depends on whether the process is batch, interactive or real time. The scheduler should aim for: * Fairness All processes get a fair share of CPU * Efficiency Keep CPU at 100% utilisation * Throughput Number of process/time * Turnaround time Time from process start to completion * Response time Time from request submitted to response produced A scheduling algorithm can be non-preemptive (process runs until it finishes or blocks) or preemptive (process can be suspended after some time). Context switching takes time, need to save process state. Various algorithms exist: * First-come first-served * Shortest process first * Shortest remaining time next * Round-robin * Priority * Multiple queues * Guaranteed * Lottery * Fair-share Difficult to build a general scheduler. ====== Memory Management ====== Process memory has three segments: - Text (program code) - Data (constant data, global variables) - Stack (local variables) ===== Hierarchy ===== Memory needs to be: * Fast * Cheap * Large * Non-volatile In reality there are various types of memory with different properties. There is a hierarchy of memory, with the largest capacity tending to be the slowest at the bottom (disk) and the fastest and smallest at the top (registers). In between there are caches and main memory. ===== Caches ===== Caches have fast random access and are managed by hardware. There can be many levels of caches, with the largest tending to be the slowest. Caches store frequently accessed data. A cache hit is when the wanted data is found in the cache, and a miss is when it isn't. There are problems in deciding what and when to move data in and out of cache. ===== Memory manager ===== Hardware responsible for * Keeping track of which parts of memory are free * Allocating memory * Freeing memory when no longer used * Protect memory against unauthorised access * Simulate appearance of larger memory by moving between memory and disk Without a memory manager, programs can access each other's memory posing security problems. Bitmaps can be used to keep track of what is in use and what is free. This is slow in looking for free space for new programs. Linked lists can be used instead, making free space faster to find. Allocation has various algorithms to decide how to allocate: * First fit * Next fit * Best fit * Worst fit * Quick fit ==== Fragmentation ==== Fragmentation is when there non-consecutive memory is used. Data can be swapped between memory and disk in order to decrease memory usage. This can allow the program to load what it needs into memory at a given time, but adds extra complexity to the program. ==== Virtual Memory ==== Allows each program to have its own address space and think it has access to more memory than exists. Breaks up memory into chunks called pages. Virtual pages sit on physical pages and can be moved between physical pages (page frames). As a full page is rarely needed, paging wastes some space for each process, called internal fragmentation. To map a virtual page to a frame, the offset needs to be found and the frame number needs to be found. This then gives the physical location. If a given page can't be found, a page fault exception is generated. === Translation Look-aside Buffer === Without optimisation, each memory reference would need two accesses to memory, one for page table and one for data. To avoid this, we can use a hardware unit to cache the table. ==== Page faults ==== The process can crash if it requests an invalid page. The OS can also suspend the process and create a new page, then rerun the instruction. ==== Page replacement algorithms ==== * Optimal * Not recently used * First-in first-out (FIFO) * Second-chance * Clock * Least recently used (LRU) * Working set * WSClock ====== Cryptography ====== With the exception of the One Time Pad, there is not perfect cryptography algorithm. The aim is to make an attack take sufficiently long that the data is no longer useful. The two main types of cryptography are symmetric, where the same key is used for encryption and decryption, and asymmetric, where two keys are used, one to encrypt and one to decrypt. ===== Symmetric encryption ===== A modern example is AES (Advanced Encryption Standard). It is useful if you have some way to securely share the key with those you want to be able to decrypt. It is useful for keeping your own data secure, such as for full disk encryption. AES works by breaking data into blocks. ==== ECB - Electronic Code-book ==== Simple, parallelisable but doesn't provide diffusion and may not provide confidentiality. Shouldn't be used. Same plain-text encrypts to same cipher-text. Repeated patterns will be evident in output but at a lower fidelity. ==== Probabilistic encryption ==== Won the 2012 Turing award. Provides a secure notion of encryption by ensure the same message encrypted twice does not provide the same cipher-text. ==== CBC - Cipher Block Chaining ==== Uses a random, non-reusable initialisation vector. Each block is encrypted sequentially using part of the previous block's cipher-text. Decryption can be done in parallel. Loss or corruption of a block prevents decryption of all subsequent blocks. ===== Asymmetric cryptography ===== Commonly called public key cryptography. Consists of a public key and a private key kept by the user/owner. The public key can be received over the internet, making there no need for a secure key exchange. Asymmetric cryptography is slower than symmetric and had size limits. As a result it is often used with symmetric cryptography as a way of securely exchanging a joint secret key. ===== Public key signatures ===== Links a key holder to a file. Verify with a publicly available key to prove signer. Large documents have a hash signed instead of signing the whole file. The hash needs to appear random, be hard to invert and have high collision resistance. Effectively a hash is a special lossy compression function. ===== Certificates ===== Digitally signed documents proving ownership. In reality the signer is a Certificate Authority (CA) who is already trusted. CA's are often a third party. Certificates can be chained, A signs B who in tern signs C. This means that as A trusts B and B trusts C, A trusts C. The top level of the hierarchy are the trust anchors, who are explicitly trusted. The anchors have self-signed root certificates which they use to sign for others, such as some CA's. Trust anchors can also be explicitly coded (ssh, application specific) or build (PGP web of trust). ==== Types ==== * Domain Validation * Most common type * Ties a certificate to a domain and checks requester has control over domain * Validation via email/DNS/URL - possible weakness * Organisation Validation * Ties a certificate to a domain and legal entity * Extended Validation * Establishes legal entity, jurisdiction and presence of authorised officer * Offline process, expensive DV certificates do not establish a link between domain and real world entity. ==== Revocation ==== Revocation occurs when a certificate is mistakenly issued or a private key is compromised. Performed via Certificate Revocation Lists and Online Certificate Status Protocol (OCSP). When a root certificate is revoked, all certificates below them become untrusted. ===== Key sizes ===== Dependent on the degree of security required. Asymmetric is generally longer than symmetric. RSA - 2056 bits, AES - 128 bits. These keys used to be shorter but grow with computational power. ===== Randomness ===== Needed for key generation, IV generation, Padding. Pseudo-random generators are often used. Randomness must never be reused or discover-able It is difficult to generate good randomness as it must be unpredictable. The random generation varies by OS. ====== Secure communication ====== Aim to provide secure, private communication between two end points, with integrity checks to ensure data does not change in transit and authentication to establish identities of end points. ===== CBC Tampering ===== An attacker can reorder ciphertext or flip bits. Every possible ciphertext corresponds to some valid plaintext. Alters the sent message. ==== Message Authentication Code (MAC) ==== Used to identify if a message has been tampered with. Uses a key, k which can be used with the message to test the message. \[t\gets\text{MAC}(k,m);b\gets\text{Verify}(k,m,t)\] b is 0/1 indicating verification status. An adversary can not create m', t' such that the verification returns b=1 for an unknown message. HMAC is the industry standard which is widely used in practice. \[t\gets\text{Hash}((k\oplus \text{opad}) || \text{Hash}((k\oplus \text{ipad}) || m))\] Here, ipad and opad are fixed constants used for padding. Generally MAC is used on the ciphertext. This is done so that decryption does not need to be done if MAC fails. ===== Diffie-Hellman Key Exchange ===== Fundamental to protocols that rely on TLS, such as HTTPS and SSH. Provides perfect forward secrecy, exposure of keys does not compromise past sessions. Sends information so that both parties can calculate shared key without sending shared key. Uses a public prime, \(p\), and a generator, \(g\). Parties pick a random value, \(x\), and compute \(X=g^x\mod p\). Parties then exchange their \(X\) values, \(X'\). The parties then calculate the secret \(s=X'^x\mod p = g^{xy}\mod p\). Both parties now have a shared secret without exchanging said secret. It is considered hard to solve the discrete log (find \(x\) after calculation). Provided both parties discard their secrets, even if one of them loses their private key the past communication cannot be decrypted. ===== TLS ===== Uses a handshake protocol to establish shared keys then uses a record protocol with the secret keys for data exchange. Requires unencrypted communication to establish common protocols and to share public keys. Can create a local CA and use for man-in-the-middle attacks. Allows for the transparent interception of TLS communication. Can also be used by antivirus/web protection software. Same can be said of proxies and load balancers. ====== System security ====== Gain access to a system to: * Access health records/financial information/personal data * Create a bot * Fraud * Espionage * Show off An attack can be passive (observing traffic and other activities) or active (changing content). ===== Access Control Lists ===== Put on files to restrict who can access them. Unix-like systems have three sets, owner, group, world. Rights determine read, write and execute control. ===== Specialised hardware ===== Can use things like: * Encrypted hard drive * Trusted Platform Module * Secure processors: * Intel SGX * ARM TrustZone ==== Encrypted hard drive ==== Transparently encrypts/decrypts content. The disk driver has an encryption engine. Can be combined with Trusted Platform Module (TPM) to allow decryption only with a certain machine. Data Encryption Key (DEK) is encrypted with Authentication key. Data is encrypted with DEK. ==== Trusted Platform Module ==== Cryptoprocessor which stores secret keys and performs some cryptographic operations. Can isolate process code and memory from the rest of the system. Can encrypt all access to hardware from a process. ===== Code signing ===== Can sign code with an encrypted hash of the program to ensure integrity. ===== Covert Channels ===== Can be used by a process to gather information about another isolated process. Can yield information about process runtime activities. Often things like file locks. ===== Stenography ===== Encodes data into another medium such that the original medium is nearly unrecognisably unchanged. Commonly done in images. ===== Side channels ===== Observations about running processes based on shared resources. Analysing things like memory, caches, page faults, timing, power analysis. Can gain information about encrypted data. Often due to inadvertent leakage. ====== Future of Computing Systems ====== Moving towards more portable, commoditised hardware. There is also the rise of online services. Historically there were large multi-user systems acting as servers, developments such as virtual machines and containers are enabling this workflow again. Software as a Service (SaaS), Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) provide varying levels of remote computing functionality. The moving of applications to a cloud shifts the responsibility for maintenance to the vendor. Rented infrastructure is often virtualised, meaning the user is responsible for patches and updates. This has advantages of allowing easier hardware scaling and reduced hardware startup costs. Because security is dependent on the user, still need skilled sysadmins. Platform as a Service provides a framework for development and deployment of applications. This removes the need to support the underlying frameworks and generally provides auto-scaling. Function as a Service (Serverless) allows a level up from PaaS, allowing a developer to focus on their applications functionality, not its deployment. All of this is changing the role of a sysadmin. The role is changing from configuration to installation. Will always be needed for hardware and IaaS, as their need will always exist.