$conf['savedir'] = '/app/www/public/data'; notes:elen90066 [DokuWiki]

Site Tools


notes:elen90066

ELEN90066 - Engineering System Design

Sensors and actuators

A sensor is a device that measures a physical quantity. An actuator is a device that modifies a physical quantity. Modelling can have some issues such as physical dynamics, noise, bias, sampling, interactions, faults, etc.

A simple type of magnetometer is the hall effect sensor which uses the deflections of electric current to determine the magnetic field. Accelerometers can be used for navigation, orientation, drop detection, image stabilisation, air bags, etc. Commonly accelerometers are built with a spring and damper on a plate, from which capacitance is measured. Accelerometers cannot distinguish between acceleration and tilt, as gravity looks like acceleration. As a result, accelerometers can be used to measure tilt. Accelerometers pose challenges in separating tilt from acceleration, vibration, spring non-linearities, and double integration gives position drift.

Gyroscopes can be used to measure changes in orientation. An inertial navigation systems are dead reckoning plus GPS, and consist of:

  • GPS Initialisation and correction
  • Three axis gyroscope Orientation
  • Three axis accelerometer Position after correction for orientation

Calibration is an issue with sensors, and calibrating can drastically increase the manufacturing cost. Non-linearity can require correction and feedback can be used to keep the sensor in the operating region. Sampling can cause aliasing and missed events. Noise requires filtering which causes latency. Failures require redundancy to alleviate.

An affine sensor model maps a real quantity to another real quantity. $$F: \mathbb{R}\to\mathbb{R}$$ $$F(x)=ax+b$$ Where $b$ is the bias. The equation needs to be inverted to find $x$, the quantity of interest. Sampling into the digital domain restricts the range into a restricted space (e.g. 16-bit corresponds to 65535 values).

Sampled data is vulnerable to aliasing, where high frequency components masquerade as low frequency components. Careful modelling of the signal sources and analog signal conditioning and/or digital oversampling are required to avoid aliasing.

Faults can be mitigated with redundancy, but there can be problems in introducing multiple sensors. Odd number of sensors allows for a majority in voting.

Motors can be difficult to control as they need to be strong enough for the application, but soft enough not to damage. Electrically a motor appears as: $$v(t)=Ri(t)+L\frac{di(t)}{dt}+k_b\omega(t)$$ There is a back EMF introduced from its movement. Mechanically a motor appears as: $$I\frac{d\omega(t)}{dt}=k_Ti(t)-\nu\omega(t)-\tau(t)$$ There is friction introduced in its working. Both models have a term for current. Microcontrollers can't sustained high current output which is required by motors. Adding transistors adds components and cost. As an alternative Pulse Width Modulation (PWM) can be used if the device tolerates rapid power fluctuations. The ratio of high to low is the duty cycle, and the average voltage is proportional to the duty cycle. $$\text{Duty cycle}=\frac{T_{\text{high}}}{T}$$ This can be good for driving inductive loads, as they tolerate the changes in voltage.

Embedded processors

Embedded processors are more specific than general processors. This results in them being more energy efficient and having specialised hardware. DSPs are specialised for signal processing applications and are designed to support numerically intensive applications. Microcontrollers have a simple CPU which is combined with peripheral devices, which are designed to consume small amounts of memory.

A basic architecture consists of a control unit and a data path, similar to a single purport processor. The data path is general, and the control unit is programmed by memory, and instructs the data path. The control unit takes a sequence of instructions in stored memory, which then need to be broken down into several sub-operations. These sub-operations can consist of:

  • Fetch Get next instruction
  • Decode Determine what the instruction means
  • Fetch operands Move the data from memory to data path register
  • Execute Move data through the ALU
  • Write back Take the output and return it to memory

The Arithmetic and Logic Unit (ALU) takes two inputs and a control input and produces a single data output and a status output. Registers are a sequence of D flip flops which store data and are driven with a common clock. The program counter is a register that holds the memory address of the next instruction to be run. Normally instructions are fetched sequentially, but control transfer instructions allow the changing of the sequence. This can be done with:

  • Branches The next instruction is fetched from elsewhere in memory
  • Subroutines Branches but saves preceding contents to return to
  • Returns Retrieves the saved contents and resumes sequential execution after the subroutine call

Pipelining

Pipelining increases instruction throughput by running independent operations in parallel. This allows instructions to use different parts of the processor at the same time. Between each stage, there are latches that store the result of the previous stage, so that the pipeline can synchronously advance.

There are stages that require feedback to earlier stages (e,g, jumps), which present hazards. Data and control hazards arise from branches, and data hazards arise from memory reads or ALU results.

Data Hazard

Sometimes the result of a later stage is required in an earlier stage. As a result, there are some resources that can't be used simultaneously. A reservation table shows the hardware resources that can be used simultaneously.

The hazard can be dealt with:

  • Explicitly The hazard is documented and the programmer/compiler must deal with it by inserting no-op instructions to advance the pipeline, avoiding the hazard.
  • Interlocks The hardware upon encountering an instruction will detect the hazard and delay its execution until the hazard is removed.

Control hazard

A conditional branch changes the value of the program counter if a specified register is zero. The branch needs to reach the memory stage before the PC can be updated. In a pipelined system, instructions that follow the branch have begun being fetched, decoded and executed despite them not being needed.

The hazard can be avoided by:

  • Delayed branch The hazard is documented and the programmer/compiler ensures that the instructions following the hazard are harmless or do useful work that doesn't depend on the branch being taken
  • Speculative execution The hardware estimates if the branch will be taken, and begins executing the instructions it expects. If the expectation is wrong, the side effects must be undone

Except for explicit pipelines and delayed branches, these techniques introduce variability in the timing of an execution sequence.

Parallelism

Concurrency is central to embedded systems. A program is concurrent if different parts of the program conceptually execute simultaneously. A program is parallel if different parts of the program physically execute simultaneously on distinct hardware. Parallelism can take two forms:

  • Multi-core architectures
  • Instruction Level Parallel (ILP)
    • CISC Instructions
    • Subword Parallelism
    • Superscalar
    • VLIW

Complex Instruction Set Computer (CISC) Instructions

A processor with complex (and generally specialised) instructions are CISC machines. For example, DSPs are CISC and include instructions for FFTs and Viterbi decoding. CISC instruction sets have disadvantages, being they can be hard for a compiler to make use of an instruction set. As a result, the libraries using DSPs are written in assembly.

Subword parallelism

Some processors support subword parallelism, where a wide ALU is divided into narrower slices enabling simultaneous arithmetic or logical operations on smaller words. An example of this is Intel MMX technology. A vector processor is one where the instruction set includes operations for multiple data elements simultaneously. Subword parallelism is a particular form of vector processing.

Superscalar processors

Superscalar processors use hardware that can simultaneously dispatch multiple instructions to distinct hardware units when it detects that the simultaneous dispatch will not change the behaviour of the program. Execution is identical to it being done in sequence. Such processors support out-of-order execution where instructions later in the stream are executed earlier. Superscalar systems have disadvantages for embedded systems as a program's execution time isn't deterministic.

Very Large Instruction Word (VLIW)

Processors intended for embedded applications use VLIW architectures instead of superscalar in order to get more repeatable and predictable timing. VLIW processors have multiple functional units, but each instruction specifies that each functional unit should do, rather than being dynamically done as in superscalar.

Multi-core architectures

Combinations of several processors on a single chip. Embedded systems see advantages in multi-core architectures because real-time and safety critical tasks can have a dedicated processor.

FPGAs can implement multi-core architectures, using soft cores. The hardware function is programmable and can be tightly coupled to custom hardware more easily than off-the-shelf hardware.

Memory architectures

Traditionally memory is used for storage and communication with programs. In embedded systems, memory is typically more constrained than general purpose computing. It can be important to understand these constraints.

Non-volatile memory is memory that preserves its contents when the power is off. For example: disk drives, EPROM, EEPROM, Flash. Volatile memory is memory that loses its contents when the power is off. For example: SRAM, DRAM. SRAM is fast and deterministic, but is power hungry and less dense than DRAM. DRAM is slower than SRAM, has nondeterministic access time, requires periodic refreshes but is denser than SRAM. A boot loader transfers data from non-volatile to volatile memory.

A segment map defines where each device is mapped in logical memory. This can include program memory, data memory, external devices, peripherals. The memory space can be different than the data bus.

Program memory is split into:

  • Statically-allocated memory Compiler chooses the address where to store a variable
  • Stack Dynamically allocated memory with a Last-in First-out (LIFO) strategy
  • Heap Dynamically allocated memory

A static and global variable persists for the whole program execution and exists in statically allocated memory. Normal variables are created in the stack, and exist while the function is in scope. The stack is an area in the memory space where memory is allocated when a procedure is called and freed when the procedure exits. A function can return an address of a variable that exists on the stack but may not exist once the function returns. That address may be overwritten when a new function is called and a new stack frame is pushed.

Recursion pushes a new function onto the stack, which can write to unintended memory and cause unintended behaviour.

The heap holds dynamic memory, and needs to allocated and freed manually. Failure to free can cause leaks, and repeated allocation and freeing can cause fragmentation. Automatic techniques (garbage collection) require stopping everything and reorganising allocated memory, which can be bad for embedded systems.

Memory occupies a hierarchy, utilising caches of varying speeds to store small amounts of memory for quick access. A cache is a subset of the memory space in SRAM, and accessing a bit of memory not in the cache is a cache miss, and requires copying the data to the cache. A scratchpad is a lot of SRAM that occupies a disjoint space to DRAM, and software manages what is stored where. The heirachy can have an overlap in address space, with data being moved to faster memory as it is used. Alternatively each memory can have its own location in address space, with manual finding of data.

A direct mapped cache consists of one line, with a valid bit, a tag and a data block. If the tag of the address matches the tag of the line, then there is a cache hit, otherwise the data needs to be fetched from main memory updating the line. A set associative cache consists of several lines, each like the direct cache line and tag matching is done using associative or content-addressable memory. Here a cache miss requires a replacement policy (e.g. FIFO, LRU).

I/O

IO bridges the cyber and physical worlds, allowing interaction between them. A simple IO often used for GPIO is an open collector circuit. The circuit consists of a register being connected to the collector of a transistor. This allows the same pin to be used for input and output but is current limited.

More complex wired connections can be classified as:

  • Parallel One wire per bit
  • Serial One wire per direction
  • Mixed One or more “lanes”

Parallel connections can suffer from skew, where the signals arrive at different times, and cross talk, where the signals interfere with each other. Asynchronous communication can rely on the clocks of the two devices being close enough that there is minimal clock skew while the packet is being sent. Some protocols include data transmission to help in aligning the clocks for longer packets.

There are two main mechanisms for checking for communication:

  • Polling A main look periodically checking for IO
  • Interrupts External hardware tells the processor when IO is ready, stops what it's doing, reads the input and resumes where it was

Interrupts are more efficient than polling, as there doesn't need to be time wasted checking for input, but is breaks easy control flow of the program. Triggers can be external pins, or internal software interrupts, both triggering a context switch or special instruction execution. The responses to an interrupt include:

  • Disable interrupts
  • Push the current program to the stack
  • Execute instruction at a designated address

A common type of interrupt is a timed interrupt, where the interrupt is triggered after an amount of time by an independent timer. This can be setup in software in the number of cycles which the desired amount of time will require.

Physical systems

Continuous dynamics

The value of a model lies with how well it matches the physical system, but the value of the physical system also depends on the model. We try to build real systems which closely match the models we construct. We can model physical continuous dynamics with differential equations. Often this is expressed with functions of time, e.g. position over time.

A physical object has six degrees of freedom, three translational and three rotational. The angular version of force of torque, where: $$T(t)=rF(t)$$

A system is a function that takes an input signal and produces an output signal. The domain and range of the system are sets of signals, which are functions themselves. The parameters of the system affect the function of the system. A controller alters the input to the system, such that ideally a desired output is achieved. Closed loop control can be used to feed the output into the controller's input in order closer match the desired output.

Discrete dynamics

Discrete systems operate on a sequence of discrete steps or has values taking discrete values. Discrete systems have discrete dynamics.

A pure signal is one which can have only one of two values. It signals an event but does not carry any information about the event. $$P:\mathbb{R}\to\{absent,present\}$$ A discrete actor maps a pure signal to another discrete signal. $$Counter:(\mathbb{R}\to\{absent,present\})^P\to(\mathbb{R}\to\{absent\}\cup\mathbb{N})$$ Where $P$ is the number of pure input signals. The actor reacts to the inputs and changes its internal state. The state is the condition of the system at a particular point in time. A guard sets the transition that needs to be true for a state transition. The state machine needs an initial state, and the outputs can vary based on the transition. The state machine can be event triggered (moving on events) or time triggered (moving on a clock). The state machine is formally defined by its:

  • States
  • Inputs
  • Outputs
  • Update
  • Initial state

A default transition is enabled if no non-default transition is enabled, and has no guard, or a guard that evaluates to true.

A stuttering transition is a default transition that is enabled when inputs are absent, that does not change state and produces absent outputs. Receptiveness is that a transition is enabled for any input values. Determinism is when exactly one transition is enabled for every input at every state. Nondeterminism can be useful for:

  • Modelling unknown aspects of the environment or system
  • Hiding detail in a specification of a system

A FSM behaviour is a sequence of (non-stuttering) steps. A trace is the record of inputs, states and outputs in a behaviour. A computation tree is a graphical representation of all states. FSMs can be used for formal analysis, e.g. safety analysis may show an unsafe state is unreachable. Nondeterministic FSMs are more compact than deterministic FSMs, although they have the same computational power (albeit with more states). For a fixed input sequence, a deterministic system exhibits a single behaviour, whereas a nondeterministic system exhibits a set of behaviours. Nondeterministic FSMs are not probabilistic FSMs as the transitions are known to be both taken, rather than one path randomly being taken.

Extended state machines augment the FSM with variables that may be read or written. We can explicitly declare the variables to help understand a ESM.

Concurrent composition

Complex state machines can be constructed from multiple simpler state machines. There are two kinds of composition:

  • Spatial How do the components communicate with each other
  • Temporal When do they communicate with each other

We need to expose the inputs and outputs of the state machine to enable concurrent composition.

Spatially State machines can be side by side, cascaded, or feedback composed. How a composed machine reacts is the Model of Computation (MoC). Side by side machines can react:

  • Together Synchronous, concurrent
  • Independently Asynchronous, concurrent

We can also create a hierarchical state machine, where multiple state machines work alongside each other and share inputs and outputs. We need to ensure that the outputs do not conflict with each other to yield invalid output. A history transition is one where a transition on one machine causes a return to a previous position on another machine, rather than the reset position. There can also be simultaneous transitions with all the machines. Flattening the machines into a single state space can be done, but it will explode the size of the state space. Hierarchical machines can be used to model interrupts. A reset transition is the opposite of a history transition, where the lower machine returns from the reset state rather than where it was earlier. A preemptive transition specifies that the guard should be evaluated before the current state reacts, and if false the current state should react.

Synchronous composition has simultaneous and instantaneous reactions. Feedback composition is when the output of the state machine is fed back into the input of the machine. As a result, the input and output need to resolve to a stable set, such that the input results in the same output. $$F(s)=s$$ Such an $s$ is referred to as a fixed point. The fixed point should exist and be unique, and should have a constructive procedure to find it. Here $F(s)$ is the actor function, and in determinate systems has the form: $$F:(\mathbb{N}\to V_x\cup\{absent\})\to(\mathbb{N}\to V_y\cup\{absent\})$$ This function maps spaces to spaces, not points to points. With synchronous composition of determinate systems, we can break down by reaction. At the nth reaction, there is a state-dependent function: $$f(n):V_x\cup\{absent\}\to V_y\cup\{absent\}$$ Such that: $$s(n)=(f(n))(s(n))$$ This is a fixed point. $f(n)$ is known as a firing function. For the nth reaction, we seen $s(n)\in V_y\cup\{absent\}$ such that: $$s(n)=(f(n))(s(n))$$ This point may not exist or may not be unique. If this is true, the system is ill-formed, otherwise it is well formed.

We can determine fixed points simply if the state machine is constructive:

  1. Start with $s(n)$ unknown
  2. Determine as much as possible about $(f(n))(s(n))$
  3. If $s(n)$ becomes known (present and value), then we have a unique fixed point.

For non-constructive machines, we are forced to do exhaustive search over all inputs. This is only practical if the data types are small. For a constructive machine, we can determine that a state may not produce an output, allowing us to conclude that the output is absent even with a known input. Likewise we can determine that a state must produce an output, allowing us to conclude that the output is present.

Multitasking

Processes and threads can be used to achieve multitasking. This sits below a concurrent model of computation and above the physical processor in abstraction. Threads are sequential procedures that share memory. It can be difficult to predict when a thread will run. Without an OS, multi-threading is achieved with interrupts from external events. Generic OSs provide thread libraries (e.g. pthread) and provide no guarantees on when threads will run. Real-time OSs support a variety of ways of controlling when a thread executes. Segmentation faults occur when a process tries to access memory owned by another process, and communication between processes needs to be handled by the OS.

Threads can share variables. When threads run may not be known, they could suspend between two atomic operations. Threads can be given priorities, which may or may not be respected by the scheduler. Threads may block on semaphores or mutexes. A mutex is a mutual exclusion lock that can be used to stop other threads from accessing a part of memory while one thread is changing it. This can be useful if the memory would temporarily be in an invalid state (such as when adding to a linked list). Mutexes come with a deadlock risk, which is where two threads hold locks preventing each other from progressing.

Due to threads unknown execution order, the state of the program can change from under it. This can be done with simple code, but can be dangerous in an embedded context. Due to threads' nondeterminism, the programmer needs to impose constraints on execution order and limit shared data access.

Scheduling

When there are fewer processors than tasks, and/or when tasks must be performed at a particular time, a scheduler intervenes: Assignment, Ordering and Timing. Real time systems have timing constraints:

  • Deadlines
  • Precedence
  • Temporal logic propositions

A scheduler needs to:

  • Set up periodic timer interrupts
  • Create default thread data structures
  • Dispatch a thread (procedure call)
  • Execute a main thread (e.g. idle or power save)

The thread data structure has:

  • Copy of all state (machine registers)
  • Address where to resume execution
  • Status of the thread
  • Priority, WCET (Worst Case Execution Time), and other information for scheduler.

Dispatching a thread consists of;

  • Disabling interrupts
  • Determining which thread should execute (scheduling)
    • If the same thread, re-enable interrupts and return
  • Save state to current thread data structure
  • Save return address from stack for current thread
  • Copy new thread state to machine registers
  • Replace program counter with new thread's stack
  • Re-enable interrupts and return

In determining the scheduler, we need to consider:

  • Preemptive vs non-preemptive scheduling
    • A non-preemptive scheduler can only dispatch a thread once the current has completed
    • A preemptive scheduler can schedule on timer interrupt, on IO interrupt, on thread creation, blocking, OS calls, state changes, etc.
      • At any instant, the thread with the highest priority should be executed, provided it is enabled (not blocked)
  • Periodic vs aperiodic tasks
  • Fixed vs dynamic priority

Rate Monotonic Scheduling (RMS) is a hard constraint scheduler. It assumes n tasks are periodically invoked with:

  • Periods $T_1,...,T_n$
  • WCET $C_1,...,C_n$
    • Assumes no mutexes, semaphores or blocking IO
  • No precedence constraints
  • Fixed priorities
  • Preemptive scheduling

Priorities are ordered by period (smallest period has highest priority). The theorem is that: Under no precedence constraints, if any priority assignment yields a feasible schedule, then RMS also yields a feasible schedule. RMS is optimal with respect to feasibility, as it can return a feasible schedule if one exists for its constraints. This assumes that there is zero time required to switch tasks. The contra-positive of the theorem is that if RMS fails to find a schedule, no other priority assignment can yield a schedule. The proof is this gives optimality with only feasibility, disregarding other optimality criteria. Practically, RMS may have a negative effect on processor utilisation.

Earliest Due Date (aka Jackson's) algorithm states to execute tasks in order of non-decreasing deadlines. It works with n independent, one time tasks with deadlines. It schedules to minimise the maximum lateness: $$L_{\max}=\max_{1\leq i\leq n}\{f_i-d_i\}$$ Where $f_i$ is the finishing time of task $i$ and $d_i$ is its deadline. This is negative if all the deadlines are met, and small positive values may be tolerates in soft real-time systems. This does not require preemption. Under no precedence constrains and non-repeating tasks, EDD yields the smallest maximum lateness compared to all other schedules.

EDF (Horn's Algorithm) extends EDD by allowing tasks to arrive at any time. EDF instantly executes the task with the earliest deadline that has arrived, minimising maximum lateness. EDF can be applied to periodic and aperiodic tasks. Being a dynamic priority scheduling algorithm, the tasks may receive a different priority in each period. RMS is simpler than EDF, due to its fixed priorities compared to EDF's dynamic priorities. EDF is optimal with respect to maximum lateness and feasibility, whereas RMS is only optimal with respect to feasibility. EDF can achieve full utilisation, and needs fewer preemptions in practice (less context switching). EDF also allows deadlines to be different to the period.

In reality, tasks require other tasks to be complete before they can run. These can be shown in Directed Acyclic Graphs (DAGs), showing the precedence in tasks. EDF can fail to deliver a schedule with precedence constraints. Latest Deadline First (LDF) builds the schedule backwards, working backwards through the DAG building a schedule from the latest deadline at the end. LDF is optimal with minimising maximum lateness, not requiring preemption. However it does not support the arrival of tasks. EDF with precedence (EDF*) can be used to allow the arrival of tasks with precedence at arbitrary times. it adjusts deadlines and arrival times according to precedence. EDF modifies the release times ($r_j'$) and deadlines ($d_j'$). $$r_j'=\max(r_j,r_i+C_i)$$ $$d_i'=\min(d_i,d_j'-C_j)$$ Where task $i$ immediately precedes task $j$. After adjustment, a schedule can be prepared with EDF which produces an optimal schedule minimising maximum lateness.

Timing behaviour of all known task scheduling strategies is brittle, where small changes can have big consequences. Local improvement may degrade global performance (non-monotonic property of multi-processor systems).

Invariants and temporal logic

A design is correct when it meets its specification withing its operating environment. Specification is the statement of the design objective. Verification is the checking that the system achieves its objective. Controller Synthesis is the modification of a system to achieve the objective. Being able to precisely specify the specifications allows for automated verification. We can formally create the model as a FSM, but need to formally write the specification with temporal logic. Temporal logic is a formal way of expressing the properties of a system over time.

Atomic formulas are statements about an input, output or state of a state machine (at the current time). For example, $x$ means $x$ is present, $x=1$ means $x$ is present and has value 1 and $s$ means the machine is in state $s$. These propositions are about the machine. Propositional logic formulas are more elaborate statements about the machine. For example: $p_1\land p_2$ (and), $p_1\lor p_2$ (or), $p_1\implies p_2$ (implies) and $\lnot p_1$ (not). Here $p_1$ and $p_2$ are atomic or propositional logic formulas. An execution trace $q_0,q_1,q_2,...;q_j=(x_j,s_j,y_j)$ is the sequence of inputs, outputs and states st different steps. A propositional logic formula $p$ holds for a trace if and only if it holds for $q_0$.

Linear Temporal Logic (LTL) formulas are statements about an execution trace.

Formula Meaning
$p$ $p$ holds in $q_0$
$G_\phi$ $\phi$ holds for every suffix of the trace
$F_\phi$ $\phi$ holds for some suffix of the trace
$X_\phi$ $\phi$ holds for the trace $q_1,q_2,\dots$
$\phi_uU_{\phi_2}$$\phi_1$ holds for all suffixes of the trace until a suffix for which $\phi_2$ holds

$G_p$ holds for a trace $q_0,q_1,q_2,\dots$ if and only if $p$ holds for every suffix of the trace $q_0,q_1,q_2,...;q_1,q_2,...;q_2,\dots$. If $p$ is a propositional logic formula, this means it holds for all $q_i$ and is an invariant of the system. $F_p$ holds for some suffix of the trace $q_2,\dots$. For example, every input is eventually followed by an output becomes: $\mathbf{G}(x\implies \mathbf{F}y)$. An LTL formula is satisfied for a state machine if every trace satisfies the formula.

$\mathbf{GF}p$ states that $p$ occurs infinitely often (infinitely many occurrences). $\mathbf{FG}p$ states that eventually $p$ holds henceforth (steady state property). $\mathbf{G}(p\implies\mathbf{F}q)$ states that every $p$ is eventually followed by a $q$ (request-response property). $\mathbf{F}(p\implies\mathbf{XX}q)$ states that if $p$ occurs, $q$ happens two reactions later.

Equivalence and refinement

It can be important to compare state machines to:

  • Check conformance with a specification
  • Optimise a model by reducing complexity
  • Check if component substitution is ok

We can compare state machines from a equivalence or refinement perspective.

Equivalence can be broken down into:

  • Type equivalence The inputs and outputs are the same type
  • Language equivalence Every input sequence produces the same output sequence
  • Bisimulation Every state in one has a corresponding state in the other that will react to inputs in the same way, and will transition to a similarly corresponding state

For deterministic machines, language equivalence and bisimilarity are the same, although this is not true for nondeterministic machines.

Two state machines that are not equivalent but may be related. The machines may be type compatible such that they can be substituted without causing a type conflict (type refinement). A machine may be a specialisation of another, such that it can produce only output sequences given the input sequences (language containment). One machine may be a specialisation of the other such that at every reaction the first can only output values that the other can produce (simulation). If one machine is valid in a system, then the other is valid where it may be a type/language/simulation refinement or a implementation of the first.

A machine $M_2$ is a type refinement of $M_1$ if:

  • $P_2\subseteq P_1$ The inputs of $M_2$ are a subset of the inputs to $M_1$
  • $Q_1\subseteq Q_2$ The outputs of $M_1$ are a subset of the outputs of $M_2$
  • $\forall p\in P_2, V_p\subseteq V_p'$ The input values of $M_1$ are a superset of $M_2$
  • $\forall p\in Q_2, V_p\subseteq V_p'$ The output values of $M_1$ are a superset of $M_2$

If all these are satisfied, $M_2$ can replace $M_1$ without causing a type error.

A deterministic machine can replace another if it is a language refinement of the other. A nondeterministic machine can replace another if it is a simulation of the other. A behaviour of a state machine is an assignment of such a signal to each port such that the signal on any output port is the output sequence produced for the given input signals. The language of a state machine is the set of all behaviours that the state machine can have. For type equivalent machines $M_1$ and $M_2$, $M_2$ is a language refinement of $M_1$ if $L(M_2)\subseteq L(M_1)$. Simulation is defined as having a set of states which both state machines can have, which can produce the same outputs for the same inputs given one machine moves first. Simulation is single directional, and bisimulation is stronger as it allows either machine to move first. If two machines can simulate each other, they are not necessarily bisimilar. If $M_1$ simulates $M_2$, then $L(M_2)\subseteq L(M_1)$.

Reachability analysis and model checking

Reachability analysis is the process of computing the set of reachable states for a system. Model checking is an algorithmic method for determining whether a system satisfies a formal specification expressed in temporal logic.

To verify $\mathbf{G}p$ on a system, we can enumerate all the reachable states and check that they satisfy $p$. This method lends itself to Depth First Search. Alternately, we can explore sets of reachable states. These sets are represented as boolean functions, and as such set operations can be performed with boolean algebra. We can represent a subset of states $C\subseteq S$ by its characteristic boolean function $f_C:S\to\{0,1\}$, where $f_C(x)=1$ iff $x\in C$. Similarly, the state transition function $\delta$ yields a set of $\delta(s)$ of next states from the current state $s$, and so can be represented with a characteristic function for each $s$. The symbolic method lends itself to Breadth First Search. The state graph can be generated by repeated application of the transition function $\delta$. If the goal state is reached, stop and report success, else continue until all states are seen.

We can abstract a FSM to a more general model to simplify the model checking. This can involve making it nondeterministic, as a more general version of the model. A safety property is checking that nothing bad happens, and has finite length counterexamples. A liveness property is checking that something good eventually happens, and has infinite length counterexamples. Model checking with liveness in more involved than reachability analysis. We can rewrite a liveness property as $\mathbf{F}p=\lnot\mathbf{G}(\lnot p)$. We can then find a counterexample to $\mathbf{G}(\lnot p)$.

Quantitative analysis

Quantitative analysis is about providing quantitative guarantees to embedded systems. There are several timing analysis problems:

  • Worst case execution time (WCET) estimation
  • Estimating the distribution of execution times
  • Threshold property: Can you produce a test case that causes a program to violate its deadline
  • Software in the loop simulation: predict execution time of a particular program path

WCET is the longest time taken by software to execute. It is a function of the input data and environment conditions. The opposite of WCET is BCET (Best Case Execution Time), being the shortest time to execute. These execution times form a distribution, with bounds of the BCET and WCET. The distribution can be characterised by repeated execution. WCET is important for correctness (completion in time) and performance (optimal scheduling). In general, the WCET cannot be found as the problem is undecidable. Additionally, knowing the WCET is not enough as anomalies can occur when tasks end earlier than expected. Traditionally, execution time analysis consists of:

  • Program path (Control flow) analysis
    • Finding the longest path through a program
    • Finding loop bounds
    • Identifying feasible paths through the program
    • Identifying dependencies amongst different code fragments
  • Processor behaviour analysis
    • For small code fragments, generate bounds on run times on the platform
    • Model details of architecture, including cache behaviour, pipeline stalls, branch prediction, etc.

The outputs of both analyses feed into each other. Program path analysis can lead to an explosion of possible paths. Instead, we use a different approach, consisting of:

  1. Manually constructing a processor behaviour model
  2. Using the model to find “worst-case” starting processor states for each basic block and measuring the execution times of the blocks from these states
  3. Using these times as an upper bound on the time for each block
  4. Formulating an integer linear program to find the maximum sum of these bounds along any program path

The run time can be measured with:

  • Instrument code to sample the CPU cycle counter
    • Incurs a small overhead, so measuring a long sequence is needed to compensate
    • Multitasking causes the counter to keep going while the task is inactive
    • Multicores require the locking of a task to a single core
    • Due to power management, the CPU may change speed or the timer could be reset while hibernating
  • A cycle-accurate simulator for the processor
  • A logic analyser

As architectures become more complex, timing becomes more complex. Analysis methods are brittle, so small changes result in complete WCET recomputation. We need more reliable ways to measure execution time.

Security and privacy

Security is being kept from harm, privacy is being kept from observation. Security is important not just for information protection, but for safety. IoT security is behind desktop/mobile security.

Threat models comprise of:

  • Secrecy/Confidentiality
  • Integrity
  • Authenticity
  • Availability

Basic cryptography can be done with one time pads, xoring a message with a known key to produce cyphertext. This is cheap, but the key should only be used once or parts of the message can be known. Also, the message and key size should be the same. A block cipher is more elaborate scheme, where the message and key can be different sizes and the same message can produce the different cipher text.

Asymmetric (public key) cryptography uses two keys, where a message can be encrypted with a public key and only decrypted with a private key. This utilises functions that are easy to calculate in one direction, but difficult to do in the other. This key exchange can still be vulnerable to man in the middle attacks, where a third parts intercepts the communication. Certificates register the public keys to help verify identity, preventing identity spoofing.

An authentication technique in cyber systems is through challenge-response authentication. This is where a party is required to prove their trustworthiness with a challenge and by checking the response. This can be extended to physical challenge response authentication. The challenge is from physical stimulus and checking the desired response.

notes/elen90066.txt · Last modified: 2023/05/30 22:32 by 127.0.0.1