$conf['savedir'] = '/app/www/public/data';
======= ELEN90058 - Signal Processing =======
{{tag>notes unimelb electrical_engineering math}}
====== Introduction ======

Signals are used all over.
When signals are read, they tend to be sampled as digital processing is easier than analogue, so processing the signal treats it as a discrete time function.
In sampling, noise can be inferred onto the signal, distorting it randomly.
The aim of signal processing is to remove the noise and recreate the original signal.
This can be done by applying filters, such as a moving average.
Filters that are linear time invariant map sinusoids to sinusoids, and as such can be effective in removing added noise.
A sinusoid in continuous time is of the form:
\[x(t)=A\cos(2\pi ft)+B\sin(2\pi ft)=\alpha\cos(2\pi ft+\theta)\]
The constants \(A,B,f,\alpha,\theta\) are unique.
When sampled into discrete time, we get:
\[x[k]=A\cos(2\pi fk)+B\sin(2\pi fk)\]
This is not unique as the frequency can always be shifted by an integer amount to provide the same sinusoid.
This means we can always find a frequency between 0 and 1 for a given sinusoid.
===== Signals and systems =====

A system is a thing that takes a signal as an input and produces a signal at the output.
A signal is a doubly infinite sequence of numbers.
We write a system as:
\[y-L\{x\}\]
where L is some mathematical rule expressing each \(y[k]\) as a function of \(\dotsb,x[-2],x[-1],x[0],x[1],x[2],\dotsb\).
We tend to define systems in terms of mathematical rules, as it makes them simpler to work with and means we don't need to enumerate all the values of the transformation.
The definition can be finite or infinite.
A linear system is one where:
\[L\{ax+by\}=aL\{x\}+bL\{y\}\]
for all possible inputs \(x\) and \(y\) and scalars \(a\) and \(b\).
For systems with infinite impulse responses, we need to be careful not to provide input for which the output is undefined.
To prove a system is linear, you need to prove it passes additivity and multiplicativity.
To prove a system is nonlinear, you need to provide a signal for which it fails linearity.
A system is time-invariant if the only effect of shifting the input in time is that the output is shifted by the same amount of time.
First we define a system \(y=L\{x\}\), then we define \(v=L\{u\}\) where \(u[k]=x[k-d]\).
The system is time-invariant if \(v[k]=y[y-d]\).
While we work with real valued functions, it can be easier to express them as complex functions.
\[y[k]=A*e^{j2\pi fk}\]
===== Signal processing hardware =====

In digital signal processing, we can use an FPGA, DSP, CPU or ASIC, and they all have their own strengths and weaknesses.
The choice of which to use depends on the project requirements, costs, speeds, etc.
Traditionally FPGAs are more energy efficient but have higher development costs, but now there are better tools making development faster.
A DSP is a specialised CPU, both having a fetch-execute cycle and have interrupts.
A DSP is a specialised CPU designed for signal processing.
DSPs can have execution time determined beforehand and are designed for high-throughput.
As a result DSPs are designed for real-time signal processing, high-throughput applications, but are not as general purpose.
===== LTI frequency responses =====

If a complex exponential (sinusoid) is fed into a LTI system, a complex exponential comes out also.
It can, however, be scaled.
The coefficient of the exponential is the frequency response, \(H(\omega)\), where \(\omega\) is the input frequency.
As the frequency response is complex, it can be useful to plot its magnitude and phase.
If our frequency response is periodic with period \(2\pi\), the filter alters the magnitude and phase of \(e^{j(\omega+2\pi)k}\) the same as \(e^{j\omega k}\).
This is because they are the same signals due to aliasing, which is sampling to remove high-frequency behaviour.
We can use the frequency response to characterise the system and determine what will happen to a given signal passed through.
A spectral null is when the frequency response is equal to 0, meaning that a sinusoid passing in produces no output.
===== Fourier transform =====

Because it is easy to work with complex exponentials, we want to represent an input signal as a sum of complex exponentials.
If the input is already a sum, the work is already done, however we cannot represent every signal as a sum of finite exponentials.
Se we use an infinite number of complex exponentials.
\[x[k]=\frac{1}{2\pi}\int_{-\pi}^\pi X(\omega)e^{j\omega j}d\omega\]
The \(X(\omega)\) term is found using the discrete-time Fourier transform (DTFT):
\[X(\omega)=\sum_{k=-\infty}^{\infty}x[k]e^{-j\omega k}\]
We note that \(X(\omega)\) is periodic with period \(2\pi\).
With the transform of the function, we can get the transform of the output from the product of the input and frequency response.
\[y[k]=\frac{1}{2\pi}\int_{-\pi}^\pi X(\omega)H(\omega)e^{j\omega k}d\omega\]
\[Y(\omega)=X(\omega)H(\omega)\]
We can recover \(x[k]\) with the inverse DTFT:
\[x[k]=\frac{1}{2\pi}\int_{-\pi}^\pi X(\omega)e^{j\omega k}d\omega\]
The inverse transform of the frequency response gives the impulse response:
\[h=\text{InvDTFT}\{H\}\]
This is the output when \(x=\delta\).
We can find the output signal from with the impulse response with convolution:
\[y[k]=x\star h=\sum_{k'=-\infty}^\infty x[k']h[k-k']\]
The Fourier transform turns convolution into multiplication.
We define the transform of the complex exponential as a shifted delta function:
\[\text{DTFT}\{e^{j\omega'k}\}=\delta(\omega-\omega')\]
====== Filter design ======

In designing a filter, we chose a frequency response \(H(\omega)\), and design hardware to implement it.
An ideal low-pass filter allows small frequencies to pass through unattentuated and blocks all high frequencies.
The transition is called the cut off frequency.
\[H_{LP}(\omega)=\begin{cases}1,&|\omega|<\omega_c\\\frac{1}{2},&|\omega|=\omega_c\\0,&\omega_c<|\omega|\leq\pi\end{cases}\]
\[h_{LP}[k]=\begin{cases}\frac{\sin(\omega_ck)}{\pi k},&k\neq 0\\\frac{\omega_c}{\pi},&k=0\end{cases}\]
Trouble in designing a filter comes from incomplete access to the signal (the signal is still arriving in the future).
Filter design is about approximating the ideal characteristics so the implementation becomes practical.

A system whose output at time k depends only on inputs at times before k is causal.
If the output depends on times after k the system is anti-causal.
A system that is not causal is non-causal or acausal.
All natural systems are causal as we are not able to see into the future.

Real filters have a cutoff as they do not work across an infinite range of filters.
The cutoff causes fringing, called the Gibbs phenomenon. 
====== Sample rate ======

Choosing a sample rate can have run on effects, so is a non-trivial task.
If we sample a 3kHz tone at 48kHz, we get a samples sinusoid of 1/16 Hz.
If we sample at 96 kHz, the sampled signal is 1/32 Hz.
The sampled frequency is dependent on the sample rate and is called the normalised frequency.
Every normalised frequency lies between 0 and 1/2, with the angular frequency between 0 and \(\pi\).
We chose complex exponential frequencies to lie between -1/2 and 1/2, which we can find equivalent signals with trigonometric transformations.
If the normalised frequency is outside the range of 0 to 1/2, we can subtract 1 until it lies in the range.
As an alternative we can treat the time index as related to amounts of time and work with the real frequency, but it turns out being easier to use normalised frequency.
When processing real world signals, real-valued signals are the norm.
It can be useful to use complex-valued signals sometimes though, such as in wireless communication networks.
Because the sample rate determines the normalised to real frequency relationship, we need to store it in order to decode the signal properly.
When processing the sampled signals, we need to adjust the cutoff frequencies to work with the normalised frequencies.

The ideal ADC converter implements \(x[k]=x(kT)\), but a DAC is a bit more complex.
Because the ADC implements a many to one mapping, the DAC is unable to perfectly invert it.
This inability is due to high frequency signals being aliased to lower signals, e.g. a 2 and 10 kHz signal being the same at 8 kHz sample rate.
As such an ideal DAC will interpolate \(y[k]\) by the unique sinusoid \(y(t)\) with a frequency between 0 Hz and half the sample rate.
As such, the sample rate should be chosen to be above double the highest frequency.
The ideal DAC is linear, and produces \(y(t)=e^{j2\pi t}\) when given \(y[k]=e^{j2\pi k}\) and \(|f|\leq\frac{1}{2}\).
That is, the DAC produces the lowest frequency output for the given data points.
The ideal DAC must observe the whole input sequence, so we can never implement an ideal DAC.
Mathematically, a DAC with an arbitrary sample rate can be constructed from a DAC with a sample rate of 1 Hz by altering the time base.

The Nyquist Rate is the minimum sampling rate needed to preserve the signal after sampling and reconstruction.
It is equal to double the highest frequency present in the signal, although we tend to sample higher to keep with error.
In reality a large number of sampling frequencies are used depending on the use case.
Lower sample rates can be useful when dealing with data limits and higher sample rates can be useful when dealing with jitter and technical faults from the ADC.

Changing the sample rate can produce a new signal.
Increasing the sample rate is upsampling and decreasing the sample rate is downsampling.
Changing the sample rate by an integer amount allows us to change by a rational factor of the original rate.
Downsampling by a factor of N means we set \(x_2[k]=x_1[kN]\).
This cause the DTFT to become:
\[X_2(\omega)=\frac{1}{N}\sum_{n=0}^{N-1}X_1(\frac{\omega-2\pi n}{N})\]
If we apply this to a signal with high frequencies, we can see aliasing corrupt our signal.
In upsampling, the Nyquist criterion is automatically fulfilled.
The new signal is the following, given we have a low-pass filter (\(H_{LP}(\omega)\)) with an angular cutoff of \(\pi/N\):
\[X_2(\omega)=NH_{LP}(\omega)X_1(N\omega)\]
While downsampling can be performed online, an ideal upsampler cannot due to the low pass filter.
Also, upsampling and DAC is a similar operation.

To up-sample by an integer amount, we add zeros in between the existing values.
we then apply a low pass filter to the modified signal.
The stage of inserting the 0s is called expansion, the application of the LPF and the scaling is called interpolation.
As such the LPF used can be called an interpolation filter.
We can use a finite response filter (FIR) easily.
We can also use a recursive filter (Infinite Impulse Response, IIR):
\[y[k]=\sum_{m=1}^M\alpha_my[k-m]x[k]\]
We can also chain an IIR filter with a FIR filter.
As an ideal LPF cant be implemented, we need to trade off speed and accuracy in our filter design.

The DTFT of the product of two signals is the convolution of their DTFTs.
\[DTFT\{w[k]\sin(2\pi k/16)\}=DTFT\{w[k]\}\star DTFT\{\sin(2\pi k/16)\}\]
\[(W\star X)(\omega)=\frac{1}{2\pi}\int_{-\pi}^\pi W(\nu)X(\omega-\nu)d\nu\]
The convolution is restricted, becoming a circular convolution.
The convolution causes a smoothing effect to the signal.
When sampling over a finite region of time, we are multiplying by a square wave, causing a sinc function in the DTFT.
Taking more cycles uses a longer rectangular pulse, causing a narrower sinc function and thus less distortion in the DTFT.
Changing the windowing function from a rectangular pulse to something else can make the found DTFT nicer, removing the convolved sinc function.
====== Block based processing ======

Filters can need past information about the input signal, so we need to store parts of the signal.
The input signal can be arbitrarily large, making it impractical to store all of it.
As a result we store blocks of the signal and continuously shift the amount stored so that we only store the parts that we need.
This is done in C with global, static variables or structs.
In object oriented languages like C++ and MATLAB, we do with objects.
The objects store a buffer of the past signal, and have a step function to apply to each block.
This step and store is done while there is an input signal, with the incoming signal being moved into the buffer replacing the old stored signal.
====== ARMA filter ======

An ARMA filter is an Auto-Regressive Moving Average filter.
AR is IIR filter, MA is FIR.
Moving average filters can have spectral nulls, and need many terms to have sharp spectral peaks.
Auto-Regressive filters are of the form:
\[y[k]=\alpha_1y[k-1]+...+\alpha_Ny[k-N]+x[k]\]
This is a difference equation and can be studied with a generating function.
===== Fibonacci Sequence =====

The Fibonacci numbers are defined by the difference equation:
\[F_n=F_{n-1}+F_{n-2}\]
We initialise it as \(F_n=0,\forall n\).
As computing it is expensive, we want a function to find the nth number non recursively.
We want to associate the generating function with the formal power series:
\[g(x)=\sum_{n=0}^\infty F_nx^n=F_0+F_1x+F_2x^2+F_3x^3+...\]
If we can find another expression for \(g(x)\), we may be able to use that expression to find the coefficients of \(g(x)\), being the values of \(F_n\).
We can write:
\[g(x)=x+\sum_{n=2}^\infty f_nx^n=x+\sum_{n=2}^\infty F_{n-1}x^n+\sum_{n=2}^\infty F_{n-2}x^n=x+xg(x)+x^2g(x)\]
\[g(x)=\frac{x}{1-x-x^2}\]
If we can write \((1-x-x^2)^{-1}\) as a power series of x, we are set.
Note that \((1-x)^{-1}=1+x+x^2+x^3+\ldots\), suggesting to use a partial fraction expansion.
\[g(x)=\frac{-x}{(x+\lambda_-)(x+\lambda_+)}=\frac{A}{x+\lambda_-}+\frac{B}{x+\lambda_+},\lambda=\frac{1\pm\sqrt{5}}{2}\]
\[g(x)=\frac{1}{\sqrt{5}}\left(\frac{\lambda_-}{x+\lambda_-}-\frac{\lambda_+}{x+\lambda_+}\right)\]
We use the identity \(\frac{-\alpha^{-1}}{x-\alpha^{-1}}=\frac{1}{1-\alpha x}=\sum_{n=0}^\infty\alpha^nx^n\) to find:
\[\frac{\lambda_-}{x+\lambda_-}=\sum_{n=0}^\infty\lambda_+^nx^n,\frac{\lambda_+}{x+\lambda_+}=\sum_{n=0}^\infty\lambda_-^nx^n\]
So we get:
\[f(x)=\sum_{n=0}^\infty\frac{1}{\sqrt{5}}(\lambda_+^n-\lambda_-^n)x^n\]
Giving the final answer as:
\[F_n=\frac{1}{\sqrt{5}}(\lambda_+^n-\lambda_-^n)\]
==== Faster method ====

We will guess the solutions are of the form \(F_n=\lambda^n\).
\[\lambda^n=\lambda^{n-1}+\lambda^{n-2}\]
\[\lambda^2=\lambda+1\]
Solving this gives two solutions: \(\lambda=\frac{1\pm\sqrt{5}}{2}\), which direct substitution shows are solutions.
Linearity implies \(F_n=\alpha\lambda_-^n+\beta\lambda_+^n\) is also a solution.
Given initial values, we can fix \(\alpha\) and \(\beta\).
===== AR(1) systems =====

A recursive system with a singular input is an AR(1) system, e.g.:
\[y[k]=\alpha y[k-1]+x[k]\]
It can be applied recursively given an initial condition.
A homogeneous solution is one where \(x[k]=0,\forall k\).
Solutions to the equation are the sum of a particular solution and a solution to the homogeneous equation (due to linearity).
These can be calculated forward and backward in time, which can cause problems in the math.

We create some generating functions:
\[X(z)=\sum_{k=-\infty}^\infty x[k]z^{-k}\]
This is a double-sided power series, which can be problematic.
\(z^{-1}\) is called the unit delay operator, because:
\[Y(z)=z^{-1}X(x)\implies y[k]=x[k-1]\]
These generating functions are the z-transform.
When we take the z-transform, we can lose some solutions.
This is because a formal double sided power series doesn't exist, so we use a formal Laurent series (which can only have a finite number of non-zero \(x[k],k<0\)).
Because we want to work with doubly infinite sequences, we must treat \(\sum_{k=-\infty}^\infty x[k]z^{-k}\) as a genuine function.
This means we define a region of convergence for z, wherein the sequence is absolutely summable.
The sequence is absolutely summable is when:
\[\sum_{k=-\infty}^\infty |x[k]z^{-k}|<\infty\]
This is dependent on \(|z|\).
The ROC is an annular region of the form \(\{z\in\mathbb{C}|r_-<|z|<r_+\}\) or \(|z|<r_+\).
The initial solutions bind the possible solutions, as the z-transform finds stable solutions.
===== Z transform =====

The one sided Z transform can find all solutions in the future, so it is useful for real-world systems.
The double sided transform works with our earlier study of LTI systems, looking at all values of k to find the frequency response.
The Z transform fails to find the infinite number of solutions to the double sided case, it finds the ones we care about for finding the frequency response.
==== One-sided transform ====

For a given system where we want to find all possible future outputs, the standard Z transform gives us one solution at most.
The other solutions are found by solving the homogeneous problem, but the Z transformed solution gives the trivial solution \(Y(z)=0\).
Hence we need the one sided Z transform.
Generating functions work for one sided problems, as problems arose when working with the double sided transform.
This works for all future equations values but misses some of the current, which can be rectified by specifying initial conditions.
The number of initial conditions is equal to the rank of the AR system (AR(1) needs 1, AR(3) needs 3).
==== Double sided transform ====

For the double sided transform, there is no place to specify initial conditions.
While this would mean we can find all solutions, we can only find all solutions to convergent power series.
The transform can have zeros, where the solution is always zero, and poles, where the solution is undefined.
The solution is found in the region of convergence by partial fraction expansion.
We can expand the fractions as Taylor series in \(z\) or \(z^{-1}\) to get the solutions.
The solutions result when the regions of convergence for each fraction overlap.
The solution in \(z\) is causal and the solution in \(z^{-1}\) is anticausal.
A representation of the series is unique for a ROC.
=== Laurent series ===

The Laurent series is unique, being that if the sum of a pair of given series are equal over a particular region of convergence, the series are the same.
If they are the same over differing ROC, they can be different.
The Laurent series can be thought of the sum of two Taylor series going in opposite directions, one to \(\infty\) and the other to \(-\infty\).
The Taylor series in \(z\) (\(Y_0(z)\)) is about \(0\) and the Taylor series about \(z^{-1}\) (\(Y_\infty(z)\)) is about \(\infty\).
This gives:
\[Y(z)=Y_0(z)+Y_\infty(z)\]
Here we can see why the ROC of each series must overlap.
Due to uniqueness, this is the only unique decomposition for a given ROC.
==== Fixed initial conditions and associated LTI systems ====

When we analyse linear systems, we tend to fix the initial conditions, however this an have problems.
This results in us not being able to use Fourier analysis to study the system as it is not time invariant and an exponential in may not result in an exponential out, due to the fixed points.
Instead we can fix the impulse response, which characterises an LTI system.
This allows the initial conditions to differ, and agree on all future values (\(k\geq 0\)) for causal systems.
With this we can analyse an AR(1) system and apply the results to an LTI one, as they produce the same output for \(k\geq0\).
We don't even need to find the LTI system, as its Z-transform will be the same as the AR(1)'s transform.
When we take the Z-transform for an ARMA system, we get a rational transfer function such that \(Y(z)=H(z)X(z)\).
The transfer function is closely related to the frequency response.
The impulse response is given by the inverse Z-transform of the transfer function.
==== Finding the frequency response ====

We chose the coefficients of the ARMA system so that we get the desired frequency response.
This is done by taking the Z-transform of the system and substituting \(z=e^{j\omega}\).
Thus they must contain the same information, but allow different views of the same information.
The Z-transform allows an easy understanding of features like zeros and poles.
Although an ARMA system has fixed initial conditions, we study it by its associated LTI system.
The DTFT and Z-transform are the same calculation for \(z=e^{j\omega}\), as long as \(z\) is in the ROC and stable systems include the unit circle in their ROC.
==== Multiplication in the Z-domain ====

We defined \(Y(z)=H(z)X(z)\).
We can think of \(H(z)\) and \(X(z)\) as polynomials whose coefficients are \(h[k]\) and \(x[k]\) respectively.
As \(y[k]=(h\star x)[k]\), we can see that convolution becomes multiplication in the Z-domain.
==== Poles and zeros ====

\(H(z)=(z-b)(z-a)^{-1}\) is said to have a pole at \(z=a\) and a zero at \(z=b\).
The zero is where the transfer function is equal to zero (spectral null) and the pole is where the function blows up (spectral resonance).
It is easy to read the poles and zeros off of a rational transfer function.
As the magnitude of the poles approach 1, their effect is increased an for real signals \(|z|=1\) (DTFT and Z-transform are the same).
A pole at the origin just delays the signal and a pole at the origin advances the signal.
The zeros are associated with smoothing (MA), and the poles are associated with resonance (AR).

Cascading ARMA systems multiplies their frequency responses, adding their poles and zeros to the resultant filter.
==== Stability ====

A system is unstable if its output approaches infinity.
It is common that an AR system is stable in one direction it is unstable in the other.
This is why the Z-transform finds a causal and an anticausal solution for an AR(1) system, as any other solution has an unstable part.

If \(|z|=1\) is not in the unit circle, then the DTFT does not exist, meaning there are frequencies where \(|H(\omega)=\infty\).
If we initialise the system to zero, the want the causal solution where the ROC is of the form \(|z|>r\), where \(r\) is the magnitude of the largest pole.
Hence stability requires all poles to be in the unit circle.
==== DTFT vs Z-transform ====

The Z-transform has more features (poles, zeros, ROC) than the DTFT.
As such, if the DTFT exists, the Z-transform is an extension of it, but not the other way around.
The locations of the poles and zeros describe the frequency response nicely.
===== Phase response =====

The phase response is \(\angle H(\omega)\).
This can matter when filtering in parallel, as the summation of the filters could be destructive resulting in cancellation of the output signal.
When filtering a signal, we need to decide whether time or frequency domain is the primary concern.
In communications, the receiver must recognise the pulse shape, which can be significantly altered by the phase response.
A linear change in phase delays the signal and the delay is proportional to the frequency of the signal.
A delay also has ambiguity due to \(2\pi\) periodicity.
At frequency \(\omega'\), a sinusoid is delayed by \(-\frac{\angle H(\omega')}{\omega'}\) seconds.
If all frequency components are delayed by the same amount, the signal shape is not altered.
An all-pass filter has a magnitude response of 1 but a phase delay, and results in delaying the overall signal.

A single system can be decomposed into subsystems that multiply to produce the overall system.
By analysing the subsystems, we can build a better overall understanding of the system.
An example decomposition is into a phase response and a magnitude response.
Ideally a filter would have a linear phase response, however this results in a more complex filter so some applications prefer a shorter filter without a linear phase response.
==== Narrow-band signals ====

A narrow-band signal is one who's frequency content is (approximately) contained in a narrow band of frequencies.

Applying a signal to a much higher frequency carrier produces a narrow band signal.
Putting this through a filter produces a group delay \(\tau_g=-\left(\frac{d\angle H(\omega)}{d\omega}\right)_{\omega=\omega'}\) and a phase delay \(\tau_p=-\frac{\theta}{\omega'}\).
The envelope is delayed by the group delay and the carrier wave is delayed by the phase delay.
The phase delay is defined on a continuous interval created by "unwrapping" \(H(\omega)\) to \(\Theta(\omega)\).
The output of a narrow band signal through a filter is:
\[y[k]=|H(\omega')|w[k-\tau_g]\cos(\omega'(k-\tau_p))\]
\[\tau_p=-\frac{\Theta(\omega')}{\omega'}\]
\[\tau_g=-\left(\frac{d\Theta(\omega)}{d\omega}\right)_{\omega=\omega'}\]
===== All pass filters =====

The transfer function of an ARMA filter is rational and if its magnitude for a complex exponential input is 1, then \(H(z)H(z^{-1})=1\) for \(|z|=1\).
For an all pass filter, the rational transfer function must satisfy:
$$H(z)H(z^{-1})=\frac{B(z)B(z^{-1})}{A(z)A(z^{-1})}=1,|z|=1$$
Applying a signal to a much higher frequency carrier produces a narrow band signal.
Putting this through a filter produces a group delay $\tau_g=-\left(\frac{d\angle H(\omega)}{d\omega}\right)_{\omega=\omega'}$ and a phase delay $\tau_p=-\frac{\theta}{\omega'}$.
The envelope is delayed by the group delay and the carrier wave is delayed by the phase delay.
The phase delay is defined on a continuous interval created by "unwrapping" $H(\omega)$ to $\Theta(\omega)$.
The output of a narrow band signal through a filter is:
$$y[k]=|H(\omega')|w[k-\tau_g]\cos(\omega'(k-\tau_p))$$
$$\tau_p=-\frac{\Theta(\omega')}{\omega'}$$
$$\tau_g=-\left(\frac{d\Theta(\omega)}{d\omega}\right)_{\omega=\omega'}$$
==== All pass filters ====

The transfer function of an ARMA filter is rational and if its magnitude for a complex exponential input is 1, then \(H(z)H(z^{-1})=1\) for \(|z|=1\).
For an all pass filter, the rational transfer function must satisfy:
\[H(z)H(z^{-1})=\frac{B(z)B(z^{-1})}{A(z)A(z{-1})}=1,|z|=1\]
The denominator has poles at \(a_1,...,a_n\) and at \(a_1^{-1},...,a_n^{-1}\).
For equal unity, the zeros of \(B(z)B(z^{-1})\) must be in the same locations as the poles of \(A(z)A(z^{-1})\).
This means \(b_i\) is either \(a_i\) or \(a_i^{-1}\).
If \(A(z^{-1})\) and \(B(z^{-1})\) have a common zero, they cancel out leaving the solution \(H(z)=1\).
Hence for all interesting transfer functions, the poles and zeros will be reciprocals.
So \(B(z)\) must be a delayed version of \(A(z^{-1})\).
So the overall filter becomes:
\[H(z)=z^{-N}\frac{A(z^{-1})}{A(z)}\]
==== Minimum, maximum and mixed phase ====

Given a system \(H(z)\) with a zero at \(b\), and an all pass filter \(H_2(z)\) with a pole at \(b\) and a zero at \(b^{-1}\).
We construct a new system \(\tilde{H}(z)=H(z)H_2(z)\), which has magnitude equal to \(|H(z)|\), meaning we can replace a zero by its reciprocal without affecting magnitude response of the filter.
A minimum phase filter is one with all the zeros inside the unit circle.
A maximum phase filter is one with all the zeros outside the unit circle.
A mixed phase filter is one with zeros inside and outside the unit circle.
The minimum phase filter will have the smallest group delay, so often is the default choice.
Sometimes a filter is needed that distorts the least in a given region, so a more carefully chosen filter is needed.

As the poles determine the ROC, and the ROC must include the unit circle.
==== Stability ====

Systems with poles can resonate.
A system is bounded-input bounded-output (BIBO) stable if for all \(c\) there exists \(C\) such that:
\[|x[k]|<c\implies |y[k]|<C, \forall k\]
A system is BIBO stable if and only if the impulse response is absolutely summable.
The ROC of an LTI BIBO system with a rational transfer function includes the unit circle.
===== Discrete Fourier Transform =====

The Inverse Discrete Fourier Transform is:
$$x[k]=\frac{1}{N}\sum_{w=0}^{N-1}X[w]e^{\frac{jw\pi k}{N}}$$
Where $X[w]$ is a uniform sampling of $X(\omega)$ from $0$ to $\pi$.
The N point Inverse Discrete Fourier Transform (IDFT) is:
\[x[k]=\frac{1}{N}\sum_{w=0}^{N-1}X(\omega_w)e^{j\omega_wk},\omega_w=\frac{2\pi w}{N}\]
\[x[k]=\frac{1}{n}\sum_{w=0}^{n-1}x[w]e^{\frac{jw\pi k}{n}}\]
where \(x[w]\) is a uniform sampling of \(x(\omega)\) from \(0\) to \(\pi\).
The signals that can be written are those which are periodic with period \(N\).

The N point Discrete Fourier Transform (DFT) is:
\[X[w]=\sum_{k=0}^{N-1}x[k]e^{-j\frac{2\pi wk}{N}}\]
For N periodic functions, the DFT and DTFT agree at the sampled points, the DTFT would consist of delta functions.
Both the DFT and IDFT are linear operators.
There are fast algorithms for computing the DFT and IDFT, known as Fast Fourier Transforms (FFT) and the best are for \(N\) being a power of two.

We can find the output of system from the IDFT of the product of the frequency response and input DFT.
\[Y[w]=H[w]X[w]\]
====== Recovering a signal from samples ======

We use the sinc function as it is the Fourier transform on a rectangular pulse.
We chose \(s(0)=1\) as it makes \(s(t)\) continuous.
When sampled, the sinc function is equal to 1 at 0, and 0 at all other integer times.
This means we can create a signal \(\tilde{x}(t)=\sum_{k=-\infty}^\infty x[k]s(t-k)\) such that \(\tilde{x}(k)=x[k]\) for integer values.
If the original signal only contained frequencies below the Nyquist rate, the constructed signal is equal to the original signal.
The accuracy of this reconstruction improves as the length of the sampled signal increases.
We can increase the length of the signal by up sampling the signal.
This introduces a delay which can cause reconstruction errors.
The delay increases with frequency, so we try to stay away from the Nyquist frequency.
The signal has some transient effects, due to windowing.
====== FIR filter design ======

We can design filters with numerical optimisation, often with computers.
A filter is fully characterised by its impulse response.
The order of our filter influences the number of terms in the frequency response.
Once the order is set, we keep tweaking the terms until it matches the desired response.

FIR filters are easy to make have a constant group delay and a linear phase response.
For linear phase, the filter should be symmetric or anti-symmetric.
Different properties are seen for even and odd filter orders.
===== Upsampling =====

For upsampling, we need a type 1 filter, being N is even and the impulse response is symmetric.
This results in the group delay being an integer number.
The filter length is one more than its order.
MATLAB has the '''firpm''' function, which uses minimax to find the desired filter, the Parks-McClellan filter design algorithm.

If we upsample by a factor \(I\), we need to remove \(I-1\) aliases.
This is as the signal has aliases introduced which need to be removed.
We start with the signal \(x[k]=\cos(2\pi fk)\), after adding \(I-1\) zeros between elements, we get:
\[\frac{1}{I}\left[\cos\left(2\pi\frac{f}{I}k\right)+\cos\left(2\pi\frac{f+1}{I}k\right)+...+\cos\left(2\pi\frac{f+I-1}{I}k\right)\right]\]
Where the constant out the front is normalising.

If we pass the signal through a filter \(H(\omega)\), the signal we want gets scaled to:
\[\frac{1}{I}H\left(2\pi\frac{f}{I}\right)\cos\left(2\pi\frac{f}{I}k\right)\]
The other components get scaled to:
\[\frac{1}{I}H\left(2\pi\frac{f+i}{I}\right)\cos\left(2\pi\frac{f+i}{I}k\right)\]
The amplitude error from the other terms is bounded by:
\[\frac{1}{I}\sum_{i=1}^{I-1}\left|H\left(2\pi\frac{f+i}{I}\right)\right|\]
As \(I\) increases, we need \(|H(\omega)|\) to be smaller in the stop band if we are to maintain a similar error.
For audio, we care about the sum of signals, so adding a small signal has a small effect.

Up to a constant, the power of our signal is:
\[|H(2\pi f/I)|^2\]
For the same constant, the power of the interference is:
\[|H(2\pi(f+i)/I)|^2\]
The signal to interference ratio is:
\[\frac{|H(2\pi f/I)|^2}{|H(2\pi(f+i)/I)|^2}\]
In decibels this is:
\[20\log_{10}|H(2\pi f/I)|-20\log_{10}|H(2\pi(f+i)/I)|\]

The ideal pass-band has \(H(\omega)=0\) for \(0\leq\omega<\pi/I\) and the ideal stop band is \(H(\omega)=0\) for \(\pi/I\leq\omega\leq\pi\).
The aliased components lie in the stop band.
The ideal filter would give an infinite signal to interference ratio.

The '''firpm''' keeps the filter response as close as possible to the desired response, known as equiripple, by minimising the maximum error of the filter.
The signal to interference ratio is no less than \(20\log_{10}(1-\epsilon)-20\log_{10}e\text{ dB}\) where \(e=\max_\omega|H(\omega)|\text{ dB}\).
The narrower we want the transition band, the worse the signal to interference ratio, essentially due to Gibbs phenomenon.
In practice this means that if the frequency content of the original signal is between \(0\) and \(f<1\), the transition region can be between \(f/I\) and \(1/I\), so the closer the signal gets to the Nyquist rate, the harder it is to design the filter.

==== Raised cosine filter ====

An upsampler and low pass filter will add data points in between the sampled data points, however it will disturb the existing points.
This results in the upsampled signal having a slight error from the ideal signal.
If we want an upsampler not to alter the original points, we need to change from an FIR filter to a different one.
We require \(h[iI]=0\) for all integer \(i\).
One way of doing this is a raised cosine filter, '''rcosdesign''' in MATLAB.
In MATLAB, we want a '''normal''' filter instead of the default '''sqrt'''.

The function is analogous to using the sinc function.
It has parameter \(\beta\), the roll off factor between 0 and 1.
It has a span which determines the order (\(\text{order}=\text{span}*\text{sps}\)), and sps which determines where the zeros in the impulse response are.
We want sps to equal I.
As this filter is designed for interpolation it will have the correct cutoff frequency.
The rolloff bandwidth determines the excess bandwidth of the filter, where 0 is a brick wall and 1 is a pure raised cosine.
The brick wall is the sinc filter, so as \(\beta\) increases, the transition becomes more gradual and the stop band gain becomes lower.
==== Key points ====

At the end of the day, the filter design doesn't matter as it is in service of producing the correct magnitude response.
The wrong filter will have non-optimal performance.
We also need to know the spectral content of the signal in order to design the filter.

In MATLAB we can interpolate with the '''interp''' command.
It produces a short filter (good for implementation).

In designing a filter we need to consider various trade offs:
  * **Can we accept a long filter?:** This produces a longer delay and requires more computational resources
  * Do we need large supression of aliased components?
  * Do we need the upsampled signal to agree with the sampled one perfectly?

Sometimes we will have a signal with specific requirements that require us to design a bespoke filter.
This could be for a signal close to the Nyquist rate but without low frequencies.
Here we move the transition region above the Nyquist rate as there is space until the next aliased signal.
====== Fourier transforms ======

There are four types of Fourier transforms:
  * **Discrete-Time Fourier Transform (DTFT):** Doubly infinite discrete time signals
  * **Discrete Fourier Transform (DFT):** Periodic discrete time signals
  * **Fourier Transform (FT):** Continuous signals
  * **Fourier Series (FS):** Periodic continuous time signals
All these transforms contain the same information, so they must be linked.
The transforms allow the representation of a signal as a sum of sinusoids, as sinusoids are eigenfunctions of LTI systems.
===== Continuous time periodic signals =====

If \(x(t)\) is periodic with period \(L\), then:
\[x(t)=x(t+L)\]
For a given periodic signal, we can scale its period to 1.
Every sinusiod with integer frequency (\(\cos(2\pi ft)\)) is periodic with a period 1.

The Fourier series is an infinite sum of the form:
\[x(t)=\frac{a_0}{2}+\sim_{n=1}^\infty a_n\cos(2\pi nt)+b_n\sin(2\pi nt)\]
A Fourier series is periodic with period 1.
It can be preferential to work with complex exponentials:
\[x(t)=\sum_{n=-\infty}^\infty c_ne^{j2\pi nt}\]
In most cases, we can represent a periodic signal as a Fourier series,

To find the Fourier series, we set:
\[c_n=\int_0^1x(t)e^{j2\pi nt}dt\]
\[x(t)=\sum_{n=-\infty}^\infty c_ne^{j2\pi nt}\]
===== Discrete time periodic signals =====

If \(x[k]\) is \(N\) periodic, then:
\[x[k]=x[k+N]\]
If we sample a 1-periodic continuous time signal \(x(t)\) at an integer rate \(N\), we obtain an \(N\) periodic discrete time signal.

The \(N\) point IDFT is:
\[x[k]=\frac{1}{N}\sum_{w=0}^{N-1}X[w]e^{j\frac{2\pi wk}{N}}\]
The IDFT is the discrete time analogue of the FS, representing an N periodic signal \(x[k]\) by a sum of sinusoids.
Only a finite number of sinusoids are required and every \(N\) periodic \(x[k]\) has a representation,
The \(N\) point DFT is:
\[X[w]=\sum_{k=0}^{N-1}x[k]e^{-j\frac{2\pi wk}{N}}\]

The DFT is created by sampling the FS, where the IFDT coeffients can be expressed as sums of the FS coeffients:
\[X[w]=N\sum_{m=-\infty}^\infty c_{mN+w}\]
Aliasing can occur when moving from continuous time to discrete time when the sample rate is too low for the given signal.
If the signal being sampled meets the Nyquist criteria for the sample rate, then we should be able to reconstruct the original signal from its samples.
When reconstructing the original signal we alias negative frequencies \(-k/N\) to \((N-k)/N\), where \(k<N\).
As a result, when sampling we require all frequencies to be between \(0\) and \((N-1)/N\).
When we reconstruct, we want to choose the lowest frequency components.
To avoid this, we can reconstruct the original signal by reconfiguring the DFT to:
\[x(t)=\frac{1}{N}\sum_{w=\lfloor-N/2\rfloor+1}^{-1}X[w+N]e^{j2\pi wt}+\frac{1}{N}\sum_{w=0}^{\lfloor N/2\rfloor}X[w]e^{j2\pi wt}\]

When \(N\) is even, there is strictly one more positive frequency than negative frequency.
This means we can recover \(e^{2\pi(N/2)t}\) but not \(e^{-2\pi(N/2)t}\), corresponding to \(\cos(2\pi(N/2)t)\) not being reconstructable.
This shows that the corresponding Nyquist result is the same as for non-periodic signals, so we must sample at strictly greater than the largest frequency.
Nyquist's Theorem states that we can only recover signals up to half of the sample rate.
===== Reconstructing an aperiodic signal from its samples =====

To reconstruct a periodic signal, we can use the IDFT.
We have to use the Inverse DTFT to reconstruct an aperiodic signal.
The DFT uses frequencies from \(0\) to \(2\pi\) to simplify the math, but needs to be shifted after the transform to \(-\pi\) to \(\pi\).
Using the DTFT is impractical as it uses integration, which is computationally expensive and imprecise, but is useful to understand.

Sinc interpolation comes from the DTFT of the delta function.
Once the delta function undergoes the DTFT it becomes \(X(\omega)=1\), and the IDTFT is the sinc function.
Sinc interpolation can be thought of as transforming a signal consisting of a large sequence of delta functions as reconstruction is LTI.
Since \(\delta(t)\implies\text{sinc}(t)\), then \(\alpha\delta(t-n)\implies\alpha\text{sinc}(t-n)\), then:
\[x(t)=\sum_{n}\alpha_n\text{sinc}(t-n)\]
This can also help to explain why the sinc function is the transform of a low pass filter, as we take only some of the transformed \(\delta\).

The reconstructed \(x(t)\) depends on all values of \(x[k]\).
This requires seeing infinitely into the future and past, which is impossible, however its dependence dies away with time.
A simple method of reconstruction would be to set \(x(t)=x[\lfloor t\rfloor]\), known as sample and hold.
This contains high frequency components, due to abrupt jumps in value which we can try to remove with low pass filtering.
===== Fourier transform =====

The Fourier transform is used for continuous time singles.
The inverse Fourier transform is:
\[x(t)=\frac{1}{2\pi}\int_{-\infty}^\infty X(\Omega)e^{j\Omega t}d\Omega\]
The difference with the IDTFT is that there are more frequencies available, due to aliasing not being present.
Sometimes the FT doesn't exist, so we use the delta function to make it work.
The set of sufficient but not necessary conditions for the FT are the Dirichlet conditions:
  - \(x(t)\) has a finite number of finite discontinuities
  - \(x(t)\) has a finite number of maxima and minima
  - \(x(t)\) is absolutely integrable, \(\int_{-\infty}^\infty |x(t)|dt<\infty\)
Sinusoids do not satisfy the last condition, but we use Dirac delta functions to overcome this limitation.
===== Sampling theorem =====

The frequency content of a signal is the set of angular frequencies \(\{\Omega|X(\Omega)\neq 0\}\).
The only way to get an isolated frequency is to use a Dirac delta function.
It can be proven (Heisenberg uncertainty) that a given signal \(x(t)\) is zero outside a finite range, then it contains arbitrarily high frequency components.
It is impossible for a signal to be localised in both time and frequency domains simultaneously.
Real systems assume these are zero anyway as systems behave non-linearly at high frequencies and decay to 0.
This is accounted for by friction in the real world, which is non-linear and resolves the purely linear discrepancy.
The sampling theorem states:
\[X_{DTFT}(\omega)=\sum_{n=-\infty}^{\infty}X_{FT}(\omega+2\pi n)\]
We can see that the sum is causing aliasing to exist by bringing high signals down.
If the original signal is band limited \(X(\Omega)=0,|\Omega|<\pi\), then \(X_{DTFT}(\omega)=X_{FT}(\omega),|\omega|<\pi\).

If \(y(t)=x(t/T)\), the FT are related as:
\[T(\omega)=TX(T\omega)\]

We can consider transitioning from the FT to DTFT  as sampling in time, and transitioning from DTFT to DFT as sampling in frequency.
We want to sample in the frequency domain such that:
\[X[w]=X(2\pi w/N)\]
This sampling turns the signal \(x[k]\) into the periodic signal \(\sum_{i=-\infty}^\infty x[k+iN]\).
We can see //aliasing// in the time domain, with multiple instances of the signal added together.
We can see that sampling in the frequency domain causes periodicity in the time domain.

From the Fourier transform to the Fourier series we also sample in frequency:
\[c_n=\frac{1}{N}X(2\pi n/N)\]
Which is equivalent to sampling at an interval of \(\frac{1}{N}\), the \(\frac{1}{N}\) is for scaling.
This sampling also causes the time signal to become periodic:
\[x_{FS}(t)=\sum_{i=-\infty}^\infty x(t+iN)\]

Sampling in the time domain turns the FT into the DTFT and the FS into the DFT.
Sampling in the frequency domain turns the FT into the FS and the DTFT into the DFT.
===== Reconstruction =====

Reconstruction is the counterpart to sampling.
When reconstructing, we want \(x(t)=x[k],t=k\), but this isn't enough to determine \(x(t)\).
In the time domain, we make the assumptions \(x(t)=0,t\notin\mathbb{Z}\) and \(X(\Omega)=0\) when \(|\Omega|\) is large.
The second assumption is that there is no high frequency components.
In the frequency domain we make the same assumptions as time, substituting time for frequency.
That is \(X(\omega)=0,\omega\frac{N}{2\pi}\notin\mathbb{Z}\) and \(x(t)=0\) when \(|t|\) is large.

When sampling for the DTFT to the DFT we chose \(X(2\pi w/N)=0\) when \(w\) is not an integer.
This results in the IDTFT being 0 as integrals are blind to isolated points.
We can rectify this with Dirac delta functions, in order to receive an explicit sinusoid.
In time domain, this will result in a periodic signal being reconstructed, as the frequency content only exists at certain points.
\[x[k]=x[k+N]\]
In frequency domain, reconstructing the DTFT from the DFT results in the transform being extended periodically outside \((0,2\pi)\).
\[X(\omega)=\frac{2\pi}{N}\sum_{w=0}^{N-1}X[w]\sum_{n=-\infty}^\infty\delta\left(\omega+2\pi n-\frac{2\pi w}{N}\right)\]

From the DTFT to the FT, we invert the time sampling operation, assuming the original signal is band limited.
In frequency domain, this equates to:
\[X_{FT}(\Omega)=\begin{cases}X_{DTFT}(\Omega),&|\Omega|\leq\pi\\0,&\text{else}\end{cases}\]
In time domain, this corresponds to sinc interpolation:
\[x(t)=\sum_{k=-\infty}^\infty x[k]\text{sinc}(t-k)\]

From the DFT to the FS, we want to invert the time sampling operation, so we assume the original signal is band limited.
In frequency domain, this corresponds to:
\[c_n=\frac{1}{N}\begin{cases}X[n],&0\leq n<N\\0,&\text{else}\end{cases}\]
And in time domain, this corresponds to:
\[x(t)=\frac{1}{N}\sum_{w=0}^{N-1}X[w]e^{j2\pi wt/N}\]

From the FS to FT, we invert the frequency sampling operation, so we want \(x_{FS}(t)=x_{IFT}(t)\), utilising \(x_{FS}(t)\) periodicity.
In time domain this is:
\[x(t)=x(t+N)\]
And in frequency domain it is:
\[X(\Omega)=2\pi\sum_{n=-\infty}^\infty c_n\delta\left(\Omega-\frac{2\pi n}{N}\right)\]

In moving from DTFT to FT and DFT to FS, we interpolate time.
In moving from DFT to DTFT and FS to FT, we preserve the periodic time domain signal.
Going to the FT from DFT via the FS and DTFT will produce different results, as DTFT has a frequency content of \((-\pi,\pi)\) and FS has in \((0,2\pi)\).
This is due to the DFT not being centred on 0.
We use the DFT and FS for periodic signals, and FT and DTFT for aperiodic signals.
In the real world, periodic signals have infinite energy, so don't exist.
Still, periodic signals can be useful for intuition.
====== Pulse shaping ======

Signal processing is used extensively in communications.
Just like in signal processing, digital communications has advantages over analogue.
We also need to communicate multiple messages at once, requiring multiplexing.
This can be done with:
  * **Time domain multiplexing:** Alternating who is using the communication channel
  * **Frequency domain multiplexing:** Dividing the frequencies used into independent segments for each message
Frequency domain multiplexing allows the extraction of signals with band-pass filtering.
As such, we want each frequency to be as band-limited as possible, to send as many signals as possible.
A simple example is to use rectangular pulses to send the message, and the message can be reconstructed by sampling in the middle of a pulse.
Here the pulses do not overlap, but have sharp transitions so have high frequency components resulting in poor spectral efficiency.
A Gaussian pulse is better, reducing the high frequency components.
If we spread out a pulse in time such that it does not interfere with other time's sample points, we can reduce the pulse bandwidth while maintaining the same sampled signal.
An example pulse is a raised cosine pulse.
====== Round off error ======

Numbers are stored with a finite amount of memory.
As a result we are limited to a certain number of significant figures and sizes for the numbers.
This tends not to cause problems in multiplication but does have problems in the precision of addition/subtraction.
Implementing digital filters uses many additions, and can cause problems.
As a result, we use second order sections to represent the filter, resulting in less relative distance and a more accurate calculation.
====== Continuous and discrete time filters ======

We can convert a continuous time filter to a discrete time one with the forward Euler scheme and sampling at a given frequency.
\[\frac{dy}{dt}\approx\frac{y(t+\Delta t)-t(t)}{\Delta t}\]
The resultant discrete time filter is not the same as the continuous time one, it works over \(-\pi,\pi\) rather than \(-\infty,\infty\).
It does approximate at low frequencies and high sample rates though.
Nowadays computers are sufficiently powerful to search for filters, rather than having to approximate continuous time filters.
For one-dimensional filters, a computer normally finds a better filter than a person, but computers suffer from the curse of dimensionality as the search space becomes too large.
====== Spectrum analyser ======

A spectrum analyser decomposes a signal into its frequency components.
It doesn't wait and take a DTFT of the signal, as it strives for a real time breakdown so must operate on incomplete data.
Using small periods of time involves windowing, which can cause some odd effects when operating on data.
Hence choosing an appropriate window is important.

The DFT works well as a spectrum analyser for sample windows that are integer multiples of the signal period.
Generally we want only positive frequencies which can be displayed by shifting the negative ones.
Generally we want to display the energy, so we display \(|X(-\omega)|^2+|X(\omega)|^2\), which gives the same energy regardless of phase.
When the window length is not a multiple of the signal period, we get problems, introduced by the DFT implicitly periodically extending the signal.
This can result in multiple unwanted frequencies being present.

Taking the DFT of a windows signal results in using a signal multiplied by a vector rather than just the signal.
We generally want the window to roll off to zero at the two ends, removing any discontinuities from periodically extending a non-periodic signal.
There are a number of window functions used, including the Hamming window.
Windowing is equivalent to truncating the signal, so understanding how the windowed signal behaves can let us make conclusions about the overall signal.
Winding in the time domain is convolution in the frequency domain, spreading out the frequency signal representation.

The plot of the square of a DFT is the periodogram, and the windowed version is the windowed periodogram.
The choice of window is the trade off between the main lobe width and the side lobe amplitude.
The wider the main lobe, the more spreading, the larger the side lobes, the more spurious frequency components.
As a result, window design is fairly similar to filter design.
====== Fast filtering ======

Implementing a FIR filter involves repeatedly calculating a convolution of the input and filter impulse response.
This is computationally expensive.
Multiplication is much easier, so we want to transform the input to the frequency domain.
This requires the whole input and produces an infinite number of frequencies.
We can skirt this with the DFT, which we can do quickly with the Fast Fourier Transform.
A direct implementation of the DFT is \(O(N^2)\), whereas the FFT is \(O\left(\frac{N}{2}\log_2N\right)\), where \(N\) is the length of the transform.
Using the DFT to do multiplication instead of convolution produces circular convolution.
Circular convolution is the same as regular convolution, but replaces one of the inputs with a periodic copy.
We can turn circular convolution into regular convolution by augmenting the inputs with 0s.

For real time filtering, we need to filter in blocks.
We can do this by breaking the convolution into parts, and then adding the parts together appropriately shifted.
This is the overlap-add method, which involves breaking the input into blocks, using the FFT to convolute the each block with the impulse response, and then shift and add the convolutions together.
There is also the overlap-save method, which involves discarding the first \(M-1\) elements of the convolution instead of adding block by block convolutions.
This involves a sliding windows, using \(M-1\) zeros in the first window.