STAT5040: Fourier Analysis of Time Series Ch8

Fall 2020

The Spectrum

Introduction

We focus on the mathematical aspect in previous chapters, e.g.,
- fitting sinusoids of known/unknown frequency, and
- the fast Fourier transform (FFT) algorithms.
We now discuss the statistical aspect with some real data sets.
- How to do spectral analysis of time series?
- How to deal with some practical issues such as change in scale?
- How to interpret the results from spectral analysis?
The data is available here.
- The link is not up-to-date in the book.
My code is available here.
- Modern libraries like xts are used.

Periodogram Analysis

Definitions recall

Discrete Fourier transform (Section 5.1): \[ d(f_j) = \frac{1}{n} \sum_{t=0}^{n-1} x_t \exp(-2 \pi i f_j t), \quad j=0,1,\ldots,n-1. \]
- Equivalent to fitting sinusoids (Section 4.1)
Periodogram (Section 6.1): \[ I(f) = n|d(f)|^2. \]
- An estimate of the spectrum/spectral density

Problem: change in scale

No sinusoids can match oscillations that grow in amplitude
- Possible cause: inflation?

Beveridge: “index of fluctuation”

Each value is divided by the average of 31 centered adjacent values
- Idea: smoothing

Motivation: unobserved components model

Proposed by Harvey (1989)
Bloomfield only considered the trend \(T_t\) and irregular component \(I_t\)
- Multiplicative model: \(x_t = T_t I_t\)
- Additive model: \(x_t = T_t +I_t\)
Interpretation for wheat price
- \(T_t\): driven by long-run economic forces such as inflation
- \(I_t\): caused by short-run effects such as changes in supply from year to year
- \(T_t\) is an unwanted complication in the analysis and approximated by 31-year moving average
- \(I_t\) is then estimated by the index of fluctuation
- Closely related to filtering in Section 7.2

Problem: spurious periodicities

Beveridge (1921) gave periodogram ordinates based on index of fluctuation
- It solved the change in scale problem in analyzing periodicity
- However, Slutsky pointed out that operations involving linear filtering might lead to spurious periodicities in 1927
- Intuitively, this means that transformation may distort the original periodicity
  - I try to replicate his estimates but it seems that additional cleanings are necessary (p.139)
  - As the original paper is not found online, I choose to discuss the idea only
Under multiplicative model, we can correct for the transfer function to mitigate spurious periodicities
- As also shown by Granger and Hughes (1971)
- For index of fluctuation, the transfer function is \(D_{31}(f)\), a Dirichlet kernel (Section 2.2)
- After correction, the peak changes as compared with Beveridge (1921)

(Part of) Beveridge’s periodogram

Logarithmic transformation

Alternatively, we can do logarithmic transformation under multiplicative model \[ \ln x_t = \ln T_t +\ln I_t \]
Interpretation
- \(\ln T_t\): the typical value of \(x_t\) in the neighborhood of \(t\)
- \(I_t\): a dimensionless quantity close to 1 so that \(\ln I_t\) fluctuates around 0
Vs index of fluctuation
- Nonsinusoidal behavior of a series introduce structure into its Fourier transform
- However, they are not revealed by the periodogram (Section 6.5)
- Logarithms can reduce this behavior such as spikiness of the peaks
  - Reasonable as we are interested in \(I_t\) but not the extreme behaviors

Logarithmic transformation

Data cleaning

Beveridge (1921) argued that the early part of the series was unreliable due to fewer sources
The later part was of a different nature due to economic changes in the 19th century
- I have not looked into wheat market history but it sounds reasonable
- We should also be aware of the underlying structure/data quality when we perform statistical analysis
After clearning, the periodogram of logarithms and index of fluctuation are similar

Data cleaning

Periodogram of logarithms (dots) and index of fluctuation (squares)

Analysis of segments: idea

Idea: if there is periodicity in a series, it should hold for segments of the same series
Beveridge (1921) also gave some terms from the periodogram of two halves of the series

Periodogram of logarithms of two halves (1545-1694=squares, 1695-1844=dots)

Analysis of segments: wheat price

The two periodograms have the same general shape
- Large at lowest frequencies and show a broad peak between 0.06 and 0.10 cycles per year
- Then show a gentle decline over the rest of the periodogram with small fluctuations
However, the fine structures are quite unrelated
- A local peak in one is just as likely to be matched by a local trough as a local peak in the other
These shows that the fine strucuture is not repeated from one segment to the next
- But the broad features show a statistical regularity or consistency across segments
  - I think these terms can be confusing in the literature
Thus the fine structure of these two periodograms is not characteristic of the series as a whole
- Broad features do not appear to vary in this way and may be characteristic of the whole series
- In contrast, the periodograms of the variable star data have similar fine structures

Analysis of segments: variable stars

Periodogram of two halves of the variable star series

Analysis of segments: conclusion

Bloomfield is trying to motivate the concept of spectrum here
- A raw periodogram may not be useful
  - For variable star, it is directly useful
  - For wheat price, the fine structure should be suppressed to focus on the broad behavior
- This suggests a smoothed periodogram
  - So that periodograms of segments may be regarded as the same underlying smooth curve
  - Such smooth curve is called the spectrum or spectral density of the series
  - Spectrum exists for many time series models (Chapter 9)

Spectrum Estimation

Bartlett’s method

Suggested by Bartlett (1948)

Split the series into \(k\) non-overlapping segments of similar length
For each segment, compute its periodogram
Average the result of the periodograms above for the \(k\) segments

Welch’s method: allow overlapping
- It can reduce noise in exchange for reducing the frequency resolution
- Often desirable in finite sample
Average of logarithms or logarithms of averages can both be used
- The latter is preferred since it puts more weight on larger values
- Small values are the most susceptible to perturbations such as leakage from other frequencies

Bartlett’s method

Average segment periodograms for the logarithms of the wheat prices, 1545-1844

Bartlett spectrum estimate

To understand the effect of Bartlett’s method, we try to find the “periodogram” that it is producing
For simplicity, Bloomfield assumed \(x_t\) are deviations around 0
- If they are not, we can do demean and arrive at a similar result
Now note that \[ \begin{aligned} I(f) &= n |d(f)|^2 = n \cdot d(f) \cdot \overline{d(f)} \\ &= \frac{1}{n} \sum_t \sum_{t'} x_t x_{t'} \exp\big\{ -2\pi i f(t-t') \big\}. \end{aligned} \]
As \(t-t' \in [-n+1, n-1] \cap \mathbb{Z}\), we can rewrite the above as \[ \begin{aligned} I(f) &= \frac{1}{n} \sum_{r=-n+1}^{n-1} \sum_{t-t'=r} x_t x_{t'} \exp(-2\pi i f r) \\ &= \frac{1}{n} \sum_{r=-n+1}^{n-1} \exp(-2\pi i f r) \sum_{t-t'=r} x_t x_{t'}. \end{aligned} \]

Bartlett spectrum estimate

We can see that the periodogram is itself a Fourier series with \(n^{-1} \sum_{t-t'=r} x_t x_{t'}\) as the coefficients
Some manipulation yields \[ I(f) = \sum_{|r|<n} c_r \exp(-2\pi i f r) \quad \textrm{with} \quad c_r = \left\{ \begin{array}{ll} n^{-1} \sum_{t=r}^{n-1} x_t x_{t-r}, & r \ge 0; \\ c_{-r}, & r < 0. \\ \end{array} \right. \]
- If you are familiar with time series, \(c_r\) is the sample autocovariance of \(\{x_t\}\) at lag \(r\)
  - Estimating the autocovariance structure is equivalent to estimating the spectrum
  - This also explains long run variance is the normalized spectrum at frequency 0
In light of the symmetry of the autocovariances, we can also write \[ I(f) = c_0 + \sum_{r=1}^{n-1} c_r \cos(2\pi f r). \]

Bartlett spectrum estimate

Suppose the data are divided into \(k\) non-overlapping segments of length \(m = n/k\)
- \(I_j(f)\): the periodogram of the \(j\)-th segment
- \(c_{j,r}\): the autocovariance of the \(j\)-th segment at lag \(r\)
The average of these periodogram is \[ \begin{aligned} \hat{s}(f) &= \frac{1}{k} \sum_{j=1}^k I_j(f) = \sum_{|r|<m} \left( \frac{1}{k} \sum_{j=1}^k c_{j,r} \right) \exp(-2\pi i f r) \\ &= \sum_{|r|<m} \left( 1 -\frac{|r|}{m} \right) \frac{1}{k(m-|r|)} \sum_{j=1}^k m c_{j,r} \exp(-2\pi i f r). \end{aligned} \]
Now \(\sum_{j=1}^k m c_{j,r}\) is like \(nc_r\), a sum of products of the form \(x_t x_{t+r}\)
- Except that the term is included only if \(x_t\) and \(x_{t+r}\) fall into the same segment
- So \(\{k(m-|r|)\}^{-1} \sum_{j=1}^k m c_{j,r}\) is the average of these products
- It makes sense to replace with \(nc_r/(n-|r|)\) (whole series version)

Bartlett spectrum estimate

Replacing \(\{k(m-|r|)\}^{-1} \sum_{j=1}^k m c_{j,r}\) with \(nc_r/(n-|r|)\), the modified version is \[ \hat{s}_{B}(f) = \sum_{|r|<m} \frac{1-|r|/m}{1-|r|/n} c_r \exp(-2\pi i f r) = \sum_{|r|<m} w_r c_r \exp(-2\pi i f r), \]
- where \(w_r = (1-|r|/m)/(1-|r|/n)\).
- \(\hat{s}_{B}(f)\) is known as the Bartlett spectrum estimate
- \(w_r\) is the weight at lag \(r\)
Vs periodogram
- All terms with \(|r| \ge m\) will be omitted
- The remaining terms are progressively reduced in magnitude by the weight \(w_r\)
- Both effects make \(\hat{s}_{B}(f)\) smoother than the periodogram
- Varying the truncation point \(m\) provides control over the degree of smoothness
  - \(m\) is also called the bandwidth
- However, \(\hat{s}_{B}(f)\) is not guaranteed to be positive unlike \(I(f)\)

Bartlett spectrum estimate

Bartlett spectrum estimates for the logarithms of the wheat prices, 1545-1844

Smoothing the periodogram

The Bartlett spectrum estimate motivates smoothing the periodogram with other data windows \(w_r\)
- A significant property is that \(w_r\) decays from 1 when \(r=0\) to 0 when \(r=\pm m\)
  - This is common but not necessary, at least for long run variance estimation
  - See, e.g., Andrews (1991)
- Anderson (1994) lists the most commonly used windows and their properties
- Bartlett window is the only window that decays linearly and rarely used in practice
  - I think Bartlett window is quite popular, at least for long run variance estimation as well
  - It is also computationally efficient for spectrum estimation. See Xiao and Wu (2011)
The smoothing strategy is attractive when the truncation point \(m\) is small
- Only \(m\) autocovariances need to be computed
- If \(m\) is large, the autocovariances can be computed efficiently using FFT and its inverse

Computing the autocovariances

From previous derivation (p.144), we know that the periodogram is itself a Fourier series
This raises the possibility of using FFT to obtain autocovariance estimates
When the periodogram is evaluated at a Fourier frequency \(f_j = j/n\), \[ I(f_j) = \sum_{r=0}^{n-1} (c_r +c_{n-r}) \exp(-2\pi i f_j r), \]
- provided that \(c_r\) is defined to be 0 for \(|r| \ge n\).
Therefore, we have \[ c_r +c_{n-r} = \frac{1}{n} \sum_{j=0}^{n-1} I(f_j) \exp(2\pi i f_j r), \]
- which may be computed using inverse FFT.
- However, \(c_{r}\) is not symmetric with \(c_{n-r}\)
- This computation is only fine for small \(r\) as giving approximations of \(c_r\)

Computing the autocovariances

The problem of \(c_r +c_{n-r}\) can be solved by using a finer grid \(f'_j=j/n'\) for some \(n' > n\)
We can pad the data with a block of \(n'-n\) zeros. Then \[ I(f'_j) = \sum_{r=0}^{n-1} (c_r +c_{n'-r}) \exp(-2\pi i f'_j r). \]
Inversion now gives \[ c_r +c_{n'-r} = \frac{1}{n'} \sum_{j=0}^{n'-1} I(f'_j) \exp(2\pi i f'_j r), \]
- which yields \(c_0, c_1, \ldots, c_{n'-n}\) exactly.
- If \(n'=2n-1\), all autocovariance estimates are obtained.
  - Note that \(I(f'_j)\) only sum to \(n-1\)
  - Probably why the zeros may not contaminate the estimates
- FFT costs \(O(n \log n)\) times, which is faster than brute force \(O(n^2)\)

Representation of a spectrum estiamte

Suppose a spectrum estimate is given by \(\hat{s}(f) = \sum_{|r|<n} w_r c_r \exp(-2\pi i f r)\),
- where a truncation point \(m\) is no longer assumed.
By the integral inversion formula (p.40), we have \(c_r = \int_0^1 I(f) \exp(2\pi i f r) df\).
- Then \(\hat{s}(f) = \int_0^1 W_n(f-f') I(f') df'\),
- where \(W_n(f) = \sum_{|r| < n} w_r \exp(-2 \pi i f r)\).
Therefore, any \(\hat{s}(f)\) of the above form may be written as an integral average of \(I(f)\)
- The \(W_n(f)\) is called the spectral window associated with the spectral estimate
  - There is one-to-one relationship between \(W_n(f)\) and \(w_r\)

Spectral window

The spectral window of the Bartlett estimate has no closed form
- The window is \(w_r = (1-|r|/m)/(1-|r|/n)\)
Modified Bartlett estimate
- The window is \(w_r = (1-|r|/m)\mathbb{I}_{|r|<m}\)
- The corresponding spectral window is \[ W_n(f) = \sum_{|r|<m} \left( 1 -\frac{|r|}{m} \right) \exp(-2\pi i f r) = m D_m(f)^2, \]
- where \(D_m(f)\) is the Dirichlet kernel (Section 2.2).
If the spectral window and periodogram are both nonnegative,
- then the spectrum estimate is guaranteed to be nonnegative.
  - Important in long run variance estimation

Spectral window

Spectral windows for the modified Bartlett spectrum estimates

Modified Bartlett spectrum estimate

The central peak in the spectral window of the modified Bartlett estimate is of height \(m\)
The first zeros on either side are at \(f=\pm 1/m\) cycles per unit time
However, a sizable proportion of the mass is contained in the sidelobes which decay slowly
The periodogram values at some distance from \(f\) may also contribute substantially to the integral
The estimated spectrum in one frequency may be swamped by leakage from another with high power
- Even when these bands are not adjacent
- Such leakage is different than the leakage in the periodogram itself
- The major source is the sidelobes in the smoothing spectral window
The sidelobes of the modified Bartlett window are larger and decay more slowly
- As compared with Anderson (1994)
- Thus it is rarely used
Note that sidelobes are bound to exist for any spectrum estimates with truncation point \(m\)

Another representation of a spectrum estiamte

From the representation \(\hat{s}(f) = \int_0^1 W_n(f-f') I(f') df'\), we have \[ \hat{s}(f) = \int_0^1 W(f') I(f-f') df', \]
- for suitable function \(W(f)\).
Furthermore, we can write \(w_r = \int_0^1 W(f) \exp(2\pi i f r) df\).
- Mathematically, \(w_r\) are the Fourier coefficients of \(W(f)\)
- \(W_n(f)\) is a partial sum of the Fourier series for \(W(f)\)
  - Recall \(W_n(f)\) is the spectral window
  - I understand this as constructing \(w_r\) reversely from the spectral window
Daniell (1946) suggested \(W(f) = (2\delta)^{-1} \mathbb{I}_{|f| < \delta}\).
- The resulting estimate is the integral analog of simple moving average filer (Section 7.2)
- It may be applied successively to build up more complex filter (p.157)
I will skip the remaining of Section 8.5 which discuss the computation of discrete \(\hat{s}(f)\)

Choice of a spectral window

Four factors need to be considered when choosing a spectral window:
- Resolution or bandwidth
- Stability
- Leakage
- Smoothness
Resolution: the ability of a spectrum estimate to represent fine structure in the frequency
- Such as narrow peaks in the spectrum
- A narrow peak in the periodogram is usually spread out into a broader peak in the spectrum
- This peak is roughly an image of the spectral window and its width is the bandwidth
- If the spectrum contains two close narrow peaks, they may overlap and form a single peak
- In this case, the estimate has failed to resolve the peaks
Stability: the extent to which estimates from different segments agree
- In other words, it is the ability to remove irrelevant fine structure
- Resolution and stability are conflicting requirements
  - Easy to see by definition
- Section 9.5 gives a statistical treatment of the stability of spectrum estimates

Choice of a spectral window

Leakage
- Caused by sidelobes in the spectral window
- Always exists if there is a nontrivial truncation point \(m < n\)
- The computationally simpler discrete spectral averages (Section 8.5) can avoid leakage entriely
  - The part that I skip
- First, use a data window to control leakage in the periodogram
- Second, use a spectral window of a desired compact form
Smoothness
- Less tangible but important in visualization
- The need for smoothness can introduce further conflict in choosing a window
- Bloomfield gave an example of Daniell estimate here
- Repeated smoothing is possible but yields a less stable estimate

Example: wheat price

Smoothed periodogram of logarithms of wheat price index, with spectral window inset (Modified Daniell filer, m=6.)

Example: wheat price

Smoothed periodogram of logarithms of wheat price index, with spectral window inset (Modified Daniell filer, m=6,12.)

Example: sunspot

For sunspot data, the spectral weights have the same span as the wheat price
- Which means they cover the same number of periodogram ordinates
- However, they are smoother and have a narrower peak
- The spectral estimates correspondingly show more rounded but slightly larger fluctuations
- This argument is made more precise in Chapter 9
For the square root transformation, refer to Section 6.7 for the idea
Further discussion of the choice of a spectral window available in Jenkins (1961) and Parzen (1961)
- I try to find a more recent survey but have not yet found one

Example: sunspot

Smoothed periodogram of yearly sunspot numbers (solid line) and their square roots (broken line), with spectral window inset (Modified Daniell filer, m=6,6,6.)

Reroughing the spectrum

Recall repeated smoothing may yield a less stable estimate
- This seems to be the case for the wheat price spectrum
- The trough between the two peaks at \(f=0\) and \(f=0.07\) is not as clear as it is before second smoothing
- Such loss of resolution suggests the possibility of oversmoothing
- Similar probelm appears in the sunspot spectra
- This motivates the idea of reroughing or twicing (Tukey, 1977)
In the context of linear filters, we have rough = input - output
Since spectra are nonnegative, we can instead define rough = input/output
To be specific, we can define the rough in spectrum estimation as \[ r(f) = \frac{I(f)}{\hat{s}(f)}. \]

Reroughing the spectrum

If \(\hat{s}(f)\) suffers from oversmoothing,
- there are narrow-band features in the periodogram that were not fully transferred to \(\hat{s}(f)\).
- They will appear partially in \(r(f)\), which can be extracted with another round of smoothing:

\[ \tilde{r}(f) = \sum_u g_u r(f-f_u). \] - The reroughted spectrum estimate is \(\hat{s}_r(f) = \tilde{r}(f) \hat{s}(f)\). - If the same filter is used in the second round as in the first, the process is called twicing

Example: wheat price

Twiced spectrum estiamte of wheat price index (solid line) and original esimtae (broken line) (Modified Daniell filer, m=6,12.)

Example: sunspot

Twiced spectrum estiamte of yearly sunspot numbers (solid line) and original esimtae (broken line) (Modified Daniell filer, m=6,6,6.)

Prewhitening

Reroughing is closely related to prewhitening (Blackman and Tukey, 1959)
Oversmoothing leads to leakage from frequency bands with high power to adjacent bands
- From this perspective, oversmoothing is caused by a large dynamic range in the spectrum
Prewhitening is a technique for reducing dynamic range prior to forming the periodogram
- It reduces the leakage and allows the use of a more stable estimate with lower resolution
- The simplest form of prewhitening is replacing the data by their first differences
  - I think differencing is also used as stationary transformation in practice
Vs reroughing
- Reroughing is an enhancement to spectrum estimation
- Prewhitening is a form of preprocessing

Concluding Remarks

Comments

Periodogram analysis
- Some practical problems and their solutions are covered
- The statistical regularity over different segments remains a concern
Spectrum estimation
- Smoothing the periodogram to focus on the broad behavior
- From Fourier transform form to autocovariance form
- Data windows and their corresponding spectral window
- Choice of spectral window
- Reroughing and prewhitening