Differentiation

Why differentiation is hard

Network identification by deconvolution (NID) comprises three numerically delicate stages: differentiation, deconvolution, and the Foster-to-Cauer transform. The first two are extremely sensitive to noise; if the derivative is biased or noisy, all subsequent steps degrade.

A recorded step response \(a(t)\) must be converted into its impulse response \(h(t)=\partial a/\partial t\). Finite-difference formulas amplify every morsel of measurement noise, while wide averaging windows suppress noise and small, rapid transients. This compromise is the bias–variance trade-off. In practice, measuring clean \(Z_\mathrm{th}\) curves greatly reduces the noise problem, but it is still necessary to balance the two extremes.

Logarithmic timeline effects

After converting to the logarithmic axis

(13)\[z = \ln t ,\]

samples are far apart at early times and crammed together later. Sparse regions suffer most from differentiation errors.

Simple differentiation schemes

Straight-line fit over a short window.
Second-order Savitzky–Golay filter over a longer window.

Both still require an informed choice of window length.

Optimal regression filtering

The implementation adopted here replaces ad-hoc windows by an adaptive LOWESS/SURE filter that simultaneously smooths the logarithmic step response and provides its derivative.

Basic idea

Model the samples

(14)\[x_i \;=\; s_i \;+\; w_i,\]

where \(s_i\) is the unknown true signal at \(z_i\) and \(w_i\) is white noise with variance \(\sigma^2\).
Local regression Around each target position \(z_i\) take the subset \(\mathbf{x}\) that falls inside a window of logarithmic length \(L\). Fit a low-order polynomial (first order performs best in practice) using tricubic weights

(15)\[\begin{split}w(\Delta z) \;=\; \begin{cases} \bigl(1 - |\Delta z|^3/L^3\bigr)^3, & |\Delta z| < L;\\[4pt] 0, & \text{otherwise}. \end{cases}\end{split}\]

Additional weights compensate for local sample density so that sparse early data are not undervalued.
Derivative The slope of the fitted line at \(z_i\) furnishes \(h(z_i)=\mathrm{d}a/\mathrm{d}z\).

Choosing the window length

For each location the statistical risk

(16)\[\mathcal{R}_i \;=\; \mathbb{E}\Bigl[(f_i(\mathbf{x})-s_i)^2\Bigr]\]

measures the mean-square error of the estimator \(f_i\). Because \(s_i\) is unknown, the risk is estimated with Stein’s unbiased risk estimate (SURE):

(17)\[\widehat{\mathcal{R}}_i \;=\; f_i^2 - 2f_i x_i + 2\sigma^2 \frac{\partial f_i}{\partial x_i} \;+\; \text{constant}.\]

The additive constant does not depend on the window, so minimising \(\widehat{\mathcal{R}}_i\) over \(L\) yields the optimal trade-off. A practical algorithm:

Define a reasonable interval \(L\in[L_{-},L_{+}]\).
Pick an initial \(L_0\) at the first grid point \(z'_0\), or find an optimal \(L_0\) by minimising \(\widehat{\mathcal{R}}_0\) over \(L\).
At each subsequent regularised position \(z'_i\)
- evaluate the risk for \(L-\Delta L,\;L,\;L+\Delta L\),
- move \(L\) towards the lowest-risk candidate.

This greedy update avoids erratic jumps and accelerates computation. Because a uniform \(z'\)-grid is created on the fly, the data are resampled, smoothed, and differentiated in a single pass—perfectly suited for the fast Fourier and Bayesian steps that follow.