fitpack
Modern Fortran library for curve and surface fitting with splines
Loading...
Searching...
No Matches
Curve Fitting Theory

This page describes the mathematical foundations of the curve fitting algorithms in FITPACK: given data points \( (x_i, y_i) \) for \( i = 1, \ldots, m \), find a spline \( s(x) \) in the B-spline space \( S(k, \mathbf{t}) \) that represents the data well. The library supports several formulations of this problem, from simple least-squares with user-supplied knots to fully automatic smoothing with adaptive knot placement.

The Approximation Problem

Given \( m \) data points \( (x_i, y_i) \) with strictly increasing abscissae

\[ x_1 < x_2 < \cdots < x_m \]

and optional positive weights \( w_i \), the goal is to find a spline \( s \in S(k, \mathbf{t}) \) of degree \( k \) on a knot vector \( \mathbf{t} \) that approximates the data. Three common formulations arise:

  • Interpolation. Require \( s(x_i) = y_i \) for every data point. This determines the spline exactly when \( n = m \) (number of knots minus degree minus one equals number of data points). No smoothing is applied.
  • Least-squares with fixed knots. Given a prescribed knot vector, find the B-spline coefficients \( c_j \) that minimize the weighted residual sum of squares. The number and placement of knots are chosen by the user.
  • Smoothing. Let the algorithm choose the knot vector automatically, balancing closeness-of-fit against smoothness of the spline. This is FITPACK's primary mode of operation.
See also
Dierckx, Ch. 4–5 (pp. 49–103)

Least-Squares with Fixed Knots

Suppose a knot vector \( \mathbf{t} = (t_1, \ldots, t_{n+k+1}) \) is given. The spline has the B-spline representation

\[ s(x) = \sum_{j=1}^{n} c_j \, B_j(x; k, \mathbf{t}) \]

where \( n \) is the number of B-spline basis functions and \( c_j \) are the unknown coefficients. The least-squares problem is to minimize

\[ \sum_{i=1}^{m} w_i^2 \left( y_i - \sum_{j=1}^{n} c_j \, B_j(x_i) \right)^2 \]

over all coefficient vectors \( \mathbf{c} \in \mathbb{R}^n \).

Normal Equations and Banded Structure

Setting the gradient to zero yields the normal equations \( A^T W^2 A \, \mathbf{c} = A^T W^2 \mathbf{y} \), where \( A_{ij} = B_j(x_i) \) is the \( m \times n \) collocation matrix and \( W = \mathrm{diag}(w_i) \). Because each B-spline \( B_j \) is nonzero on at most \( k+1 \) knot spans, the matrix \( A \) has at most \( k+1 \) nonzero entries per row. The resulting normal equations matrix \( A^T W^2 A \) is symmetric, positive semi-definite, and banded with bandwidth \( k+1 \).

QR Factorization via Givens Rotations

Rather than forming and solving the normal equations directly (which squares the condition number), FITPACK solves the least-squares problem by computing a QR factorization of the weighted matrix \( W A \) using Givens rotations. Each data point \( (x_i, y_i) \) contributes one row to the system; Givens rotations annihilate entries below the diagonal one at a time, maintaining the banded structure throughout.

This approach has several advantages:

  • Numerical stability. The condition number of the triangular factor \( R \) equals \( \kappa(WA) \), not \( \kappa(WA)^2 \).
  • Efficiency. Each rotation touches only \( \mathcal{O}(k) \) elements, giving \( \mathcal{O}(m k^2) \) total work.
  • Incremental updates. Adding or removing knots requires only local updates to the factorization.

The back-substitution step solves the upper triangular system \( R \mathbf{c} = Q^T W \mathbf{y} \) for the coefficients.

See also
Dierckx, Ch. 4, §4.2 (pp. 53–62)

The Smoothing Problem

Instead of fixing the knot vector a priori, the smoothing formulation treats the number and positions of knots as unknowns. The problem is to find a spline \( s \in S(k, \mathbf{t}) \) that minimizes a roughness measure subject to a constraint on the residual:

\[ \min_{s \in S(k, \mathbf{t})} \int_{x_1}^{x_m} \left[ s''(x) \right]^2 \, dx \quad \text{subject to} \quad \sum_{i=1}^{m} w_i^2 \left( y_i - s(x_i) \right)^2 \leq S \]

Here \( S \geq 0 \) is the smoothing parameter that controls the trade-off:

  • ** \( S = 0 \)**: The constraint forces interpolation. The spline passes through every data point, and the roughness integral is minimized among all interpolating splines.
  • **Small \( S \)**: The spline stays close to the data but may oscillate between data points, requiring many knots.
  • **Large \( S \)**: The spline is very smooth (few knots) but may miss important features in the data.

For a given \( S \), the theoretical solution is a natural spline of degree \( 2k + 1 \) with knots at the data sites. FITPACK approximates this solution using a spline of degree \( k \) on a much smaller knot set, determined adaptively.

See also
Dierckx, Ch. 5, §5.2 (pp. 67–84)

Adaptive Knot Placement

FITPACK's curfit routine implements an iterative strategy to determine both the number and positions of knots:

  1. Initialize. Start with the minimal knot vector containing \( 2(k+1) \) knots: \( k+1 \) knots stacked at each end of the data range.
  2. Solve. Compute the least-squares spline for the current knot vector using Givens rotations. Let \( f_p = \sum w_i^2 (y_i - s(x_i))^2 \) be the weighted residual sum of squares.
  3. Test convergence. If \( f_p \leq S \), the smoothing constraint is satisfied and the algorithm terminates.
  4. Add knots. Identify the knot interval \( [t_j, t_{j+1}) \) with the largest contribution to the residual. Insert a new knot at the data abscissa in that interval where the local residual is greatest.
  5. Iterate. Return to step 2 with the augmented knot vector.
  6. Smoothing parameter update. Once enough knots are available, the algorithm uses rational interpolation (Dierckx's fprati) to refine the Lagrange multiplier \( p \) that connects the unconstrained minimization

    \[ \eta(p) = p \sum_{i=1}^{m} w_i^2 (y_i - s_p(x_i))^2 + (1 - p) \int [s_p''(x)]^2 \, dx \]

    to the constrained formulation. This avoids solving many full least-squares problems and typically converges in a few iterations.

The procedure is designed to produce a spline with as few knots as possible while satisfying the smoothing constraint \( f_p \leq S \).

See also
Dierckx, Ch. 5, §5.2.4 (pp. 78–84)

Choosing the Smoothing Factor

Selecting an appropriate value of \( S \) is the most important practical decision in spline smoothing. The following guidelines apply:

  • Known noise variance. If the data satisfy \( y_i = g(x_i) + \varepsilon_i \) where \( \varepsilon_i \) are independent errors with variance \( \sigma^2 \) and the weights are \( w_i = 1 \), the expected value of the residual sum of squares for the true function is \( m \sigma^2 \). A natural choice is

    \[ S \approx m \, \sigma^2 \]

    More generally, for non-uniform weights, \( S \approx \sum w_i^2 \sigma_i^2 \).

  • Unit weights, well-scaled data. When \( w_i = 1 \) and no noise estimate is available, \( S = m \) is a reasonable starting point.
  • **Over-fitting ( \( S \) too small).** The spline chases noise in the data, introducing spurious oscillations and using too many knots. The number of knots grows until the maximum is reached, and the fit may become numerically ill-conditioned.
  • **Under-fitting ( \( S \) too large).** The spline is too smooth and fails to capture genuine features. The knot count stays near the minimum \( 2(k+1) \), and the residuals show systematic patterns.
  • Diagnostic. After fitting, inspect the mean squared error via curve\mse(). If the residual is much larger than the expected noise level, decrease \( S \); if the spline oscillates visibly, increase \( S \).

A practical workflow is to try a sequence of \( S \) values (e.g., on a logarithmic scale) and select the one that gives the best balance between smoothness and fidelity, or use cross-validation techniques.

See also
Dierckx, Ch. 5, §5.2.4 (pp. 78–84)

Periodic Splines

When the data represent a function that is periodic on \( [x_1, x_m] \) (for example, angular measurements as a function of time), the fitted spline should satisfy periodicity constraints:

\[ s^{(j)}(x_1) = s^{(j)}(x_m), \quad j = 0, 1, \ldots, k-1 \]

These \( k \) conditions ensure that the spline and its first \( k-1 \) derivatives match at the endpoints, producing a globally smooth periodic function.

In terms of the B-spline representation, periodicity is enforced by wrapping the knot vector and tying the first and last \( k \) coefficients:

\[ c_j = c_{n - k + j}, \quad j = 1, \ldots, k \]

This reduces the number of free coefficients from \( n \) to \( n - k \). The modified least-squares problem is solved with the same Givens rotation approach.

FITPACK's percur routine implements automatic-knot periodic curve fitting. The corresponding type is fitpack_periodic_curve.

See also
Dierckx, Ch. 6, §6.1 (pp. 105–114)

Parametric Curves

A parametric curve in \( \mathbb{R}^d \) is defined by \( d \) coordinate functions of a common parameter \( u \):

\[ \mathbf{x}(u) = \bigl( s_1(u), \, s_2(u), \, \ldots, \, s_d(u) \bigr) \]

Given data points \( \mathbf{x}_i \in \mathbb{R}^d \) for \( i = 1, \ldots, m \), the first step is to assign parameter values \( u_i \).

Chord-Length Parameterization

The default choice is cumulative chord length:

\[ u_1 = 0, \quad u_{i+1} = u_i + \| \mathbf{x}_{i+1} - \mathbf{x}_i \|, \quad i = 1, \ldots, m-1 \]

This ensures that parameter increments are proportional to the spacing between successive data points, preventing bunching in regions of high curvature.

Shared Knot Vector

All \( d \) component splines \( s_j(u) \) share the same knot vector \( \mathbf{t} \) and degree \( k \). The smoothing problem becomes: minimize

\[ \sum_{j=1}^{d} \int \left[ s_j''(u) \right]^2 \, du \quad \text{subject to} \quad \sum_{i=1}^{m} \sum_{j=1}^{d} w_i^2 \left( x_{ij} - s_j(u_i) \right)^2 \leq S \]

FITPACK's parcur routine handles this by solving \( d \) coupled least-squares problems with a common knot vector.

Closed Curves

For closed parametric curves, the spline must satisfy periodic boundary conditions in every component:

\[ s_j^{(l)}(u_1) = s_j^{(l)}(u_m), \quad l = 0, 1, \ldots, k-1, \quad j = 1, \ldots, d \]

This is implemented by clocur (type fitpack_closed_curve).

Endpoint Constraints

In some applications, the tangent vector or higher derivatives at the curve endpoints are prescribed. FITPACK's concur routine supports derivative constraints of the form:

\[ s_j^{(l)}(u_1) = \alpha_{jl}, \quad s_j^{(l)}(u_m) = \beta_{jl} \]

for selected orders \( l \) and components \( j \), implemented by fitpack_constrained_curve.

See also
Dierckx, Ch. 9 (pp. 199–228)

Convexity-Constrained Fitting

In many applications the fitted curve must be shape-preserving: convex data should produce a convex spline, concave data a concave spline. FITPACK supports local convexity constraints via the second derivative.

B-Spline Coefficient Conditions

A spline \( s(x) \) of degree \( k \geq 2 \) is convex on an interval if and only if \( s''(x) \geq 0 \) there. For B-splines, this translates into conditions on the second-order divided differences of the coefficients:

\[ \Delta^2 c_j = c_{j+2} - 2 c_{j+1} + c_j \geq 0 \]

for all \( j \) whose corresponding B-splines overlap the interval. Concavity requires \( \Delta^2 c_j \leq 0 \).

Per-Interval Convexity Flags

FITPACK allows the user to specify convexity constraints independently for each data interval via a flag vector \( v_i \):

Flag value Meaning
\( +1 \) The spline must be concave on \( [x_i, x_{i+1}] \)
\( -1 \) The spline must be convex on \( [x_i, x_{i+1}] \)
\( 0 \) No constraint (the spline is free)

Quadratic Programming Formulation

The constrained fitting problem becomes a quadratic program (QP): minimize the roughness integral (or weighted residual) subject to the linear inequality constraints on \( \Delta^2 c_j \) and the smoothing constraint \( f_p \leq S \). FITPACK solves this QP iteratively, combining the adaptive knot strategy with active-set management for the convexity constraints.

Two routines are provided:

  • concon (type fitpack_convex_curve): automatic knot placement with convexity constraints. The algorithm adds knots as in curfit but enforces the shape constraints at each step.
  • cocosp: fixed knot vector with convexity constraints. The user supplies the knots and the routine solves the constrained least-squares problem.
See also
Dierckx, Ch. 8, §8.3–8.4 (pp. 173–196)

Summary

Formulation Routine Type Key parameter
Automatic smoothing curfit fitpack_curve Smoothing factor \( S \)
Interpolation curfit fitpack_curve \( S = 0 \)
Fixed-knot least-squares curfit fitpack_curve User knot vector
Periodic smoothing percur fitpack_periodic_curve Smoothing factor \( S \)
Open parametric parcur fitpack_parametric_curve Smoothing factor \( S \)
Closed parametric clocur fitpack_closed_curve Smoothing factor \( S \)
Endpoint-constrained concur fitpack_constrained_curve Derivative values
Convex (auto knots) concon fitpack_convex_curve Convexity flags \( v_i \)
Convex (fixed knots) cocosp Convexity flags \( v_i \)
See also
B-Spline Foundations
Univariate Curve Fitting with fitpack_curve
Book Reference Index