DPLL — Digital Phase-Locked Loop

Overview

A fully synchronous, parameterizable Digital Phase-Locked Loop implemented in synthesizable Verilog. The design closes the carrier-recovery loop entirely in the digital domain using a fixed-point CORDIC engine; no floating-point arithmetic appears anywhere in the datapath. All angular quantities are in Q2.30 format throughout.

The DPLL simultaneously handles arbitrary carrier frequency offsets and static phase offsets across any initial phase from 0° to 180°. Lock is confirmed when both frequency and phase error criteria are satisfied simultaneously — a dual-lock detector that eliminates the class of false lock bugs where the integrator stabilises at the wrong frequency. Steady-state freq_adj error is below 1×10⁻⁷ rad/sample across all tested conditions, limited entirely by Q2.30 register quantisation. The acquisition range extends to 0.040 rad/sample, bounded by the ±0.1 rad/sample anti-windup clamp.

Architecture

Signal Flow

ref_i / ref_q  (Q1.30, unit amplitude complex phasor)
        │
        ▼
  [ref_i_hold / ref_q_hold]   latched at valid_in
        │                     stable during CORDIC compute (ITER+3 cycles)
        │
        ▼                        ┌──────────────────────────────────────┐
  [cordic_nco] ◄── phase_inc     │  phase_acc  +=  phase_inc            │
        │       ◄── freq_adj     │               +  freq_adj            │
        │       ◄── phase_adj    │               +  phase_adj (one-shot) │
        │            (delayed)   │  CORDIC rotation  →  cos / sin        │
        │                        └──────────────────────────────────────┘
        │  nco_i / nco_q  (Q1.14)
        ▼
  [dpll_phase_det]  ◄── ref_i_hold / ref_q_hold
        │
        │  cross  =  Im{ conj(nco) × ref }  (Q2.30)
        │          =  (nco_i × ref_q  −  nco_q × ref_i)  >>  14
        ▼
  [dpll_loop_filter]   PI filter
        │
        │  integrator  +=  KI × cross        →  freq_adj   (persistent)
        │  phase_adj    =  KP × cross                       (one-shot)
        │
        │  freq_locked  :  |Δfreq_adj|  <  0.001 rad/samp  for 64 samples
        │  phase_locked :  |phase_err|  <  sin(5°) × 2³⁰   for 64 samples
        │  locked       :  freq_locked  AND  phase_locked
        ▼
  [adj_valid delayed 4 cycles]  ──►  phase_adj_en  in  cordic_nco

Numerically Controlled Oscillator — `cordic_nco`

The NCO maintains a 64-bit signed phase accumulator. A 33-bit accumulator would technically hold ±π in Q2.30, but TWO_PI = 6.75×10⁹ exceeds the signed 33-bit range (4.3×10⁹), making wrap arithmetic unsafe at the extremes. The 64-bit accumulator eliminates all overflow risk at negligible synthesis cost on modern FPGA targets.

Phase wrapping to (−π, π] is performed combinationally before each CORDIC invocation. The NCO then folds the wrapped phase further into (−π/2, π/2] to stay within the CORDIC convergence domain. A quadrant-negation flag is registered and applied to the output to recover correct cosine and sine for all four quadrants.

A design-critical choice: the CORDIC is launched on the next phase accumulator value, computed combinationally, rather than the registered previous value. Without this look-ahead, the NCO introduces a one-sample lag of phase_inc per clock. Over a long sequence under a nonzero frequency offset, this lag accumulates in the loop integrator as a constant residual error, driving the phase trajectory toward a fold boundary and producing false equilibria.

Phase adjustments from the loop filter are latched separately and consumed atomically on the next NCO enable pulse, preventing a race between the feedback write and the accumulator read.

The NCO outputs cos/sin in Q1.14 (right-shifted by 16 from the full Q1.30 CORDIC output). This truncation keeps all phase detector partial products within 64 bits without additional scaling.

Phase Detector — `dpll_phase_det`

Computes the imaginary part of conj(nco) × ref:

cross  =  (ref_q × nco_i  −  ref_i × nco_q)  >>  14

A positive cross product means ref leads nco — the loop speeds up the NCO, closing the feedback.

The shift of 14 (not 16) is deliberate. ref is Q1.30 (amplitude = 2³⁰) and nco is Q1.14 (amplitude = 2¹⁴). Their product is Q2.44. Shifting right by 14 produces Q2.30, normalized so that cross = 2³⁰ when sin(error) = 1. Shifting by 16 instead gives Q2.28 — output 4× too small — which causes the integrator to settle at a false nonzero freq_adj rather than converging to zero. This was the hardest bug to diagnose: the loop appeared locked, locked asserted, but freq_adj was offset from the correct value by a constant proportional to frequency offset.

PD_MODE=0 (default): direct cross-product, 1-cycle latency. Correct sign for all |error| < 180°. Used for the free-running DPLL.

PD_MODE=1: full atan2 via CORDIC vectoring. Not recommended for free-running DPLL — CORDIC preconditioning negates both x and y when x < 0, making 0° and 180° indistinguishable and causing false lock at ±180°. Suitable for decision-directed loops (Costas) where symbol decisions resolve the ambiguity.

Loop Filter — `dpll_loop_filter`

Standard second-order PI structure:

integrator  +=  KI × phase_err      →  freq_adj   (eliminates steady-state freq error)
phase_adj    =  KP × phase_err                     (fast one-shot phase correction)

Both multiplications sign-extend their 32-bit operands to 64 bits before multiplying. Without this, Verilog sizes a 32×32 product to 32 bits, truncating the result before the right-shift — ki_term and kp_term are always approximately zero, and the integrator never moves. This was the first major bug encountered: the loop filter silently did nothing.

Anti-windup. The integrator is clamped to ±0.1 rad/sample (±107374182 in Q2.30). This limits maximum frequency correction to the expected acquisition range and prevents wild integrator excursions during cold start. The original ±π clamp allowed the integrator to wander far past the correct value and settle in a false equilibrium where the oscillating phase error averaged to zero over full phase rotations.

Dual lock detection. Two independent counters run simultaneously:

Signal	Condition	Threshold
`freq_locked`	integrator has stabilised	`\|Δfreq_adj\| < 0.001 rad/sample` for 64 consecutive samples
`phase_locked`	phase error is small	`\|phase_err\| < sin(5°) × 2³⁰` for 64 consecutive samples
`locked`	true synchronisation	`freq_locked AND phase_locked`

This eliminates the false lock class where freq_locked fires because the integrator stops changing, while the NCO is actually running at the wrong frequency and the phase error oscillates through full cycles.

CORDIC Core — `cordic_core`

Iterative shift-add CORDIC with configurable width and iteration count. Supports rotation mode (compute cos/sin of a given angle) and vectoring mode (compute atan2 of a vector). Operates with ITER clock cycles per result, controlled by a start/busy/valid handshake.

Vectoring mode pre-conditions the input: if x_init is negative, both x and y are negated before iteration to fold the input into the right half-plane (CORDIC convergence domain). The atan lookup table holds atan(2⁻ⁱ) for i = 0..ITER−1 in Q2.30, auto-generated alongside cordic_params.vh and cordic_consts.vh.

Top Level — `dpll`

Wires all submodules together and handles two timing concerns:

ref_i_hold: The reference phasor is latched at valid_in and held stable. The NCO takes ITER + 3 cycles to produce output — without the hold register, the phase detector would compare the wrong reference sample against the NCO output.

adj_valid 4-cycle delay. adj_valid fires 2 cycles after nco_valid. In back-to-back operation, the next valid_in arrives immediately after nco_valid. Without the delay, adj_valid and en arrive at the NCO simultaneously — the phase_adj latch guard discards the correction silently every single sample. Delaying adj_valid by 4 cycles (negligible relative to ITER=16) ensures phase_adj is safely latched before the next NCO computation begins.

freq_adj is not delayed — it is a persistent wire applied every NCO cycle unconditionally. Only phase_adj (one-shot proportional correction) needs the delayed enable.

Fixed-Point Format Summary

Signal	Format	Max Value	Notes
`ref_i`, `ref_q`	Q1.30	±1.0	Unit amplitude input phasor
`phase_inc`, `freq_adj`, `phase_adj`	Q2.30	±π	Angular quantities, rad/sample
`nco_i`, `nco_q`	Q1.14	±1.0	After `>> 16` from CORDIC Q1.30 output
`phase_err`	Q2.30	±1.0	`sin(error)`, normalized by `MUL_SH=14`
Integrator	64-bit	±0.1 × 2³⁰	Anti-windup clamped
CORDIC internal	Q1.30	—	Matches WIDTH=32, ITER=16

Parameters

Parameter	Default	Float Equiv.	Description
`WIDTH`	32	—	Datapath width, set via `cordic_params.vh`
`ITER`	16	—	CORDIC iterations (precision vs. latency)
`PD_MODE`	0	—	0 = cross-product PD, 1 = atan2 PD
`KP`	15182709	0.014	Proportional gain: `2ζωn × 2³⁰`
`KI`	107374	0.0001	Integral gain: `ωn² × 2³⁰`
`LOCK_COUNT`	64	—	Consecutive samples required for lock

Gains are derived from standard second-order PLL design with natural frequency ωn = 0.01 rad/sample and damping ratio ζ = 0.707:

KP  =  2 × ζ × ωn  =  0.014   →  15182709  in Q2.30
KI  =  ωn²          =  0.0001  →  107374    in Q2.30

Directed Tests

Five test vectors, each running 2000 samples. Pass criterion: locked == true AND |freq_adj − (ref_freq − nom_freq)| < 0.0001 rad/sample.

Test	Nom Freq	Ref Freq	Init Phase	Expected `freq_adj`	Lock Sample	`freq_adj` Error
Ideal	0.2	0.2	0.0 rad	0.000000	149	8.48×10⁻⁸
Phase +0.5 rad	0.2	0.2	+0.5 rad	0.000000	78	4.38×10⁻⁸
Freq +0.005	0.2	0.205	0.0 rad	+0.005000	437	1.03×10⁻⁷
Freq +0.015	0.2	0.215	0.0 rad	+0.015000	478	1.60×10⁻⁷
Combined +0.003/+0.3	0.2	0.203	+0.3 rad	+0.003000	350	4.23×10⁻⁸

============================================================
  DPLL Testbench  WIDTH=32  ITER=16
  KP=0.014  KI=0.0001  LOCK_COUNT=64
  Pass: locked && |freq_adj_err| < 0.0001 rad/s
============================================================

  [Ideal                    ]  locked=True  fl=True  pl=True  lock_samp=  149  freq_adj=-0.000000  exp=+0.000000  err=8.48e-08  PASS
  [Phase +0.5 rad           ]  locked=True  fl=True  pl=True  lock_samp=   78  freq_adj=-0.000000  exp=+0.000000  err=4.38e-08  PASS
  [Freq +0.005              ]  locked=True  fl=True  pl=True  lock_samp=  437  freq_adj=+0.005000  exp=+0.005000  err=1.03e-07  PASS
  [Freq +0.015              ]  locked=True  fl=True  pl=True  lock_samp=  478  freq_adj=+0.015000  exp=+0.015000  err=1.60e-07  PASS
  [Combined +0.003/0.3      ]  locked=True  fl=True  pl=True  lock_samp=  350  freq_adj=+0.003000  exp=+0.003000  err=4.23e-08  PASS

  Directed: 5 PASS  0 FAIL

The sub-nanosecond freq_adj errors are Q2.30 register quantisation residuals — the loop dithers around the exact value within ±100 LSBs (1 LSB = 9.3×10⁻¹⁰ rad/sample). Not a real tracking error.

The Phase +0.5 rad test locks faster than the Ideal test (78 vs 149 samples) because a large initial phase error drives stronger proportional correction during acquisition, pulling freq_adj to zero faster despite the zero frequency offset.

Convergence Transients

Convergence

All five directed tests shown simultaneously. Each freq_adj trace converges monotonically to its target (dotted line). The Ideal and Phase-offset traces converge to zero — the frequency error is zero so the integrator settles at zero. The three frequency-offset traces converge to their respective offsets: +0.005, +0.015, +0.003 rad/sample.

The lock status panel shows locked asserting after the integrator has settled. The Phase +0.5 rad test locks earliest (sample 78) despite a large initial phase excursion, because the proportional path drives immediate frequency correction. The Freq +0.015 case locks latest (sample 478) as the integrator needs the most time to accumulate the large frequency correction.

Step Response

Single test at foff = +0.010 rad/sample, high-resolution transient. freq_adj starts at zero and integrates upward, overshooting slightly before settling at the target of 0.010. The gray dashed line marks sample 724 where freq_adj enters and stays within ±0.0001 of target for 50 consecutive samples.

The overshoot-free shape is characteristic of a critically damped second-order loop (ζ = 0.707). A lower damping ratio would show a faster rise with ringing; a higher ratio would show a slower, smoother approach. The lock signal asserts shortly after settling, once the dual-lock threshold is satisfied.

Frequency Acquisition Range

Acquisition Range

Acquisition range sweep...
  foff=+0.000 → +0.040 rad/sample:  locked=True, err < 1e-4  (all 21 points OK)
  foff=+0.042:  err=1.95e-4  locked=True   FAIL  (just above threshold)
  foff=+0.044 → +0.100:  locked=False  FAIL
  → Max acquisition range: 0.040 rad/sample

The left panel shows clean acquisition through foff = 0.040 rad/sample (green bars). Failure begins at 0.042 where the residual error crosses the 1×10⁻⁴ pass threshold. The right panel shows the steady-state freq_adj error in log scale — note the small spikes at 0.028 and 0.034–0.038 which are quantisation noise near the anti-windup boundary, not real failures.

The pull-in boundary at 0.040 rad/sample is directly explained by the anti-windup clamp: INT_MAX = 0.1 rad/sample, and for a second-order loop with the chosen gains the practical acquisition range is approximately INT_MAX × 0.4. Increasing INT_MAX widens the acquisition range proportionally, at the cost of allowing larger integrator excursions during cold start.

Lock Time

Lock time vs freq offset:
  foff=+0.000  lock_sample=149
  foff=+0.002  lock_sample=409
  foff=+0.004  lock_sample=433
  foff=+0.006  lock_sample=447
  foff=+0.008  lock_sample=457
  foff=+0.010  lock_sample=460
  foff=+0.012  lock_sample=470
  foff=+0.014  lock_sample=476
  foff=+0.016  lock_sample=480
  foff=+0.018  lock_sample=485
  foff=+0.020  lock_sample=487
  foff=+0.022  lock_sample=495
  foff=+0.024  lock_sample=781

Lock time distribution (50 seeds, foff=0.005):
  mean=506.7  min=233  max=810 samples

The left panel shows lock time growing monotonically with frequency offset — the integrator must accumulate more correction before settling. The jump from 495 to 781 samples at foff=0.024 occurs because this offset is close to the anti-windup boundary (0.040/2 ≈ 0.020 practical saturation for the integrator path), causing a longer transient before the integrator settles.

The right panel shows the 50-seed distribution at foff=0.005. The wide spread (min=233, max=810) is caused by initial phase — a phase that creates a large initial proportional correction can accelerate or retard convergence significantly. Unlike the QAM Costas loop where lock time is deterministic at exactly 32 symbols, the DPLL lock time is variable because initial phase is a free parameter. The mean of 507 samples is consistent with the directed-test result of 437 samples at the same offset (the seeds span all initial phases, some of which are unfavorable).

Phase Offset Tolerance

Phase Sweep

Phase offset tolerance sweep (0 to 180°):
  25 / 25 points pass  (all initial phases, zero freq offset)
  Max freq_adj residual: 8.48e-8 rad/sample  (quantisation only)

The left panel shows the DPLL acquires correctly for every initial phase from 0° to 180° — including exactly 180°, which is the ambiguous point that causes false lock in PD_MODE=1 (atan2). This confirms the cross-product phase detector has no false lock anywhere in the full ±180° range.

The right panel shows the residual freq_adj error after convergence. All values are in the range 10⁻⁸–10⁻⁷ rad/sample, which is pure Q2.30 quantisation noise. There is no systematic trend with phase offset — the residual is truly a quantisation floor, not a phase-dependent bias.

Steady-State Frequency Accuracy

Freq Accuracy

Steady-state accuracy sweep (0 to 0.025, step 0.001):
  Max error  : 9.93e-8 rad/sample
  Mean error : 6.30e-8 rad/sample
  All 26 points within 1e-4: True

freq_adj converges to the exact expected value across the full operating range. The error floor of ~10⁻⁷ rad/sample is set by the Q2.30 register resolution — 1 LSB = 9.3×10⁻¹⁰ rad/sample, so the observed residual of ~70 LSBs represents the loop dithering around the nearest representable value. The green shaded region shows all points well inside the 1×10⁻⁴ pass threshold, with 3 orders of magnitude of margin.

Known Bugs Found During Development

These are documented because each represents a recurring class of fixed-point RTL pitfall.

#	Module	Bug	Symptom	Fix
1	`dpll_phase_det`	Cross product sign inverted: `p_iq − p_qi` instead of `p_qi − p_iq`	Loop diverged on startup	Flip operands: `cross = p_qi − p_iq`
2	`dpll_loop_filter`	`32×32` multiply result sized to 32 bits by Verilog before shift	`ki_term` always ≈ 0; integrator never moved; loop appeared stuck	Sign-extend both operands to 64 bits before multiplying
3	`dpll_phase_det`	`MUL_SH=16` gave Q2.28 output instead of Q2.30	Integrator settled at constant nonzero `freq_adj`; `locked` asserted but wrong frequency	Change `MUL_SH = NCO_SH − 2 = 14`
4	`dpll`	`adj_valid` and `en` arrived at NCO simultaneously	Every `phase_adj` correction silently dropped by latch guard	Delay `adj_valid` by 4 cycles before `phase_adj_en`
5	`dpll` / gains	KP=0.1, KI=0.005 — 10× too large	Loop overshot correct `freq_adj`, settled in false equilibrium; `locked` asserted at wrong frequency	Reduce to KP=0.014, KI=0.0001 (from `ωn=0.01`, `ζ=0.707`)
6	`dpll_loop_filter`	Anti-windup clamped at ±π	Integrator wandered into false equilibria during large offsets	Tighten clamp to ±0.1 rad/sample

Bug #3 was the most insidious: the symptom (lock at wrong frequency) was identical to the symptom of Bug #5, and both occurred simultaneously in the same run. Distinguishing them required isolating the phase detector output scaling from the gain values — confirmed by computing that the cross-product at sin(err)=1 should equal 2³⁰ for a Q2.30-normalized detector, and verifying it was actually 2²⁸ with MUL_SH=16.

Verification Summary

============================================================
SUMMARY
============================================================

Directed Tests:
  Test                       Locked   fl     pl   LkSmp    freq_adj    Expected        Err  Result
  Ideal                        True  True   True     149  -0.000000  +0.000000   8.48e-08  PASS
  Phase +0.5 rad               True  True   True      78  -0.000000  +0.000000   4.38e-08  PASS
  Freq +0.005                  True  True   True     437  +0.005000  +0.005000   1.03e-07  PASS
  Freq +0.015                  True  True   True     478  +0.015000  +0.015000   1.60e-07  PASS
  Combined +0.003/0.3          True  True   True     350  +0.003000  +0.003000   4.23e-08  PASS

Acquisition range       : 0.040 rad/sample
Step response settled   : sample 724  (foff = +0.010)
Phase tolerance         : 0°–180° (25/25 points pass)
Lock time (50 seeds)    : mean=507  min=233  max=810 samples  (foff=0.005)

Steady-state accuracy (0 to 0.025 rad/sample):
  Max error  : 9.93e-08 rad/sample
  Mean error : 6.30e-08 rad/sample
  All within 1e-4: True

Metric	Value
Directed tests	5 / 5 PASS
Acquisition range	0.040 rad/sample
Step settling time	724 samples (foff=+0.010)
Lock time — ideal	149 samples
Lock time — foff=0.005, 50 seeds	mean=507, min=233, max=810
Phase acquisition range	0°–180° (no false lock anywhere)
Steady-state `freq_adj` error	< 1×10⁻⁷ rad/sample (quantisation floor)
`freq_adj` accuracy at 0.025	9.93×10⁻⁸ rad/sample
Lock criterion	`freq_locked AND phase_locked` (dual)
Anti-windup range	±0.1 rad/sample