DPLL — Digital Phase-Locked Loop
Overview
A fully synchronous, parameterizable Digital Phase-Locked Loop implemented in synthesizable Verilog. The design closes the carrier-recovery loop entirely in the digital domain using a fixed-point CORDIC engine; no floating-point arithmetic appears anywhere in the datapath. All angular quantities are in Q2.30 format throughout.
The DPLL simultaneously handles arbitrary carrier frequency offsets and static phase offsets across any initial phase from 0° to 180°. Lock is confirmed when both frequency and phase error criteria are satisfied simultaneously — a dual-lock detector that eliminates the class of false lock bugs where the integrator stabilises at the wrong frequency. Steady-state freq_adj error is below 1×10⁻⁷ rad/sample across all tested conditions, limited entirely by Q2.30 register quantisation. The acquisition range extends to 0.040 rad/sample, bounded by the ±0.1 rad/sample anti-windup clamp.
Architecture
Signal Flow
ref_i / ref_q (Q1.30, unit amplitude complex phasor)
│
▼
[ref_i_hold / ref_q_hold] latched at valid_in
│ stable during CORDIC compute (ITER+3 cycles)
│
▼ ┌──────────────────────────────────────┐
[cordic_nco] ◄── phase_inc │ phase_acc += phase_inc │
│ ◄── freq_adj │ + freq_adj │
│ ◄── phase_adj │ + phase_adj (one-shot) │
│ (delayed) │ CORDIC rotation → cos / sin │
│ └──────────────────────────────────────┘
│ nco_i / nco_q (Q1.14)
▼
[dpll_phase_det] ◄── ref_i_hold / ref_q_hold
│
│ cross = Im{ conj(nco) × ref } (Q2.30)
│ = (nco_i × ref_q − nco_q × ref_i) >> 14
▼
[dpll_loop_filter] PI filter
│
│ integrator += KI × cross → freq_adj (persistent)
│ phase_adj = KP × cross (one-shot)
│
│ freq_locked : |Δfreq_adj| < 0.001 rad/samp for 64 samples
│ phase_locked : |phase_err| < sin(5°) × 2³⁰ for 64 samples
│ locked : freq_locked AND phase_locked
▼
[adj_valid delayed 4 cycles] ──► phase_adj_en in cordic_nco
Numerically Controlled Oscillator — cordic_nco
The NCO maintains a 64-bit signed phase accumulator. A 33-bit accumulator would technically hold ±π in Q2.30, but TWO_PI = 6.75×10⁹ exceeds the signed 33-bit range (4.3×10⁹), making wrap arithmetic unsafe at the extremes. The 64-bit accumulator eliminates all overflow risk at negligible synthesis cost on modern FPGA targets.
Phase wrapping to (−π, π] is performed combinationally before each CORDIC invocation. The NCO then folds the wrapped phase further into (−π/2, π/2] to stay within the CORDIC convergence domain. A quadrant-negation flag is registered and applied to the output to recover correct cosine and sine for all four quadrants.
A design-critical choice: the CORDIC is launched on the next phase accumulator value, computed combinationally, rather than the registered previous value. Without this look-ahead, the NCO introduces a one-sample lag of phase_inc per clock. Over a long sequence under a nonzero frequency offset, this lag accumulates in the loop integrator as a constant residual error, driving the phase trajectory toward a fold boundary and producing false equilibria.
Phase adjustments from the loop filter are latched separately and consumed atomically on the next NCO enable pulse, preventing a race between the feedback write and the accumulator read.
The NCO outputs cos/sin in Q1.14 (right-shifted by 16 from the full Q1.30 CORDIC output). This truncation keeps all phase detector partial products within 64 bits without additional scaling.
Phase Detector — dpll_phase_det
Computes the imaginary part of conj(nco) × ref:
cross = (ref_q × nco_i − ref_i × nco_q) >> 14
A positive cross product means ref leads nco — the loop speeds up the NCO, closing the feedback.
The shift of 14 (not 16) is deliberate. ref is Q1.30 (amplitude = 2³⁰) and nco is Q1.14 (amplitude = 2¹⁴). Their product is Q2.44. Shifting right by 14 produces Q2.30, normalized so that cross = 2³⁰ when sin(error) = 1. Shifting by 16 instead gives Q2.28 — output 4× too small — which causes the integrator to settle at a false nonzero freq_adj rather than converging to zero. This was the hardest bug to diagnose: the loop appeared locked, locked asserted, but freq_adj was offset from the correct value by a constant proportional to frequency offset.
PD_MODE=0 (default): direct cross-product, 1-cycle latency. Correct sign for all |error| < 180°. Used for the free-running DPLL.
PD_MODE=1: full atan2 via CORDIC vectoring. Not recommended for free-running DPLL — CORDIC preconditioning negates both x and y when x < 0, making 0° and 180° indistinguishable and causing false lock at ±180°. Suitable for decision-directed loops (Costas) where symbol decisions resolve the ambiguity.
Loop Filter — dpll_loop_filter
Standard second-order PI structure:
integrator += KI × phase_err → freq_adj (eliminates steady-state freq error)
phase_adj = KP × phase_err (fast one-shot phase correction)
Both multiplications sign-extend their 32-bit operands to 64 bits before multiplying. Without this, Verilog sizes a 32×32 product to 32 bits, truncating the result before the right-shift — ki_term and kp_term are always approximately zero, and the integrator never moves. This was the first major bug encountered: the loop filter silently did nothing.
Anti-windup. The integrator is clamped to ±0.1 rad/sample (±107374182 in Q2.30). This limits maximum frequency correction to the expected acquisition range and prevents wild integrator excursions during cold start. The original ±π clamp allowed the integrator to wander far past the correct value and settle in a false equilibrium where the oscillating phase error averaged to zero over full phase rotations.
Dual lock detection. Two independent counters run simultaneously:
| Signal | Condition | Threshold |
|---|---|---|
freq_locked |
integrator has stabilised | |Δfreq_adj| < 0.001 rad/sample for 64 consecutive samples |
phase_locked |
phase error is small | |phase_err| < sin(5°) × 2³⁰ for 64 consecutive samples |
locked |
true synchronisation | freq_locked AND phase_locked |
This eliminates the false lock class where freq_locked fires because the integrator stops changing, while the NCO is actually running at the wrong frequency and the phase error oscillates through full cycles.
CORDIC Core — cordic_core
Iterative shift-add CORDIC with configurable width and iteration count. Supports rotation mode (compute cos/sin of a given angle) and vectoring mode (compute atan2 of a vector). Operates with ITER clock cycles per result, controlled by a start/busy/valid handshake.
Vectoring mode pre-conditions the input: if x_init is negative, both x and y are negated before iteration to fold the input into the right half-plane (CORDIC convergence domain). The atan lookup table holds atan(2⁻ⁱ) for i = 0..ITER−1 in Q2.30, auto-generated alongside cordic_params.vh and cordic_consts.vh.
Top Level — dpll
Wires all submodules together and handles two timing concerns:
ref_i_hold: The reference phasor is latched at valid_in and held stable. The NCO takes ITER + 3 cycles to produce output — without the hold register, the phase detector would compare the wrong reference sample against the NCO output.
adj_valid 4-cycle delay. adj_valid fires 2 cycles after nco_valid. In back-to-back operation, the next valid_in arrives immediately after nco_valid. Without the delay, adj_valid and en arrive at the NCO simultaneously — the phase_adj latch guard discards the correction silently every single sample. Delaying adj_valid by 4 cycles (negligible relative to ITER=16) ensures phase_adj is safely latched before the next NCO computation begins.
freq_adj is not delayed — it is a persistent wire applied every NCO cycle unconditionally. Only phase_adj (one-shot proportional correction) needs the delayed enable.
Fixed-Point Format Summary
| Signal | Format | Max Value | Notes |
|---|---|---|---|
ref_i, ref_q |
Q1.30 | ±1.0 | Unit amplitude input phasor |
phase_inc, freq_adj, phase_adj |
Q2.30 | ±π | Angular quantities, rad/sample |
nco_i, nco_q |
Q1.14 | ±1.0 | After >> 16 from CORDIC Q1.30 output |
phase_err |
Q2.30 | ±1.0 | sin(error), normalized by MUL_SH=14 |
| Integrator | 64-bit | ±0.1 × 2³⁰ | Anti-windup clamped |
| CORDIC internal | Q1.30 | — | Matches WIDTH=32, ITER=16 |
Parameters
| Parameter | Default | Float Equiv. | Description |
|---|---|---|---|
WIDTH |
32 | — | Datapath width, set via cordic_params.vh |
ITER |
16 | — | CORDIC iterations (precision vs. latency) |
PD_MODE |
0 | — | 0 = cross-product PD, 1 = atan2 PD |
KP |
15182709 | 0.014 | Proportional gain: 2ζωn × 2³⁰ |
KI |
107374 | 0.0001 | Integral gain: ωn² × 2³⁰ |
LOCK_COUNT |
64 | — | Consecutive samples required for lock |
Gains are derived from standard second-order PLL design with natural frequency ωn = 0.01 rad/sample and damping ratio ζ = 0.707:
KP = 2 × ζ × ωn = 0.014 → 15182709 in Q2.30
KI = ωn² = 0.0001 → 107374 in Q2.30
Directed Tests
Five test vectors, each running 2000 samples. Pass criterion: locked == true AND |freq_adj − (ref_freq − nom_freq)| < 0.0001 rad/sample.
| Test | Nom Freq | Ref Freq | Init Phase | Expected freq_adj |
Lock Sample | freq_adj Error |
|---|---|---|---|---|---|---|
| Ideal | 0.2 | 0.2 | 0.0 rad | 0.000000 | 149 | 8.48×10⁻⁸ |
| Phase +0.5 rad | 0.2 | 0.2 | +0.5 rad | 0.000000 | 78 | 4.38×10⁻⁸ |
| Freq +0.005 | 0.2 | 0.205 | 0.0 rad | +0.005000 | 437 | 1.03×10⁻⁷ |
| Freq +0.015 | 0.2 | 0.215 | 0.0 rad | +0.015000 | 478 | 1.60×10⁻⁷ |
| Combined +0.003/+0.3 | 0.2 | 0.203 | +0.3 rad | +0.003000 | 350 | 4.23×10⁻⁸ |
============================================================
DPLL Testbench WIDTH=32 ITER=16
KP=0.014 KI=0.0001 LOCK_COUNT=64
Pass: locked && |freq_adj_err| < 0.0001 rad/s
============================================================
[Ideal ] locked=True fl=True pl=True lock_samp= 149 freq_adj=-0.000000 exp=+0.000000 err=8.48e-08 PASS
[Phase +0.5 rad ] locked=True fl=True pl=True lock_samp= 78 freq_adj=-0.000000 exp=+0.000000 err=4.38e-08 PASS
[Freq +0.005 ] locked=True fl=True pl=True lock_samp= 437 freq_adj=+0.005000 exp=+0.005000 err=1.03e-07 PASS
[Freq +0.015 ] locked=True fl=True pl=True lock_samp= 478 freq_adj=+0.015000 exp=+0.015000 err=1.60e-07 PASS
[Combined +0.003/0.3 ] locked=True fl=True pl=True lock_samp= 350 freq_adj=+0.003000 exp=+0.003000 err=4.23e-08 PASS
Directed: 5 PASS 0 FAIL
The sub-nanosecond freq_adj errors are Q2.30 register quantisation residuals — the loop dithers around the exact value within ±100 LSBs (1 LSB = 9.3×10⁻¹⁰ rad/sample). Not a real tracking error.
The Phase +0.5 rad test locks faster than the Ideal test (78 vs 149 samples) because a large initial phase error drives stronger proportional correction during acquisition, pulling freq_adj to zero faster despite the zero frequency offset.
Convergence Transients

All five directed tests shown simultaneously. Each freq_adj trace converges monotonically to its target (dotted line). The Ideal and Phase-offset traces converge to zero — the frequency error is zero so the integrator settles at zero. The three frequency-offset traces converge to their respective offsets: +0.005, +0.015, +0.003 rad/sample.
The lock status panel shows locked asserting after the integrator has settled. The Phase +0.5 rad test locks earliest (sample 78) despite a large initial phase excursion, because the proportional path drives immediate frequency correction. The Freq +0.015 case locks latest (sample 478) as the integrator needs the most time to accumulate the large frequency correction.
Step Response

Single test at foff = +0.010 rad/sample, high-resolution transient. freq_adj starts at zero and integrates upward, overshooting slightly before settling at the target of 0.010. The gray dashed line marks sample 724 where freq_adj enters and stays within ±0.0001 of target for 50 consecutive samples.
The overshoot-free shape is characteristic of a critically damped second-order loop (ζ = 0.707). A lower damping ratio would show a faster rise with ringing; a higher ratio would show a slower, smoother approach. The lock signal asserts shortly after settling, once the dual-lock threshold is satisfied.
Frequency Acquisition Range

Acquisition range sweep...
foff=+0.000 → +0.040 rad/sample: locked=True, err < 1e-4 (all 21 points OK)
foff=+0.042: err=1.95e-4 locked=True FAIL (just above threshold)
foff=+0.044 → +0.100: locked=False FAIL
→ Max acquisition range: 0.040 rad/sample
The left panel shows clean acquisition through foff = 0.040 rad/sample (green bars). Failure begins at 0.042 where the residual error crosses the 1×10⁻⁴ pass threshold. The right panel shows the steady-state freq_adj error in log scale — note the small spikes at 0.028 and 0.034–0.038 which are quantisation noise near the anti-windup boundary, not real failures.
The pull-in boundary at 0.040 rad/sample is directly explained by the anti-windup clamp: INT_MAX = 0.1 rad/sample, and for a second-order loop with the chosen gains the practical acquisition range is approximately INT_MAX × 0.4. Increasing INT_MAX widens the acquisition range proportionally, at the cost of allowing larger integrator excursions during cold start.
Lock Time

Lock time vs freq offset:
foff=+0.000 lock_sample=149
foff=+0.002 lock_sample=409
foff=+0.004 lock_sample=433
foff=+0.006 lock_sample=447
foff=+0.008 lock_sample=457
foff=+0.010 lock_sample=460
foff=+0.012 lock_sample=470
foff=+0.014 lock_sample=476
foff=+0.016 lock_sample=480
foff=+0.018 lock_sample=485
foff=+0.020 lock_sample=487
foff=+0.022 lock_sample=495
foff=+0.024 lock_sample=781
Lock time distribution (50 seeds, foff=0.005):
mean=506.7 min=233 max=810 samples
The left panel shows lock time growing monotonically with frequency offset — the integrator must accumulate more correction before settling. The jump from 495 to 781 samples at foff=0.024 occurs because this offset is close to the anti-windup boundary (0.040/2 ≈ 0.020 practical saturation for the integrator path), causing a longer transient before the integrator settles.
The right panel shows the 50-seed distribution at foff=0.005. The wide spread (min=233, max=810) is caused by initial phase — a phase that creates a large initial proportional correction can accelerate or retard convergence significantly. Unlike the QAM Costas loop where lock time is deterministic at exactly 32 symbols, the DPLL lock time is variable because initial phase is a free parameter. The mean of 507 samples is consistent with the directed-test result of 437 samples at the same offset (the seeds span all initial phases, some of which are unfavorable).
Phase Offset Tolerance

Phase offset tolerance sweep (0 to 180°):
25 / 25 points pass (all initial phases, zero freq offset)
Max freq_adj residual: 8.48e-8 rad/sample (quantisation only)
The left panel shows the DPLL acquires correctly for every initial phase from 0° to 180° — including exactly 180°, which is the ambiguous point that causes false lock in PD_MODE=1 (atan2). This confirms the cross-product phase detector has no false lock anywhere in the full ±180° range.
The right panel shows the residual freq_adj error after convergence. All values are in the range 10⁻⁸–10⁻⁷ rad/sample, which is pure Q2.30 quantisation noise. There is no systematic trend with phase offset — the residual is truly a quantisation floor, not a phase-dependent bias.
Steady-State Frequency Accuracy

Steady-state accuracy sweep (0 to 0.025, step 0.001):
Max error : 9.93e-8 rad/sample
Mean error : 6.30e-8 rad/sample
All 26 points within 1e-4: True
freq_adj converges to the exact expected value across the full operating range. The error floor of ~10⁻⁷ rad/sample is set by the Q2.30 register resolution — 1 LSB = 9.3×10⁻¹⁰ rad/sample, so the observed residual of ~70 LSBs represents the loop dithering around the nearest representable value. The green shaded region shows all points well inside the 1×10⁻⁴ pass threshold, with 3 orders of magnitude of margin.
Known Bugs Found During Development
These are documented because each represents a recurring class of fixed-point RTL pitfall.
| # | Module | Bug | Symptom | Fix |
|---|---|---|---|---|
| 1 | dpll_phase_det |
Cross product sign inverted: p_iq − p_qi instead of p_qi − p_iq |
Loop diverged on startup | Flip operands: cross = p_qi − p_iq |
| 2 | dpll_loop_filter |
32×32 multiply result sized to 32 bits by Verilog before shift |
ki_term always ≈ 0; integrator never moved; loop appeared stuck |
Sign-extend both operands to 64 bits before multiplying |
| 3 | dpll_phase_det |
MUL_SH=16 gave Q2.28 output instead of Q2.30 |
Integrator settled at constant nonzero freq_adj; locked asserted but wrong frequency |
Change MUL_SH = NCO_SH − 2 = 14 |
| 4 | dpll |
adj_valid and en arrived at NCO simultaneously |
Every phase_adj correction silently dropped by latch guard |
Delay adj_valid by 4 cycles before phase_adj_en |
| 5 | dpll / gains |
KP=0.1, KI=0.005 — 10× too large | Loop overshot correct freq_adj, settled in false equilibrium; locked asserted at wrong frequency |
Reduce to KP=0.014, KI=0.0001 (from ωn=0.01, ζ=0.707) |
| 6 | dpll_loop_filter |
Anti-windup clamped at ±π | Integrator wandered into false equilibria during large offsets | Tighten clamp to ±0.1 rad/sample |
Bug #3 was the most insidious: the symptom (lock at wrong frequency) was identical to the symptom of Bug #5, and both occurred simultaneously in the same run. Distinguishing them required isolating the phase detector output scaling from the gain values — confirmed by computing that the cross-product at sin(err)=1 should equal 2³⁰ for a Q2.30-normalized detector, and verifying it was actually 2²⁸ with MUL_SH=16.
Verification Summary
============================================================
SUMMARY
============================================================
Directed Tests:
Test Locked fl pl LkSmp freq_adj Expected Err Result
Ideal True True True 149 -0.000000 +0.000000 8.48e-08 PASS
Phase +0.5 rad True True True 78 -0.000000 +0.000000 4.38e-08 PASS
Freq +0.005 True True True 437 +0.005000 +0.005000 1.03e-07 PASS
Freq +0.015 True True True 478 +0.015000 +0.015000 1.60e-07 PASS
Combined +0.003/0.3 True True True 350 +0.003000 +0.003000 4.23e-08 PASS
Acquisition range : 0.040 rad/sample
Step response settled : sample 724 (foff = +0.010)
Phase tolerance : 0°–180° (25/25 points pass)
Lock time (50 seeds) : mean=507 min=233 max=810 samples (foff=0.005)
Steady-state accuracy (0 to 0.025 rad/sample):
Max error : 9.93e-08 rad/sample
Mean error : 6.30e-08 rad/sample
All within 1e-4: True
| Metric | Value |
|---|---|
| Directed tests | 5 / 5 PASS |
| Acquisition range | 0.040 rad/sample |
| Step settling time | 724 samples (foff=+0.010) |
| Lock time — ideal | 149 samples |
| Lock time — foff=0.005, 50 seeds | mean=507, min=233, max=810 |
| Phase acquisition range | 0°–180° (no false lock anywhere) |
Steady-state freq_adj error |
< 1×10⁻⁷ rad/sample (quantisation floor) |
freq_adj accuracy at 0.025 |
9.93×10⁻⁸ rad/sample |
| Lock criterion | freq_locked AND phase_locked (dual) |
| Anti-windup range | ±0.1 rad/sample |