
Design & Formal Verification of Parameterizable Fixed-Point CORDIC IP






| Repository | Mummanajagadeesh/cordic-algorithm-verilog |
|---|---|
| Start Date | Dec 2024 |
Summary
| Item | Description |
|---|---|
| Type | Fully synthesizable fixed-point CORDIC Soft IP |
| Modes | 6 modes (Rotation/Vectoring × Linear/Circular/Hyperbolic) |
| Focus | Parameterization, numerical characterization, formal verification |
| Precision | ~\(10^{-5}\) RMS (@ 32-bit, 16 iterations) |
| Verification | SVA + SymbiYosys (Yices2 backend) |
Overview
The CORDIC (COordinate Rotation DIgital Computer) algorithm performs trigonometric, hyperbolic, and linear computations using only additions and bit-shifts — no multipliers anywhere in the datapath. This implementation is a fully parameterized, synthesizable Verilog soft IP covering all six canonical modes across three coordinate systems. It has been formally verified for structural and control invariants using SymbiYosys, numerically characterized across the full parameter space, and demonstrated as the computational engine inside two production-grade RTL integrations: a Digital PLL, 16-QAM demodulator and a Radix-2 DIT FFT/IFFT.
Core Features
- Shift–add iterative datapath (no multipliers)
- 6 operating modes across 3 coordinate systems
- Deterministic latency: \( \text{ITER} + 1 \) cycles
- Clean
start / busy / validhandshake - Fully parameterized width, iterations, angle precision, scaling
- Python auto-generation of LUTs and parameter headers
- FuseSoC packaged IP
- Formal verification of control and structural invariants
- Drop-in architectural variants (iterative, pipelined, SIMD, multi-issue)
- Demonstrated integration in QAM16 demodulator, DPLL and FFT/IFFT
Operating Modes
| Architecture ↓ / Algorithm → | Rotation Mode (Forward) | Vectoring Mode (Inverse / Reduction) |
|---|---|---|
| Linear | mult | div |
| Circular | sin, cos, tan | atan2, magnitude |
| Hyperbolic | sinh, cosh, tanh, exp | ln |
Algorithm: Iterative Update Equations
| Circular Rotation | Circular Vectoring | |
|---|---|---|
| Function | sin, cos, tan | magnitude, atan2 |
| \(x_{i+1}\) | \(x_i - d_i(y_i \gg i)\) | \(x_i + d_i(y_i \gg i)\) |
| \(y_{i+1}\) | \(y_i + d_i(x_i \gg i)\) | \(y_i - d_i(x_i \gg i)\) |
| \(z_{i+1}\) | \(z_i - d_i \tan^{-1}(2^{-i})\) | \(z_i + d_i \tan^{-1}(2^{-i})\) |
| \(d_i\) | \(\text{sgn}(z_i)\) | \(\text{sgn}(y_i)\) |
| Init | \(x_0 = K_\text{SCALE},\ y_0 = 0,\ z_0 = \theta\) | \(x_0 = x,\ y_0 = y,\ z_0 = 0\) |
| Converges to | \(x_N = \cos\theta,\ y_N = \sin\theta\) | \(x_N = K\sqrt{x^2+y^2},\ z_N = \text{atan2}(y,x)\) |
| Notes | \(\tan = y_N/x_N\). Input must satisfy \(|\theta| < \pi/2\); fold wider angles via quadrant-negation flags before entry. | Pre-condition: negate both \(x_0, y_0\) if \(x < 0\) to fold into right half-plane. \((x,y)=(0,0)\) is a singularity — must be guarded in system logic. |
| Linear Rotation | Linear Vectoring | |
|---|---|---|
| Function | mult | div |
| \(x_{i+1}\) | \(x_i\) (unchanged) | \(x_i\) (unchanged) |
| \(y_{i+1}\) | \(y_i + d_i(x_i \gg i)\) | \(y_i + d_i(x_i \gg i)\) |
| \(z_{i+1}\) | \(z_i - d_i \cdot 2^{-i}\) | \(z_i + d_i \cdot 2^{-i}\) |
| \(d_i\) | \(\text{sgn}(z_i)\) | \(-\text{sgn}(y_i)\) |
| Init | \(x_0 = a,\ y_0 = 0,\ z_0 = b\) | \(x_0 = b,\ y_0 = a,\ z_0 = 0\) |
| Converges to | \(y_N = a \cdot b\) | \(z_N = a / b\) |
| Notes | No gain compensation required. x is a constant multiplier throughout all iterations. | Output saturates at ±2.0 in Q2.30. Errors up to ~4.0 occur when the true quotient exceeds ±2.0. Requires prescaling or format extension for high-dynamic-range division. |
| Hyperbolic Rotation | Hyperbolic Vectoring | |
|---|---|---|
| Function | sinh, cosh, tanh, exp | ln |
| \(x_{i+1}\) | \(x_i + d_i(y_i \gg i)\) | \(x_i + d_i(y_i \gg i)\) |
| \(y_{i+1}\) | \(y_i + d_i(x_i \gg i)\) | \(y_i + d_i(x_i \gg i)\) |
| \(z_{i+1}\) | \(z_i - d_i \tanh^{-1}(2^{-i})\) | \(z_i + d_i \tanh^{-1}(2^{-i})\) |
| \(d_i\) | \(\text{sgn}(z_i)\) | \(-\text{sgn}(y_i)\) |
| Init | \(x_0 = K_\text{H},\ y_0 = 0,\ z_0 = \theta\) | \(x_0 = s+1,\ y_0 = s-1,\ z_0 = 0\) |
| Converges to | \(x_N = \cosh\theta,\ y_N = \sinh\theta\) | \(z_N = \tfrac{1}{2}\ln\tfrac{x_0+y_0}{x_0-y_0} = \ln(s)\) |
| Notes | \(\tanh = y_N/x_N\). \(\exp(\theta) = x_N + y_N\) (no extra hardware). Both require the repeat schedule below. | \(s > 0\) required; valid tested range 0.2–1.9. Apply factor-of-2 correction in wrapper. Same repeat schedule as rotation is mandatory. |
Hyperbolic repeat schedule — certain iterations must execute twice for guaranteed convergence. Indices follow \(k_{n+1} = 3k_n + 1\):
ITER=16: [1, 2, 3, 4, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 13, 14]
LUT Generation
The angle lookup tables must be regenerated whenever WIDTH, ITER, or ANG_FRAC change. The Python generator gen/gen_cordic_tables.py produces three auto-generated files that all downstream RTL \include`s:
cordic_atan_rom.v — LUT of \( \tan^{-1}(2^{-i}) \) for circular mode and \( \tanh^{-1}(2^{-i}) \) for hyperbolic mode, each entry in Q2.30:
atan_table[i] = int(round(math.atan(2**-i) * 2**FRAC_BITS))
atanh_table[i] = int(round(math.atanh(2**-i) * 2**FRAC_BITS))
cordic_params.vh — Defines WIDTH, ITER, CORDIC_K_SCALE, CORDIC_K_H_SCALE, and all bit-width macros. The gain compensation constants are:
K = 1.0
for i in range(ITER):
K *= math.sqrt(1.0 + 2**(-2*i))
K_SCALE = int(round((1.0 / K) * 2**FRAC_BITS)) # circular
K_H = 1.0
for i in repeat_schedule: # with repeat indices
K_H *= math.sqrt(1.0 - 2**(-2*i))
K_H_SCALE = int(round((1.0 / K_H) * 2**FRAC_BITS)) # hyperbolic
cordic_consts.vh — Defines HALF_PI, PI, TWO_PI, and SIN_COS_OUT_SHIFT in Q2.30. Never hardcode angle constants in RTL — always use this file.
Parameterization
| Parameter | Description |
|---|---|
WIDTH | Internal fixed-point datapath width |
ITER | Iteration depth |
ANG_FRAC | Angle fractional precision |
OUT_WIDTH | Output width |
SHIFT | Deterministic scaling/truncation control |
Trade-offs explored between accuracy, latency, and area.
Accuracy — Baseline Results (WIDTH=32, ITER=16, FRAC=30)
All figures below are at the recommended production configuration. Full parameter sweep results are in the linked subproject reports.
Circular Rotation — sin / cos
| Metric | sin | cos |
|---|---|---|
| Max Error | 6.9×10⁻⁵ | 7.2×10⁻⁵ |
| RMS Error | 3.9×10⁻⁵ | 3.9×10⁻⁵ |
OUT_WIDTH=16, OUT_SHIFT=16. Errors are symmetric and dominated by output quantization. Increasing ITER beyond 16 yields no improvement at 30-bit angle precision.
Circular Rotation — tan
| Metric | Value |
|---|---|
| Max Error | ~12.1 (near ±π/2) |
| RMS Error | ~2.46 |
Inherently ill-conditioned near ±π/2. Stable within \(|z| < 1.4\) rad; treat as best-effort outside that range.
Circular Vectoring — magnitude
| Metric | Value |
|---|---|
| Max Error | 5.6×10⁻⁵ |
| RMS Error | 2.6×10⁻⁵ |
MAG_OUT_WIDTH=16, MAG_OUT_SHIFT=16. Achieves lower error than rotation because no output gain correction is needed beyond K_SCALE.
Circular Vectoring — atan2
| Metric | Value |
|---|---|
| Max Error | ~3.31 rad (near origin) |
| RMS Error | ~0.49 rad |
Error is dominated entirely by the \((x,y) \to (0,0)\) singularity. All non-zero inputs produce correct phase. Zero-magnitude inputs must be explicitly guarded in system logic.
Linear Rotation — mult
| Metric | Value |
|---|---|
| Max Error | 2.7×10⁻⁵ |
| RMS Error | 9.2×10⁻⁶ |
Q2.30 × Q2.30 → Q2.30, well-conditioned across ±0.9 input range. Production-ready.
Linear Vectoring — div
| Metric | Value |
|---|---|
| Max Error | ~4.0 (saturated) |
| RMS Error | ~0.83 |
Saturates at ±2.0 due to Q2.30 representable range. Stable within range; requires prescaling for quotients outside ±2.0.
Hyperbolic Rotation — sinh / cosh
| Metric | sinh | cosh |
|---|---|---|
| Max Error | 1.1×10⁻⁴ | 7.9×10⁻⁵ |
| RMS Error | 5.7×10⁻⁵ | 4.3×10⁻⁵ |
OUT_WIDTH=16, OUT_SHIFT=16. Stable and monotonic across ±1.0 input range with correct repeat schedule.
Hyperbolic Rotation — tanh
| Metric | Value |
|---|---|
| Max Error | 0.762 |
| RMS Error | 0.183 |
Structurally limited — error does not improve with ITER or FRAC. Treat as best-effort.
Hyperbolic Rotation — exp
| Metric | Value |
|---|---|
| Max Error | 2.5×10⁻⁴ |
| RMS Error | 1.1×10⁻⁴ |
OUT_SHIFT=17. Stable across ±1.0 after correct repeat scheduling.
Hyperbolic Vectoring — ln
| Metric | Value |
|---|---|
| Max Error | 1.5×10⁻⁴ |
| RMS Error | 8.1×10⁻⁵ |
OUT_SHIFT=16. Valid input range 0.2–1.9. Repeat schedule mandatory — without it the series fails to converge.
Known Numerical Sensitivities
| Issue | Cause |
|---|---|
| Amplitude collapse | Underscaled outputs |
| Instability | Overscaled internal widths |
| Systematic bias | Incorrect gain compensation |
| Clipping / sign inversion | Mismatched output shifts |
tan(x) instability | Near \( \pm \pi/2 \) |
atan2 sensitivity | Near \( (0,0) \) |
| Hyperbolic divergence | Missing repeat schedule |
| WIDTH=40 failure | Mismatched scaling at extreme widths |
Wrapper Semantics
Wrappers provide gain compensation, deterministic scaling, format conversion, and mode-specific preconditioning. Correct scaling is mandatory — underscaling causes amplitude collapse; overscaling causes sign inversion and catastrophic distortion. RMS error saturates near ~0.7–1.2 in all failure regions (deterministic, not random).
| Function | Output Mapping | Gain Correction |
|---|---|---|
sin | final \( y \) | K_SCALE on \( x_0 \) |
cos | final \( x \) | K_SCALE on \( x_0 \) |
tan | ratio \( y/x \) | K_SCALE cancels |
magnitude | final \( x \) | K_SCALE on \( x_0 \) |
atan2 | accumulated \( z \) | none needed |
sinh | final \( y \) | K_H_SCALE on \( x_0 \) |
cosh | final \( x \) | K_H_SCALE on \( x_0 \) |
tanh | ratio \( y/x \) | K_H_SCALE cancels |
exp | \( x_N + y_N \) | K_H_SCALE on \( x_0 \) |
ln | accumulated \( z \) | factor-of-2 in wrapper |
Formal Verification
Scope: Core iteration engine and control FSM
Tools: SymbiYosys + Yices2
Assertions: SystemVerilog Assertions (parametric)
Proven Structural Properties
- Single-cycle
start - Non-overlapping transactions
- Bounded progress \( \le ITER + 1 \)
- Deadlock-free execution
- Input stability while
busy - Output range safety
Rotation Mode Properties
- Odd/even symmetry
\( \sin(-x) = -\sin(x) \)
\( \cos(-x) = \cos(x) \) - Local monotonicity near zero
- Sign consistency
Vectoring Mode Properties
- Magnitude symmetry across quadrants
- Correct zero-vector handling
- \( \text{atan2} \) odd symmetry
- \( \pi \)-rotation consistency
Note: Numerical precision is validated through simulation, not formal proof.
Parameterization Sweep Summary
Full parameter sensitivity reports (iteration sweeps, width sweeps, fraction-bit sweeps, output-scaling failure regions) are in the linked subproject docs for each of the six modes. Key cross-mode findings:
ITER=16is sufficient for all modes at 30-bit precision; beyond 16 yields no measurable improvementWIDTH≥32required for stable convergence;WIDTH=40with mismatched scaling causes catastrophic failure across all modesFRAC≥28required for sin/cos/magnitude below 1×10⁻⁴ RMS- Output scaling mismatches (wrong
OUT_SHIFT) produce deterministic catastrophic failure — RMS error saturates near ~0.7–1.2 — not graceful degradation
Drop-In CORDIC Core Families
Family A — Iterative / Control-Oriented
| Core ID | Description | Purpose |
|---|---|---|
| A1 | Iterative rolled | Minimum-area baseline |
| A2 | Iterative unrolled | Reduced latency |
| A3 | 2-stage micro-pipeline | Shorter critical path |
| A4 | 3-stage micro-pipeline | Higher Fmax |
Family B — Throughput / Feed-Forward
| Core ID | Description | Purpose |
|---|---|---|
| B1 | Fully unrolled | 1 result/cycle |
| B2 | 2-stage pipeline | Balanced throughput |
| B3 | Balanced 2-stage | Even stage distribution |
| B4 | 3-stage deep pipeline | High-frequency design |
Family C — SIMD / Spatial
| Core ID | Description | Purpose |
|---|---|---|
| CS | Scalar unrolled | Reference |
| CR2 | Replicated ×2 | Dual lane |
| CR4 | Replicated ×4 | Quad lane |
| CSR4 | Shared ROM quad | Area optimized |
| CVEC4 | Packed SIMD | Vector datapath |
| CVEC5 | Refined SIMD | Shared resource optimized |
Family D — Multi-Issue / Time-Interleaved
| Core ID | Description | Purpose |
|---|---|---|
| D2X2S | 2 issues × 2 lanes | Dual issue |
| D4X2S | 2 issues × 4 lanes | Wider SIMD |
| D4X2P | 2 issues × 4 lanes + pipeline | Improved Fmax |
| D4X4S | 4 issues × 4 lanes | High concurrency |
Implementations
The CORDIC core has been integrated into two complete RTL subsystems, each independently verified with dedicated Python verification suites. Both treat cordic_core as a black-box computational primitive — no CORDIC internals leak into the integration logic.
FFT / IFFT — Radix-2 DIT, N=8
A fully synchronous N=8 FFT implemented using cordic_core in rotation mode to compute twiddle factors entirely in hardware, requiring no ROM. The IFFT is a zero-logic wrapper using the identity:
IFFT(X) = conj( FFT( conj(X) ) ) / N
Conjugation is two wire negations — din_im negated on input, dout_im negated on output. No extra logic.
N/2 parallel CORDIC instances compute twiddle factors \( W_N^k = e^{-j2\pi k/N} \) on start. Angles outside \((-\pi/2, \pi/2)\) are folded at elaboration time via localparam. The butterfly datapath uses Q1.30 with >>1 guard shift per stage to prevent overflow. Full architecture, directed tests, and sweep results are in the linked docs.
Verified metrics (WIDTH=32, ITER=16, 200 Monte-Carlo trials):
| Metric | FFT | IFFT |
|---|---|---|
| SNR mean | 78.8 dB | 78.4 dB |
| SNR min | 74.3 dB | 75.4 dB |
| Max error | 8.3×10⁻⁵ | 1.0×10⁻⁴ |
| Mean error | 2.1×10⁻⁵ | 3.1×10⁻⁵ |
| SFDR (pure-tone) | 87.1 dBc | — |
| Phase error mean | 0.008° | 0.006° |
| Roundtrip SNR (FFT→IFFT) | — | 78.3 dB |
QAM16 Demodulator
A fully synchronous 16-QAM demodulator using two cordic_core instances — one in rotation mode driving the NCO for carrier synthesis, one in vectoring mode computing the exact atan2 phase error inside the Costas loop. No floating-point anywhere in the datapath. All angular quantities are Q2.30; sample-level IQ data is Q1.30 throughout. Full architecture description and verification results are in the linked docs.
Verified metrics:
| Metric | Value |
|---|---|
| Lock time | 32 symbols (deterministic, all conditions) |
| Acquisition range | ≥ 0.020 rad/sample |
| Pipeline delay | 1 symbol (constant) |
| Ideal EVM (CORDIC noise floor) | 4.50% |
| RTL BER floor (≥15 dB SNR) | ~2.6×10⁻⁴ |
| BER deviation from theory (9–13 dB) | < 1 dB |
Directed test results:
| Test | Freq Offset | Phase Offset | Noise σ | BER |
|---|---|---|---|---|
| Ideal | 0 | 0 | 0 | 0.00e+00 |
| Phase_Offset_0p3 | 0 | +0.3 rad | 0 | 1.53e−02 |
| Freq_Offset_0p005 | +0.005 rad/samp | 0 | 0 | 2.34e−02 |
| Combined | +0.003 rad/samp | +0.2 rad | 0.01 | 1.44e−02 |
DPLL — Digital Phase-Locked Loop
A fully synchronous second-order Digital Phase-Locked Loop using one cordic_core instance in rotation mode driving the NCO for carrier synthesis. The phase detector uses a direct cross-product (Im{ conj(nco) × ref }) rather than CORDIC vectoring — this avoids the ±180° false-lock ambiguity that CORDIC preconditioning introduces in free-running loops. All angular quantities are Q2.30; the phase accumulator is 64-bit to safely hold TWO_PI without overflow. Full architecture description, bug log, and verification results are in the linked docs.
Verified metrics (WIDTH=32, ITER=16, KP=0.014, KI=0.0001):
| Metric | Value |
|---|---|
| Acquisition range | 0.040 rad/sample |
| Step settling time | 724 samples (foff=+0.010) |
| Lock time — ideal | 149 samples |
| Lock time — foff=0.005, 50 seeds | mean=507, min=233, max=810 |
| Phase acquisition range | 0°–180° (no false lock anywhere) |
Steady-state freq_adj error | < 1×10⁻⁷ rad/sample (quantisation floor) |
| Lock criterion | freq_locked AND phase_locked (dual) |
Directed test results:
| Test | Nom Freq | Freq Offset | Phase Offset | freq_adj Error | Result |
|---|---|---|---|---|---|
| Ideal | 0.2 | 0 | 0 | 8.48×10⁻⁸ | PASS |
| Phase +0.5 rad | 0.2 | 0 | +0.5 rad | 4.38×10⁻⁸ | PASS |
| Freq +0.005 | 0.2 | +0.005 | 0 | 1.03×10⁻⁷ | PASS |
| Freq +0.015 | 0.2 | +0.015 | 0 | 1.60×10⁻⁷ | PASS |
| Combined +0.003/+0.3 | 0.2 | +0.003 | +0.3 rad | 4.23×10⁻⁸ | PASS |
Sigma-Delta ADC — Bandpass with CORDIC Downconversion
A fully synchronous bandpass oversampling ADC pipeline using cordic_core in rotation mode (via cordic_nco) and cordic_mixer for digital downconversion of the decimated IF output to baseband IQ. The modulator and decimator are standalone fixed-point blocks; the CORDIC chain is identical to the carrier recovery path in qam16_demod. No floating-point anywhere. All angular quantities are Q2.30; sample data is Q1.30 throughout.
The pipeline has three stages. The 2nd-order error-feedback modulator converts the IF input to a 1-bit PDM bitstream at the oversampled rate using two 64-bit integrators — the same sign-extended accumulator pattern as dpll_loop_filter. A 3rd-order CIC decimator (56-bit accumulators, supports OSR up to 256) filters and downsamples by OSR, producing a Q1.30 IF PCM sample. The cordic_nco and cordic_mixer then shift the IF PCM to baseband IQ — freq_adj can be driven directly from a DPLL or Costas loop output for closed-loop carrier recovery, making this the natural front-end for the QAM16 demodulator in an SDR chain.
IF input (Q1.30, OSR×fs)
│
▼
[sd_adc] 2nd-order SD modulator, LFSR dither, ±2³¹ anti-windup
│ 1-bit PDM at OSR×fs
▼
[sd_decimator] 3rd-order CIC, decimate by OSR, normalize >>>3×log₂(OSR)
│ IF PCM (Q1.30 at fs)
▼
[cordic_nco] ◄── phase_inc (carrier at PCM rate, Q2.30)
[cordic_mixer] complex multiply: pcm × conj(nco)
│ bb_i / bb_q (Q1.30 baseband IQ at fs)
▼
Output ──► qam16_demod / dpll (freq_adj feedback)
Full architecture description, bug log, and verification results are in the linked docs.
Verified metrics (WIDTH=32, ITER=16, OSR=64, DITHER_EN=1):
| Metric | Value |
|---|---|
| Bitstream density accuracy | < 1% error across −0.95 to +0.95 input range |
| PCM SNR (f = fs_pcm/8) | ~61 dB measured (DFT, short record) |
| Theoretical peak SNR (2nd order, OSR=64) | 85.2 dB |
| SNR slope vs OSR | ~15 dB/octave (2nd-order confirms) |
| Dither idle-tone suppression | > 20 dB |
| Overload recovery | < 128 PDM samples to density 0.5±0.08 |
| Bandpass combined IQ RMS (pure carrier) | ~0.258 (exp ~0.327 with CIC droop at fs_pcm/4) |
| CIC group delay | ~3.5 PCM samples (non-integer, phase-matched SNR impractical) |
| Anti-windup range | ±2³¹ (modulator integrators) |
Directed test results (tb_sd_adc.v, 8 tests):
| Test | Input | Metric | Expected | Result |
|---|---|---|---|---|
| DC sweep × 5 | −0.9 to +0.9 | density error | < 2% | PASS × 5 |
| Sine RMS | 0.5×sin, f=fs_pdm/512 | sum_sq/sample | 114939386 ±20% | PASS |
| Overload recovery | ±FS for 256 samples | density after recovery | 0.5 ±0.08 | PASS |
| Dither at DC=0 | x_in=0 | bitstream density | 0.3–0.7 | PASS |
Bandpass test results (tb_sd_bandpass.v, 3 tests):
| Test | Input | Combined IQ RMS | Window | Result |
|---|---|---|---|---|
| Pure carrier | 0.5×sin at fs_pdm/256 | 0.258 | 0.20–0.42 | PASS |
| Sideband tone | carrier + fs_pdm/1024 offset | 0.216 | 0.20–0.42 | PASS |
| Zero input | x_in=0 | 0.000 | < 0.02 | PASS |
Toolchain
| Tool | Role |
|---|---|
| Verilog | RTL design |
| SVA | Formal property specification |
| SymbiYosys | Formal verification engine |
| Yices2 | SMT backend |
| Icarus Verilog | Simulation |
| Python | LUT generation & parameter sweeps |
| FuseSoC | Packaging & integration |
