Design & Formal Verification of Parameterizable Fixed-Point CORDIC IP

SystemVerilog Verilog Mircoarchitecture Pipelining Formal Verification IP Packaging

Full reports and documentation can be found here

Formal verification setup and parameter sensitivity reports for all 6 modes

QAM16 Demodulator

DPLL

FFT/IFFT

Sigma-Delta ADC - Bandpass IF Sampling with Downconversion

View Project

My First Tiny Tapeout: Designing and Hardening a CORDIC based DPLL

View Project

Repository	Mummanajagadeesh/cordic-algorithm-verilog
Start Date	Dec 2024

Summary

Item	Description
Type	Fully synthesizable fixed-point CORDIC Soft IP
Modes	6 modes (Rotation/Vectoring × Linear/Circular/Hyperbolic)
Focus	Parameterization, numerical characterization, formal verification
Precision	~\(10^{-5}\) RMS (@ 32-bit, 16 iterations)
Verification	SVA + SymbiYosys (Yices2 backend)

Overview

The CORDIC (COordinate Rotation DIgital Computer) algorithm performs trigonometric, hyperbolic, and linear computations using only additions and bit-shifts — no multipliers anywhere in the datapath. This implementation is a fully parameterized, synthesizable Verilog soft IP covering all six canonical modes across three coordinate systems. It has been formally verified for structural and control invariants using SymbiYosys, numerically characterized across the full parameter space, and demonstrated as the computational engine inside two production-grade RTL integrations: a Digital PLL, 16-QAM demodulator and a Radix-2 DIT FFT/IFFT.

Core Features

Shift–add iterative datapath (no multipliers)
6 operating modes across 3 coordinate systems
Deterministic latency: \( \text{ITER} + 1 \) cycles
Clean start / busy / valid handshake
Fully parameterized width, iterations, angle precision, scaling
Python auto-generation of LUTs and parameter headers
FuseSoC packaged IP
Formal verification of control and structural invariants
Drop-in architectural variants (iterative, pipelined, SIMD, multi-issue)
Demonstrated integration in QAM16 demodulator, DPLL and FFT/IFFT

Operating Modes

Architecture ↓ / Algorithm →	Rotation Mode (Forward)	Vectoring Mode (Inverse / Reduction)
Linear	`mult`	`div`
Circular	`sin`, `cos`, `tan`	`atan2`, `magnitude`
Hyperbolic	`sinh`, `cosh`, `tanh`, `exp`	`ln`

Algorithm: Iterative Update Equations

	Circular Rotation	Circular Vectoring
Function	`sin`, `cos`, `tan`	`magnitude`, `atan2`
\(x_{i+1}\)	\(x_i - d_i(y_i \gg i)\)	\(x_i + d_i(y_i \gg i)\)
\(y_{i+1}\)	\(y_i + d_i(x_i \gg i)\)	\(y_i - d_i(x_i \gg i)\)
\(z_{i+1}\)	\(z_i - d_i \tan^{-1}(2^{-i})\)	\(z_i + d_i \tan^{-1}(2^{-i})\)
\(d_i\)	\(\text{sgn}(z_i)\)	\(\text{sgn}(y_i)\)
Init	\(x_0 = K_\text{SCALE},\ y_0 = 0,\ z_0 = \theta\)	\(x_0 = x,\ y_0 = y,\ z_0 = 0\)
Converges to	\(x_N = \cos\theta,\ y_N = \sin\theta\)	\(x_N = K\sqrt{x^2+y^2},\ z_N = \text{atan2}(y,x)\)
Notes	\(\tan = y_N/x_N\). Input must satisfy \(\|\theta\| < \pi/2\); fold wider angles via quadrant-negation flags before entry.	Pre-condition: negate both \(x_0, y_0\) if \(x < 0\) to fold into right half-plane. \((x,y)=(0,0)\) is a singularity — must be guarded in system logic.

	Linear Rotation	Linear Vectoring
Function	`mult`	`div`
\(x_{i+1}\)	\(x_i\) (unchanged)	\(x_i\) (unchanged)
\(y_{i+1}\)	\(y_i + d_i(x_i \gg i)\)	\(y_i + d_i(x_i \gg i)\)
\(z_{i+1}\)	\(z_i - d_i \cdot 2^{-i}\)	\(z_i + d_i \cdot 2^{-i}\)
\(d_i\)	\(\text{sgn}(z_i)\)	\(-\text{sgn}(y_i)\)
Init	\(x_0 = a,\ y_0 = 0,\ z_0 = b\)	\(x_0 = b,\ y_0 = a,\ z_0 = 0\)
Converges to	\(y_N = a \cdot b\)	\(z_N = a / b\)
Notes	No gain compensation required. x is a constant multiplier throughout all iterations.	Output saturates at ±2.0 in Q2.30. Errors up to ~4.0 occur when the true quotient exceeds ±2.0. Requires prescaling or format extension for high-dynamic-range division.

	Hyperbolic Rotation	Hyperbolic Vectoring
Function	`sinh`, `cosh`, `tanh`, `exp`	`ln`
\(x_{i+1}\)	\(x_i + d_i(y_i \gg i)\)	\(x_i + d_i(y_i \gg i)\)
\(y_{i+1}\)	\(y_i + d_i(x_i \gg i)\)	\(y_i + d_i(x_i \gg i)\)
\(z_{i+1}\)	\(z_i - d_i \tanh^{-1}(2^{-i})\)	\(z_i + d_i \tanh^{-1}(2^{-i})\)
\(d_i\)	\(\text{sgn}(z_i)\)	\(-\text{sgn}(y_i)\)
Init	\(x_0 = K_\text{H},\ y_0 = 0,\ z_0 = \theta\)	\(x_0 = s+1,\ y_0 = s-1,\ z_0 = 0\)
Converges to	\(x_N = \cosh\theta,\ y_N = \sinh\theta\)	\(z_N = \tfrac{1}{2}\ln\tfrac{x_0+y_0}{x_0-y_0} = \ln(s)\)
Notes	\(\tanh = y_N/x_N\). \(\exp(\theta) = x_N + y_N\) (no extra hardware). Both require the repeat schedule below.	\(s > 0\) required; valid tested range 0.2–1.9. Apply factor-of-2 correction in wrapper. Same repeat schedule as rotation is mandatory.

Hyperbolic repeat schedule — certain iterations must execute twice for guaranteed convergence. Indices follow \(k_{n+1} = 3k_n + 1\):

ITER=16:  [1, 2, 3, 4, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 13, 14]

LUT Generation

The angle lookup tables must be regenerated whenever WIDTH, ITER, or ANG_FRAC change. The Python generator gen/gen_cordic_tables.py produces three auto-generated files that all downstream RTL \include`s:

cordic_atan_rom.v — LUT of \( \tan^{-1}(2^{-i}) \) for circular mode and \( \tanh^{-1}(2^{-i}) \) for hyperbolic mode, each entry in Q2.30:

atan_table[i]  = int(round(math.atan(2**-i)  * 2**FRAC_BITS))
atanh_table[i] = int(round(math.atanh(2**-i) * 2**FRAC_BITS))

cordic_params.vh — Defines WIDTH, ITER, CORDIC_K_SCALE, CORDIC_K_H_SCALE, and all bit-width macros. The gain compensation constants are:

K = 1.0
for i in range(ITER):
    K *= math.sqrt(1.0 + 2**(-2*i))
K_SCALE   = int(round((1.0 / K)   * 2**FRAC_BITS))   # circular

K_H = 1.0
for i in repeat_schedule:                              # with repeat indices
    K_H *= math.sqrt(1.0 - 2**(-2*i))
K_H_SCALE = int(round((1.0 / K_H) * 2**FRAC_BITS))   # hyperbolic

cordic_consts.vh — Defines HALF_PI, PI, TWO_PI, and SIN_COS_OUT_SHIFT in Q2.30. Never hardcode angle constants in RTL — always use this file.

Parameterization

Parameter	Description
`WIDTH`	Internal fixed-point datapath width
`ITER`	Iteration depth
`ANG_FRAC`	Angle fractional precision
`OUT_WIDTH`	Output width
`SHIFT`	Deterministic scaling/truncation control

Trade-offs explored between accuracy, latency, and area.

Accuracy — Baseline Results (WIDTH=32, ITER=16, FRAC=30)

All figures below are at the recommended production configuration. Full parameter sweep results are in the linked subproject reports.

Circular Rotation — sin / cos

Metric	sin	cos
Max Error	6.9×10⁻⁵	7.2×10⁻⁵
RMS Error	3.9×10⁻⁵	3.9×10⁻⁵

OUT_WIDTH=16, OUT_SHIFT=16. Errors are symmetric and dominated by output quantization. Increasing ITER beyond 16 yields no improvement at 30-bit angle precision.

Circular Rotation — tan

Metric	Value
Max Error	~12.1 (near ±π/2)
RMS Error	~2.46

Inherently ill-conditioned near ±π/2. Stable within \(|z| < 1.4\) rad; treat as best-effort outside that range.

Circular Vectoring — magnitude

Metric	Value
Max Error	5.6×10⁻⁵
RMS Error	2.6×10⁻⁵

MAG_OUT_WIDTH=16, MAG_OUT_SHIFT=16. Achieves lower error than rotation because no output gain correction is needed beyond K_SCALE.

Circular Vectoring — atan2

Metric	Value
Max Error	~3.31 rad (near origin)
RMS Error	~0.49 rad

Error is dominated entirely by the \((x,y) \to (0,0)\) singularity. All non-zero inputs produce correct phase. Zero-magnitude inputs must be explicitly guarded in system logic.

Linear Rotation — mult

Metric	Value
Max Error	2.7×10⁻⁵
RMS Error	9.2×10⁻⁶

Q2.30 × Q2.30 → Q2.30, well-conditioned across ±0.9 input range. Production-ready.

Linear Vectoring — div

Metric	Value
Max Error	~4.0 (saturated)
RMS Error	~0.83

Saturates at ±2.0 due to Q2.30 representable range. Stable within range; requires prescaling for quotients outside ±2.0.

Hyperbolic Rotation — sinh / cosh

Metric	sinh	cosh
Max Error	1.1×10⁻⁴	7.9×10⁻⁵
RMS Error	5.7×10⁻⁵	4.3×10⁻⁵

OUT_WIDTH=16, OUT_SHIFT=16. Stable and monotonic across ±1.0 input range with correct repeat schedule.

Hyperbolic Rotation — tanh

Metric	Value
Max Error	0.762
RMS Error	0.183

Structurally limited — error does not improve with ITER or FRAC. Treat as best-effort.

Hyperbolic Rotation — exp

Metric	Value
Max Error	2.5×10⁻⁴
RMS Error	1.1×10⁻⁴

OUT_SHIFT=17. Stable across ±1.0 after correct repeat scheduling.

Hyperbolic Vectoring — ln

Metric	Value
Max Error	1.5×10⁻⁴
RMS Error	8.1×10⁻⁵

OUT_SHIFT=16. Valid input range 0.2–1.9. Repeat schedule mandatory — without it the series fails to converge.

Known Numerical Sensitivities

Issue	Cause
Amplitude collapse	Underscaled outputs
Instability	Overscaled internal widths
Systematic bias	Incorrect gain compensation
Clipping / sign inversion	Mismatched output shifts
`tan(x)` instability	Near \( \pm \pi/2 \)
`atan2` sensitivity	Near \( (0,0) \)
Hyperbolic divergence	Missing repeat schedule
WIDTH=40 failure	Mismatched scaling at extreme widths

Wrapper Semantics

Wrappers provide gain compensation, deterministic scaling, format conversion, and mode-specific preconditioning. Correct scaling is mandatory — underscaling causes amplitude collapse; overscaling causes sign inversion and catastrophic distortion. RMS error saturates near ~0.7–1.2 in all failure regions (deterministic, not random).

Function	Output Mapping	Gain Correction
`sin`	final \( y \)	K_SCALE on \( x_0 \)
`cos`	final \( x \)	K_SCALE on \( x_0 \)
`tan`	ratio \( y/x \)	K_SCALE cancels
`magnitude`	final \( x \)	K_SCALE on \( x_0 \)
`atan2`	accumulated \( z \)	none needed
`sinh`	final \( y \)	K_H_SCALE on \( x_0 \)
`cosh`	final \( x \)	K_H_SCALE on \( x_0 \)
`tanh`	ratio \( y/x \)	K_H_SCALE cancels
`exp`	\( x_N + y_N \)	K_H_SCALE on \( x_0 \)
`ln`	accumulated \( z \)	factor-of-2 in wrapper

Formal Verification

Scope: Core iteration engine and control FSM
Tools: SymbiYosys + Yices2
Assertions: SystemVerilog Assertions (parametric)

Proven Structural Properties

Single-cycle start
Non-overlapping transactions
Bounded progress \( \le ITER + 1 \)
Deadlock-free execution
Input stability while busy
Output range safety

Rotation Mode Properties

Odd/even symmetry
\( \sin(-x) = -\sin(x) \)
\( \cos(-x) = \cos(x) \)
Local monotonicity near zero
Sign consistency

Vectoring Mode Properties

Magnitude symmetry across quadrants
Correct zero-vector handling
\( \text{atan2} \) odd symmetry
\( \pi \)-rotation consistency

Note: Numerical precision is validated through simulation, not formal proof.

Parameterization Sweep Summary

Full parameter sensitivity reports (iteration sweeps, width sweeps, fraction-bit sweeps, output-scaling failure regions) are in the linked subproject docs for each of the six modes. Key cross-mode findings:

ITER=16 is sufficient for all modes at 30-bit precision; beyond 16 yields no measurable improvement
WIDTH≥32 required for stable convergence; WIDTH=40 with mismatched scaling causes catastrophic failure across all modes
FRAC≥28 required for sin/cos/magnitude below 1×10⁻⁴ RMS
Output scaling mismatches (wrong OUT_SHIFT) produce deterministic catastrophic failure — RMS error saturates near ~0.7–1.2 — not graceful degradation

Drop-In CORDIC Core Families

Family A — Iterative / Control-Oriented

Core ID	Description	Purpose
A1	Iterative rolled	Minimum-area baseline
A2	Iterative unrolled	Reduced latency
A3	2-stage micro-pipeline	Shorter critical path
A4	3-stage micro-pipeline	Higher Fmax

Family B — Throughput / Feed-Forward

Core ID	Description	Purpose
B1	Fully unrolled	1 result/cycle
B2	2-stage pipeline	Balanced throughput
B3	Balanced 2-stage	Even stage distribution
B4	3-stage deep pipeline	High-frequency design

Family C — SIMD / Spatial

Core ID	Description	Purpose
CS	Scalar unrolled	Reference
CR2	Replicated ×2	Dual lane
CR4	Replicated ×4	Quad lane
CSR4	Shared ROM quad	Area optimized
CVEC4	Packed SIMD	Vector datapath
CVEC5	Refined SIMD	Shared resource optimized

Family D — Multi-Issue / Time-Interleaved

Core ID	Description	Purpose
D2X2S	2 issues × 2 lanes	Dual issue
D4X2S	2 issues × 4 lanes	Wider SIMD
D4X2P	2 issues × 4 lanes + pipeline	Improved Fmax
D4X4S	4 issues × 4 lanes	High concurrency

Implementations

The CORDIC core has been integrated into two complete RTL subsystems, each independently verified with dedicated Python verification suites. Both treat cordic_core as a black-box computational primitive — no CORDIC internals leak into the integration logic.

FFT / IFFT — Radix-2 DIT, N=8

A fully synchronous N=8 FFT implemented using cordic_core in rotation mode to compute twiddle factors entirely in hardware, requiring no ROM. The IFFT is a zero-logic wrapper using the identity:

IFFT(X) = conj( FFT( conj(X) ) ) / N

Conjugation is two wire negations — din_im negated on input, dout_im negated on output. No extra logic.

N/2 parallel CORDIC instances compute twiddle factors \( W_N^k = e^{-j2\pi k/N} \) on start. Angles outside \((-\pi/2, \pi/2)\) are folded at elaboration time via localparam. The butterfly datapath uses Q1.30 with >>1 guard shift per stage to prevent overflow. Full architecture, directed tests, and sweep results are in the linked docs.

Verified metrics (WIDTH=32, ITER=16, 200 Monte-Carlo trials):

Metric	FFT	IFFT
SNR mean	78.8 dB	78.4 dB
SNR min	74.3 dB	75.4 dB
Max error	8.3×10⁻⁵	1.0×10⁻⁴
Mean error	2.1×10⁻⁵	3.1×10⁻⁵
SFDR (pure-tone)	87.1 dBc	—
Phase error mean	0.008°	0.006°
Roundtrip SNR (FFT→IFFT)	—	78.3 dB

QAM16 Demodulator

A fully synchronous 16-QAM demodulator using two cordic_core instances — one in rotation mode driving the NCO for carrier synthesis, one in vectoring mode computing the exact atan2 phase error inside the Costas loop. No floating-point anywhere in the datapath. All angular quantities are Q2.30; sample-level IQ data is Q1.30 throughout. Full architecture description and verification results are in the linked docs.

Verified metrics:

Metric	Value
Lock time	32 symbols (deterministic, all conditions)
Acquisition range	≥ 0.020 rad/sample
Pipeline delay	1 symbol (constant)
Ideal EVM (CORDIC noise floor)	4.50%
RTL BER floor (≥15 dB SNR)	~2.6×10⁻⁴
BER deviation from theory (9–13 dB)	< 1 dB

Directed test results:

Test	Freq Offset	Phase Offset	Noise σ	BER
Ideal	0	0	0	0.00e+00
Phase_Offset_0p3	0	+0.3 rad	0	1.53e−02
Freq_Offset_0p005	+0.005 rad/samp	0	0	2.34e−02
Combined	+0.003 rad/samp	+0.2 rad	0.01	1.44e−02

DPLL — Digital Phase-Locked Loop

A fully synchronous second-order Digital Phase-Locked Loop using one cordic_core instance in rotation mode driving the NCO for carrier synthesis. The phase detector uses a direct cross-product (Im{ conj(nco) × ref }) rather than CORDIC vectoring — this avoids the ±180° false-lock ambiguity that CORDIC preconditioning introduces in free-running loops. All angular quantities are Q2.30; the phase accumulator is 64-bit to safely hold TWO_PI without overflow. Full architecture description, bug log, and verification results are in the linked docs.

Verified metrics (WIDTH=32, ITER=16, KP=0.014, KI=0.0001):

Metric	Value
Acquisition range	0.040 rad/sample
Step settling time	724 samples (foff=+0.010)
Lock time — ideal	149 samples
Lock time — foff=0.005, 50 seeds	mean=507, min=233, max=810
Phase acquisition range	0°–180° (no false lock anywhere)
Steady-state `freq_adj` error	< 1×10⁻⁷ rad/sample (quantisation floor)
Lock criterion	`freq_locked AND phase_locked` (dual)

Directed test results:

Test	Nom Freq	Freq Offset	Phase Offset	`freq_adj` Error	Result
Ideal	0.2	0	0	8.48×10⁻⁸	PASS
Phase +0.5 rad	0.2	0	+0.5 rad	4.38×10⁻⁸	PASS
Freq +0.005	0.2	+0.005	0	1.03×10⁻⁷	PASS
Freq +0.015	0.2	+0.015	0	1.60×10⁻⁷	PASS
Combined +0.003/+0.3	0.2	+0.003	+0.3 rad	4.23×10⁻⁸	PASS

Sigma-Delta ADC — Bandpass with CORDIC Downconversion

A fully synchronous bandpass oversampling ADC pipeline using cordic_core in rotation mode (via cordic_nco) and cordic_mixer for digital downconversion of the decimated IF output to baseband IQ. The modulator and decimator are standalone fixed-point blocks; the CORDIC chain is identical to the carrier recovery path in qam16_demod. No floating-point anywhere. All angular quantities are Q2.30; sample data is Q1.30 throughout.

The pipeline has three stages. The 2nd-order error-feedback modulator converts the IF input to a 1-bit PDM bitstream at the oversampled rate using two 64-bit integrators — the same sign-extended accumulator pattern as dpll_loop_filter. A 3rd-order CIC decimator (56-bit accumulators, supports OSR up to 256) filters and downsamples by OSR, producing a Q1.30 IF PCM sample. The cordic_nco and cordic_mixer then shift the IF PCM to baseband IQ — freq_adj can be driven directly from a DPLL or Costas loop output for closed-loop carrier recovery, making this the natural front-end for the QAM16 demodulator in an SDR chain.

IF input (Q1.30, OSR×fs)
    │
    ▼
[sd_adc]          2nd-order SD modulator, LFSR dither, ±2³¹ anti-windup
    │  1-bit PDM at OSR×fs
    ▼
[sd_decimator]    3rd-order CIC, decimate by OSR, normalize >>>3×log₂(OSR)
    │  IF PCM (Q1.30 at fs)
    ▼
[cordic_nco]  ◄── phase_inc (carrier at PCM rate, Q2.30)
[cordic_mixer]    complex multiply: pcm × conj(nco)
    │  bb_i / bb_q (Q1.30 baseband IQ at fs)
    ▼
  Output  ──►  qam16_demod  /  dpll  (freq_adj feedback)

Full architecture description, bug log, and verification results are in the linked docs.

Verified metrics (WIDTH=32, ITER=16, OSR=64, DITHER_EN=1):

Metric	Value
Bitstream density accuracy	< 1% error across −0.95 to +0.95 input range
PCM SNR (f = fs_pcm/8)	~61 dB measured (DFT, short record)
Theoretical peak SNR (2nd order, OSR=64)	85.2 dB
SNR slope vs OSR	~15 dB/octave (2nd-order confirms)
Dither idle-tone suppression	> 20 dB
Overload recovery	< 128 PDM samples to density 0.5±0.08
Bandpass combined IQ RMS (pure carrier)	~0.258 (exp ~0.327 with CIC droop at fs_pcm/4)
CIC group delay	~3.5 PCM samples (non-integer, phase-matched SNR impractical)
Anti-windup range	±2³¹ (modulator integrators)

Directed test results (tb_sd_adc.v, 8 tests):

Test	Input	Metric	Expected	Result
DC sweep × 5	−0.9 to +0.9	density error	< 2%	PASS × 5
Sine RMS	0.5×sin, f=fs_pdm/512	sum_sq/sample	114939386 ±20%	PASS
Overload recovery	±FS for 256 samples	density after recovery	0.5 ±0.08	PASS
Dither at DC=0	x_in=0	bitstream density	0.3–0.7	PASS

Bandpass test results (tb_sd_bandpass.v, 3 tests):

Test	Input	Combined IQ RMS	Window	Result
Pure carrier	0.5×sin at fs_pdm/256	0.258	0.20–0.42	PASS
Sideband tone	carrier + fs_pdm/1024 offset	0.216	0.20–0.42	PASS
Zero input	x_in=0	0.000	< 0.02	PASS

Toolchain

Tool	Role
Verilog	RTL design
SVA	Formal property specification
SymbiYosys	Formal verification engine
Yices2	SMT backend
Icarus Verilog	Simulation
Python	LUT generation & parameter sweeps
FuseSoC	Packaging & integration