Loading…
Design & Formal Verification of Parameterizable Fixed-Point CORDIC IP

Design & Formal Verification of Parameterizable Fixed-Point CORDIC IP

SystemVerilog Verilog Mircoarchitecture Pipelining Formal Verification IP Packaging
Design & Formal Verification of Parameterizable Fixed-Point CORDIC IP
Full reports and documentation can be found here
Formal verification setup and parameter sensitivity reports for all 6 modes
View Project
Design & Formal Verification of Parameterizable Fixed-Point CORDIC IP
QAM16 Demodulator
View Project
Design & Formal Verification of Parameterizable Fixed-Point CORDIC IP
FFT/IFFT
View Project
Design & Formal Verification of Parameterizable Fixed-Point CORDIC IP
Sigma-Delta ADC - Bandpass IF Sampling with Downconversion
View Project
Design & Formal Verification of Parameterizable Fixed-Point CORDIC IP
My First Tiny Tapeout: Designing and Hardening a CORDIC based DPLL
View Project
RepositoryMummanajagadeesh/cordic-algorithm-verilog
Start DateDec 2024

Summary

ItemDescription
TypeFully synthesizable fixed-point CORDIC Soft IP
Modes6 modes (Rotation/Vectoring × Linear/Circular/Hyperbolic)
FocusParameterization, numerical characterization, formal verification
Precision~\(10^{-5}\) RMS (@ 32-bit, 16 iterations)
VerificationSVA + SymbiYosys (Yices2 backend)

Overview

The CORDIC (COordinate Rotation DIgital Computer) algorithm performs trigonometric, hyperbolic, and linear computations using only additions and bit-shifts — no multipliers anywhere in the datapath. This implementation is a fully parameterized, synthesizable Verilog soft IP covering all six canonical modes across three coordinate systems. It has been formally verified for structural and control invariants using SymbiYosys, numerically characterized across the full parameter space, and demonstrated as the computational engine inside two production-grade RTL integrations: a Digital PLL, 16-QAM demodulator and a Radix-2 DIT FFT/IFFT.


Core Features

  • Shift–add iterative datapath (no multipliers)
  • 6 operating modes across 3 coordinate systems
  • Deterministic latency: \( \text{ITER} + 1 \) cycles
  • Clean start / busy / valid handshake
  • Fully parameterized width, iterations, angle precision, scaling
  • Python auto-generation of LUTs and parameter headers
  • FuseSoC packaged IP
  • Formal verification of control and structural invariants
  • Drop-in architectural variants (iterative, pipelined, SIMD, multi-issue)
  • Demonstrated integration in QAM16 demodulator, DPLL and FFT/IFFT

Operating Modes

Architecture ↓ / Algorithm →Rotation Mode (Forward)Vectoring Mode (Inverse / Reduction)
Linearmultdiv
Circularsin, cos, tanatan2, magnitude
Hyperbolicsinh, cosh, tanh, expln

Algorithm: Iterative Update Equations

Circular RotationCircular Vectoring
Functionsin, cos, tanmagnitude, atan2
\(x_{i+1}\)\(x_i - d_i(y_i \gg i)\)\(x_i + d_i(y_i \gg i)\)
\(y_{i+1}\)\(y_i + d_i(x_i \gg i)\)\(y_i - d_i(x_i \gg i)\)
\(z_{i+1}\)\(z_i - d_i \tan^{-1}(2^{-i})\)\(z_i + d_i \tan^{-1}(2^{-i})\)
\(d_i\)\(\text{sgn}(z_i)\)\(\text{sgn}(y_i)\)
Init\(x_0 = K_\text{SCALE},\ y_0 = 0,\ z_0 = \theta\)\(x_0 = x,\ y_0 = y,\ z_0 = 0\)
Converges to\(x_N = \cos\theta,\ y_N = \sin\theta\)\(x_N = K\sqrt{x^2+y^2},\ z_N = \text{atan2}(y,x)\)
Notes\(\tan = y_N/x_N\). Input must satisfy \(|\theta| < \pi/2\); fold wider angles via quadrant-negation flags before entry.Pre-condition: negate both \(x_0, y_0\) if \(x < 0\) to fold into right half-plane. \((x,y)=(0,0)\) is a singularity — must be guarded in system logic.
Linear RotationLinear Vectoring
Functionmultdiv
\(x_{i+1}\)\(x_i\) (unchanged)\(x_i\) (unchanged)
\(y_{i+1}\)\(y_i + d_i(x_i \gg i)\)\(y_i + d_i(x_i \gg i)\)
\(z_{i+1}\)\(z_i - d_i \cdot 2^{-i}\)\(z_i + d_i \cdot 2^{-i}\)
\(d_i\)\(\text{sgn}(z_i)\)\(-\text{sgn}(y_i)\)
Init\(x_0 = a,\ y_0 = 0,\ z_0 = b\)\(x_0 = b,\ y_0 = a,\ z_0 = 0\)
Converges to\(y_N = a \cdot b\)\(z_N = a / b\)
NotesNo gain compensation required. x is a constant multiplier throughout all iterations.Output saturates at ±2.0 in Q2.30. Errors up to ~4.0 occur when the true quotient exceeds ±2.0. Requires prescaling or format extension for high-dynamic-range division.
Hyperbolic RotationHyperbolic Vectoring
Functionsinh, cosh, tanh, expln
\(x_{i+1}\)\(x_i + d_i(y_i \gg i)\)\(x_i + d_i(y_i \gg i)\)
\(y_{i+1}\)\(y_i + d_i(x_i \gg i)\)\(y_i + d_i(x_i \gg i)\)
\(z_{i+1}\)\(z_i - d_i \tanh^{-1}(2^{-i})\)\(z_i + d_i \tanh^{-1}(2^{-i})\)
\(d_i\)\(\text{sgn}(z_i)\)\(-\text{sgn}(y_i)\)
Init\(x_0 = K_\text{H},\ y_0 = 0,\ z_0 = \theta\)\(x_0 = s+1,\ y_0 = s-1,\ z_0 = 0\)
Converges to\(x_N = \cosh\theta,\ y_N = \sinh\theta\)\(z_N = \tfrac{1}{2}\ln\tfrac{x_0+y_0}{x_0-y_0} = \ln(s)\)
Notes\(\tanh = y_N/x_N\). \(\exp(\theta) = x_N + y_N\) (no extra hardware). Both require the repeat schedule below.\(s > 0\) required; valid tested range 0.2–1.9. Apply factor-of-2 correction in wrapper. Same repeat schedule as rotation is mandatory.

Hyperbolic repeat schedule — certain iterations must execute twice for guaranteed convergence. Indices follow \(k_{n+1} = 3k_n + 1\):

ITER=16:  [1, 2, 3, 4, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 13, 14]

LUT Generation

The angle lookup tables must be regenerated whenever WIDTH, ITER, or ANG_FRAC change. The Python generator gen/gen_cordic_tables.py produces three auto-generated files that all downstream RTL \include`s:

cordic_atan_rom.v — LUT of \( \tan^{-1}(2^{-i}) \) for circular mode and \( \tanh^{-1}(2^{-i}) \) for hyperbolic mode, each entry in Q2.30:

atan_table[i]  = int(round(math.atan(2**-i)  * 2**FRAC_BITS))
atanh_table[i] = int(round(math.atanh(2**-i) * 2**FRAC_BITS))

cordic_params.vh — Defines WIDTH, ITER, CORDIC_K_SCALE, CORDIC_K_H_SCALE, and all bit-width macros. The gain compensation constants are:

K = 1.0
for i in range(ITER):
    K *= math.sqrt(1.0 + 2**(-2*i))
K_SCALE   = int(round((1.0 / K)   * 2**FRAC_BITS))   # circular

K_H = 1.0
for i in repeat_schedule:                              # with repeat indices
    K_H *= math.sqrt(1.0 - 2**(-2*i))
K_H_SCALE = int(round((1.0 / K_H) * 2**FRAC_BITS))   # hyperbolic

cordic_consts.vh — Defines HALF_PI, PI, TWO_PI, and SIN_COS_OUT_SHIFT in Q2.30. Never hardcode angle constants in RTL — always use this file.


Parameterization

ParameterDescription
WIDTHInternal fixed-point datapath width
ITERIteration depth
ANG_FRACAngle fractional precision
OUT_WIDTHOutput width
SHIFTDeterministic scaling/truncation control

Trade-offs explored between accuracy, latency, and area.


Accuracy — Baseline Results (WIDTH=32, ITER=16, FRAC=30)

All figures below are at the recommended production configuration. Full parameter sweep results are in the linked subproject reports.

Circular Rotation — sin / cos

Metricsincos
Max Error6.9×10⁻⁵7.2×10⁻⁵
RMS Error3.9×10⁻⁵3.9×10⁻⁵

OUT_WIDTH=16, OUT_SHIFT=16. Errors are symmetric and dominated by output quantization. Increasing ITER beyond 16 yields no improvement at 30-bit angle precision.

Circular Rotation — tan

MetricValue
Max Error~12.1 (near ±π/2)
RMS Error~2.46

Inherently ill-conditioned near ±π/2. Stable within \(|z| < 1.4\) rad; treat as best-effort outside that range.

Circular Vectoring — magnitude

MetricValue
Max Error5.6×10⁻⁵
RMS Error2.6×10⁻⁵

MAG_OUT_WIDTH=16, MAG_OUT_SHIFT=16. Achieves lower error than rotation because no output gain correction is needed beyond K_SCALE.

Circular Vectoring — atan2

MetricValue
Max Error~3.31 rad (near origin)
RMS Error~0.49 rad

Error is dominated entirely by the \((x,y) \to (0,0)\) singularity. All non-zero inputs produce correct phase. Zero-magnitude inputs must be explicitly guarded in system logic.

Linear Rotation — mult

MetricValue
Max Error2.7×10⁻⁵
RMS Error9.2×10⁻⁶

Q2.30 × Q2.30 → Q2.30, well-conditioned across ±0.9 input range. Production-ready.

Linear Vectoring — div

MetricValue
Max Error~4.0 (saturated)
RMS Error~0.83

Saturates at ±2.0 due to Q2.30 representable range. Stable within range; requires prescaling for quotients outside ±2.0.

Hyperbolic Rotation — sinh / cosh

Metricsinhcosh
Max Error1.1×10⁻⁴7.9×10⁻⁵
RMS Error5.7×10⁻⁵4.3×10⁻⁵

OUT_WIDTH=16, OUT_SHIFT=16. Stable and monotonic across ±1.0 input range with correct repeat schedule.

Hyperbolic Rotation — tanh

MetricValue
Max Error0.762
RMS Error0.183

Structurally limited — error does not improve with ITER or FRAC. Treat as best-effort.

Hyperbolic Rotation — exp

MetricValue
Max Error2.5×10⁻⁴
RMS Error1.1×10⁻⁴

OUT_SHIFT=17. Stable across ±1.0 after correct repeat scheduling.

Hyperbolic Vectoring — ln

MetricValue
Max Error1.5×10⁻⁴
RMS Error8.1×10⁻⁵

OUT_SHIFT=16. Valid input range 0.2–1.9. Repeat schedule mandatory — without it the series fails to converge.


Known Numerical Sensitivities

IssueCause
Amplitude collapseUnderscaled outputs
InstabilityOverscaled internal widths
Systematic biasIncorrect gain compensation
Clipping / sign inversionMismatched output shifts
tan(x) instabilityNear \( \pm \pi/2 \)
atan2 sensitivityNear \( (0,0) \)
Hyperbolic divergenceMissing repeat schedule
WIDTH=40 failureMismatched scaling at extreme widths

Wrapper Semantics

Wrappers provide gain compensation, deterministic scaling, format conversion, and mode-specific preconditioning. Correct scaling is mandatory — underscaling causes amplitude collapse; overscaling causes sign inversion and catastrophic distortion. RMS error saturates near ~0.7–1.2 in all failure regions (deterministic, not random).

FunctionOutput MappingGain Correction
sinfinal \( y \)K_SCALE on \( x_0 \)
cosfinal \( x \)K_SCALE on \( x_0 \)
tanratio \( y/x \)K_SCALE cancels
magnitudefinal \( x \)K_SCALE on \( x_0 \)
atan2accumulated \( z \)none needed
sinhfinal \( y \)K_H_SCALE on \( x_0 \)
coshfinal \( x \)K_H_SCALE on \( x_0 \)
tanhratio \( y/x \)K_H_SCALE cancels
exp\( x_N + y_N \)K_H_SCALE on \( x_0 \)
lnaccumulated \( z \)factor-of-2 in wrapper

Formal Verification

Scope: Core iteration engine and control FSM
Tools: SymbiYosys + Yices2
Assertions: SystemVerilog Assertions (parametric)

Proven Structural Properties

  • Single-cycle start
  • Non-overlapping transactions
  • Bounded progress \( \le ITER + 1 \)
  • Deadlock-free execution
  • Input stability while busy
  • Output range safety

Rotation Mode Properties

  • Odd/even symmetry
    \( \sin(-x) = -\sin(x) \)
    \( \cos(-x) = \cos(x) \)
  • Local monotonicity near zero
  • Sign consistency

Vectoring Mode Properties

  • Magnitude symmetry across quadrants
  • Correct zero-vector handling
  • \( \text{atan2} \) odd symmetry
  • \( \pi \)-rotation consistency

Note: Numerical precision is validated through simulation, not formal proof.


Parameterization Sweep Summary

Full parameter sensitivity reports (iteration sweeps, width sweeps, fraction-bit sweeps, output-scaling failure regions) are in the linked subproject docs for each of the six modes. Key cross-mode findings:

  • ITER=16 is sufficient for all modes at 30-bit precision; beyond 16 yields no measurable improvement
  • WIDTH≥32 required for stable convergence; WIDTH=40 with mismatched scaling causes catastrophic failure across all modes
  • FRAC≥28 required for sin/cos/magnitude below 1×10⁻⁴ RMS
  • Output scaling mismatches (wrong OUT_SHIFT) produce deterministic catastrophic failure — RMS error saturates near ~0.7–1.2 — not graceful degradation

Drop-In CORDIC Core Families


Family A — Iterative / Control-Oriented

Core IDDescriptionPurpose
A1Iterative rolledMinimum-area baseline
A2Iterative unrolledReduced latency
A32-stage micro-pipelineShorter critical path
A43-stage micro-pipelineHigher Fmax

Family B — Throughput / Feed-Forward

Core IDDescriptionPurpose
B1Fully unrolled1 result/cycle
B22-stage pipelineBalanced throughput
B3Balanced 2-stageEven stage distribution
B43-stage deep pipelineHigh-frequency design

Family C — SIMD / Spatial

Core IDDescriptionPurpose
CSScalar unrolledReference
CR2Replicated ×2Dual lane
CR4Replicated ×4Quad lane
CSR4Shared ROM quadArea optimized
CVEC4Packed SIMDVector datapath
CVEC5Refined SIMDShared resource optimized

Family D — Multi-Issue / Time-Interleaved

Core IDDescriptionPurpose
D2X2S2 issues × 2 lanesDual issue
D4X2S2 issues × 4 lanesWider SIMD
D4X2P2 issues × 4 lanes + pipelineImproved Fmax
D4X4S4 issues × 4 lanesHigh concurrency

Implementations

The CORDIC core has been integrated into two complete RTL subsystems, each independently verified with dedicated Python verification suites. Both treat cordic_core as a black-box computational primitive — no CORDIC internals leak into the integration logic.


FFT / IFFT — Radix-2 DIT, N=8

A fully synchronous N=8 FFT implemented using cordic_core in rotation mode to compute twiddle factors entirely in hardware, requiring no ROM. The IFFT is a zero-logic wrapper using the identity:

IFFT(X) = conj( FFT( conj(X) ) ) / N

Conjugation is two wire negations — din_im negated on input, dout_im negated on output. No extra logic.

N/2 parallel CORDIC instances compute twiddle factors \( W_N^k = e^{-j2\pi k/N} \) on start. Angles outside \((-\pi/2, \pi/2)\) are folded at elaboration time via localparam. The butterfly datapath uses Q1.30 with >>1 guard shift per stage to prevent overflow. Full architecture, directed tests, and sweep results are in the linked docs.

Verified metrics (WIDTH=32, ITER=16, 200 Monte-Carlo trials):

MetricFFTIFFT
SNR mean78.8 dB78.4 dB
SNR min74.3 dB75.4 dB
Max error8.3×10⁻⁵1.0×10⁻⁴
Mean error2.1×10⁻⁵3.1×10⁻⁵
SFDR (pure-tone)87.1 dBc
Phase error mean0.008°0.006°
Roundtrip SNR (FFT→IFFT)78.3 dB

QAM16 Demodulator

A fully synchronous 16-QAM demodulator using two cordic_core instances — one in rotation mode driving the NCO for carrier synthesis, one in vectoring mode computing the exact atan2 phase error inside the Costas loop. No floating-point anywhere in the datapath. All angular quantities are Q2.30; sample-level IQ data is Q1.30 throughout. Full architecture description and verification results are in the linked docs.

Verified metrics:

MetricValue
Lock time32 symbols (deterministic, all conditions)
Acquisition range≥ 0.020 rad/sample
Pipeline delay1 symbol (constant)
Ideal EVM (CORDIC noise floor)4.50%
RTL BER floor (≥15 dB SNR)~2.6×10⁻⁴
BER deviation from theory (9–13 dB)< 1 dB

Directed test results:

TestFreq OffsetPhase OffsetNoise σBER
Ideal0000.00e+00
Phase_Offset_0p30+0.3 rad01.53e−02
Freq_Offset_0p005+0.005 rad/samp002.34e−02
Combined+0.003 rad/samp+0.2 rad0.011.44e−02

DPLL — Digital Phase-Locked Loop

A fully synchronous second-order Digital Phase-Locked Loop using one cordic_core instance in rotation mode driving the NCO for carrier synthesis. The phase detector uses a direct cross-product (Im{ conj(nco) × ref }) rather than CORDIC vectoring — this avoids the ±180° false-lock ambiguity that CORDIC preconditioning introduces in free-running loops. All angular quantities are Q2.30; the phase accumulator is 64-bit to safely hold TWO_PI without overflow. Full architecture description, bug log, and verification results are in the linked docs.

Verified metrics (WIDTH=32, ITER=16, KP=0.014, KI=0.0001):

MetricValue
Acquisition range0.040 rad/sample
Step settling time724 samples (foff=+0.010)
Lock time — ideal149 samples
Lock time — foff=0.005, 50 seedsmean=507, min=233, max=810
Phase acquisition range0°–180° (no false lock anywhere)
Steady-state freq_adj error< 1×10⁻⁷ rad/sample (quantisation floor)
Lock criterionfreq_locked AND phase_locked (dual)

Directed test results:

TestNom FreqFreq OffsetPhase Offsetfreq_adj ErrorResult
Ideal0.2008.48×10⁻⁸PASS
Phase +0.5 rad0.20+0.5 rad4.38×10⁻⁸PASS
Freq +0.0050.2+0.00501.03×10⁻⁷PASS
Freq +0.0150.2+0.01501.60×10⁻⁷PASS
Combined +0.003/+0.30.2+0.003+0.3 rad4.23×10⁻⁸PASS

Sigma-Delta ADC — Bandpass with CORDIC Downconversion

A fully synchronous bandpass oversampling ADC pipeline using cordic_core in rotation mode (via cordic_nco) and cordic_mixer for digital downconversion of the decimated IF output to baseband IQ. The modulator and decimator are standalone fixed-point blocks; the CORDIC chain is identical to the carrier recovery path in qam16_demod. No floating-point anywhere. All angular quantities are Q2.30; sample data is Q1.30 throughout.

The pipeline has three stages. The 2nd-order error-feedback modulator converts the IF input to a 1-bit PDM bitstream at the oversampled rate using two 64-bit integrators — the same sign-extended accumulator pattern as dpll_loop_filter. A 3rd-order CIC decimator (56-bit accumulators, supports OSR up to 256) filters and downsamples by OSR, producing a Q1.30 IF PCM sample. The cordic_nco and cordic_mixer then shift the IF PCM to baseband IQ — freq_adj can be driven directly from a DPLL or Costas loop output for closed-loop carrier recovery, making this the natural front-end for the QAM16 demodulator in an SDR chain.

IF input (Q1.30, OSR×fs)
    │
    ▼
[sd_adc]          2nd-order SD modulator, LFSR dither, ±2³¹ anti-windup
    │  1-bit PDM at OSR×fs
    ▼
[sd_decimator]    3rd-order CIC, decimate by OSR, normalize >>>3×log₂(OSR)
    │  IF PCM (Q1.30 at fs)
    ▼
[cordic_nco]  ◄── phase_inc (carrier at PCM rate, Q2.30)
[cordic_mixer]    complex multiply: pcm × conj(nco)
    │  bb_i / bb_q (Q1.30 baseband IQ at fs)
    ▼
  Output  ──►  qam16_demod  /  dpll  (freq_adj feedback)

Full architecture description, bug log, and verification results are in the linked docs.

Verified metrics (WIDTH=32, ITER=16, OSR=64, DITHER_EN=1):

MetricValue
Bitstream density accuracy< 1% error across −0.95 to +0.95 input range
PCM SNR (f = fs_pcm/8)~61 dB measured (DFT, short record)
Theoretical peak SNR (2nd order, OSR=64)85.2 dB
SNR slope vs OSR~15 dB/octave (2nd-order confirms)
Dither idle-tone suppression> 20 dB
Overload recovery< 128 PDM samples to density 0.5±0.08
Bandpass combined IQ RMS (pure carrier)~0.258 (exp ~0.327 with CIC droop at fs_pcm/4)
CIC group delay~3.5 PCM samples (non-integer, phase-matched SNR impractical)
Anti-windup range±2³¹ (modulator integrators)

Directed test results (tb_sd_adc.v, 8 tests):

TestInputMetricExpectedResult
DC sweep × 5−0.9 to +0.9density error< 2%PASS × 5
Sine RMS0.5×sin, f=fs_pdm/512sum_sq/sample114939386 ±20%PASS
Overload recovery±FS for 256 samplesdensity after recovery0.5 ±0.08PASS
Dither at DC=0x_in=0bitstream density0.3–0.7PASS

Bandpass test results (tb_sd_bandpass.v, 3 tests):

TestInputCombined IQ RMSWindowResult
Pure carrier0.5×sin at fs_pdm/2560.2580.20–0.42PASS
Sideband tonecarrier + fs_pdm/1024 offset0.2160.20–0.42PASS
Zero inputx_in=00.000< 0.02PASS

Toolchain

ToolRole
VerilogRTL design
SVAFormal property specification
SymbiYosysFormal verification engine
Yices2SMT backend
Icarus VerilogSimulation
PythonLUT generation & parameter sweeps
FuseSoCPackaging & integration