RISCOF: Compliance and Coverage Testing

RISCOF Tests

RISCOF (RISC-V Compatibility Framework) is the official framework for certifying RISC-V implementations against the RISC-V Architectural Compatibility Tests (ACT). It works by running the same test programs on both the DUT and a trusted reference model (Spike), extracting memory signatures from both, and comparing them word by word. A test passes when the DUT signature matches the reference exactly.

The framework has two separate modes. The compliance run (riscof run) checks whether the DUT correctly implements the ISA by verifying functional correctness across the full test suite. The coverage run (riscof coverage) measures how well the test suite itself exercises the architectural specification, using covergroups defined in .cgf files to count how many cross-product bins across operand values, alignments, and edge cases were actually hit.

Compliance Testing

The compliance run targets rv32i-pipe against the official RISC-V architectural test suite. RISCOF builds and runs each test on the DUT and on Spike, extracts the signature region from data memory at the end of simulation, and diffs the two outputs.

The test suite used here covers:

37 rv32ui tests (base integer ISA, user-level)
4 rv32im tests excluded from this run (M extension not targeted here)

Final result: 41/41 tests passed.

All tests produce a matching signature. No instruction produces a wrong result or fails to write its output to the expected memory location.

Coverage Testing

After the compliance run, riscof coverage is run with two .cgf files:

coverage/dataset.cgf (shared cross-instruction dataset bins)
coverage/i/rv32i.cgf (per-instruction RV32I covergroups)

These files define, for each instruction, what operand value combinations, alignment patterns, and edge cases must be hit to consider the instruction thoroughly tested. Coverage is reported per instruction as bins hit over total bins.

Results

Covergroup	Coverage
add	720/730 (98.63%)
addi	650/655 (99.24%)
and	720/730 (98.63%)
andi	650/655 (99.24%)
auipc	103/103 (100.00%)
beq	684/693 (98.70%)
bge	683/693 (98.56%)
bgeu	825/831 (99.28%)
blt	685/693 (98.85%)
bltu	822/831 (98.92%)
bne	685/693 (98.85%)
fence	1/1 (100.00%)
jal	37/37 (100.00%)
jalr	94/94 (100.00%)
lb-align	84/85 (98.82%)
lbu-align	85/85 (100.00%)
lh-align	76/77 (98.70%)
lhu-align	76/77 (98.70%)
lui	103/103 (100.00%)
lw-align	72/73 (98.63%)
or	721/730 (98.77%)
ori	654/654 (100.00%)
sb-align	150/153 (98.04%)
sh-align	145/145 (100.00%)
sll	204/211 (96.68%)
slli	178/178 (100.00%)
slt	723/730 (99.04%)
slti	651/655 (99.39%)
sltiu	785/790 (99.37%)
sltu	856/866 (98.85%)
sra	211/211 (100.00%)
srai	172/178 (96.63%)
srl	206/211 (97.63%)
srli	173/178 (97.19%)
sub	723/730 (99.04%)
sw-align	141/141 (100.00%)
xor	722/730 (98.90%)
xori	652/655 (99.54%)

Reading the Numbers

Several instructions reach 100%: auipc, fence, jal, jalr, lbu-align, lui, ori, sh-align, slli, sra, sw-align. These covergroups were either small enough that the standard test suite covered everything, or they required directed vector additions to close the remaining bins (described in the fix history below).

The remaining gaps are small and uniform. Instructions like add, and, or, xor, and sub sit at 98.6% to 98.9%, with 7 to 10 bins uncovered out of 730. These are typically extreme operand combinations in the walking-ones/zeros dataset that the standard tests do not exercise. Similarly, the branch instructions (beq, bge, blt, bne, etc.) are at 98.5% to 99.3%, with the uncovered bins in edge-case signed/unsigned boundary combinations.

The shift instructions are the weakest group. srai at 96.63% and srl at 97.63% have bins around specific shift amounts combined with boundary register values that were not fully covered even after directed additions. These remain open.

Load alignment (lb-align, lh-align, lhu-align, lw-align) and store alignment (sb-align) each have one to three bins uncovered, typically a specific alignment offset combined with a particular data pattern.

Setup and Commands

The RISCOF flow is isolated under rv32i-pipe/riscof/. It does not touch the existing Makefile, ISA test scripts, or coremark flows.

Directory structure:

rv32i-pipe/riscof/
  config.ini
  setup_env.sh
  run_certification.sh
  dut/
    riscof_rv32i_pipe.py
    rv32i_pipe_runner.py
    rv32i_pipe_isa.yaml
    rv32i_pipe_platform.yaml
    env/
      link.ld
      model_test.h
  reference-spike/
    riscof_spike_ref.py
    env/
      link.ld
      model_test.h

Quick start:

cd rv32i-pipe/riscof
./setup_env.sh
source .venv/bin/activate
./run_certification.sh

Report at: riscof_work/report.html

Manual commands:

source .venv/bin/activate
cd rv32i-pipe/riscof

# clone architectural tests (once)
riscof arch-test --clone --dir ./riscv-arch-test

# validate YAML
riscof validateyaml --config ./config.ini --work-dir ./riscof_work

# compliance run
riscof run \
  --config ./config.ini \
  --suite ./riscv-arch-test/riscv-test-suite \
  --env ./riscv-arch-test/riscv-test-suite/env \
  --work-dir ./riscof_work

# coverage run
riscof coverage \
  --config ./config.ini \
  --suite ./riscv-arch-test/riscv-test-suite \
  --env ./riscv-arch-test/riscv-test-suite/env \
  --work-dir ./riscof_work \
  -c ./riscv-arch-test/coverage/dataset.cgf \
  -c ./riscv-arch-test/coverage/i/rv32i.cgf

Coverage report at: riscof_work/coverage.html

Notes:

DUT ISA spec targets RV32I in dut/rv32i_pipe_isa.yaml.
config.ini sets rtl_root=.., resolving to rv32i-pipe/.
sim_backend=verilator is used for DUT simulation.
Coverage extraction uses Spike trace logging (-l --log-commits) in the reference plugin.
Some tests have labels not present in selected CGFs (for example, privilege misalign1-jalr against coverage/i/rv32i.cgf). These emit an empty ref.cgf and are safely ignored during merge.
.venv/, riscv-arch-test/, and riscof_work/ are local-only and not committed. They are recreated by setup_env.sh and run_certification.sh.

Bring-Up and Fix History

Getting from a fresh RISCOF setup to 41/41 passing and closing coverage outliers required fixing problems at multiple layers: YAML schema, simulator integration, memory mapping, signature extraction, and finally directed test vectors for coverage closure.

Stage 1: YAML Validation Failure

The first riscof validateyaml call failed immediately. The ISA YAML (dut/rv32i_pipe_isa.yaml) included a PMP block with keys not recognized by the installed riscv_config schema version.

Fix: removed the unsupported PMP fields entirely from the ISA YAML.

Stage 2: Icarus Compile Failure

After YAML was fixed, the DUT runner invoked iverilog to build the testbench, which failed with:

Include file defines.v not found

The generated testbench had `include "defines.v" but the compile command did not include the RTL root as a search path.

Fix: added -I <rtl_root> to the Icarus compile command in the runner.

Stage 3: Simulation Timeout / Halted Incorrectly

Many tests timed out (halted=0) and never wrote signatures. The root cause was that the linker base address, halt mechanism, and RISCOF environment headers were not aligned with the architectural test execution model.

Fixes applied together:

Set linker start address to 0x80000000 in both DUT and reference link.ld.
Set DUT BASE_ADDR to 0x80000000 in config and runner.
Updated dut/env/model_test.h to include explicit halt behavior matching the arch-test model.

Stage 4: Spike Reference Issues

Reference runs produced errors about invalid payload addresses, and some tests had no reference signature at all. A secondary issue was that the local GCC did not support the exact zicsr ISA string coming from test metadata.

Fixes:

Updated reference-spike/env/model_test.h to include correct .tohost/.fromhost sections and a halt loop writing to tohost.
Forced compile ISA to rv32i (compile_isa=rv32i) in both plugins.
Set Spike run ISA to rv32i (ref_isa=rv32i) explicitly.

Stage 5: Partial Pass on Icarus (34/41)

After the above fixes, the Icarus-based run reached 34 pass, 7 fail. The 7 failing tests were all branch and jump heavy: beq, bge, bgeu, blt, bltu, bne, jal. Both DUT and reference halted correctly, but the DUT signature contained large deadbeef regions where the reference had updated values.

Root cause: those tests had large .text sections followed by .data/signature regions at high VMAs. The memory arrays were fixed-size constants, and the extraction logic used simplistic indexing that did not account for the real section layout.

Fixes:

Added dynamic memory sizing in the runner by parsing ELF section headers and computing the actual required INST_MEM_WORDS and DATA_MEM_WORDS.
Generated a per-test local defines.v with the computed values instead of using hardcoded constants.

Stage 6: Switch to Verilator

Icarus was too slow for the larger jump-heavy tests. Switched the DUT simulation backend to Verilator.

Changes:

Added sim_backend dispatch in the DUT plugin.
Added the Verilator build and run path in the runner.
Updated config.ini: sim_backend=verilator, verilator=verilator, jobs=1.

Stage 7: Full Regression After Verilator Switch (All Tests Failing)

After the Verilator switch, every test failed. Signatures started with the canary value then deadbeef throughout, indicating the signature region was never written or was read from the wrong location.

Root cause: the runner was mixing absolute addresses with array-relative indices in two critical places:

When placing ELF sections into memory arrays, the code used raw VMAs instead of vma - base_addr as the array offset.
When extracting the signature from the memory dump, indices were computed without subtracting the base address first.

With VMAs starting at 0x80000000, this produced completely wrong offsets, even though the simulation itself halted correctly.

Fixes:

ELF section placement: section_off = section_vma - base_addr
Signature extraction: idx = (addr - base_addr) >> 2 with bounds checking
Kept base_addr = 0x80000000 consistent across compile, simulation, and extraction

After these fixes: 41/41 tests passed.

Stage 8: Coverage Bring-Up

After the compliance run was stable, riscof coverage was enabled. The main artifacts confirmed working:

riscof_work/coverage.md
riscof_work/coverage.html
riscof_work/suite_coverage.rpt

One benign warning appeared: the privilege test misalign1-jalr had no matching covergroup in the selected CGFs, producing an empty ref.cgf. This is not a failure for RV32I functional coverage and was treated as non-blocking.

The bin-level truth source used for all targeted closure work was suite_coverage.rpt, not the summary HTML. The HTML percentages are useful for orientation; the .rpt file shows exactly which bin is zero-hit.

Stage 9: First Directed Coverage Pass

Baseline weak areas before directed fixes:

ori: ~74% initially, rose to ~81.65% after initial edits, still an outlier
lb-align: ~92.94%
sb-align: ~92.81%
Shift family (sll, slli, srl, srli, sra, srai): mostly mid-90s

For each instruction, zero-hit bins were identified from suite_coverage.rpt, and targeted directed vectors were added directly to the relevant .S files in rv32i_m/I/src/. Signature regions were expanded with .fill directives to accommodate the added output writes.

Files modified in this pass:

ori-01.S
lb-align-01.S
sb-align-01.S
sll-01.S, slli-01.S, sra-01.S, srai-01.S, srl-01.S, srli-01.S

Results after this pass:

lb-align: 84/85 (98.82%)
sb-align: 150/153 (98.04%)
Shift instructions: ~96 to 97% range
ori: 534/654 (81.65%), still the main outlier

Stage 10: Deep `ori` Closure

ori remained at 81.65% with many uncovered bins across walking-ones, walking-zeros, and specific dataset combinations involving immediate values.

The zero-hit bin list from suite_coverage.rpt was parsed systematically. A large targeted vector set was added to ori-01.S covering each missing combination. The final remaining zero bin required a specific case: rs1_val==46341 and imm_val==3. After adding that case and updating the signature capacity, ori reached 654/654 (100.00%).

Stage 11: Final Remaining Gaps

After ori was closed, the remaining weak areas were:

sh-align: 93.79%
sw-align: 93.62%
lbu-align: 97.65%
slli: 96.07%
sra: 96.21%

For each, zero-hit bins were read from suite_coverage.rpt and targeted directed vectors were added, using exact alignment offset combinations, walking register values, and edge shift relations.

Files modified:

sh-align-01.S
sw-align-01.S
lbu-align-01.S
slli-01.S
sra-01.S

Results after this pass:

sh-align: 145/145 (100.00%)
sw-align: 141/141 (100.00%)
lbu-align: 85/85 (100.00%)
slli: 178/178 (100.00%)
sra: 211/211 (100.00%)
ori held at 654/654 (100.00%)

Key Points from the Fix Process

Bin-level debugging via suite_coverage.rpt is necessary for stubborn gaps. Summary percentages do not tell you which specific operand combination is uncovered. Generated arch tests can be safely extended with directed vectors when the goal is coverage closure, provided signature memory capacity is updated to match. Every added vector must write to a valid, non-overlapping region of the signature area. Some warnings from RISCOF (misalign1-jalr empty CGF) are benign and should not be confused with actual test failures.

The core DUT behavior was functionally correct throughout. The problems encountered were adapter-level: address translation, memory sizing, reference model plumbing, and signature extraction, not instruction execution errors.

RISCOF: Compliance and Coverage Testing

Compliance Testing

Coverage Testing

Results

Reading the Numbers

Setup and Commands

Bring-Up and Fix History

Stage 1: YAML Validation Failure

Stage 2: Icarus Compile Failure

Stage 3: Simulation Timeout / Halted Incorrectly

Stage 4: Spike Reference Issues

Stage 5: Partial Pass on Icarus (34/41)

Stage 6: Switch to Verilator

Stage 7: Full Regression After Verilator Switch (All Tests Failing)

Stage 8: Coverage Bring-Up

Stage 9: First Directed Coverage Pass

Stage 10: Deep ori Closure

Stage 11: Final Remaining Gaps

Key Points from the Fix Process

Stage 10: Deep `ori` Closure