RISCOF: Compliance and Coverage Testing

RISCOF (RISC-V Compatibility Framework) is the official framework for certifying RISC-V implementations against the RISC-V Architectural Compatibility Tests (ACT). It works by running the same test programs on both the DUT and a trusted reference model (Spike), extracting memory signatures from both, and comparing them word by word. A test passes when the DUT signature matches the reference exactly.
The framework has two separate modes. The compliance run (riscof run) checks whether the DUT correctly implements the ISA by verifying functional correctness across the full test suite. The coverage run (riscof coverage) measures how well the test suite itself exercises the architectural specification, using covergroups defined in .cgf files to count how many cross-product bins across operand values, alignments, and edge cases were actually hit.
Compliance Testing
The compliance run targets rv32i-pipe against the official RISC-V architectural test suite. RISCOF builds and runs each test on the DUT and on Spike, extracts the signature region from data memory at the end of simulation, and diffs the two outputs.
The test suite used here covers:
- 37
rv32uitests (base integer ISA, user-level) - 4
rv32imtests excluded from this run (M extension not targeted here)
Final result: 41/41 tests passed.
All tests produce a matching signature. No instruction produces a wrong result or fails to write its output to the expected memory location.
Coverage Testing
After the compliance run, riscof coverage is run with two .cgf files:
coverage/dataset.cgf(shared cross-instruction dataset bins)coverage/i/rv32i.cgf(per-instruction RV32I covergroups)
These files define, for each instruction, what operand value combinations, alignment patterns, and edge cases must be hit to consider the instruction thoroughly tested. Coverage is reported per instruction as bins hit over total bins.
Results
| Covergroup | Coverage |
|---|---|
| add | 720/730 (98.63%) |
| addi | 650/655 (99.24%) |
| and | 720/730 (98.63%) |
| andi | 650/655 (99.24%) |
| auipc | 103/103 (100.00%) |
| beq | 684/693 (98.70%) |
| bge | 683/693 (98.56%) |
| bgeu | 825/831 (99.28%) |
| blt | 685/693 (98.85%) |
| bltu | 822/831 (98.92%) |
| bne | 685/693 (98.85%) |
| fence | 1/1 (100.00%) |
| jal | 37/37 (100.00%) |
| jalr | 94/94 (100.00%) |
| lb-align | 84/85 (98.82%) |
| lbu-align | 85/85 (100.00%) |
| lh-align | 76/77 (98.70%) |
| lhu-align | 76/77 (98.70%) |
| lui | 103/103 (100.00%) |
| lw-align | 72/73 (98.63%) |
| or | 721/730 (98.77%) |
| ori | 654/654 (100.00%) |
| sb-align | 150/153 (98.04%) |
| sh-align | 145/145 (100.00%) |
| sll | 204/211 (96.68%) |
| slli | 178/178 (100.00%) |
| slt | 723/730 (99.04%) |
| slti | 651/655 (99.39%) |
| sltiu | 785/790 (99.37%) |
| sltu | 856/866 (98.85%) |
| sra | 211/211 (100.00%) |
| srai | 172/178 (96.63%) |
| srl | 206/211 (97.63%) |
| srli | 173/178 (97.19%) |
| sub | 723/730 (99.04%) |
| sw-align | 141/141 (100.00%) |
| xor | 722/730 (98.90%) |
| xori | 652/655 (99.54%) |
Reading the Numbers
Several instructions reach 100%: auipc, fence, jal, jalr, lbu-align, lui, ori, sh-align, slli, sra, sw-align. These covergroups were either small enough that the standard test suite covered everything, or they required directed vector additions to close the remaining bins (described in the fix history below).
The remaining gaps are small and uniform. Instructions like add, and, or, xor, and sub sit at 98.6% to 98.9%, with 7 to 10 bins uncovered out of 730. These are typically extreme operand combinations in the walking-ones/zeros dataset that the standard tests do not exercise. Similarly, the branch instructions (beq, bge, blt, bne, etc.) are at 98.5% to 99.3%, with the uncovered bins in edge-case signed/unsigned boundary combinations.
The shift instructions are the weakest group. srai at 96.63% and srl at 97.63% have bins around specific shift amounts combined with boundary register values that were not fully covered even after directed additions. These remain open.
Load alignment (lb-align, lh-align, lhu-align, lw-align) and store alignment (sb-align) each have one to three bins uncovered, typically a specific alignment offset combined with a particular data pattern.
Setup and Commands
The RISCOF flow is isolated under rv32i-pipe/riscof/. It does not touch the existing Makefile, ISA test scripts, or coremark flows.
Directory structure:
rv32i-pipe/riscof/
config.ini
setup_env.sh
run_certification.sh
dut/
riscof_rv32i_pipe.py
rv32i_pipe_runner.py
rv32i_pipe_isa.yaml
rv32i_pipe_platform.yaml
env/
link.ld
model_test.h
reference-spike/
riscof_spike_ref.py
env/
link.ld
model_test.h
Quick start:
cd rv32i-pipe/riscof
./setup_env.sh
source .venv/bin/activate
./run_certification.sh
Report at: riscof_work/report.html
Manual commands:
source .venv/bin/activate
cd rv32i-pipe/riscof
# clone architectural tests (once)
riscof arch-test --clone --dir ./riscv-arch-test
# validate YAML
riscof validateyaml --config ./config.ini --work-dir ./riscof_work
# compliance run
riscof run \
--config ./config.ini \
--suite ./riscv-arch-test/riscv-test-suite \
--env ./riscv-arch-test/riscv-test-suite/env \
--work-dir ./riscof_work
# coverage run
riscof coverage \
--config ./config.ini \
--suite ./riscv-arch-test/riscv-test-suite \
--env ./riscv-arch-test/riscv-test-suite/env \
--work-dir ./riscof_work \
-c ./riscv-arch-test/coverage/dataset.cgf \
-c ./riscv-arch-test/coverage/i/rv32i.cgf
Coverage report at: riscof_work/coverage.html
Notes:
- DUT ISA spec targets
RV32Iindut/rv32i_pipe_isa.yaml. config.inisetsrtl_root=.., resolving torv32i-pipe/.sim_backend=verilatoris used for DUT simulation.- Coverage extraction uses Spike trace logging (
-l --log-commits) in the reference plugin. - Some tests have labels not present in selected CGFs (for example, privilege
misalign1-jalragainstcoverage/i/rv32i.cgf). These emit an emptyref.cgfand are safely ignored during merge. .venv/,riscv-arch-test/, andriscof_work/are local-only and not committed. They are recreated bysetup_env.shandrun_certification.sh.
Bring-Up and Fix History
Getting from a fresh RISCOF setup to 41/41 passing and closing coverage outliers required fixing problems at multiple layers: YAML schema, simulator integration, memory mapping, signature extraction, and finally directed test vectors for coverage closure.
Stage 1: YAML Validation Failure
The first riscof validateyaml call failed immediately. The ISA YAML (dut/rv32i_pipe_isa.yaml) included a PMP block with keys not recognized by the installed riscv_config schema version.
Fix: removed the unsupported PMP fields entirely from the ISA YAML.
Stage 2: Icarus Compile Failure
After YAML was fixed, the DUT runner invoked iverilog to build the testbench, which failed with:
Include file defines.v not found
The generated testbench had `include "defines.v" but the compile command did not include the RTL root as a search path.
Fix: added -I <rtl_root> to the Icarus compile command in the runner.
Stage 3: Simulation Timeout / Halted Incorrectly
Many tests timed out (halted=0) and never wrote signatures. The root cause was that the linker base address, halt mechanism, and RISCOF environment headers were not aligned with the architectural test execution model.
Fixes applied together:
- Set linker start address to
0x80000000in both DUT and referencelink.ld. - Set DUT
BASE_ADDRto0x80000000in config and runner. - Updated
dut/env/model_test.hto include explicit halt behavior matching the arch-test model.
Stage 4: Spike Reference Issues
Reference runs produced errors about invalid payload addresses, and some tests had no reference signature at all. A secondary issue was that the local GCC did not support the exact zicsr ISA string coming from test metadata.
Fixes:
- Updated
reference-spike/env/model_test.hto include correct.tohost/.fromhostsections and a halt loop writing totohost. - Forced compile ISA to
rv32i(compile_isa=rv32i) in both plugins. - Set Spike run ISA to
rv32i(ref_isa=rv32i) explicitly.
Stage 5: Partial Pass on Icarus (34/41)
After the above fixes, the Icarus-based run reached 34 pass, 7 fail. The 7 failing tests were all branch and jump heavy: beq, bge, bgeu, blt, bltu, bne, jal. Both DUT and reference halted correctly, but the DUT signature contained large deadbeef regions where the reference had updated values.
Root cause: those tests had large .text sections followed by .data/signature regions at high VMAs. The memory arrays were fixed-size constants, and the extraction logic used simplistic indexing that did not account for the real section layout.
Fixes:
- Added dynamic memory sizing in the runner by parsing ELF section headers and computing the actual required
INST_MEM_WORDSandDATA_MEM_WORDS. - Generated a per-test local
defines.vwith the computed values instead of using hardcoded constants.
Stage 6: Switch to Verilator
Icarus was too slow for the larger jump-heavy tests. Switched the DUT simulation backend to Verilator.
Changes:
- Added
sim_backenddispatch in the DUT plugin. - Added the Verilator build and run path in the runner.
- Updated
config.ini:sim_backend=verilator,verilator=verilator,jobs=1.
Stage 7: Full Regression After Verilator Switch (All Tests Failing)
After the Verilator switch, every test failed. Signatures started with the canary value then deadbeef throughout, indicating the signature region was never written or was read from the wrong location.
Root cause: the runner was mixing absolute addresses with array-relative indices in two critical places:
- When placing ELF sections into memory arrays, the code used raw VMAs instead of
vma - base_addras the array offset. - When extracting the signature from the memory dump, indices were computed without subtracting the base address first.
With VMAs starting at 0x80000000, this produced completely wrong offsets, even though the simulation itself halted correctly.
Fixes:
- ELF section placement:
section_off = section_vma - base_addr - Signature extraction:
idx = (addr - base_addr) >> 2with bounds checking - Kept
base_addr = 0x80000000consistent across compile, simulation, and extraction
After these fixes: 41/41 tests passed.
Stage 8: Coverage Bring-Up
After the compliance run was stable, riscof coverage was enabled. The main artifacts confirmed working:
riscof_work/coverage.mdriscof_work/coverage.htmlriscof_work/suite_coverage.rpt
One benign warning appeared: the privilege test misalign1-jalr had no matching covergroup in the selected CGFs, producing an empty ref.cgf. This is not a failure for RV32I functional coverage and was treated as non-blocking.
The bin-level truth source used for all targeted closure work was suite_coverage.rpt, not the summary HTML. The HTML percentages are useful for orientation; the .rpt file shows exactly which bin is zero-hit.
Stage 9: First Directed Coverage Pass
Baseline weak areas before directed fixes:
ori: ~74% initially, rose to ~81.65% after initial edits, still an outlierlb-align: ~92.94%sb-align: ~92.81%- Shift family (
sll,slli,srl,srli,sra,srai): mostly mid-90s
For each instruction, zero-hit bins were identified from suite_coverage.rpt, and targeted directed vectors were added directly to the relevant .S files in rv32i_m/I/src/. Signature regions were expanded with .fill directives to accommodate the added output writes.
Files modified in this pass:
ori-01.Slb-align-01.Ssb-align-01.Ssll-01.S,slli-01.S,sra-01.S,srai-01.S,srl-01.S,srli-01.S
Results after this pass:
lb-align: 84/85 (98.82%)sb-align: 150/153 (98.04%)- Shift instructions: ~96 to 97% range
ori: 534/654 (81.65%), still the main outlier
Stage 10: Deep ori Closure
ori remained at 81.65% with many uncovered bins across walking-ones, walking-zeros, and specific dataset combinations involving immediate values.
The zero-hit bin list from suite_coverage.rpt was parsed systematically. A large targeted vector set was added to ori-01.S covering each missing combination. The final remaining zero bin required a specific case: rs1_val==46341 and imm_val==3. After adding that case and updating the signature capacity, ori reached 654/654 (100.00%).
Stage 11: Final Remaining Gaps
After ori was closed, the remaining weak areas were:
sh-align: 93.79%sw-align: 93.62%lbu-align: 97.65%slli: 96.07%sra: 96.21%
For each, zero-hit bins were read from suite_coverage.rpt and targeted directed vectors were added, using exact alignment offset combinations, walking register values, and edge shift relations.
Files modified:
sh-align-01.Ssw-align-01.Slbu-align-01.Sslli-01.Ssra-01.S
Results after this pass:
sh-align: 145/145 (100.00%)sw-align: 141/141 (100.00%)lbu-align: 85/85 (100.00%)slli: 178/178 (100.00%)sra: 211/211 (100.00%)oriheld at 654/654 (100.00%)
Key Points from the Fix Process
Bin-level debugging via suite_coverage.rpt is necessary for stubborn gaps. Summary percentages do not tell you which specific operand combination is uncovered. Generated arch tests can be safely extended with directed vectors when the goal is coverage closure, provided signature memory capacity is updated to match. Every added vector must write to a valid, non-overlapping region of the signature area. Some warnings from RISCOF (misalign1-jalr empty CGF) are benign and should not be confused with actual test failures.
The core DUT behavior was functionally correct throughout. The problems encountered were adapter-level: address translation, memory sizing, reference model plumbing, and signature extraction, not instruction execution errors.