RV32I RISC-V CPU Core (TL-Verilog)

RV32I RISC-V CPU Core (TL-Verilog)

RISC-V Architecture TL-Verilog ISA Implementation Instruction Decode ALU Design Control Flow Logic Makerchip Toolchain
ISA RV32I (Base Integer)
Architecture Single-stage
Language TL-Verilog
Target Makerchip / SandPiper
Output Synthesizable SystemVerilog

RV32I CPU Core

A complete implementation of a *32-bit RV32I RISC-V processor core- written in TL-Verilog. The core executes the full base integer instruction set and is fully synthesizable through the Makerchip → SandPiper toolchain.

The reference program verifies correct arithmetic, branching, and register behavior by computing:

\[ 1 + 2 + \dots + 9 = 45 \]


Architectural Summary

Component Description
Datapath Single-stage
Register File 32 × 32-bit (dual-read, single-write)
Instruction Memory Read-only
Data Memory Load/store capable
PC Update Branch/jump priority mux
Immediate Support I, S, B, U, J types
Test Result x30 = pass, x31 = fail

Core Architecture

The design integrates:

  • Program counter ($pc)
  • Instruction memory ($instr)
  • Register file (m4+rf)
  • ALU with full RV32I arithmetic/logic support
  • Immediate extraction logic
  • Branch/jump control logic
  • Data memory interface

All stages (fetch, decode, execute, memory, write-back) occur within a single logical stage, simplifying control at the expense of throughput.


Program Behavior

The embedded m4_asm program:

  • Initializes x12 = 10 (loop bound)
  • Increments x13 from 1 to 9
  • Accumulates sum into x14
  • Sets x30 = 1 if sum = 45
  • Sets x31 = 1 if incorrect

Expected result:

\[ x14 = 45 \]

This validates:

  • Register arithmetic
  • Immediate handling
  • Branch comparison
  • Loop control
  • Correct PC update

Instruction Decode

Fields extracted from $instr:

Field Bits
opcode [6:2]
rd [11:7]
funct3 [14:12]
rs1 [19:15]
rs2 [24:20]
funct7 [31:25]

Decoded control flags:

  • $is_add, $is_sub, $is_addi
  • $is_and, $is_or, $is_xor
  • $is_sll, $is_srl, $is_sra
  • $is_slt, $is_sltu
  • $is_beq, $is_bne, $is_blt, $is_bge
  • $is_jal, $is_jalr

These flags drive ALU, register write-back, and PC control.


ALU Implementation

Operands:

  • $src1_value
  • $src2_value (register or immediate)

Supported operations:

Arithmetic

  • ADD / ADDI
  • SUB

Logical

  • AND / ANDI
  • OR / ORI
  • XOR / XORI

Shifts

  • SLL / SLLI
  • SRL / SRLI
  • SRA / SRAI

Comparison

  • SLT / SLTU
  • SLTI / SLTIU

Result stored in:

$alu_result

Immediate Extraction

All RV32I formats supported:

Type Construction
I-type Sign-extended [30:20]
S-type { [30:25], [11:7] }
B-type Reordered branch offset
U-type Upper 20 bits « 12
J-type Reordered jump offset

Immediate stored in:

$imm

Used for:

  • ALU operand
  • Branch offset
  • Jump target

Register File

Implemented using:

m4+rf(32, 32, ...)

Characteristics:

  • 32 registers (x0–x31)
  • Dual read ports
  • Single write port
  • x0 hardwired to zero
  • $wr_en gated when rd != 0

Read values:

$src1_value
$src2_value

Write-back:

$wr_data = $alu_result

Program Counter Logic

PC update rules:

Condition Next PC
Reset 0
Taken branch $pc + $imm
JAL $pc + $imm
JALR $src1_value + $imm
Default $pc + 4

Branch resolution is purely combinational.

Control priority:

taken_br > jal > jalr > sequential

Branch Decision Logic

Conditions implemented:

  • BEQ
  • BNE
  • BLT / BGE (signed)
  • BLTU / BGEU (unsigned)

Decision signal:

$taken_br

Drives $next_pc.


Data Memory

Instantiated via:

m4+dmem(32, 32, ...)

Signals:

  • $ld_en
  • $st_en
  • $dmem_addr
  • $dmem_wr_data
  • $dmem_rd_data

Supports:

  • LW
  • SW
  • Extensible to full load/store family

Not exercised in the summation test but fully integrated.


Simulation & Verification

Simulation bounded by:

M4_MAX_CYC = 50

Pass/Fail flags:

  • x30 = 1 → PASS
  • x31 = 1 → FAIL

Correct execution requires:

\[ x14 = 45 \]


Makerchip Toolchain Flow

Processing Stages

  1. top.tlv Original TL-Verilog source.

  2. top.m4.pre Macro-expanded preprocessed file.

  3. top.m4 Fully macro-expanded TL-Verilog.

  4. SandPiper Output

    • top.sv
    • top_gen.sv
  5. Verilator Simulation

    • Generates vlt_dump.vcd
  6. Waveform Viewing

    • VCD opened in external viewer (e.g., Surfer)

This pipeline converts TL-Verilog into synthesizable SystemVerilog and enables cycle-accurate simulation.