preloader

RV32I CPU CORE

RV32I NEVER

This project implements a basic RV32I RISC-V processor core using TL-Verilog. The processor is capable of executing the base integer instruction set, with an example program that calculates the sum of integers from 1 to 9. The code is written for Makerchip and is fully synthesizable and testable in simulation.


Overview

The design uses a single-stage architecture with instruction fetch, decode, execution, and memory access logic implemented in TL-Verilog. The core uses a program counter $pc, instruction memory $instr, and a 32-register file. It supports all immediate types (I, S, B, U, J), arithmetic and logic operations, and basic branching and jumping.


Program

The initial program is stored in the section m4_asm. It loads a simple loop that:

  • Sets register x12 to 10 as the loop end
  • Increments register x13 from 1 to 9
  • Accumulates the sum in register x14
  • Sets x30 if the sum is correct
  • Sets x31 if the test fails

The purpose of this program is to verify that the core can perform register arithmetic and branching correctly.


Instruction Memory

The instruction memory is read-only and provides 32-bit instructions from the address held in $pc. It is instantiated using the Makerchip macro READONLY_MEM($pc, $$instr). The instructions are stored in little-endian order and aligned to 4-byte boundaries.


Instruction Decode

Instruction decode logic uses:

  • opcode field at bits [6:2] to classify instruction type
  • funct3 and funct7 fields to identify specific ALU operations
  • rd, rs1, and rs2 fields to access register addresses

Decoded flags such as $is_add, $is_sub, $is_beq, $is_jal, and $is_jalr are derived from instruction fields to simplify control logic.


ALU Logic

ALU operations are determined based on the decoded instruction. The source operand values are read from $src1_value and $src2_value. These are selected from the register file and immediate values depending on the instruction type.

The ALU supports:

  • Arithmetic: $is_add, $is_sub, $is_addi
  • Logical: $is_and, $is_or, $is_xor, $is_andi, $is_ori, $is_xori
  • Shifting: $is_sll, $is_srl, $is_sra, including immediate shift types
  • Comparison: $is_slt, $is_sltu, $is_slti, $is_sltiu

Results from the ALU are stored in $alu_result.


Immediate Value Extraction

The design supports all RISC-V immediate formats. Immediate values are extracted into $imm based on instruction type flags:

  • For I-type, bits [30:20] are sign-extended
  • For S-type, bits [11:7] and [30:25] are concatenated and sign-extended
  • For B-type, bits [11], [10:5], [4:1], [31] are reordered and sign-extended
  • For U-type, upper 20 bits are used with lower bits set to zero
  • For J-type, bits are reordered similarly to form the jump target

These are used as the second operand in ALU calculations or as offsets in control flow instructions.


Register File

The core uses a 32-register file (x0 to x31) with dual-read and single-write capabilities. Register x0 is hard-wired to zero using a write-enable mask:

  • $wr_en is true only when $rd_valid is set and $rd is not zero
  • $rd, $rs1, and $rs2 are extracted from the instruction
  • Values are read from $src1_value and $src2_value
  • Results are written back to the register file using $wr_data

The register file is instantiated using m4+rf(32, 32, ...).


Program Counter and Control Flow

The program counter $pc determines the instruction to fetch each cycle. It is updated based on instruction type:

  • On reset, $pc is set to zero
  • For jumps and branches, $br_tgt_pc or $jalr_tgt_pc is selected
  • If no branch is taken, $pc increments by 4

Branch target addresses are calculated using:

  • $br_tgt_pc = $pc + $imm
  • $jalr_tgt_pc = $src1_value + $imm

The next PC is selected using a priority multiplexer with $taken_br, $is_jal, and $is_jalr flags.


Branch and Jump Control

Branch decisions are made using a combinational block that evaluates conditions:

  • $is_beq: branch if equal
  • $is_bne: branch if not equal
  • $is_blt: branch if less than (signed)
  • $is_bge: branch if greater or equal (signed)
  • $is_bltu: branch if less than (unsigned)
  • $is_bgeu: branch if greater or equal (unsigned)

The condition results are used to set $taken_br, which influences the $next_pc logic.


Data Memory (Load/Store)

Data memory is declared using m4+dmem(32, 32, ...), which is ready to handle load and store instructions. Though not used in the example program, the memory supports:

  • $ld_en: load enable signal
  • $st_en: store enable signal
  • $dmem_addr: effective address
  • $dmem_wr_data: store data
  • $dmem_rd_data: read result

Instruction decoding for LW, SW, and other memory operations is included for extensibility.


Test and Debug Support

Simulation is capped by M4_MAX_CYC, defined as 50 cycles. If the program does not finish by then, the core sets *failed.

Two registers indicate test results:

  • x30 is set to 1 when the final sum is correct (i.e., 45)
  • x31 is set to 1 if the sum is incorrect

The test program loops using beq and exits by jumping to its own address after completion. This allows easy validation of loop control and ALU functionality.


Makerchip Integration

The code includes m4+cpu_viz() for simulation and visualization in Makerchip. This enables waveform inspection and debugging without extra setup.

To run the simulation:

  • Paste the code into Makerchip
  • Click run to simulate
  • View registers and waveform to ensure correct behavior

Check if register x14 equals 45 to confirm successful execution.

Here’s a concise explanation of the results flow in Makerchip’s TL-Verilog toolchain:


Results Flow in Makerchip

  • top.tlv — Your original TL-Verilog source code file. This file is processed by a Perl script to generate an intermediate macro-expanded file:

  • top.m4.pre — A preprocessed TL-Verilog file with macros expanded, ready for the M4 macro processor. The M4 macro processor then transforms this into:

  • top.m4 — The fully macro-expanded TL-Verilog file, prepared for further processing by SandPiper. SandPiper takes this file and generates synthesizable SystemVerilog code:

  • top.sv and top_gen.sv — Generated SystemVerilog files from TL-Verilog, suitable for simulation and synthesis. These files can then be passed to Verilator for cycle-accurate simulation.

  • vlt_dump.vcd — The output waveform dump file generated by Verilator during simulation. This VCD (Value Change Dump) file can be opened in third-party waveform viewers such as drom.io Surfer to visualize signal activity and debug the design.