RV32I OOO Core: Tomasulo-Style Algorithm Notes

This core uses a hybrid algorithm: Tomasulo-style tag-based dynamic scheduling for wakeup/select, combined with reorder-buffer commit for precise architectural state.

Big Picture

The execution policy is:

Fetch and decode in program order.
Rename destination registers to ROB tags.
Dispatch renamed instructions into an issue queue.
Issue when operands are ready, not strictly by program order.
Write results back using tag broadcasts.
Commit from ROB head in order.

So instruction execution can move out of order, but architectural visibility stays in order.

Core State and Registers

Architectural Register File

32 integer registers (x0 to x31), 32-bit each.
x0 remains hardwired to zero.
Register writes happen only at commit.

RAT (Register Alias Table)

For each architectural register:

alias_valid[reg]: whether the register currently maps to an in-flight producer.
alias_tag[reg]: ROB tag for the latest producer.

Behavior:

At dispatch of a writing instruction: RAT entry for rd is updated to the new ROB tag.
At commit: RAT entry is cleared only if the committing tag still matches the current mapping.
This avoids clearing a mapping that has already been overwritten by a younger writer.

ROB (Reorder Buffer)

Circular buffer with head, tail, and count.

Each entry tracks at least:

valid, ready
rd, reg_write
value
memory intent (mem_read, mem_write, mem_to_reg, mem_funct3)
memory payload (mem_addr, mem_wdata, load_data)
ecall, halt

Commit policy:

Only the head entry can commit.
Commit moves head forward by one entry.
This gives precise retirement order.

Issue Queue

Each entry contains:

source operand state: rs1_ready/val/tag, rs2_ready/val/tag
decoded control fields
assigned ROB tag
age stamp (iq_age), used for oldest-ready selection

Selection policy:

ready means both operands are ready now, either already latched or woken in this cycle by CDB.
among ready entries, oldest age is selected.
control instructions can be preferred when configured.

CDB (Tag Broadcast)

Two value/tag broadcast paths feed wakeup logic:

execute writeback broadcast
commit-side broadcast

IQ compares incoming tags against waiting operand tags and marks matches ready.

Fetch/Decode/Dispatch Rules

Dispatch occurs only when all are true:

IF/ID holds a valid instruction
ROB can allocate
IQ has space
no control instruction is currently in flight
no flush or redirect is active

This prevents front-end overrun and keeps rename + ROB + IQ synchronized.

Operand Resolution at Dispatch

When a source register is read:

If RAT says no pending producer: operand comes from architectural register file.
If RAT has a tag:
- first check same-cycle broadcast tags
- if matched, capture value immediately
- else keep tag in IQ and wait for wakeup

This is the key Tomasulo-style part: operands are tracked by producer tags instead of strict pipeline stage timing.

Execute and Writeback

Issued instructions execute using resolved operands.

ALU/branch/CSR use computed values directly.
For loads, speculative memory read is allowed only when there is no older-store hazard and no conflicting commit-side memory activity in that cycle.

Writeback sends tag + value into ROB and CDB path.

Memory Ordering Behavior

Stores:

enter store state when issued
become architecturally visible only at ROB-head commit

Loads:

perform older-store hazard query before speculative memory read
if blocked, load is serviced when it reaches commit path

This design keeps memory side effects ordered by commit while still allowing some load overlap.

Control Flow Handling

One control instruction is tracked as in-flight from dispatch until issue.
This blocks dispatch of younger control-sensitive work.
On taken branch, redirect and backend flush are asserted.
On jal/jalr redirect, front-end flush is used with the in-flight gating model.

Step-by-Step Example 1: RAW Chain

Instruction stream:

I0: add x5, x1, x2
I1: sub x6, x5, x3
I2: xor x7, x6, x4

Rename/dispatch:

I0 allocates tag T0, RAT[x5] = T0.
I1 allocates T1, RAT[x6] = T1, source x5 is tagged T0 (not ready yet).
I2 allocates T2, RAT[x7] = T2, source x6 is tagged T1.

Issue/wakeup:

I0 issues first, writes T0 on CDB.
IQ entry for I1 sees T0, marks rs1_ready and captures value.
I1 issues, writes T1.
IQ entry for I2 wakes on T1.
I2 issues.

Commit:

ROB commits T0, then T1, then T2 in order.
Architectural state changes happen in program order.

Step-by-Step Example 2: WAW Elimination via Rename

Instruction stream:

I0: addi x10, x0, 1
I1: addi x10, x0, 2
I2: add  x11, x10, x12

Rename effect:

I0 gets T0, RAT[x10] = T0.
I1 gets T1, RAT[x10] = T1 (overwrites alias to younger writer).
I2 reads source x10 as tag T1, not T0.

Outcome:

Older write and younger write do not conflict in scheduling.
Consumer is automatically bound to the correct younger producer.

Step-by-Step Example 3: Store-Load Ordering

Instruction stream:

I0: sw  x8, 0(x2)
I1: lw  x9, 0(x2)
I2: add x10, x9, x3

Behavior:

I0 allocates older ROB entry and records store metadata.
Before speculative load read, hazard query checks for older pending stores.
If unresolved/matching older store exists, load speculation is blocked.
Load value is then provided through commit-time memory path.
Dependents wake when load value is broadcast from commit path.

This keeps memory ordering correct even with out-of-order issue.

Step-by-Step Example 4: Taken Branch Recovery

Instruction stream:

I0: beq x1, x2, target
I1: add x5, x6, x7
I2: sub x8, x9, x10

Behavior:

Control instruction is marked in-flight, limiting further risky dispatch.
When I0 resolves taken, PC redirects to target.
Front-end state is flushed.
Backend flush path removes speculative queue/buffer state for branch recovery.

Why This Works Well for RV32I OOO

Tag-based wakeup handles true data dependencies naturally.
Rename removes false dependencies (WAR/WAW).
ROB commit gives precise state and clean exception behavior.
Ordered commit for memory side effects keeps correctness simple.

Practical Limits in This Revision

Single-issue width at dispatch/issue/commit limits peak throughput.
Control in-flight gating is conservative by design.
Load/store handling favors correctness over aggressive speculation.
LSQ store-to-load forward interface exists, while current top-level load path is primarily driven by ROB hazard policy and commit memory flow.

Compact Pseudocode

every cycle:
  if flush:
    clear front-end valid state
    clear rename/scheduler/ROB state as required

  if dispatch_ok:
    tag <- ROB.allocate()
    RAT[rd] <- tag (if rd writes)
    src operands <- RF value or RAT tag or same-cycle CDB value
    IQ.enqueue(instruction, src descriptors, tag)

  IQ.wakeup_from_CDB()
  issued <- IQ.select_oldest_ready()

  if issued:
    result <- execute(issued)
    ROB.writeback(tag, result, mem metadata)
    CDB.broadcast(tag, result_if_broadcastable)

  if ROB.head.ready:
    commit ROB.head in order
    write RF / memory as needed
    clear RAT mapping only if tag still matches