
RV32I OOO Core: Tomasulo-Style Algorithm Notes
This core uses a hybrid algorithm: Tomasulo-style tag-based dynamic scheduling for wakeup/select, combined with reorder-buffer commit for precise architectural state.
Big Picture
The execution policy is:
- Fetch and decode in program order.
- Rename destination registers to ROB tags.
- Dispatch renamed instructions into an issue queue.
- Issue when operands are ready, not strictly by program order.
- Write results back using tag broadcasts.
- Commit from ROB head in order.
So instruction execution can move out of order, but architectural visibility stays in order.
Core State and Registers
Architectural Register File
- 32 integer registers (
x0tox31), 32-bit each. x0remains hardwired to zero.- Register writes happen only at commit.
RAT (Register Alias Table)
For each architectural register:
alias_valid[reg]: whether the register currently maps to an in-flight producer.alias_tag[reg]: ROB tag for the latest producer.
Behavior:
- At dispatch of a writing instruction: RAT entry for
rdis updated to the new ROB tag. - At commit: RAT entry is cleared only if the committing tag still matches the current mapping.
- This avoids clearing a mapping that has already been overwritten by a younger writer.
ROB (Reorder Buffer)
Circular buffer with head, tail, and count.
Each entry tracks at least:
valid,readyrd,reg_writevalue- memory intent (
mem_read,mem_write,mem_to_reg,mem_funct3) - memory payload (
mem_addr,mem_wdata,load_data) ecall,halt
Commit policy:
- Only the
headentry can commit. - Commit moves head forward by one entry.
- This gives precise retirement order.
Issue Queue
Each entry contains:
- source operand state:
rs1_ready/val/tag,rs2_ready/val/tag - decoded control fields
- assigned ROB tag
- age stamp (
iq_age), used for oldest-ready selection
Selection policy:
- ready means both operands are ready now, either already latched or woken in this cycle by CDB.
- among ready entries, oldest age is selected.
- control instructions can be preferred when configured.
CDB (Tag Broadcast)
Two value/tag broadcast paths feed wakeup logic:
- execute writeback broadcast
- commit-side broadcast
IQ compares incoming tags against waiting operand tags and marks matches ready.
Fetch/Decode/Dispatch Rules
Dispatch occurs only when all are true:
- IF/ID holds a valid instruction
- ROB can allocate
- IQ has space
- no control instruction is currently in flight
- no flush or redirect is active
This prevents front-end overrun and keeps rename + ROB + IQ synchronized.
Operand Resolution at Dispatch
When a source register is read:
- If RAT says no pending producer: operand comes from architectural register file.
- If RAT has a tag:
- first check same-cycle broadcast tags
- if matched, capture value immediately
- else keep tag in IQ and wait for wakeup
This is the key Tomasulo-style part: operands are tracked by producer tags instead of strict pipeline stage timing.
Execute and Writeback
Issued instructions execute using resolved operands.
- ALU/branch/CSR use computed values directly.
- For loads, speculative memory read is allowed only when there is no older-store hazard and no conflicting commit-side memory activity in that cycle.
Writeback sends tag + value into ROB and CDB path.
Memory Ordering Behavior
Stores:
- enter store state when issued
- become architecturally visible only at ROB-head commit
Loads:
- perform older-store hazard query before speculative memory read
- if blocked, load is serviced when it reaches commit path
This design keeps memory side effects ordered by commit while still allowing some load overlap.
Control Flow Handling
- One control instruction is tracked as in-flight from dispatch until issue.
- This blocks dispatch of younger control-sensitive work.
- On taken branch, redirect and backend flush are asserted.
- On jal/jalr redirect, front-end flush is used with the in-flight gating model.
Step-by-Step Example 1: RAW Chain
Instruction stream:
I0: add x5, x1, x2
I1: sub x6, x5, x3
I2: xor x7, x6, x4
Rename/dispatch:
I0allocates tagT0, RAT[x5] =T0.I1allocatesT1, RAT[x6] =T1, sourcex5is taggedT0(not ready yet).I2allocatesT2, RAT[x7] =T2, sourcex6is taggedT1.
Issue/wakeup:
I0issues first, writesT0on CDB.- IQ entry for
I1seesT0, marksrs1_readyand captures value. I1issues, writesT1.- IQ entry for
I2wakes onT1. I2issues.
Commit:
- ROB commits
T0, thenT1, thenT2in order. - Architectural state changes happen in program order.
Step-by-Step Example 2: WAW Elimination via Rename
Instruction stream:
I0: addi x10, x0, 1
I1: addi x10, x0, 2
I2: add x11, x10, x12
Rename effect:
I0getsT0, RAT[x10] =T0.I1getsT1, RAT[x10] =T1(overwrites alias to younger writer).I2reads sourcex10as tagT1, notT0.
Outcome:
- Older write and younger write do not conflict in scheduling.
- Consumer is automatically bound to the correct younger producer.
Step-by-Step Example 3: Store-Load Ordering
Instruction stream:
I0: sw x8, 0(x2)
I1: lw x9, 0(x2)
I2: add x10, x9, x3
Behavior:
I0allocates older ROB entry and records store metadata.- Before speculative load read, hazard query checks for older pending stores.
- If unresolved/matching older store exists, load speculation is blocked.
- Load value is then provided through commit-time memory path.
- Dependents wake when load value is broadcast from commit path.
This keeps memory ordering correct even with out-of-order issue.
Step-by-Step Example 4: Taken Branch Recovery
Instruction stream:
I0: beq x1, x2, target
I1: add x5, x6, x7
I2: sub x8, x9, x10
Behavior:
- Control instruction is marked in-flight, limiting further risky dispatch.
- When
I0resolves taken, PC redirects to target. - Front-end state is flushed.
- Backend flush path removes speculative queue/buffer state for branch recovery.
Why This Works Well for RV32I OOO
- Tag-based wakeup handles true data dependencies naturally.
- Rename removes false dependencies (WAR/WAW).
- ROB commit gives precise state and clean exception behavior.
- Ordered commit for memory side effects keeps correctness simple.
Practical Limits in This Revision
- Single-issue width at dispatch/issue/commit limits peak throughput.
- Control in-flight gating is conservative by design.
- Load/store handling favors correctness over aggressive speculation.
- LSQ store-to-load forward interface exists, while current top-level load path is primarily driven by ROB hazard policy and commit memory flow.
Compact Pseudocode
every cycle:
if flush:
clear front-end valid state
clear rename/scheduler/ROB state as required
if dispatch_ok:
tag <- ROB.allocate()
RAT[rd] <- tag (if rd writes)
src operands <- RF value or RAT tag or same-cycle CDB value
IQ.enqueue(instruction, src descriptors, tag)
IQ.wakeup_from_CDB()
issued <- IQ.select_oldest_ready()
if issued:
result <- execute(issued)
ROB.writeback(tag, result, mem metadata)
CDB.broadcast(tag, result_if_broadcastable)
if ROB.head.ready:
commit ROB.head in order
write RF / memory as needed
clear RAT mapping only if tag still matches