Superscalar Processor – Top-Level Module Documentation
1. Overview
The top-level module implements a 2-way in-order superscalar pipelined processor based on a RiSC-16–like ISA. The design issues up to two instructions per cycle, propagates them through duplicated pipeline lanes, and employs hazard detection, forwarding, and squash logic.
The top module instantiates:
- A multi-ported memory system (instruction and data ports).
- A dual-issue 5-stage pipeline: IF, ID, EX, MEM, WB for slot 0 and slot 1.
- Pipeline registers (
IFID,IDEX,EXMEM,MEMWB) for each slot. - Arithmetic and logical components.
- Bypass/forwarding network.
- Stall and squash control.
Despite its superscalar nature, execution remains in-order.
2. Architectural Block Diagram (Textual Representation)
+------------------+
| Memory System |
| (3-Port ARAM) |
+------------------+
^ ^
| |
+--------------+ +-----------------+
| |
+---------------+ +---------------+ +---------------+
| IF Stage 0 | | IF Stage 1 | | PC Logic |
+---------------+ +---------------+ +---------------+
| |
v v
+---------------+ +---------------+
| IF/ID Pipe | | IF/ID Pipe |
| Register 0 | | Register 1 |
+---------------+ +---------------+
| |
v v
+---------------+ +---------------+
| ID/EX Pipe | | ID/EX Pipe |
| Register 0 | | Register 1 |
+---------------+ +---------------+
| |
v v
+---------------+ +---------------+
| EX/MEM Pipe | | EX/MEM Pipe |
| Register 0 | | Register 1 |
+---------------+ +---------------+
| |
v v
+---------------+ +---------------+
| MEM/WB Pipe | | MEM/WB Pipe |
| Register 0 | | Register 1 |
+---------------+ +---------------+
| |
+-----------+-----------+
v
Register File
3. Pipeline Structure
3.1 Dual-Issue Pipeline Lanes
Each pipeline stage has two independent lanes:
| Stage | Slot 0 Signal Prefix | Slot 1 Signal Prefix |
|---|---|---|
| IF | IFID_*_0 |
IFID_*_1 |
| ID | IDEX_*_0 |
IDEX_*_1 |
| EX | EXMEM_*_0 |
EXMEM_*_1 |
| MEM | MEMWB_*_0 |
MEMWB_*_1 |
| WB | direct outputs | direct outputs |
Each lane processes one instruction. Dual issue occurs only when slot 1 is not dependent on slot 0 within the same cycle and no hazards/stalls prevent it.
3.2 Pipeline Stages Summary
| Stage | Primary Functions |
|---|---|
| IF | PC selection, instruction fetch from memory port 1 |
| ID | Instruction decode, operand register read, hazard detection |
| EX | ALU operations, address generation, branch evaluation |
| MEM | Data memory read/write via memory port 2 |
| WB | Register file writeback |
4. Control Flow Logic
4.1 Program Counter (PC) and Next-PC Logic
Both slots share a common PC generator, but squashes determine which slot should be nullified.
Pseudocode illustrating top-level PC update:
if (branch_taken) {
PC_next = branch_target;
}
else {
PC_next = PC + 2; // fetch two instructions
}
When a branch occupies slot 0, slot 1 is automatically squashed.
4.2 Squash Logic
Squash signals are named:
Pstomp_0Pstomp_1
Rules:
-
If slot 0 instruction is a branch or JALR and is taken:
Pstomp_0 = 1Pstomp_1 = 1(slot 1 must also be squashed)
-
If slot 1 instruction is independent branch:
Pstomp_1 = 1
On squash, the next stage pipeline registers receive a NOP.
5. Hazard Detection and Stall Mechanisms
5.1 Stall Logic
Stall signals:
Pstall_0Pstall_1
A stall freezes the respective instruction; earlier stages do not advance.
Key hazard types:
| Hazard Type | Behavior |
|---|---|
| Load-use | Stall 1 cycle until memory data is available |
| Structural (writing RF) | Prevents slot 1 issue if both instructions write same register |
| Control | Handled via squashing |
5.2 Load-Use Hazard Detection
Pseudocode summarizing logic:
if (IDEX.op == LW and
(IDEX.dest == IFID.srcA or IDEX.dest == IFID.srcB)) {
Pstall = 1;
}
This check exists for both slots.
5.3 Inter-slot Dependency Handling
Slot 1 is prevented from issuing if it depends on slot 0:
if (IFID_1.src == IFID_0.dest) {
Pstall_1 = 1;
}
In-order dual issue is enforced.
6. Forwarding Network
Forwarding MUXes compare source register indices against the destination registers in:
- IDEX stage
- EXMEM stage
- MEMWB stage
A simplified pseudocode model:
if (EXMEM.dest == ID.src) operand = EXMEM.value;
else if (MEMWB.dest == ID.src) operand = MEMWB.value;
else operand = RF_value;
Because forwarding uses architectural register indices directly, not tags, the design remains in-order.
7. Memory Subsystem Integration
The top module connects to a 3-port ARAM memory:
| Port | Purpose |
|---|---|
| Port 1 | Instruction fetch (dual words) |
| Port 2 | Data access for slot 0 |
| Port 3 | Data access for slot 1 |
Instruction fetch uses sequential PC addresses:
addr1_0 = PC
addr1_1 = PC + 1
Data accesses for loads/stores use addresses from ALU results in EXMEM stage.
8. Writeback and Register File Behavior
8.1 Writeback Rules
Each slot writes back:
- ALU results
- Loaded data
- JALR link addresses
Writeback is in-order because WB follows pipeline progression.
8.2 Register File Constraints
Slot 1 may not issue if:
- It writes the same register as slot 0 in the same cycle.
- It reads a register being written by slot 0 in the same cycle that cannot be forwarded in time.
Thus RF integrity is preserved.
9. Instruction Flow Example
Below is a conceptual pipeline table showing two independent instructions issued together:
Cycle | Slot 0 | Slot 1
------+-------------------+----------------------
1 | IF inst0 | IF inst1
2 | ID inst0 | ID inst1
3 | EX inst0 | EX inst1
4 | MEM inst0 | MEM inst1
5 | WB inst0 | WB inst1
With a load-use hazard:
Cycle | Slot 0 | Slot 1
------+----------------------+---------------------
1 | IF LW r1,0(r2) | IF ADD r3,r1,r4
2 | ID LW | ID ADD (stall)
3 | EX LW | ID ADD
4 | MEM LW | EX ADD (forwarded)
5 | WB LW | MEM ADD
6 | | WB ADD
10. Summary of Capabilities
| Feature | Supported |
|---|---|
| Dual-issue superscalar | Yes |
| Out-of-order execution | No |
| Pipeline depth | 5 stages per slot |
| Forwarding | Yes |
| Load-use stall | Yes |
| Branch resolution in EX | Yes |
| Squash on taken branch | Yes |
| Hazard detection (RAW) | Yes |
| Register renaming | No |
| Reorder buffer | No |