Superscalar Processor – Top-Level Module Documentation

1. Overview

The top-level module implements a 2-way in-order superscalar pipelined processor based on a RiSC-16–like ISA. The design issues up to two instructions per cycle, propagates them through duplicated pipeline lanes, and employs hazard detection, forwarding, and squash logic.

The top module instantiates:

  • A multi-ported memory system (instruction and data ports).
  • A dual-issue 5-stage pipeline: IF, ID, EX, MEM, WB for slot 0 and slot 1.
  • Pipeline registers (IFID, IDEX, EXMEM, MEMWB) for each slot.
  • Arithmetic and logical components.
  • Bypass/forwarding network.
  • Stall and squash control.

Despite its superscalar nature, execution remains in-order.


2. Architectural Block Diagram (Textual Representation)

                     +------------------+
                     |   Memory System  |
                     |  (3-Port ARAM)   |
                     +------------------+
                        ^        ^
                        |        |
         +--------------+        +-----------------+
         |                                     |
 +---------------+      +---------------+      +---------------+
 | IF Stage 0    |      | IF Stage 1    |      |   PC Logic    |
 +---------------+      +---------------+      +---------------+
         |                       |  
         v                       v
 +---------------+      +---------------+
 | IF/ID  Pipe   |      | IF/ID Pipe    |
 | Register 0    |      | Register 1    |
 +---------------+      +---------------+
         |                       |
         v                       v
 +---------------+      +---------------+
 | ID/EX Pipe    |      | ID/EX Pipe    |
 | Register 0    |      | Register 1    |
 +---------------+      +---------------+
         |                       |
         v                       v
 +---------------+      +---------------+
 | EX/MEM Pipe   |      | EX/MEM Pipe   |
 | Register 0    |      | Register 1    |
 +---------------+      +---------------+
         |                       |
         v                       v
 +---------------+      +---------------+
 | MEM/WB Pipe   |      | MEM/WB Pipe   |
 | Register 0    |      | Register 1    |
 +---------------+      +---------------+
         |                       |
         +-----------+-----------+
                     v
              Register File

3. Pipeline Structure

3.1 Dual-Issue Pipeline Lanes

Each pipeline stage has two independent lanes:

Stage Slot 0 Signal Prefix Slot 1 Signal Prefix
IF IFID_*_0 IFID_*_1
ID IDEX_*_0 IDEX_*_1
EX EXMEM_*_0 EXMEM_*_1
MEM MEMWB_*_0 MEMWB_*_1
WB direct outputs direct outputs

Each lane processes one instruction. Dual issue occurs only when slot 1 is not dependent on slot 0 within the same cycle and no hazards/stalls prevent it.

3.2 Pipeline Stages Summary

Stage Primary Functions
IF PC selection, instruction fetch from memory port 1
ID Instruction decode, operand register read, hazard detection
EX ALU operations, address generation, branch evaluation
MEM Data memory read/write via memory port 2
WB Register file writeback

4. Control Flow Logic

4.1 Program Counter (PC) and Next-PC Logic

Both slots share a common PC generator, but squashes determine which slot should be nullified.

Pseudocode illustrating top-level PC update:

if (branch_taken) {
    PC_next = branch_target;
}
else {
    PC_next = PC + 2;   // fetch two instructions
}

When a branch occupies slot 0, slot 1 is automatically squashed.

4.2 Squash Logic

Squash signals are named:

  • Pstomp_0
  • Pstomp_1

Rules:

  1. If slot 0 instruction is a branch or JALR and is taken:

    • Pstomp_0 = 1
    • Pstomp_1 = 1 (slot 1 must also be squashed)
  2. If slot 1 instruction is independent branch:

    • Pstomp_1 = 1

On squash, the next stage pipeline registers receive a NOP.


5. Hazard Detection and Stall Mechanisms

5.1 Stall Logic

Stall signals:

  • Pstall_0
  • Pstall_1

A stall freezes the respective instruction; earlier stages do not advance.

Key hazard types:

Hazard Type Behavior
Load-use Stall 1 cycle until memory data is available
Structural (writing RF) Prevents slot 1 issue if both instructions write same register
Control Handled via squashing

5.2 Load-Use Hazard Detection

Pseudocode summarizing logic:

if (IDEX.op == LW and 
    (IDEX.dest == IFID.srcA or IDEX.dest == IFID.srcB)) {
    Pstall = 1;
}

This check exists for both slots.

5.3 Inter-slot Dependency Handling

Slot 1 is prevented from issuing if it depends on slot 0:

if (IFID_1.src == IFID_0.dest) {
    Pstall_1 = 1;
}

In-order dual issue is enforced.


6. Forwarding Network

Forwarding MUXes compare source register indices against the destination registers in:

  • IDEX stage
  • EXMEM stage
  • MEMWB stage

A simplified pseudocode model:

if (EXMEM.dest == ID.src)      operand = EXMEM.value;
else if (MEMWB.dest == ID.src) operand = MEMWB.value;
else                           operand = RF_value;

Because forwarding uses architectural register indices directly, not tags, the design remains in-order.


7. Memory Subsystem Integration

The top module connects to a 3-port ARAM memory:

Port Purpose
Port 1 Instruction fetch (dual words)
Port 2 Data access for slot 0
Port 3 Data access for slot 1

Instruction fetch uses sequential PC addresses:

addr1_0 = PC
addr1_1 = PC + 1

Data accesses for loads/stores use addresses from ALU results in EXMEM stage.


8. Writeback and Register File Behavior

8.1 Writeback Rules

Each slot writes back:

  • ALU results
  • Loaded data
  • JALR link addresses

Writeback is in-order because WB follows pipeline progression.

8.2 Register File Constraints

Slot 1 may not issue if:

  • It writes the same register as slot 0 in the same cycle.
  • It reads a register being written by slot 0 in the same cycle that cannot be forwarded in time.

Thus RF integrity is preserved.


9. Instruction Flow Example

Below is a conceptual pipeline table showing two independent instructions issued together:

Cycle | Slot 0           | Slot 1
------+-------------------+----------------------
  1   | IF inst0         | IF inst1
  2   | ID inst0         | ID inst1
  3   | EX inst0         | EX inst1
  4   | MEM inst0        | MEM inst1
  5   | WB inst0         | WB inst1

With a load-use hazard:

Cycle | Slot 0              | Slot 1
------+----------------------+---------------------
  1   | IF LW r1,0(r2)      | IF ADD r3,r1,r4
  2   | ID LW               | ID ADD (stall)
  3   | EX LW               | ID ADD
  4   | MEM LW              | EX ADD (forwarded)
  5   | WB LW               | MEM ADD
  6   |                     | WB ADD

10. Summary of Capabilities

Feature Supported
Dual-issue superscalar Yes
Out-of-order execution No
Pipeline depth 5 stages per slot
Forwarding Yes
Load-use stall Yes
Branch resolution in EX Yes
Squash on taken branch Yes
Hazard detection (RAW) Yes
Register renaming No
Reorder buffer No