Execution Unit Components Documentation

The processor’s execute stage uses lightweight combinational blocks to compute arithmetic results, logical operations, memory addresses, and branch conditions. The following two modules implement the core ALU functionality and the equality comparator required for conditional branching.


1. arithmetic_logic_unit

1.1 Purpose

The arithmetic_logic_unit module implements the primary combinational ALU for the superscalar pipeline. It consumes:

  • ALU opcode (op)
  • Two 16-bit operands (alu1, alu2)

and produces a 16-bit result (bus).

1.2 Supported Operations

The ALU implements a minimal RiSC-16–derived instruction set:

Operation Opcode Function
ADD 3'd0 alu1 + alu2 (16-bit addition)
NAND 3'd2 Bitwise ~(alu1 & alu2)
Default otherwise Pass-through of alu2

The default pass-through is used for operations where the ALU is not required to transform the input (e.g., load/store address pass-through, LUI, register moves).

This design keeps the ALU fast and minimal.

1.3 Behavioral Definition

Pseudocode representation:

switch(op):
    case ADD:
        bus = alu1 + alu2
    case NAND:
        bus = ~(alu1 & alu2)
    default:
        bus = alu2

1.4 Architectural Role

The ALU output feeds:

  • EX/MEM pipeline register (EXMEM_ALUout__out)
  • Forwarding network for EX→EX and MEM→EX bypass
  • Branch comparison logic (indirectly)

Because the design is in-order and relies on forwarding instead of dynamic scheduling, ALU latency is one cycle, strictly combinational.


2. not_equivalent

2.1 Purpose

not_equivalent is the branch comparator used to implement the BNE (branch-if-not-equal) instruction in the EX stage.

It tests whether two 16-bit operands differ at any bit position.

The output is a 1-bit boolean:

  • 1 if operands are not equal
  • 0 if operands are equal

2.2 Implementation Details

The module computes:

out = OR over i = 0..15 of (alu1[i] XOR alu2[i])

This is written as a deeply nested expression, but functionally equivalent to:

Pseudocode:

out = (alu1 != alu2)

or explicitly:

out = |(alu1 ^ alu2)

where | is the reduction OR operator and ^ is bitwise XOR.

2.3 Architectural Role

This comparator executes in the EX stage to determine branch direction:

if (opcode == BNE):
    branch_taken = (alu1 != alu2)

The branch resolution logic then triggers squash signals for:

  • Current slot (EX stage)
  • Future slots (IF/ID stage)
  • Opposite lane (slot 1 squashed if slot 0 branch is taken)

Thus, not_equivalent is critical in maintaining precise control-flow semantics in a dual-issue pipeline.


3. Integration in EX Stage

Both modules feed into the EX stage data path:

             +----------------------------+
 alu1 -----> |                            |
             |     arithmetic_logic_unit  | ---> ALU result
 alu2 -----> |                            |
             +----------------------------+

             +----------------------------+
 alu1 -----> |        not_equivalent      | ---> branch_ne_flag
 alu2 -----> +----------------------------+

The ALU result is passed to:

  • EXMEM pipeline register
  • Forwarding muxes

The comparator result is passed to:

  • Branch logic
  • Squash logic
  • PC-select module

4. Summary

Module Function Width Pipeline Stage
arithmetic_logic_unit ADD, NAND, pass-through 16-bit EX
not_equivalent Inequality comparator 1-bit EX

These two small combinational units form the core computational primitives for the superscalar processor’s execution engine.