Pipelined ALU with Scan-Chain Integration

Pipelined ALU with Scan-Chain Integration

Scan Chains Timing Closure PD analysis Open Source ASIC Tools
RepositoryMummanajagadeesh/ALU
Start DateJan 2025

Summary

ItemDescription
Design4-stage ALU (non-pipelined / pipelined / scan-enabled)
PurposeQuantitative study of pipelining, scan insertion, and PD constraints
FlowYosys → OpenSTA → OpenLane (Sky130)
Clock Target4.0 ns (post-CTS & signoff closure)
FocusArea, timing, routing, clocking, power trade-offs

Design Variants

VariantDescription
Non-PipelinedFully combinational datapath
Pipelined4-stage register boundaries
Scan-PipelinedPipeline FFs replaced with scan-enabled FFs

All variants implement identical arithmetic/comparison logic. Only register structure and physical constraints differ.


Synthesis Results (Yosys)

MetricNon-PipePipeScan-Pipe
Total Cells375089
Flip-Flops01313
Chip Area ((\mu m^2))538.66891.911162.04

Area Overhead

  • Pipelining: +65.6%
  • Scan (over pipelined): +30.3%
  • Baseline → Scan-Pipelined: +116%

Scan insertion adds ~3 logic gates per flip-flop.


Timing Results (OpenSTA – Baseline Constraints)

MetricNon-PipePipeScan-Pipe
Critical Delay (ns)1.670.991.14
Slack (ns)0.330.880.74
( f_{max} ) (MHz)5981010877

Observations

  • Critical path reduction with pipelining: 40.7%
  • Frequency improvement: ~1.7×
  • Scan overhead: ~150 ps register delay (~13% fmax loss)
  • Hold timing clean in all variants
  • Scan logic improves minimum delay margins

Pessimistic STA (Interface-Aware Constraints)

Added:

  • 50 ps clock uncertainty
  • 1.0 ns I/O delays
  • Non-zero input slew
DesignCritical PathArrival (ns)Slack (ns)
Non-PipeInput → Output2.83−1.88
PipeInput → Stage-1 FF1.54+0.27
Scan-PipeInput → Scan FF1.69+0.14

Key Results

  • Non-pipelined fails timing under realistic constraints
  • Pipelining absorbs interface delay
  • Scan reduces slack by ~48%, depth unchanged

RTL → GDS Physical Design (OpenLane / Sky130)

The scan-pipelined variant was stressed through 10 controlled PD experiments (E1–E10).
Each experiment modified a single dominant physical knob while maintaining timing closure unless specified.


PD Experiment Summary

ExpPrimary ChangeQuantitative Outcome
E1Scan baseline, 60% util, CTS skew 0.1 nsPost-CTS WNS +1.27 ns, power ~8.99e-04 W
E2Scan removed (control)Cells 82 → 52, synth WNS +2.22 ns
E3CTS skew tightened to 0.05 nsTiming unchanged, ~2× power
E4FP_CORE_UTIL = 80%Wirelength 26601 → 26771, WNS ~1.9 ns
E5Dual scan chainsClock latency 0.68 → 0.63 ns, power 1.03e-03 W
E6PL_TARGET_DENSITY = 0.85GPL WNS 1.61 → 1.48 ns, DPL recovered
E7Channelized floorplanClock net ~745 → ~788 µm, −70 ps slack
E8Worst-case IO pinningClock net ~1525 µm (~2×)
E9Pin-architecture fixClock net ~703 µm (−54%)
E10PDN tighteningTotal power +21%, switching +58%

Routing Geometry Evolution

MetricE6E7E8E9E10
Clock Net Length (µm)~745~7881525703~703
Longest Net (µm)~1075~875~1235~1168~1168
scan_out Length (µm)~128~329~258~205~205

Timing Closure (Post-Route / Signoff)

MetricE6E7E8E9E10
Setup WNS (ns)~1.45~1.38~1.12Closed
Hold WNS (ns)~0.22~0.22~0.22~0.36Closed
TNS / WNS0 / 00 / 00 / 00 / 00 / 0

Power Impact (Signoff, RCX)

MetricE9E10
Total Power (W)7.48e-049.06e-04
Switching Power (W)2.69e-044.25e-04
Delta+21% total, +58% switching

CTS-stage power remained stable; PDN tightening manifested only at signoff.


Consolidated Takeaways

  • Pipelining trades ~66% area for ~70% frequency improvement
  • Scan insertion adds ~30% area and ~150 ps register delay
  • CTS skew tightening affects power, not timing
  • Utilization/density ineffective below ~10% real placement density
  • Dual scan reduces clock latency but increases routing power
  • Floorplan topology affects routing geometry more than timing
  • Worst-case IO placement can inflate clock routing by >2×
  • Geometry-aware pin restructuring recovers >50% clock routing
  • PDN tightening increases signoff switching power by ~58% with timing intact