Pipelined ALU with Scan-Chain Integration

Scan Chains Timing Closure PD analysis Open Source ASIC Tools

Repository	Mummanajagadeesh/ALU
Start Date	Jan 2025

Summary

Item	Description
Design	4-stage ALU (non-pipelined / pipelined / scan-enabled)
Purpose	Quantitative study of pipelining, scan insertion, and PD constraints
Flow	Yosys → OpenSTA → OpenLane (Sky130)
Clock Target	4.0 ns (post-CTS & signoff closure)
Focus	Area, timing, routing, clocking, power trade-offs

Design Variants

Variant	Description
Non-Pipelined	Fully combinational datapath
Pipelined	4-stage register boundaries
Scan-Pipelined	Pipeline FFs replaced with scan-enabled FFs

All variants implement identical arithmetic/comparison logic. Only register structure and physical constraints differ.

Synthesis Results (Yosys)

Metric	Non-Pipe	Pipe	Scan-Pipe
Total Cells	37	50	89
Flip-Flops	0	13	13
Chip Area ((\mu m^2))	538.66	891.91	1162.04

Area Overhead

Pipelining: +65.6%
Scan (over pipelined): +30.3%
Baseline → Scan-Pipelined: +116%

Scan insertion adds ~3 logic gates per flip-flop.

Timing Results (OpenSTA – Baseline Constraints)

Metric	Non-Pipe	Pipe	Scan-Pipe
Critical Delay (ns)	1.67	0.99	1.14
Slack (ns)	0.33	0.88	0.74
( f_{max} ) (MHz)	598	1010	877

Observations

Critical path reduction with pipelining: 40.7%
Frequency improvement: ~1.7×
Scan overhead: ~150 ps register delay (~13% fmax loss)
Hold timing clean in all variants
Scan logic improves minimum delay margins

Pessimistic STA (Interface-Aware Constraints)

Added:

50 ps clock uncertainty
1.0 ns I/O delays
Non-zero input slew

Design	Critical Path	Arrival (ns)	Slack (ns)
Non-Pipe	Input → Output	2.83	−1.88
Pipe	Input → Stage-1 FF	1.54	+0.27
Scan-Pipe	Input → Scan FF	1.69	+0.14

Key Results

Non-pipelined fails timing under realistic constraints
Pipelining absorbs interface delay
Scan reduces slack by ~48%, depth unchanged

RTL → GDS Physical Design (OpenLane / Sky130)

The scan-pipelined variant was stressed through 10 controlled PD experiments (E1–E10).
Each experiment modified a single dominant physical knob while maintaining timing closure unless specified.

PD Experiment Summary

Exp	Primary Change	Quantitative Outcome
E1	Scan baseline, 60% util, CTS skew 0.1 ns	Post-CTS WNS +1.27 ns, power ~8.99e-04 W
E2	Scan removed (control)	Cells 82 → 52, synth WNS +2.22 ns
E3	CTS skew tightened to 0.05 ns	Timing unchanged, ~2× power
E4	FP_CORE_UTIL = 80%	Wirelength 26601 → 26771, WNS ~1.9 ns
E5	Dual scan chains	Clock latency 0.68 → 0.63 ns, power 1.03e-03 W
E6	PL_TARGET_DENSITY = 0.85	GPL WNS 1.61 → 1.48 ns, DPL recovered
E7	Channelized floorplan	Clock net ~745 → ~788 µm, −70 ps slack
E8	Worst-case IO pinning	Clock net ~1525 µm (~2×)
E9	Pin-architecture fix	Clock net ~703 µm (−54%)
E10	PDN tightening	Total power +21%, switching +58%

Routing Geometry Evolution

Metric	E6	E7	E8	E9	E10
Clock Net Length (µm)	~745	~788	1525	703	~703
Longest Net (µm)	~1075	~875	~1235	~1168	~1168
scan_out Length (µm)	~128	~329	~258	~205	~205

Timing Closure (Post-Route / Signoff)

Metric	E6	E7	E8	E9	E10
Setup WNS (ns)	~1.45	~1.38	↓	~1.12	Closed
Hold WNS (ns)	~0.22	~0.22	~0.22	~0.36	Closed
TNS / WNS	0 / 0	0 / 0	0 / 0	0 / 0	0 / 0

Power Impact (Signoff, RCX)

Metric	E9	E10
Total Power (W)	7.48e-04	9.06e-04
Switching Power (W)	2.69e-04	4.25e-04
Delta	—	+21% total, +58% switching

CTS-stage power remained stable; PDN tightening manifested only at signoff.

Consolidated Takeaways

Pipelining trades ~66% area for ~70% frequency improvement
Scan insertion adds ~30% area and ~150 ps register delay
CTS skew tightening affects power, not timing
Utilization/density ineffective below ~10% real placement density
Dual scan reduces clock latency but increases routing power
Floorplan topology affects routing geometry more than timing
Worst-case IO placement can inflate clock routing by >2×
Geometry-aware pin restructuring recovers >50% clock routing
PDN tightening increases signoff switching power by ~58% with timing intact