Part 4: ATPG on Gate-Level Netlists
What ATPG Is Doing Here
Scan chains provide the structural mechanism for controllability and observability: shifting test data into and out of flip-flops. ATPG generates the content of those vectors — the specific bit patterns that activate fault sites and propagate their effects to observable outputs.
The tool used here is Fault 0.6.1, an open-source ATPG tool from the American University in Cairo, targeting the stuck-at fault model. Stuck-at faults model permanent hard faults: a net stuck at logic 0 (sa0) regardless of what the driving logic computes, or stuck at logic 1 (sa1) regardless. These represent physical defects such as shorts to supply or ground, broken connections, and oxide defects that hold a node in a fixed state. Stuck-at is the foundational fault model and the one compatible with Fault 0.6.1’s simulation engine.
ATPG operates on gate-level netlists, not RTL. RTL contains behavioral descriptions that do not map to fault sites. Fault sites are gate pins and wire nodes in a technology-mapped netlist. The flow therefore takes the synthesized flat gate netlist, cuts sequential boundaries, and runs fault simulation on the resulting combinational representation.
The cut step is the mechanism that makes a sequential circuit amenable to combinational ATPG. Every flip-flop in the netlist has its D input and Q output disconnected. The D input becomes a pseudo-primary output. The Q output becomes a pseudo-primary input. The result is a netlist with only combinational logic between the original primary ports and the added pseudo-ports at every flip-flop boundary. This is the same assumption as full-scan: if every flip-flop is in a scan chain and can be loaded with an arbitrary value, the sequential circuit reduces to a set of combinational cones, each testable independently.
The ATPG engine in Fault 0.6.1 uses a D-algorithm derived approach. For each undetected fault, the engine attempts two things: first, set the faulted node to the opposite of its stuck value (fault activation), and second, find a path through the combinational cone from the fault site to a primary output where the difference between the fault-free and faulty circuit behaviors is observable (fault propagation). If both can be achieved simultaneously under a consistent input assignment, that assignment is a test vector for the fault. If no such assignment exists due to circuit structure preventing either activation or propagation, the fault is classified as structurally undetectable.
Parameters used across all four module runs: -m 80 (target 80% minimum fault coverage before stopping), -v 100 (generate up to 100 vectors), --ceiling 500 (attempt up to 500 backtracks per fault before classifying it as ATPG-untestable). The -m and -v flags control termination conditions. The --ceiling flag directly controls the search depth of the D-algorithm backtracking procedure.
Why ATPG Runs on Submodules, Not the Full Top Level
The complete noc_router top level synthesizes to a netlist with 34,906 fault sites across over 10,000 gates. Fault 0.6.1 spawns one Icarus Verilog simulation process per test vector application. Each process elaborates the full design netlist plus the sky130_fd_sc_hd full cell model, which is itself several thousand lines of Verilog. At this scale, the number of parallel simulation processes multiplied by the memory cost of elaborating both files simultaneously exceeds the available RAM on a standard development machine. The run was attempted and the system ran out of memory during parallel simulation.
The solution was to run ATPG on four submodules independently: rr_arbiter, switch_allocator, input_unit, and vc_fifo. The crossbar module was omitted because it is purely combinational passthrough logic with no sequential state, meaning every fault site in it is covered trivially by the routing logic that drives and reads the crossbar. The four submodules cover all sequential logic in the design.
Running ATPG at the submodule level gives fault coverage numbers for each block independently. It does not give a single fault coverage number for the integrated top level. The trade-off is that the ATPG is actually executable rather than crashing, and the results are meaningful for each block in isolation.
Tool Installation and Cell Model Preparation
Fault 0.6.1 is distributed as a self-contained Linux x86_64 AppImage:
curl -L https://github.com/AUCOHL/Fault/releases/download/0.6.1/Fault-0.6.1-x86_64.AppImage -o fault.AppImage
chmod +x fault.AppImage
./fault.AppImage --version
The tool internally uses Yosys for synthesis and Icarus Verilog for fault simulation. Both are bundled inside the AppImage.
The sky130_fd_sc_hd Verilog cell models are split across two files: primitives.v containing UDP (User-Defined Primitive) definitions for the mux and DFF-based primitives, and sky130_fd_sc_hd.v containing full cell definitions that reference those UDP primitives. Fault’s -c flag specifies the cell model file. Passing only sky130_fd_sc_hd.v causes Icarus Verilog to report Unknown module type for every sky130_fd_sc_hd__udp_dff$* and sky130_fd_sc_hd__udp_mux_* primitive instantiation inside the cell definitions, because the UDP definitions are in the file that was not provided. Fix: concatenate both files:
cat primitives.v sky130_fd_sc_hd.v > sky130_fd_sc_hd_full.v
This single concatenated file is passed to every Fault invocation via -c sky130_fd_sc_hd_full.v.
The Liberty .lib file cannot be substituted here. Fault’s -c flag expects Verilog. Passing a Liberty file produces syntax error: I give up from the Icarus parser.
SRAM Behavioral Model
The sram_sp wrapper conditionally instantiates the OpenRAM-generated macro sram_1rw_16x16 under a SYNTHESIS define. When Fault’s internal Yosys elaborates the design without this define set, synthesis fails with Module sram_1rw_16x16 referenced in module sram_sp is not part of the design. Fault does not have access to the OpenRAM macro definition and cannot synthesize it.
A behavioral replacement for sram_sp was written with the ifdef guards removed, implementing the register-based simulation model directly:
module sram_sp #(
parameter DW = 16,
parameter AW = 3
)(
input wire clk,
input wire csb,
input wire web,
input wire [AW-1:0] addr,
input wire [DW-1:0] din,
output wire [DW-1:0] dout
);
reg [DW-1:0] mem [0:(1<<AW)-1];
reg [DW-1:0] dout_r;
integer k;
initial begin
for(k=0;k<(1<<AW);k=k+1) mem[k]=0;
dout_r=0;
end
always @(posedge clk) begin
if (!csb) begin
if (!web) mem[addr] <= din;
else dout_r <= mem[addr];
end
end
assign dout = dout_r;
endmodule
This file is passed to synthesis in place of the original sram_sp.v for any module that instantiates SRAM. The behavioral model synthesizes cleanly through Yosys to a register array, giving Fault a complete, simulable gate-level representation of the memory.
DFF Cell Names for the Cut Step
The cut step requires explicit naming of the DFF cell variants in the synthesized netlist so Fault knows which instances are flip-flops and how to sever their sequential boundaries. The cell names present after synthesis were identified with:
grep -o 'sky130_fd_sc_hd__df[a-z_]*' noc_router.netlist.v | sort -u
Three DFF base types appeared: sky130_fd_sc_hd__dfxtp (standard positive-edge triggered DFF), sky130_fd_sc_hd__dfrtp (DFF with reset), and sky130_fd_sc_hd__dfstp (DFF with set). Each comes in drive strength variants 1, 2, and 4. All nine variants were passed to fault cut -d:
sky130_fd_sc_hd__dfxtp_1,sky130_fd_sc_hd__dfxtp_2,sky130_fd_sc_hd__dfxtp_4,
sky130_fd_sc_hd__dfrtp_1,sky130_fd_sc_hd__dfrtp_2,sky130_fd_sc_hd__dfrtp_4,
sky130_fd_sc_hd__dfstp_1,sky130_fd_sc_hd__dfstp_2,sky130_fd_sc_hd__dfstp_4
Missing any variant causes the cut step to leave those flip-flop instances intact as sequential elements in the cut netlist. The ATPG engine then sees a partially sequential circuit and cannot correctly enumerate pseudo-primary inputs and outputs for those instances.
Three-Step Flow Per Module
Each module goes through three Fault invocations in sequence.
Step 1: Synthesis.
fault synth takes the RTL source files and the Liberty file and produces a flat gate-level netlist mapped to sky130_fd_sc_hd cells. Internally this calls Yosys with a standard synthesis script. Output is a Verilog netlist containing only sky130 standard cell instances and interconnect.
./fault.AppImage synth \
-l <liberty_path> \
-t <top_module> \
-o <module>.netlist.v \
<source_files>
Step 2: Cut.
fault cut takes the synthesized netlist, identifies every flip-flop instance matching the specified cell name list, severs the D-to-Q connections, and writes a combinational netlist. The D inputs become outputs (observable points) and the Q outputs become inputs (controllable points). The resulting netlist has no sequential elements.
./fault.AppImage cut \
-d <dff_cell_list> \
-o <module>.cut.v \
<module>.netlist.v
Step 3: ATPG.
The main Fault invocation takes the cut netlist and the cell model, enumerates all fault sites, applies the D-algorithm to generate test vectors, and writes the result to a JSON file. The --clock and -i flags tell the tool which ports are clock and reset signals to exclude from controllable inputs.
./fault.AppImage \
-c sky130_fd_sc_hd_full.v \
--clock clk \
-i clk,rst_n \
-m 80 \
-v 100 \
--ceiling 500 \
-o <module>.tv.json \
<module>.cut.v
The output .tv.json file contains the actual binary test vectors: the input assignments that detect the covered faults. These can be loaded into a simulator or an ATE pattern loader for silicon testing.
Module Results
rr_arbiter
The round-robin arbiter is the smallest module. It arbitrates between N requestors in a rotating priority order, maintaining a rotating grant pointer in a register. The cut netlist was 433 lines after synthesis.
| Metric | Value |
|---|---|
| Fault coverage | 96.43% |
| Test vectors generated | 97 |
| Runtime | 5.00s |
The arbiter’s register structure is highly observable: the grant output is directly visible at primary outputs, and the priority rotation register has direct paths to the grant outputs on the next cycle boundary (represented as a pseudo-primary output in the cut netlist). The 3.57% uncovered faults are structurally undetectable: nodes where the fault effect is masked by reconvergent fanout before reaching any observable output.
switch_allocator
The switch allocator contains a 5x5 round-robin arbiter matrix, with one arbiter per output port resolving contention among the 5 input ports requesting that output. It instantiates rr_arbiter internally, so both files are passed to synthesis. The cut netlist was 2,722 lines.
| Metric | Value |
|---|---|
| Fault coverage | 95.46% |
| Runtime | 7.12s |
Lowest coverage of the four modules. The round-robin arbitration state creates fault masking paths through the grant logic: a stuck fault on a priority state bit can be masked when the grant resolves through a different path that reaches the same output regardless of the faulted bit value. This is structural masking from the redundancy inherent in the priority rotation logic.
vc_fifo
The VC FIFO contains the FIFO control logic, read and write pointer management, full and empty flag generation, and the SRAM interface. The behavioral SRAM model is passed for synthesis.
| Metric | Value |
|---|---|
| Fault coverage | 98.09% |
| Runtime | 10.60s |
Highest coverage of the four modules. The FIFO datapath has high controllability through the data input port and high observability through the data output port. The read pointer, write pointer, full flag, and empty flag all have direct combinational paths to the output ports that can be driven to observe their values, which makes the majority of fault sites detectable with straightforward input assignments.
input_unit
The input unit contains the pipeline state machine (IDLE, DECODE, ROUTING, ACTIVE), the two-VC FIFO interface logic, and the switch allocation request and grant interface. It instantiates vc_fifo which instantiates sram_sp. With the behavioral SRAM model substituted in, the entire hierarchy flattens into a single combinational netlist at the cut step. The cut netlist was 11,688 lines, the largest of the four modules.
| Metric | Value |
|---|---|
| Fault coverage | 96.59% |
| Runtime | 17.22s |
Runtime scales with netlist size as expected: the fault simulation workload is proportional to the number of fault sites, each requiring at least one Icarus simulation pass to evaluate.
Summary Table
| Module | Cut Netlist Lines | Fault Coverage | Runtime |
|---|---|---|---|
| rr_arbiter | 433 | 96.43% | 5.00s |
| switch_allocator | 2,722 | 95.46% | 7.12s |
| vc_fifo | not recorded | 98.09% | 10.60s |
| input_unit | 11,688 | 96.59% | 17.22s |
All four modules exceeded 95% stuck-at fault coverage.
Structurally Undetectable Faults
The 3.41% to 4.54% of undetected faults across the four modules are not a sign of insufficient ATPG effort. The D-algorithm with --ceiling 500 backtracks exhaustively within its configured depth for each fault. If a fault cannot be detected after exhaustive search, the tool classifies it as ATPG-untestable. The underlying reason is always structural.
Reconvergent fanout is the typical mechanism. A signal fans out to two paths that reconverge at an AND or OR gate. A stuck fault on the fanout point affects both paths identically, and the reconvergence point computes the same output regardless of whether the fault is present or absent. The fault cannot be propagated past the reconvergence point.
Redundant logic in an arbiter is a common source of structural undetectability. If two priority decode terms can both assert the same grant under different but operationally equivalent conditions, a fault in one term may be masked by the other term activating the grant through the alternate path. The grant output does not differ between the fault-free and faulty circuit.
These faults cannot be removed by generating more test vectors. They can only be removed by redesigning the logic to eliminate the structural masking, which typically means removing the redundancy that causes it. In practice, 95%+ stuck-at coverage with a known remainder of structurally undetectable faults is the accepted outcome of ATPG on any nontrivial combinational logic block.
Relationship Between Scan Insertion and ATPG
The DFT scan chain work in Part 3 and the ATPG in Part 4 address different parts of the same problem. Scan insertion creates the physical mechanism for controllability and observability. ATPG determines what patterns to use.
In this project, the two flows are connected but run separately. The scan insertion was done on the post-PnR physical design using OpenROAD’s DFT commands through the custom OpenLane 2 steps. The ATPG was done on the pre-PnR synthesized netlists using Fault 0.6.1. The test vectors from ATPG target the same logical fault sites covered by the scan chains in the physical design, but the vector sets have not been mapped to the specific scan chain ordering that OpenROAD produced.
A complete production DFT flow would close this loop: take the post-scan-replace netlist (where all DFFs have been swapped for SDFFs), reorder the fault simulation to account for the specific scan chain ordering (which flop is at position N in chain K), and generate vectors in a format compatible with ATE scan load sequences. Fault 0.6.1 does not produce ATE-formatted vectors natively, and the scan chain ordering from OpenROAD’s insert_dft was not fed back into the ATPG run.
The ATPG results are valid as submodule stuck-at coverage numbers and as validation that the logic is fully observable and controllable at each block boundary. They are not a substitute for a scan-aware vector set mapped to the physical scan chain topology. Producing that mapping is the remaining step to reach a production-complete, ATE-ready test program.