This part covers advanced STA topics that are crucial for senior timing closure and physical design roles.
OCV accounts for process, voltage, and temperature variations across different regions of the same chip. Even within a single die, transistors at different locations may have slightly different characteristics. OCV is modeled using derating factors that multiply the nominal delays:
# Late path derating (pessimistic for setup)
set_timing_derate -late 1.1 [all_cells]
# Early path derating (pessimistic for hold)
set_timing_derate -early 0.9 [all_cells]For setup analysis, the data path uses a late derate (>1.0) and the clock path uses an early derate (<1.0). For hold analysis, it is reversed: the data path uses early derate and the clock path uses late derate. Common derate values range from 1.05 to 1.15 for late and 0.85 to 0.95 for early, depending on technology node.
CRPR (also known as CPPR, Clock Path Pessimism Removal) eliminates the over-pessimism in OCV analysis when the launch and capture clock paths share a common segment. OCV derating pessimistically assumes the entire clock path is derated independently for launch and capture, but shared parts of the clock tree experience the same variation. CRPR computes a credit that is added back to the slack:
CRPR_credit = late_derated_delay(common_path) - early_derated_delay(common_path)
setup_slack_with_crpr = setup_slack_ocv + CRPR_creditThis credit can be significant (hundreds of ps) for designs with deep clock trees. AOCV (Advanced OCV) further refines this by making derate values distance-dependent — cells further apart get larger derates.
PVT stands for Process, Voltage, and Temperature — three dimensions along which a chip's performance varies. Standard corners in STA:
| Corner | Process | Voltage | Temperature | Usage |
|---|---|---|---|---|
| SS (Slow-Slow) | Slow | Low (e.g., 0.72V) | High (e.g., 125°C) | Setup check (worst-case delay) |
| TT (Typical-Typical) | Typical | Nominal (0.80V) | 25°C | Power estimation |
| FF (Fast-Fast) | Fast | High (0.88V) | Low (-40°C) | Hold check (fastest delay) |
| SF / FS | Slow FETs | Fast/Fast FETs | Varies | Cross-corner analysis |
Advanced nodes (7nm, 5nm) require many more corners due to tighter margins and voltage-temperature coupling effects.
STA splits analysis into two timing scenarios for each path:
- Late path (max analysis) — Uses the slowest possible data arrival and the fastest possible clock arrival. This gives the most pessimistic setup check. The data path is derated with late factors, the clock path with early factors.
- Early path (min analysis) — Uses the fastest possible data arrival and the slowest possible clock arrival. This gives the most pessimistic hold check. The data path is derated with early factors, the clock path with late factors.
By analyzing both extremes, STA ensures the design works across all possible process, voltage, and temperature conditions. A single path is checked at both SS (setup) and FF (hold) corners.
A .lib (Liberty) file contains the timing, power, and noise characterization data for a standard cell library. Key contents:
- NLDM tables (Non-Linear Delay Model) — 2D lookup tables indexed by input slew and output load capacitance, providing cell delay and output slew values
- Power tables — Internal power dissipation as a function of input transition and output load
- Timing arcs — Each input-to-output path with rise/fall delays, setup/hold constraints for sequential cells
- Cell area, leakage power, and pin capacitance values
- Wire load models (optional, for pre-layout estimation)
pin (Q) {
timing () {
related_pin : "CK";
timing_type : "rising_edge";
cell_rise (delay_template_7x7) { values(...) }
}
}Wire Load Models (WLM) are statistical estimates of wire capacitance, resistance, and area based on fanout. They are used during synthesis (pre-layout) when actual wire geometries are unknown. WLMs are inaccurate for deep sub-micron nodes (≤ 65nm).
RC extraction (e.g., StarRC, QRC) computes actual parasitic values from the routed layout using the physical dimensions, metal layer characteristics, and coupling effects. Extracted parasitics (in SPEF or DSPF format) are used for signoff STA. RC extraction accounts for:
- Coupling capacitance between adjacent wires (crosstalk)
- Wire resistance per unit length
- Fringe capacitance to substrate
Signal integrity (SI) in STA refers to the impact of crosstalk noise on timing and functionality. When a wire (victim) has adjacent switching wires (aggressors), capacitive coupling causes:
- Crosstalk delay — Aggressor switching in the same direction as the victim speeds up the victim (decreases delay). Opposite direction switching slows down the victim (increases delay). This can cause setup or hold violations.
- Crosstalk glitch (noise) — If the aggressor switches while the victim is quiet, a voltage glitch appears on the victim net. Large glitches can propagate and cause functional failures.
STA tools (e.g., PrimeTime SI) perform crosstalk-aware timing by computing worst-case and best-case delay push-out. Mitigation techniques: wire spacing, shielding, up-sizing drivers, and inserting repeaters.
Clock gating checks verify that the enable signal to a clock gating cell (AND, OR, latch-based) transitions safely without corrupting the clock pulse. The key requirements:
- Setup check on enable — The enable signal must be stable before the clock edge to avoid a glitched clock pulse
- Hold check on enable — The enable must remain stable after the clock edge
For an AND gate: enable must be high when clock rises. The check is done at the low phase of the clock for AND gates (enable must be stable before clock goes high). For OR gates, checking is done at the high phase. These are different from data path checks because the clock gating check directly affects clock integrity — a violation can cause functional errors (missing or extra clock pulses) rather than just timing margin loss.
| Aspect | Synchronous Paths | Asynchronous Paths |
|---|---|---|
| Clock relationship | Same clock or related (known phase) | Unrelated clocks or no clock |
| STA handling | Analyzed with setup/hold equations | Set as false path or using set_max_delay |
| Examples | Same flop-to-flop, clock divider output | Cross-clock domain, reset paths, I/O pins |
| Synchronization | None needed | Requires CDC (dual flop, FIFO, handshake) |
| Timing closure | Frequency-driven | Requires structural analysis + CDC verification |
In modern SoCs with multiple clock domains (CPU, GPU, memory, peripherals), identifying and correctly constraining asynchronous paths is essential. STA alone cannot verify asynchronous crossings — dedicated CDC tools (Mentor 0-In, SpyGlass CDC) are needed.
Max delay analysis (setup check) ensures data arrives at the capturing flip-flop before the required time. It uses the longest possible delays (slow corner, late derate) for data paths and the shortest possible delays for clock paths. The constraint is driven by frequency — the slower the data, the lower the maximum frequency.
Min delay analysis (hold check) ensures data does not arrive too early. It uses the shortest possible delays (fast corner, early derate) for data paths and the longest possible delays for clock paths. Hold violations are independent of clock frequency — a design can pass at 100 MHz but fail hold at 1 GHz because hold is a race condition.
Both must pass at all PVT corners simultaneously. A typical signoff flow runs:
# Setup corners
PT: SS_0.72V_125C --> check max_delay
PT: FF_0.88V_-40C --> check min_delayNote that setup and hold are always run on opposite corners — the slowest corner for setup, the fastest for hold.
Advanced STA concepts separate junior from senior engineers. Master these!