First-In, First-Out (FIFO) buffers are fundamental building blocks in digital design, commonly used for rate matching, data buffering, and Clock Domain Crossing (CDC). This guide covers FIFO fundamentals, pointer synchronization, flag generation, and 10 detailed interview scenarios for calculating minimum FIFO depth.
FIFO Architecture & Fundamentals
The core differences between Synchronous and Asynchronous FIFOs lie in their clocking schemes, pointer synchronization, and overall design complexity:
Synchronous FIFO:
- Clocking: Both read and write ports operate on the exact same clock domain (or phase-locked clocks).
- Pointers: Direct binary comparison of read and write pointers is used because they share a clock edge.
- Complexity: Low. No clock domain crossing (CDC) synchronizers are required.
Asynchronous FIFO:
- Clocking: Read and write ports operate on separate, unsynchronized clocks with arbitrary frequencies and phases.
- Pointers: Pointers must be synchronized using Gray code and 2-flop/multi-flop synchronizers to prevent metastability.
- Complexity: High. Requires domain crossing logic, Gray-to-binary or direct Gray pointer comparison, and CDC-safe reset synchronization.
When transmitting pointers across asynchronous clock boundaries, using binary pointers can cause severe system failures due to multi-bit transitions. Let's look at why Gray code solves this:
Binary Multi-Bit Transition Hazard:
If a binary pointer increments from 7 (4'b0111) to 8 (4'b1000), all 4 bits must flip simultaneously. Due to physical routing skew on silicon, these bits arrive at slightly different times. If sampled during transition, the destination domain can read a corrupted, arbitrary intermediate value.
The Gray Code Solution:
Gray code guarantees that only one bit changes per increment (e.g., 7 is 4'b0100 and 8 is 4'b1100, where only the MSB flips). If the destination clock samples during the transition, the value resolves safely to either 7 or 8.
FIFO Depth Calculations (10 Classic Scenarios)
To size a FIFO, you just need to find out how much data gets left behind when you write faster than you read. Think of it like pouring water into a bucket: if water enters faster than it leaks out, the bucket must be big enough to hold the leftover water. Let's look at 10 common scenarios explained from first principles!
- fwrite / fread: 100 MHz
- Transfer: Continuous
Think of this like water flowing through a pipe. If you pour water in at 100 gallons per minute (write side) and pump it out at the exact same 100 gallons per minute (read side) continuously, the water never builds up! If the clocks are perfectly in sync and aligned, you don't even need a buffer (depth = 0). But in the real world, if the clocks are asynchronous (even if they run at the same speed), jitter and phase differences can cause timing issues. To play it safe and cross that clock domain boundary without data corruption, we use a small FIFO of depth 1 or 2.
- fwrite / fread: 80 MHz (12.5 ns)
- Burst Length: 64 items
- Write Rate: 1 / 3 cycles
- Read Rate: 1 / 4 cycles
Burst Window (Tburst) = 64 × 37.5 ns = 2400 ns
Effective Read Period = 4 × 12.5 ns = 50.0 ns
Reads during Write Window = Tburst / Tread = 2400 ns / 50.0 ns = 48 reads
FIFO Depth = Burst Length - Reads during Write Window = 64 - 48 = 16 items
Let's break this down simply! Both clocks run at 80 MHz. Since we only write once every 3 clock cycles, it takes 3 cycles per write. So, to write our burst of 64 items, we need a window of 64 × 3 = 192 clock cycles. Now, how many items can the read side pull out during those same 192 cycles? Since the read side reads once every 4 cycles, it completes 192 / 4 = 48 reads. Since we wrote 64 items but only managed to read 48, the leftover 64 - 48 = 16 items have to sit in the FIFO. So we need a depth of 16!
- fwrite: 40 MHz
- fread: 160 MHz
- Operation: Continuous
Here's an easy one! The read clock is super fast (160 MHz) compared to the write clock (40 MHz). Since the read port is active continuously, it is sucking out data four times faster than we can write it! Because the reader is so much faster, data never backs up. We only need a depth of 1 or 2 here just to safely pass the data across the asynchronous clock boundary.
- fwrite: 60 MHz (16.67 ns)
- fread: 120 MHz (8.33 ns)
- Burst Length: 80 items
- Write Rate: 1 / 2 cycles
- Read Rate: 1 / 5 cycles
Reads during Write Window = ⌊ 2666.67 ns / (5 × 8.33 ns) ⌋ = 64 reads
FIFO Depth = Burst Length - Reads during Write Window = 80 - 64 = 16 items
Let's calculate the clock cycles needed. Writing 80 items at 1 write every 2 cycles takes 160 write clock cycles. Since the write clock is 60 MHz (16.67 ns period), this write window lasts 2666.67 ns. In this same time window, how many reads can the faster 120 MHz read clock (8.33 ns period) complete? Since it reads once every 5 cycles (every 41.67 ns), it completes 2666.67 ns / 41.67 ns = 64 reads. The remaining 80 - 64 = 16 items that couldn't be read in time must be stored in the FIFO, so we need a depth of 16.
- fwrite: 125 MHz
- fread: 75 MHz
- Burst Length: 120 items
- Operation: Continuous
Reads during Write Window = ⌊ 960 ns / 13.33 ns ⌋ = 72 reads
FIFO Depth = Burst Length - Reads during Write Window = 120 - 72 = 48 items
This is a classic case where we write faster than we read. The write clock is 125 MHz (8 ns) and writes continuously, so the burst of 120 items is poured in over 960 ns. In that same 960 ns, the slower 75 MHz read clock (13.33 ns) reads continuously, completing 960 ns / 13.33 ns = 72 reads. Since 120 items came in but only 72 went out, the FIFO must hold the difference: 120 - 72 = 48 items.
- fwrite: 166.67 MHz (6 ns)
- fread: 83.33 MHz (12 ns)
- Burst Length: 100 items
- Write Rate: 1 / 2 cycles
- Read Rate: 1 / 2 cycles
Reads during Write Window = ⌊ 1200 ns / (2 × 12 ns) ⌋ = 50 reads
FIFO Depth = Burst Length - Reads during Write Window = 100 - 50 = 50 items
Let's look at the timing. We write 100 items at 1 write every 2 cycles, which takes 200 write cycles. At 166.67 MHz (6 ns period), the write window is 1200 ns. During this window, the slower 83.33 MHz read clock (12 ns period) reads at a rate of 1 read every 2 cycles (every 24 ns). This means it completes 1200 ns / 24 ns = 50 reads. The FIFO has to buffer the leftover 100 - 50 = 50 items.
- fwrite: 200 MHz (5 ns)
- fread: 50 MHz (20 ns)
- Burst Length: 40 items
- Write Rate: 1 / 2 cycles
- Read Rate: 1 / 4 cycles
Reads during Write Window = ⌊ 400 ns / (4 × 20 ns) ⌋ = 5 reads
FIFO Depth = Burst Length - Reads during Write Window = 40 - 5 = 35 items
We have a fast write clock (200 MHz) writing every 2 cycles (10 ns per write), so a 40-item burst takes 400 ns to write. The slower read clock (50 MHz) reads every 4 cycles (80 ns per read). In that 400 ns window, the reader only has time to complete 400 ns / 80 ns = 5 reads! Because the reader is so slow, almost the entire burst accumulates. We need to store 40 - 5 = 35 items in the FIFO.
- fwrite: 100 MHz (10 ns)
- fread: 40 MHz (25 ns)
- Burst Length: 120 items
- Write Duty: 50%
- Read Duty: 25%
Reads during Write Window = ⌊ 2400 ns / (25 ns / 0.25) ⌋ = 24 reads
FIFO Depth = Burst Length - Reads during Write Window = 120 - 24 = 96 items
Let's look at the averages. A 100 MHz clock (10 ns) with a 50% write duty cycle means we write an item every 20 ns on average. Writing 120 items takes 2400 ns. The 40 MHz read clock (25 ns) with a 25% read duty cycle means we read an item every 100 ns on average. In that 2400 ns burst window, the reader only completes 2400 ns / 100 ns = 24 reads. The FIFO must hold the rest: 120 - 24 = 96 items.
- fwrite: 50 MHz (20 ns)
- fread: 25 MHz (40 ns)
- Burst Length: 100 items
- Read Active: 40%
- Read Inactive: 60%
Tinactive_read = 2000 ns × 0.60 = 1200 ns
Accumulated Writes = 1200 ns / 20 ns = 60 writes
FIFO Depth = 60 items
Here we are writing 100 items continuously at 50 MHz (20 ns per write), taking 2000 ns. The reader is active for 40% of the time and idle for 60% of the time. In the worst-case scenario, the reader goes completely idle for the first 60% of the write window (which is 2000 ns × 0.60 = 1200 ns). During this 1200 ns idle stretch, data is written continuously but absolutely nothing is read! The FIFO must store all the writes that pile up during this idle period: 1200 ns / 20 ns = 60 items.
- fwrite / fread: 200 MHz (5 ns)
- Write Rate: 40 / 60 cycles
- Read Rate: 8 / 10 cycles
- Burst Length: 80 items
Tburst = 80 × 5 ns = 400 ns
Reads during Write Window = 80 × (8 / 10) = 64 reads
FIFO Depth = Burst Length - Reads during Write Window = 80 - 64 = 16 items
In a random distribution, the absolute worst case is when the writes clump together as close as possible—a back-to-back burst! So, the 80 write operations happen consecutively in 80 clock cycles. At 200 MHz (5 ns per cycle), this burst window lasts 400 ns. The uniform reader reads at a rate of 8 items every 10 clock cycles, which means it completes 80 × 0.8 = 64 reads in that same window. The FIFO must hold the leftover 80 - 64 = 16 items.
FIFO RTL Implementation
1. Synchronous FIFO RTL
Below is a synthesizable Verilog implementation of a basic Synchronous FIFO buffer with full/empty flags.
module sync_fifo #(
parameter DATA_WIDTH = 8,
parameter ADDR_WIDTH = 4,
parameter DEPTH = 16
)(
input wire clk,
input wire rst_n,
input wire wr_en,
input wire rd_en,
input wire [DATA_WIDTH-1:0] din,
output reg [DATA_WIDTH-1:0] dout,
output wire full,
output wire empty
);
reg [DATA_WIDTH-1:0] ram [DEPTH-1:0];
reg [ADDR_WIDTH:0] wptr; // Extra bit to differentiate full/empty
reg [ADDR_WIDTH:0] rptr;
// Flags logic
assign empty = (wptr == rptr);
assign full = (wptr[ADDR_WIDTH] != rptr[ADDR_WIDTH]) &&
(wptr[ADDR_WIDTH-1:0] == rptr[ADDR_WIDTH-1:0]);
// Write Logic
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
wptr <= 0;
end else if (wr_en && !full) begin
ram[wptr[ADDR_WIDTH-1:0]] <= din;
wptr <= wptr + 1'b1;
end
end
// Read Logic
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
rptr <= 0;
dout <= 0;
end else if (rd_en && !empty) begin
dout <= ram[rptr[ADDR_WIDTH-1:0]];
rptr <= rptr + 1'b1;
end
end
endmodule