ImProVe: Image Processing using Verilog

ImProVe: Image Processing using Verilog

Verilog SystemVerilog Image Processing Computer Vision Python OpenCV
ImProVe: Image Processing using Verilog
Image Processing: Selected Results
A few results are included below at end of this page.
View Project
ImProVe: Image Processing using Verilog
Image Processing: Streaming Version
ISP with multiple modes built on top of existing workflow
View Project
ImProVe: Image Processing using Verilog
Label Detection
Edge-based label localization using Prewitt operator and contour extraction.
Document Scanner
Edge detection, corner detection, and perspective correction for document extraction.
ImProVe: Image Processing using Verilog
Stereo Vision
Disparity and depth map computation implemented in Verilog.
ImProVe: Image Processing using Verilog
MNIST Digit Recognition
Fully connected neural network implemented in Verilog using fixed-point arithmetic.
View Project
ImProVe: Image Processing using Verilog
OCR using EMNIST
Multi-layer neural network for 62-class alphanumeric recognition in Verilog.
View Project

Overview

ImProVe (IMage PROcessing using VErilog) is an individual project focused on implementing core image processing algorithms directly in Verilog for hardware-oriented deployment. The primary objective is to accelerate image processing by exploiting the parallelism inherent in hardware architectures.

The project emphasizes the design of modular, reusable processing blocks suitable for FPGA/ASIC realization, while preserving a clear understanding of the mathematical foundations behind each algorithm.

Repository Mummanajagadeesh/ImProVe
Start Date 27 Nov 2024

The work began with geometric rotation experiments and evolved into a structured framework covering edge detection, filtering, geometric transformations, stereo vision, and neural-network-related modules.

The hardware implementations use AXI-Stream interfaces for image input/output, while OpenCL-based implementations are used for functional verification and numerical comparison.


Motivation

The project originated from a practical need to rotate and scan handwritten notes during exam preparation. The initial question was whether image rotation could be implemented in Verilog.

It started as RoVer (Rotation using Verilog)- and gradually expanded to include:

  • Edge detection
  • Noise reduction
  • Thresholding
  • Geometric transformations
  • Neural network inference
  • OCR

Each algorithm was implemented while learning its mathematical foundation.


Design Approach

  • Image data is converted into text-based pixel representations using Python.
  • Verilog modules operate on pixel arrays.
  • Results are written back to text files.
  • Python is used for visualization and validation.
  • Later versions replace file I/O with synthesizable memory blocks.

Key design goals:

  • Replace non-synthesizable constructs.
  • Use fixed-point arithmetic.
  • Eliminate $cos, $sin, $sqrt, $exp via CORDIC-based implementations.
  • Move simulation-only constructs to testbenches.

Implemented Functionalities

Edge Detection and Feature Extraction

  • Sobel Operator
  • Prewitt Operator
  • Roberts Cross Operator
  • Robinson Compass Operator
  • Kirsch Compass Operator
  • Laplacian Operator
  • Laplacian of Gaussian (LoG)
  • Canny Edge Detection
  • Emboss Filter
  • Moravec Corner Detection

Noise Reduction and Smoothing

  • Gaussian Blur
  • Median Filter
  • Box Filter
  • Bilateral Filter

Thresholding and Binarization

  • Global Thresholding
  • Adaptive Thresholding
  • Otsu’s Method
  • Color Thresholding

Geometric Transformations

  • Rotation
  • Scaling
  • Translation
  • Shearing
  • Cropping
  • Reflection
  • 3D Homogeneous Perspective Transformation

Color and Intensity Transformations

  • Negative Transformation
  • Inversion
  • Sepia
  • Brightness Adjustment
  • Contrast Adjustment
  • Gamma Correction
  • Saturation Adjustment
  • Sharpness Enhancement

Applications

Label Detection

Process:

  1. Split RGB channels using Python.
  2. Convert to grayscale using NTSC luminance formula.
  3. Apply Gaussian blur if required.
  4. Apply Prewitt operator.
  5. Perform flood-fill to detect largest contour.
  6. Draw bounding box.
  7. Superimpose on original image.

Implementation uses Verilog for processing and Python for visualization.

Sample Results

Original Image After Vertical Prewitt After Horizontal Prewitt After Full Prewitt
Original Image 1 After Vertical Prewitt After Horizontal Prewitt After Full Prewitt
Original Image After Full Prewitt Binary Box Overlayed Image with Box
Original Image 1 After Full Prewitt Binary Box Overlayed Image with Box
Original Image After Full Prewitt Binary Box Overlayed Image with Box
Original Image 2 After Full Prewitt Binary Box Overlayed Image with Box

Document Scanner

Process:

  • Canny edge detection
  • Boundary fill
  • Boolean filtering
  • Moravec corner detection
  • Bresenham line drawing
  • Perspective mapping
  • Shearing and scaling refinement

Current issue: Bresenham implementation refinement.


Stereo Vision

This module implements a complete stereo matching pipeline using calibrated stereo image pairs. The workflow includes grayscale conversion, disparity estimation, depth computation using calibration parameters, and 3D reconstruction support.

The depth (Z) is computed from disparity (d) using:

\[ Z = \frac{baseline \times f}{d + doffs} \]

where focal length and baseline are obtained from the calibration file.


Sample Results

Left Image Right Image Disparity / Depth Map
Left Image Right Image Disparity and Depth Map

Overview

  • Stereo image rectification using calibration matrices
  • Disparity map computation along horizontal epipolar lines
  • Depth estimation from disparity and baseline
  • Intermediate result storage for verification
  • Python-based 3D reconstruction (point cloud / mesh generation)

Current Limitation: Depth accuracy refinement and sub-pixel disparity optimization are under improvement.


MNIST Digit Recognition (Neural Network)

Dataset: MNIST (28×28, 784 inputs)

Architecture:

  • Input layer: 784 neurons
  • Hidden layer: 128 neurons (ReLU)
  • Output layer: 10 neurons

Training:

  • Implemented in Python using NumPy
  • 500 iterations
  • Learning rate: 0.1
  • 90% accuracy

Hardware Adaptation:

  • Weights scaled by 10,000
  • Stored as integers in text files
  • No softmax in hardware
  • Maximum activation used for classification
  • Fixed-point format (Q24.8 under development)

Text files converted into synthesizable register modules using Python scripts.

Only testbench contains $display, $finish, and file operations.


OCR (EMNIST – 62 Classes)

Dataset: EMNIST ByClass Classes: 0–9, A–Z, a–z

Architecture:

  • Input: 784
  • Hidden1: 256
  • Hidden2: 128
  • Output: 62

Training:

  • SGD and Adam optimizer
  • ReLU activations
  • Integer scaling for hardware compatibility

Inference in Verilog:

  • Matrix multiplications

  • ReLU activations

  • Maximum output selection

  • Character mapping to:

    "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
    

Python automation scripts generate:

  • Image memory modules
  • Weight memory modules
  • Bias memory modules

All instantiated in a top-level module with a dedicated testbench.

A coarse-grained pipelined fully connected network using FSM and Taylor-series-based Softmax approximation was also implemented for improved throughput.


Tools

  • Verilog
  • SystemVerilog
  • Icarus Verilog 12.0
  • Xilinx Vivado
  • Python 3.12
  • OpenCV

AXI Streaming Acceleration

This section restructures the grayscale hardware model into a fully AXI-Stream–compliant streaming architecture. While grayscale is used as the reference operation, the same infrastructure applies to any pixel-wise or neighborhood-based accelerator (e.g., filtering, edge detection, thresholding).


Transition to Streaming Architecture

The initial RTL grayscale design implemented direct combinational arithmetic with a simple pipeline register. Although functionally correct, it assumed continuous data availability and did not model realistic flow control.

The revised architecture introduces:

  • AXI-Stream input and output interfaces
  • Input and output FIFOs
  • Skid buffering for timing isolation
  • Valid/ready handshake compliance
  • Configurable parallel processing lanes (LANES)

This transforms the design into a throughput-aware, backpressure-safe streaming accelerator.


AXI-Stream Dataflow

Input AXI Stream → Input FIFO → Pipeline Register → Parallel Processing Lanes → Output FIFO → Output AXI Stream

Key characteristics:

  • Proper tvalid/tready handshake behavior
  • Elastic buffering for stall tolerance
  • Deterministic cycle-level throughput modeling
  • Linear scalability with lane count

Architectural Comparison

Static Combination AXI-Stream Architecture
Assumes ideal data flow Handshake-driven flow control
No stall modeling Backpressure-safe
Limited integration capability SoC-ready streaming block
Minimal structural realism Hardware-accurate architecture

Performance Characteristics

  • Latency: 1 pipeline cycle (core) + FIFO buffering

  • Throughput:

    • 1 pixel/cycle for LANES = 1
    • N pixels/cycle for LANES = N
  • Scales with clock frequency and lane parallelism


OpenCL vs RTL Output Comparison

Numerical comparison between floating-point OpenCL output and fixed-point RTL streaming output:

Metric Value
MAE 8694.96 (0.132677)
RMSE 12930.8 (0.197311)
PSNR 14.097 dB

Minor deviations are expected due to fixed-point coefficient approximation in RTL versus floating-point arithmetic in OpenCL.


Verification and Status

  • AXI-Stream compliant accelerator
  • Functional equivalence validated against OpenCL reference
  • Parameterized multi-lane parallelism
  • FIFO-based elastic buffering for realistic simulation

This implementation uses AXI-Stream for image input/output handling and OpenCL for functional verification and numerical benchmarking.

Selected Image Processing Results

Below are some of the best results from my image processing work. While there are many more images, including all of them here without relevant explanations would not be meaningful. For a detailed breakdown of the implementation and the mathematical concepts behind each operation, refer to the repository.


Edge Detection – Prewitt Operator

Edge Detection using Prewitt Operator

Corner Detection – Moravec

Corner Detection using Moravec Operator

Noise Reduction – Gaussian Blur

Gaussian Blur for Noise Reduction

Thresholding – Otsu’s Method

Otsu Thresholding

Otsu Thresholding Histogram

Geometric Transformations

Rotation with Same Dimensions Rotation with Same Dimensions

Rotation with Diagonal Dimensions Rotation with Diagonal Dimensions

Scaling Image Scaling

Translation Image Translation

Shearing Image Shearing

Cropping Image Cropping

Reflection (Both Axes) Reflection Across Both Axes

3D Homogeneous Perspective Transformation Homogeneous Perspective Transformation


Color and Intensity Transformations

Gamma Correction Gamma Correction

Image Inversion Image Inversion

Sepia Effect Sepia Effect

Negative Transformation Negative Transformation

Grayscale Conversion Grayscale Conversion

Contrast Adjustment Contrast Adjustment

Brightness Adjustment Brightness Adjustment

Saturation Adjustment Saturation Adjustment

Sharpness Enhancement Sharpness Enhancement


For more insights into the implementation, visit the repository for a comprehensive explanation of the mathematical foundations behind each operation

Digital Image Processing
Mathematics for Engineering and Computing
Verilog
CORDIC Algorithm Resources
Datasets

Contributors

Feel free to contribute by submitting pull requests or feature suggestions!

If interested in working together, do drop a DM or mail 🙂