Loading…

Jagadeesh Mummana

Visakhapatnam, India

Electronics undergrad exploring chip design, robotics, and real-world ML systems. This is my little corner of the internet, where you’ll find my work, ideas, and experiences.

Visit the Guestbook

Education

Bachelor of Technology in Electronics and Communication Engineering

2023 - 2027 | Calicut, Kerala, India

Grade: 8.66/10 (CGPA)

Lab Involvement

Technical Member

Nov 2024 - Present

Kerala, India

Working on several real-world interdisciplinary projects as part of robotics enthusiast teams while representing the institute on competitive platforms

Featured Projects

INT8 Fixed-Point CNN Hardware Accelerator and Image-Processing Suite

INT8 Fixed-Point CNN Hardware Accelerator and Image-Processing Suite

View Project
Two-Stage CMOS Op-Amp with Miller Compensation

Two-Stage CMOS Op-Amp with Miller Compensation

Autonomous Drone for GNSS-Denied Environments (ISRO IRoC-U 2025)

Autonomous Drone for GNSS-Denied Environments (ISRO IRoC-U 2025)

View Project

Blogs

Optimising a Pipelined RISC-V Core: From Naive Pipeline to Near-Superscalar Performance

April 5, 2026

Optimising a Pipelined RISC-V Core: From Naive Pipeline to Near-Superscalar Performance

This post walks through the complete optimization journey of a single-issue pipelined RV32I core, from a plain five-stage implementation all the way to a version that runs within 2.3% of a superscalar design on CoreMark. Every number here comes from actual simulation runs on my own implementation. This is not a theoretical survey – it is what happened, step by step, with real cycle counts.

Before getting into the details, a disclaimer worth repeating throughout: modern high-performance processors like those from ARM, Intel, or Apple are not just superscalar, they are wide out-of-order superscalar with speculative execution, branch prediction trained on billions of cycles of silicon, register renaming, reorder buffers, and execution units that would make what is described here look like a toy. That context matters, and so does what this comparison intentionally excludes. All CoreMark/MHz figures here are frequency-normalized because superscalar and superpipelined designs face harder timing closure, with wider forwarding muxes, more register file ports, and longer critical paths. Collapsing everything into a single clock frequency number would obscure the microarchitectural comparison. The goal here was never to match a Cortex-A or an M-series core.