Jagadeesh Mummana

Visakhapatnam, India

Electronics undergrad exploring chip design, robotics, and real-world ML systems. This is my little corner of the internet, where you’ll find my work, ideas, and experiences.

Guestbook

Education

Bachelor of Technology in Electronics and Communication Engineering

2023 - 2027 | Calicut, Kerala, India

Grade: 8.66/10 (CGPA)

Lab Involvement

Technical Member

Nov 2024 - Present

Kerala, India

Working on several real-world interdisciplinary projects as part of robotics enthusiast teams while representing the institute on competitive platforms

Featured Projects

INT8 Fixed-Point CNN Hardware Accelerator and Image-Processing Suite

INT8 Fixed-Point CNN Hardware Accelerator and Image-Processing Suite

View Project
Two-Stage CMOS Op-Amp with Miller Compensation

Two-Stage CMOS Op-Amp with Miller Compensation

Autonomous Drone for GNSS-Denied Environments (ISRO IRoC-U 2025)

Autonomous Drone for GNSS-Denied Environments (ISRO IRoC-U 2025)

View Project

Blogs

Optimising a Pipelined RISC-V Core: From Naive Pipeline to Near-Superscalar Performance

April 5, 2026

Optimising a Pipelined RISC-V Core: From Naive Pipeline to Near-Superscalar Performance

This post walks through the complete optimization journey of a single-issue pipelined RV32I core, from a plain five-stage implementation all the way to a version that runs within 2.3% of a superscalar design on CoreMark. Every number here comes from actual simulation runs on my own implementation. This is not a theoretical survey – it is what happened, step by step, with real cycle counts.

Before getting into the details, a disclaimer worth repeating throughout: modern high-performance processors like those from ARM, Intel, or Apple are not just superscalar, they are wide out-of-order superscalar with speculative execution, branch prediction trained on billions of cycles of silicon, register renaming, reorder buffers, and execution units that would make what is described here look like a toy. That context matters. The goal here was never to match a Cortex-A or an M-series core. The goal was to understand what headroom exists in a single-issue pipeline, and how far it can be pushed before the fundamental architectural limit of issuing one instruction per cycle becomes the ceiling.