Robotics & Vision

Vision-Guided 6-DoF Robotic Arm

Perception → Planning → Real-Time Control

Ongoing

Robotics & Vision

Degrees of Freedom

30+

Detection FPS

94.7%

Grasp Accuracy

<50 ms

End-to-End Latency

Overview

An end-to-end manipulation stack: a TensorRT-accelerated YOLOv8n detector feeds a hand-eye calibrated grasp planner, which solves inverse kinematics on-device and streams joint targets to an STM32F4 running a 1 kHz PID loop over a binary UART protocol. The system is built to expose every layer — perception, planning, kinematics, firmware — for inspection and retraining.

The Problem

Most off-the-shelf arms ship as black boxes — fixed firmware, no vision, no path to closed-loop perception. Building a research-grade manipulation platform usually costs five figures and still hides the control layer. The goal here was a transparent, low-cost arm where every layer (mechanics, firmware, kinematics, perception) is auditable and modifiable, and where a new ML model can be deployed without rewriting the stack.

The Approach

Six MG996R/DS3225 servos are driven by an STM32F4 running a 1 kHz PID loop with anti-windup and feed-forward compensation. A Jetson Nano hosts the perception stack: a fine-tuned YOLOv8n model exported to TensorRT, a hand-eye calibrated grasp planner, and a Jacobian-based numerical IK solver with an analytical fast path. Firmware and host communicate over a length-prefixed binary UART protocol with CRC16 framing, keeping control jitter under 200 µs even when vision saturates the link.

Results

Sub-50 ms perception-to-actuation latency, 94.7% grasp success across 12 object classes in cluttered scenes, and a total BOM under $250. The full stack — firmware, kinematics, training scripts, calibration tools — is open-source and reproducible from a single makefile.

Vision Pipeline Demo

Object Recognition Pipeline — Upload & Detect

Drag & drop an image, or browse

PNG, JPG up to 10MB

object_01 — 97.2%

object_02 — 91.8%

Detections

94.5%

Avg Conf

23ms

Latency

Process & Timeline

Phase 1
Mechanical design
Fusion 360 link-length optimization for reachable workspace under servo torque limits; 3D-printed structural parts with metal-geared joints.
Phase 2
Real-time firmware
Bare-metal STM32F4 PID loop at 1 kHz, anti-windup, feed-forward, and a CRC16-framed UART protocol.
Phase 3
Kinematics
Analytical IK for the 6-DoF chain with a Jacobian-based numerical fallback near singularities.
Phase 4
Perception
4k-image dataset, YOLOv8n fine-tune, TensorRT export, and hand-eye calibration into a unified grasp planner.
Phase 5
Closed-loop integration
End-to-end latency budgeting, jitter measurement, and a reproducible benchmark across object classes and lighting.

Like what you see?

I'm always open to collaborations on AI, robotics, edge computing, or embedded systems.

Get in Touch

Vision-Guided 6-DoF Robotic Arm

Overview

The Problem

The Approach

Results

Vision Pipeline Demo

Process & Timeline

Mechanical design

Real-time firmware

Kinematics

Perception

Closed-loop integration

Like what you see?