Vision-Guided 6-DoF Robotic Arm
Perception → Planning → Real-Time Control
Overview
An end-to-end manipulation stack: a TensorRT-accelerated YOLOv8n detector feeds a hand-eye calibrated grasp planner, which solves inverse kinematics on-device and streams joint targets to an STM32F4 running a 1 kHz PID loop over a binary UART protocol. The system is built to expose every layer — perception, planning, kinematics, firmware — for inspection and retraining.
The Problem
Most off-the-shelf arms ship as black boxes — fixed firmware, no vision, no path to closed-loop perception. Building a research-grade manipulation platform usually costs five figures and still hides the control layer. The goal here was a transparent, low-cost arm where every layer (mechanics, firmware, kinematics, perception) is auditable and modifiable, and where a new ML model can be deployed without rewriting the stack.
The Approach
Six MG996R/DS3225 servos are driven by an STM32F4 running a 1 kHz PID loop with anti-windup and feed-forward compensation. A Jetson Nano hosts the perception stack: a fine-tuned YOLOv8n model exported to TensorRT, a hand-eye calibrated grasp planner, and a Jacobian-based numerical IK solver with an analytical fast path. Firmware and host communicate over a length-prefixed binary UART protocol with CRC16 framing, keeping control jitter under 200 µs even when vision saturates the link.
Results
Sub-50 ms perception-to-actuation latency, 94.7% grasp success across 12 object classes in cluttered scenes, and a total BOM under $250. The full stack — firmware, kinematics, training scripts, calibration tools — is open-source and reproducible from a single makefile.
Vision Pipeline Demo
Object Recognition Pipeline — Upload & Detect
Drag & drop an image, or browse
PNG, JPG up to 10MB
Process & Timeline
- Phase 1
Mechanical design
Fusion 360 link-length optimization for reachable workspace under servo torque limits; 3D-printed structural parts with metal-geared joints.
- Phase 2
Real-time firmware
Bare-metal STM32F4 PID loop at 1 kHz, anti-windup, feed-forward, and a CRC16-framed UART protocol.
- Phase 3
Kinematics
Analytical IK for the 6-DoF chain with a Jacobian-based numerical fallback near singularities.
- Phase 4
Perception
4k-image dataset, YOLOv8n fine-tune, TensorRT export, and hand-eye calibration into a unified grasp planner.
- Phase 5
Closed-loop integration
End-to-end latency budgeting, jitter measurement, and a reproducible benchmark across object classes and lighting.
Like what you see?
I'm always open to collaborations on AI, robotics, edge computing, or embedded systems.