Robotics & Vision

Vision-Guided 6-DoF Robotic Arm

Perception → Planning → Real-Time Control

Ongoing
Robotics & Vision
6
Degrees of Freedom
30+
Detection FPS
94.7%
Grasp Accuracy
<50 ms
End-to-End Latency

Overview

An end-to-end manipulation stack: a TensorRT-accelerated YOLOv8n detector feeds a hand-eye calibrated grasp planner, which solves inverse kinematics on-device and streams joint targets to an STM32F4 running a 1 kHz PID loop over a binary UART protocol. The system is built to expose every layer — perception, planning, kinematics, firmware — for inspection and retraining.

The Problem

Most off-the-shelf arms ship as black boxes — fixed firmware, no vision, no path to closed-loop perception. Building a research-grade manipulation platform usually costs five figures and still hides the control layer. The goal here was a transparent, low-cost arm where every layer (mechanics, firmware, kinematics, perception) is auditable and modifiable, and where a new ML model can be deployed without rewriting the stack.

The Approach

Six MG996R/DS3225 servos are driven by an STM32F4 running a 1 kHz PID loop with anti-windup and feed-forward compensation. A Jetson Nano hosts the perception stack: a fine-tuned YOLOv8n model exported to TensorRT, a hand-eye calibrated grasp planner, and a Jacobian-based numerical IK solver with an analytical fast path. Firmware and host communicate over a length-prefixed binary UART protocol with CRC16 framing, keeping control jitter under 200 µs even when vision saturates the link.

Results

Sub-50 ms perception-to-actuation latency, 94.7% grasp success across 12 object classes in cluttered scenes, and a total BOM under $250. The full stack — firmware, kinematics, training scripts, calibration tools — is open-source and reproducible from a single makefile.

Vision Pipeline Demo

Object Recognition Pipeline — Upload & Detect

Drag & drop an image, or browse

PNG, JPG up to 10MB

object_01 — 97.2%
object_02 — 91.8%
2
Detections
94.5%
Avg Conf
23ms
Latency

Process & Timeline

  1. Phase 1

    Mechanical design

    Fusion 360 link-length optimization for reachable workspace under servo torque limits; 3D-printed structural parts with metal-geared joints.

  2. Phase 2

    Real-time firmware

    Bare-metal STM32F4 PID loop at 1 kHz, anti-windup, feed-forward, and a CRC16-framed UART protocol.

  3. Phase 3

    Kinematics

    Analytical IK for the 6-DoF chain with a Jacobian-based numerical fallback near singularities.

  4. Phase 4

    Perception

    4k-image dataset, YOLOv8n fine-tune, TensorRT export, and hand-eye calibration into a unified grasp planner.

  5. Phase 5

    Closed-loop integration

    End-to-end latency budgeting, jitter measurement, and a reproducible benchmark across object classes and lighting.

Like what you see?

I'm always open to collaborations on AI, robotics, edge computing, or embedded systems.