Enikk is an open-source GUI Agent framework for desktop automation — no APIs required, just visual understanding.
In this demo, Enikk autonomously controls NetEase Music (网易云音乐), demonstrating how the framework can interact with any desktop application through YOLO-based UI element detection, OCR, and LLM-driven decision making.
🔗 GitHub: https://github.com/gtt116/enikk
Key Features:
• YOLO icon detection for precise UI element localization
• RapidOCR for text recognition
• LLM-powered semantic decision making
• FastAPI server + Web Dashboard with real-time monitoring
• Supports any OpenAI-compatible API (Qwen, Claude, GPT)
Tech Stack: FastAPI + Hermes Agent + Vue 3 + YOLO + ONNX Runtime
#AI #GUIAgent #DesktopAutomation #OpenSource #LLM