| .. | ||
| examples | ||
| src | ||
| tests | ||
| .env.example | ||
| AUTO_TRADING.md | ||
| DEPLOYMENT.md | ||
| docker-compose.ollama.yml | ||
| Dockerfile | ||
| environment.yml | ||
| IMPLEMENTATION_SUMMARY.md | ||
| pyproject.toml | ||
| README.md | ||
| requirements.txt | ||
LLM Agent Service
AI-powered trading copilot with local GPU support for OrbiQuant IA Trading Platform.
Overview
This service provides an intelligent trading agent that runs locally on GPU using Ollama, with support for:
- Local LLM Inference: Run Llama 3, Mistral, and other models on your GPU
- Trading Analysis: Technical analysis, market sentiment, AMD phase identification
- Function Calling (Tools): Get real-time data, ML signals, portfolio info
- Educational Support: Explain concepts, recommend learning paths
- Risk Management: Position sizing, stop loss recommendations
- Streaming Responses: Real-time chat with Server-Sent Events
Why Local LLM?
- Privacy: Your trading conversations stay on your hardware
- Cost: No API costs after initial setup
- Speed: Low latency with local GPU
- Customization: Full control over model and prompts
- Always Available: No internet dependency for inference
Technology Stack
- Python: 3.11+
- LLM Provider: Ollama (local GPU) + Claude/OpenAI fallback
- Models: Llama 3 8B (recommended), Mistral 7B, Mixtral 8x7b
- API Framework: FastAPI with SSE streaming
- Tools System: Custom function calling for trading operations
- Database: PostgreSQL (asyncpg)
- Cache: Redis
- Testing: pytest, pytest-asyncio
Setup
Prerequisites
- GPU: NVIDIA RTX 5060 Ti (16GB VRAM) or better
- RAM: 16GB minimum, 32GB recommended
- OS: Linux (Ubuntu 20.04+) or WSL2
- Software:
- Docker with NVIDIA Container Toolkit
- Python 3.11+
- Miniconda or Anaconda
Quick Start
- Start Ollama:
cd /home/isem/workspace/projects/trading-platform/apps/llm-agent
# Start Ollama with GPU support
docker-compose -f docker-compose.ollama.yml up -d
# Pull Llama 3 8B model (recommended for 16GB VRAM)
docker exec orbiquant-ollama ollama pull llama3:8b
- Configure Service:
# Copy environment file
cp .env.example .env
# Edit configuration (LLM_PROVIDER=ollama is default)
nano .env
- Install Dependencies:
# Create conda environment
conda env create -f environment.yml
conda activate orbiquant-llm-agent
# Or install with pip
pip install -r requirements.txt
- Start the Service:
# Development mode with hot-reload
uvicorn src.main:app --reload --host 0.0.0.0 --port 8003
- Test it:
Open in browser: http://localhost:8003/docs
# Or via curl
curl http://localhost:8003/api/v1/health
Detailed Setup
See DEPLOYMENT.md for:
- Complete installation guide
- GPU configuration
- Model selection
- Performance tuning
- Troubleshooting
Project Structure
llm-agent/
├── src/
│ ├── main.py # FastAPI application
│ ├── config.py # Configuration management
│ ├── core/ # Core LLM functionality
│ │ ├── llm_client.py # Ollama/Claude client
│ │ ├── prompt_manager.py # System prompts
│ │ └── context_manager.py # Conversation context
│ ├── tools/ # Trading tools (function calling)
│ │ ├── base.py # Tool base classes
│ │ ├── signals.py # ML signals & analysis
│ │ ├── portfolio.py # Portfolio management
│ │ ├── trading.py # Trading execution
│ │ └── education.py # Educational tools
│ ├── prompts/ # System prompts
│ │ ├── system.txt # Main trading copilot prompt
│ │ ├── analysis.txt # Analysis template
│ │ └── strategy.txt # Strategy template
│ ├── api/ # API routes
│ │ └── routes.py # All endpoints
│ ├── models/ # Pydantic models
│ ├── services/ # Business logic
│ └── repositories/ # Data access layer
├── tests/
├── docker-compose.ollama.yml # Ollama GPU setup
├── DEPLOYMENT.md # Detailed deployment guide
├── requirements.txt
├── environment.yml
└── .env.example
API Endpoints
Core Endpoints
GET /- Service info and healthGET /api/v1/health- Detailed health check with LLM statusGET /api/v1/models- List available LLM models
Chat & Analysis
POST /api/v1/chat- Interactive chat (supports streaming)POST /api/v1/analyze- Comprehensive symbol analysisPOST /api/v1/strategy- Generate trading strategyPOST /api/v1/explain- Explain trading concepts
Tools & Context
GET /api/v1/tools- List available tools for user planDELETE /api/v1/context/{user_id}/{conversation_id}- Clear conversation
See interactive docs at: http://localhost:8003/docs
Development
Code Quality
# Format code
black src/
isort src/
# Lint
flake8 src/
# Type checking
mypy src/
Testing
# Run all tests
pytest
# With coverage
pytest --cov=src --cov-report=html
# Specific tests
pytest tests/unit/
Trading Tools (Function Calling)
The agent has access to these tools:
Market Data (Free)
get_analysis- Current price, volume, 24h changeget_news- Recent news with sentiment analysiscalculate_position_size- Risk-based position sizing
ML & Signals (Pro/Premium)
get_signal- ML predictions with entry/exit levelscheck_portfolio- Portfolio overview and P&Lget_positions- Detailed position informationget_trade_history- Historical trades with metrics
Trading (Pro/Premium)
execute_trade- Execute paper trading ordersset_alert- Create price alerts
Education (Free)
explain_concept- Explain trading terms (RSI, MACD, AMD, etc.)get_course_info- Recommend learning resources
Tools are automatically filtered based on user subscription plan (Free/Pro/Premium).
System Prompt & Trading Philosophy
The agent is trained with a comprehensive system prompt that includes:
- AMD Framework: Identifies Accumulation, Manipulation, and Distribution phases
- Risk Management: Always prioritizes proper position sizing and stop losses
- Educational Approach: Explains the "why" behind every recommendation
- Multi-timeframe Analysis: Considers multiple timeframes for context
- Data-Driven: Uses tools to fetch real data, never invents prices
The agent will:
- ✅ Provide educational analysis with clear risk management
- ✅ Explain concepts in simple terms for all levels
- ✅ Use real market data via tools
- ✅ Warn against risky behavior
- ❌ NEVER give financial advice or guarantee returns
- ❌ NEVER invent market data or prices
Architecture
Built following SOLID principles:
- LLM Client Abstraction: Unified interface for Ollama/Claude/OpenAI
- Tool Registry: Dynamic tool loading with permission checking
- Context Manager: Maintains conversation history efficiently
- Prompt Manager: Centralized prompt templates
- Streaming Support: Real-time responses with SSE
Configuration
Key environment variables:
# LLM Provider
LLM_PROVIDER=ollama # ollama, claude, or openai
OLLAMA_BASE_URL=http://localhost:11434
LLM_MODEL=llama3:8b
# Service URLs
BACKEND_URL=http://localhost:8000
DATA_SERVICE_URL=http://localhost:8001
ML_ENGINE_URL=http://localhost:8002
# Optional: Claude fallback
# ANTHROPIC_API_KEY=sk-ant-xxx
# Database & Cache
DATABASE_URL=postgresql://...
REDIS_URL=redis://localhost:6379/0
Model Recommendations
For RTX 5060 Ti (16GB VRAM):
| Model | Size | Speed | Quality | Best For |
|---|---|---|---|---|
llama3:8b |
4.7GB | ⚡⚡⚡ | ⭐⭐⭐⭐ | Recommended - Best balance |
mistral:7b |
4.1GB | ⚡⚡⚡⚡ | ⭐⭐⭐ | Fast responses, good quality |
llama3:70b |
40GB+ | ⚡ | ⭐⭐⭐⭐⭐ | Requires 40GB+ VRAM |
mixtral:8x7b |
26GB | ⚡⚡ | ⭐⭐⭐⭐ | Requires 32GB+ VRAM |
Recommendation: Start with llama3:8b for your hardware.
Documentation
- DEPLOYMENT.md - Complete deployment guide
- API Documentation - Interactive API docs
- Specification Docs - Technical specifications
License
Proprietary - OrbiQuant IA Trading Platform