trading-platform/apps/llm-agent/README.md

287 lines
8.6 KiB
Markdown

# LLM Agent Service
AI-powered trading copilot with local GPU support for OrbiQuant IA Trading Platform.
## Overview
This service provides an intelligent trading agent that runs locally on GPU using **Ollama**, with support for:
- **Local LLM Inference:** Run Llama 3, Mistral, and other models on your GPU
- **Trading Analysis:** Technical analysis, market sentiment, AMD phase identification
- **Function Calling (Tools):** Get real-time data, ML signals, portfolio info
- **Educational Support:** Explain concepts, recommend learning paths
- **Risk Management:** Position sizing, stop loss recommendations
- **Streaming Responses:** Real-time chat with Server-Sent Events
## Why Local LLM?
- **Privacy:** Your trading conversations stay on your hardware
- **Cost:** No API costs after initial setup
- **Speed:** Low latency with local GPU
- **Customization:** Full control over model and prompts
- **Always Available:** No internet dependency for inference
## Technology Stack
- **Python**: 3.11+
- **LLM Provider**: Ollama (local GPU) + Claude/OpenAI fallback
- **Models**: Llama 3 8B (recommended), Mistral 7B, Mixtral 8x7b
- **API Framework**: FastAPI with SSE streaming
- **Tools System**: Custom function calling for trading operations
- **Database**: PostgreSQL (asyncpg)
- **Cache**: Redis
- **Testing**: pytest, pytest-asyncio
## Setup
### Prerequisites
- **GPU:** NVIDIA RTX 5060 Ti (16GB VRAM) or better
- **RAM:** 16GB minimum, 32GB recommended
- **OS:** Linux (Ubuntu 20.04+) or WSL2
- **Software:**
- Docker with NVIDIA Container Toolkit
- Python 3.11+
- Miniconda or Anaconda
### Quick Start
1. **Start Ollama:**
```bash
cd /home/isem/workspace/projects/trading-platform/apps/llm-agent
# Start Ollama with GPU support
docker-compose -f docker-compose.ollama.yml up -d
# Pull Llama 3 8B model (recommended for 16GB VRAM)
docker exec orbiquant-ollama ollama pull llama3:8b
```
2. **Configure Service:**
```bash
# Copy environment file
cp .env.example .env
# Edit configuration (LLM_PROVIDER=ollama is default)
nano .env
```
3. **Install Dependencies:**
```bash
# Create conda environment
conda env create -f environment.yml
conda activate orbiquant-llm-agent
# Or install with pip
pip install -r requirements.txt
```
4. **Start the Service:**
```bash
# Development mode with hot-reload
uvicorn src.main:app --reload --host 0.0.0.0 --port 8003
```
5. **Test it:**
Open in browser: http://localhost:8003/docs
```bash
# Or via curl
curl http://localhost:8003/api/v1/health
```
### Detailed Setup
See [DEPLOYMENT.md](./DEPLOYMENT.md) for:
- Complete installation guide
- GPU configuration
- Model selection
- Performance tuning
- Troubleshooting
## Project Structure
```
llm-agent/
├── src/
│ ├── main.py # FastAPI application
│ ├── config.py # Configuration management
│ ├── core/ # Core LLM functionality
│ │ ├── llm_client.py # Ollama/Claude client
│ │ ├── prompt_manager.py # System prompts
│ │ └── context_manager.py # Conversation context
│ ├── tools/ # Trading tools (function calling)
│ │ ├── base.py # Tool base classes
│ │ ├── signals.py # ML signals & analysis
│ │ ├── portfolio.py # Portfolio management
│ │ ├── trading.py # Trading execution
│ │ └── education.py # Educational tools
│ ├── prompts/ # System prompts
│ │ ├── system.txt # Main trading copilot prompt
│ │ ├── analysis.txt # Analysis template
│ │ └── strategy.txt # Strategy template
│ ├── api/ # API routes
│ │ └── routes.py # All endpoints
│ ├── models/ # Pydantic models
│ ├── services/ # Business logic
│ └── repositories/ # Data access layer
├── tests/
├── docker-compose.ollama.yml # Ollama GPU setup
├── DEPLOYMENT.md # Detailed deployment guide
├── requirements.txt
├── environment.yml
└── .env.example
```
## API Endpoints
### Core Endpoints
- `GET /` - Service info and health
- `GET /api/v1/health` - Detailed health check with LLM status
- `GET /api/v1/models` - List available LLM models
### Chat & Analysis
- `POST /api/v1/chat` - Interactive chat (supports streaming)
- `POST /api/v1/analyze` - Comprehensive symbol analysis
- `POST /api/v1/strategy` - Generate trading strategy
- `POST /api/v1/explain` - Explain trading concepts
### Tools & Context
- `GET /api/v1/tools` - List available tools for user plan
- `DELETE /api/v1/context/{user_id}/{conversation_id}` - Clear conversation
See interactive docs at: http://localhost:8003/docs
## Development
### Code Quality
```bash
# Format code
black src/
isort src/
# Lint
flake8 src/
# Type checking
mypy src/
```
### Testing
```bash
# Run all tests
pytest
# With coverage
pytest --cov=src --cov-report=html
# Specific tests
pytest tests/unit/
```
## Trading Tools (Function Calling)
The agent has access to these tools:
### Market Data (Free)
- `get_analysis` - Current price, volume, 24h change
- `get_news` - Recent news with sentiment analysis
- `calculate_position_size` - Risk-based position sizing
### ML & Signals (Pro/Premium)
- `get_signal` - ML predictions with entry/exit levels
- `check_portfolio` - Portfolio overview and P&L
- `get_positions` - Detailed position information
- `get_trade_history` - Historical trades with metrics
### Trading (Pro/Premium)
- `execute_trade` - Execute paper trading orders
- `set_alert` - Create price alerts
### Education (Free)
- `explain_concept` - Explain trading terms (RSI, MACD, AMD, etc.)
- `get_course_info` - Recommend learning resources
Tools are automatically filtered based on user subscription plan (Free/Pro/Premium).
## System Prompt & Trading Philosophy
The agent is trained with a comprehensive system prompt that includes:
- **AMD Framework:** Identifies Accumulation, Manipulation, and Distribution phases
- **Risk Management:** Always prioritizes proper position sizing and stop losses
- **Educational Approach:** Explains the "why" behind every recommendation
- **Multi-timeframe Analysis:** Considers multiple timeframes for context
- **Data-Driven:** Uses tools to fetch real data, never invents prices
The agent will:
- ✅ Provide educational analysis with clear risk management
- ✅ Explain concepts in simple terms for all levels
- ✅ Use real market data via tools
- ✅ Warn against risky behavior
- ❌ NEVER give financial advice or guarantee returns
- ❌ NEVER invent market data or prices
## Architecture
Built following SOLID principles:
- **LLM Client Abstraction:** Unified interface for Ollama/Claude/OpenAI
- **Tool Registry:** Dynamic tool loading with permission checking
- **Context Manager:** Maintains conversation history efficiently
- **Prompt Manager:** Centralized prompt templates
- **Streaming Support:** Real-time responses with SSE
## Configuration
Key environment variables:
```bash
# LLM Provider
LLM_PROVIDER=ollama # ollama, claude, or openai
OLLAMA_BASE_URL=http://localhost:11434
LLM_MODEL=llama3:8b
# Service URLs
BACKEND_URL=http://localhost:8000
DATA_SERVICE_URL=http://localhost:8001
ML_ENGINE_URL=http://localhost:8002
# Optional: Claude fallback
# ANTHROPIC_API_KEY=sk-ant-xxx
# Database & Cache
DATABASE_URL=postgresql://...
REDIS_URL=redis://localhost:6379/0
```
## Model Recommendations
For **RTX 5060 Ti (16GB VRAM):**
| Model | Size | Speed | Quality | Best For |
|-------|------|-------|---------|----------|
| `llama3:8b` | 4.7GB | ⚡⚡⚡ | ⭐⭐⭐⭐ | **Recommended** - Best balance |
| `mistral:7b` | 4.1GB | ⚡⚡⚡⚡ | ⭐⭐⭐ | Fast responses, good quality |
| `llama3:70b` | 40GB+ | ⚡ | ⭐⭐⭐⭐⭐ | Requires 40GB+ VRAM |
| `mixtral:8x7b` | 26GB | ⚡⚡ | ⭐⭐⭐⭐ | Requires 32GB+ VRAM |
**Recommendation:** Start with `llama3:8b` for your hardware.
## Documentation
- [DEPLOYMENT.md](./DEPLOYMENT.md) - Complete deployment guide
- [API Documentation](http://localhost:8003/docs) - Interactive API docs
- [Specification Docs](/docs/02-definicion-modulos/OQI-007-llm-agent/) - Technical specifications
## License
Proprietary - OrbiQuant IA Trading Platform