Getting Started
Set up Krawl locally and make your first research request
Prerequisites
- Python 3.12+
- PostgreSQL (for result persistence, memory, lookouts)
- API keys for search providers
Local Development
# Clone and install
git clone https://github.com/phdowling/krawl.git
cd krawl
pip install -e ".[dev]"Environment Variables
Create a .env file in the project root. Krawl uses pydantic-settings and loads from .env automatically.
Required
| Variable | Description |
|---|---|
EXA_API_KEY | Exa search API key — primary web search provider |
GITHUB_TOKEN | GitHub personal access token — repo/code search |
COINGECKO_API_KEY | CoinGecko Pro API key — token data, market cap, charts |
Required for LLM (at least one)
| Variable | Description |
|---|---|
AWS_BEDROCK_ACCESS_KEY_ID | AWS access key for Bedrock (primary LLM provider) |
AWS_BEDROCK_SECRET_ACCESS_KEY | AWS secret key for Bedrock |
AWS_BEDROCK_REGION | AWS region (default: us-east-1) |
ANTHROPIC_API_KEY | Anthropic direct API key (fallback provider, or primary if no Bedrock) |
Optional
| Variable | Default | Description |
|---|---|---|
MOGRA_API_KEY | "" | API key for endpoint auth. If empty, auth is disabled |
XAI_API_KEY | "" | xAI API key for X/Twitter search via Grok |
XAI_BASE_URL | https://api.x.ai/v1 | xAI API base URL |
X_BEARER_TOKEN | "" | X API v2 bearer token (direct, pay-per-use) |
FIRECRAWL_API_KEY | "" | Firecrawl for JS-heavy site scraping |
TAVILY_API_KEY | "" | Tavily search |
SERPER_API_KEY | "" | Serper (Google search) |
BRAVE_API_KEY | "" | Brave search |
NANSEN_API_KEY | "" | Nansen on-chain analytics |
DUNE_API_KEY | "" | Dune Analytics queries |
LUNARCRUSH_API_KEY | "" | LunarCrush social metrics |
COINGLASS_API_KEY | "" | Coinglass derivatives data |
MESSARI_API_KEY | "" | Messari research data |
DATABASE_URL | "" | PostgreSQL connection string |
OPENAI_API_KEY | "" | OpenAI API key (unused by default) |
Tuning Parameters
| Variable | Default | Description |
|---|---|---|
MAX_STEPS | 40 | Maximum agent tool-calling steps |
DEFAULT_BREADTH | 4 | Parallel queries per depth level |
DEFAULT_DEPTH | 3 | Recursion depth levels |
MAX_BREADTH | 10 | Maximum breadth per level |
MAX_DEPTH | 5 | Maximum recursion depth |
RATE_LIMIT | 5/minute | API rate limit (slowapi format) |
SYNTHESIS_TIMEOUT | 120.0 | Synthesis LLM call timeout (seconds) |
SYNTHESIS_MAX_SOURCES | 25 | Max sources included in synthesis |
VERIFY_CITATIONS | true | Enable retrieve-then-cite verification |
AUDIT_TRAIL | true | Enable audit trail logging |
LOOKOUT_MAX_PER_USER | 10 | Max active lookouts per API key |
LOOKOUT_MIN_INTERVAL_MINUTES | 60 | Minimum time between lookout runs |
Model Overrides
You can override the default models via env vars:
| Variable | Default |
|---|---|
MODEL_PLANNING | bedrock/us.anthropic.claude-sonnet-4-6 |
MODEL_RESEARCH | bedrock/us.anthropic.claude-sonnet-4-6 |
MODEL_SYNTHESIS | bedrock/us.anthropic.claude-opus-4-7 |
MODEL_QUERY_GEN | bedrock/us.anthropic.claude-haiku-4-5-20251001-v1:0 |
MODEL_X_SEARCH | grok-4.20-0309-non-reasoning |
LEARNING_EXTRACTION_MODEL | bedrock/us.anthropic.claude-sonnet-4-6 |
GAP_ANALYSIS_MODEL | bedrock/us.anthropic.claude-sonnet-4-6 |
Run the Server
uvicorn app.main:app --host 0.0.0.0 --port 8080 --reloadHealth Check
curl http://localhost:8080/health{"status": "ok", "version": "0.2.0"}First Research Request
curl -N -X POST http://localhost:8080/research \
-H "Content-Type: application/json" \
-d '{
"query": "What are the latest developments in AI agents?",
"mode": "deep",
"breadth": 4,
"depth_levels": 3
}'The -N flag disables output buffering so you see SSE events as they arrive.
First Search Request
For a simple single-source search without the full research pipeline:
curl -X POST http://localhost:8080/search \
-H "Content-Type: application/json" \
-d '{
"query": "AI agents 2025",
"source": "exa"
}'Authentication
When MOGRA_API_KEY is set, all endpoints require the X-API-Key header:
curl -H "X-API-Key: your-key" http://localhost:8080/healthWhen MOGRA_API_KEY is empty or unset, authentication is disabled and all endpoints are publicly accessible. The server logs a warning on startup when auth is disabled.
CORS
Default allowed origins:
https://krawl.shhttp://localhost:5173http://localhost:3000
Configure via CORS_ORIGINS env var (JSON array format).