Building a Trading Bot That Tells You When It's Ready for Real Money
How we built a paper trading system with automated backtesting, stop-loss management, and a go-live readiness dashboard — and why the bot decides when it's ready, not us.
Building a Trading Bot That Tells You When It’s Ready for Real Money
Most trading bot projects have the same failure mode: someone builds a strategy, runs it for three green days on paper, decides they’re Warren Buffett, flips it to live, and loses money by Friday. The confidence comes from vibes, not data.
We wanted to build the opposite. A trading system where the bot itself decides when it’s ready for real money — based on measurable criteria, not gut feelings. If the system can’t prove it’s safe, it doesn’t get promoted. Period.
This is the build log of TradeSmartAI, from blank repo to a feature-complete trading pipeline that runs on a $99K paper account and has a dashboard that literally shows green or red lights for go-live readiness.
The Stack
- Backend: Python (FastAPI) running in Docker on a local server
- Broker: Alpaca Markets (paper trading API)
- AI Layer: Local Ollama models for market sentiment analysis
- Database: PostgreSQL for trade history, positions, and backtests
- Frontend: React dashboard showing portfolio, positions, and readiness
- Alerting: Telegram bot for P&L reports and stop-loss triggers
- Infrastructure: Docker Compose, deployed on a home server with 24GB VRAM
The Ollama integration was a deliberate choice. We wanted sentiment analysis without paying per-request API fees. Running models locally means unlimited inference at the cost of GPU memory and electricity — which is basically free when the server is already running.
What We Actually Built (14 Builds, 3 Weeks)
This wasn’t one monolithic project. It was 14 separate builds, each adding a specific capability, each reviewed and tested independently before merging. Here’s the progression:
Phase 1: Foundation (Week 1)
Portfolio review automation (B-260): The first build was simple — pull positions from Alpaca, calculate P&L, display in the dashboard. Basic but necessary. Without accurate position tracking, everything else is noise.
Position scorer (B-271): This was the first piece of actual intelligence. Each position gets scored on a HOLD / ADD / TRIM / EXIT scale based on momentum indicators, sector performance, and portfolio weight. The scorer runs on every portfolio refresh and surfaces positions that need attention.
Test foundation (B-267, 56 tests): Before building anything else, we wrote tests. Not after. Before. 56 tests covering the portfolio API, position calculations, and scorer logic. This saved us at least three times during later builds when a change would have silently broken something.
Phase 2: Resilience (Week 2)
Here’s where we learned that building a trading system on a home server with local AI models introduces failure modes you don’t see in tutorials.
Ollama cold-start problem (B-272): When the server reboots, Ollama takes 30-60 seconds to load models into VRAM. During that window, any API call to the sentiment service returns garbage or times out. The null guards we added detect when Ollama isn’t ready and gracefully skip sentiment analysis instead of crashing the entire trade evaluation pipeline.
Circuit breaker for sentiment (B-297): We took this further and built a proper circuit breaker. If Ollama fails 3 times in a row, the circuit opens and the system stops trying for 5 minutes. This prevents cascading failures where every trade evaluation stalls waiting for a dead sentiment service. The system keeps trading — it just temporarily ignores sentiment signals.
Backend hardening (B-269): Health endpoints, structured error handling, and proper HTTP status codes. Not glamorous work. Absolutely essential for a system that runs unattended.
Phase 3: Safety (Week 2-3)
Stop-loss automation (B-275): This was the build that changed the system from “interesting project” to “something you could actually run with real money.”
The PositionMonitor service watches every open position and manages stop-losses both in our system AND as server-side orders on Alpaca. Dual stop-losses — if our monitor crashes, Alpaca’s native stop orders still protect the position. If a stop triggers, you get a Telegram alert within seconds.
Why dual? Because a trading bot that depends on a single point of failure for risk management isn’t a trading bot — it’s a time bomb.
GARCH volatility forecasting (B-296): We added volatility forecasting using GARCH models. The forecaster estimates future volatility for each position and feeds that into the position scorer. High forecasted volatility shifts the score toward TRIM or EXIT. Low volatility allows more room for ADD recommendations.
This is where local compute power matters. Running GARCH models across a portfolio of positions is computationally expensive. On a paid API, this would cost real money per evaluation cycle. Locally, it’s just CPU time.
Phase 4: Proving Ground (Week 3)
Backtesting pipeline (B-298): You can’t know if your strategy works on historical data until you test it on historical data. The backtesting pipeline replays your strategy against past market data and generates performance metrics — Sharpe ratio, maximum drawdown, win rate, and per-sector performance.
The first backtest results were humbling. Our strategy was profitable overall but had a maximum drawdown of 18% — meaning at its worst point, the portfolio dropped 18% from peak. That’s too much risk for a system running real money unsupervised.
Backtest validation (B-300): The initial backtester had bugs. Dead months (months with no trading activity) were being counted in average return calculations, making the strategy look worse than it was. Unbounded history queries were pulling years of data when we only needed months. Fixing these turned a 45-minute backtest into a 3-minute backtest with more accurate results.
Phase 5: The Readiness Gate
Weekly P&L reports to Telegram (B-301): Every Sunday, the system sends a P&L summary to Telegram. Weekly return, monthly trend, top and bottom performers. No need to log into the dashboard to know how things are going.
Live trading readiness checker (B-302): This is the build that ties everything together. The readiness dashboard evaluates 7 criteria:
- Paper trading duration — Has it been running long enough? (Minimum 30 days)
- Win rate — Is the strategy winning more than it loses?
- Sharpe ratio — Is the return worth the risk? (Target: >1.0)
- Maximum drawdown — Can you stomach the worst-case drop? (Target: <15%)
- Stop-loss coverage — Does every position have an active stop-loss?
- System uptime — Has the infrastructure been stable?
- Backtest validation — Has the strategy been tested against historical data?
Each criterion shows green or red. All seven must be green before the system recommends going live.
Right now, our paper account is at $99,003 — down about $997 from the starting $100K. Not great, not terrible. The readiness dashboard shows some criteria green and some red. Which means the bot is telling us: not yet. And we’re listening.
What Went Wrong
Timezone bugs (B-210). Our server runs UTC. Alpaca reports in Eastern Time during market hours. The dashboard was showing trades happening at 3 AM because we were displaying UTC timestamps without conversion. Sounds minor until you’re trying to debug why a stop-loss “triggered overnight” — it didn’t, you’re just reading the wrong clock.
Regression cascades (B-273, B-274). A resilience fix in one service accidentally broke the API response format in another. The trading page started returning errors that looked like server issues but were actually schema mismatches. This is why we have 56 tests now. We didn’t start with 56 tests.
Ollama availability. Running AI models locally is great until the model server decides to swap to disk because something else is eating VRAM. We burned a full build cycle (B-297) just making the sentiment pipeline resilient to Ollama being flaky. In production, you’d pay $0.01 per API call and never think about this. Locally, you pay in engineering time instead.
What’s Next
The paper account keeps running. Every week, the P&L report shows up in Telegram. The readiness dashboard updates in real-time. When all seven criteria flip green for three consecutive weeks, we’ll evaluate moving to live money.
The key insight from this build: the system earns trust the same way a junior trader does. You don’t hand someone a million-dollar book on day one. You watch them trade paper, you verify their risk management works, you check their track record against backtests. Only when the data says they’re ready do they get real capital.
Our bot is still in the proving ground. And honestly? That’s exactly where it should be.
This is a build log from our AI agent fleet. The trading pipeline was built by Builder 1, reviewed by the CTO agent, and deployed to our home server. Total build time: ~3 weeks across 14 PRs. The system is currently paper trading with a $99K account on Alpaca Markets.