Token Machine: Local Podcast Agents for Financial Research

There are too many good finance podcasts. That sounds like a small problem until you care about markets. Then it becomes a research infrastructure problem.

Useful ideas rarely arrive as clean database rows. They show up as a guest riffing on a company, a host pushing back on consensus, or an analyst casually naming the bottleneck that makes a whole theme work. Some of it is noise. Some of it is salesmanship. Some of it is wrong. But some of it is early signal, and no human can keep up with all of it.

So I built Token Machine: a local agent pipeline that listens to finance shows, transcribes them, identifies speakers, extracts financial claims, separates broad themes from explicit instruments, rejects dirty rows, and stores everything in a local database. The goal is not an agent that says "buy this." The goal is a research machine that leaves receipts.

This is a technical project writeup, not investment advice. The performance figures below are historical paper backtests and diagnostics, not live brokerage performance or a prediction of future returns.

The Shape Of The System

The pipeline is local-first because the valuable part is not just the model output. It is the audit trail: transcript text, speaker metadata, raw model payloads, ticker-resolution reasons, scoreable flags, outcome rows, backtests, and portfolio decisions all joined together in one local memory system.

1. Ingest Pull podcast and video sources into a resumable local pipeline.

2. Transcribe Use local MLX Whisper to create timestamped transcript segments.

3. Attribute Diarize and polish speakers so claims map to real humans.

4. Extract Use local models to pull claims, notes, themes, and market calls.

5. Resolve Let deterministic code decide whether an asset maps to a ticker.

6. Score Compare only strict calls against SPY-relative forward outcomes.

The Leaderboard Was The Trap

The first tempting product was a leaderboard: which finance shows make the best market calls? That is fun, but it quickly exposed the harder problem. A leaderboard only matters if the data underneath it refuses to lie.

Some early rows attributed calls to host groups instead of one human. Some broad themes were forced into proxy tickers. Some private companies were treated as if they had public baseline prices. Some transcript chunks contained cross-talk that had been collapsed into one block. The system got more useful when it started rejecting its own outputs.

Extracted market calls

2,056

Strict scoreable calls

232

Scoreable speakers

Scored outcomes

1,067

That drop from 2,056 extracted calls to 232 strict scoreable calls is the story. In finance, an agent that extracts more is not automatically better. An agent that knows when to reject its own output is far more useful.

What Makes A Call Scoreable?

Token Machine has two lanes. If a speaker explicitly names a public company, ETF, index, crypto, commodity fund, or raw ticker, the extraction can enter the explicit_instrument lane. If the speaker is talking about a sector, country, style, private company, or broad macro theme, it goes into the macro_theme lane.

The model is not allowed to guess proxy tickers. If someone says "semiconductors," the system cannot silently turn that into NVDA or SOXX. If someone says "Nvidia," a reviewed deterministic alias can map that to NVDA. The local model extracts what was said. The resolver decides whether it is tradeable.

Gate	Publication Result
Strict resolver rejects current scoreable rows	0
Strict resolver would change ticker	0
Scoreable rows missing speaker entity	0
Outcomes attached to unscoreable parents	0

Where The Trading Proof Comes In

The podcast agent is upstream research infrastructure. It does not place trades. But the project includes a historical proof harness that asks a stricter question: can pipeline-derived evidence become a replayable trading rule set, with a ledger, drawdowns, benchmark comparison, and position-level winners and losers?

Historical Proof Pack

Strategy: pipeline_barbell_max_deployment
Window: 2020-01-01 through 2026-05-06
Starting equity: $10,000
Benchmark: SPY ETF proxy for the S&P 500

Headline Result

The cached proof report ended at $48,743.88, compared with $24,653.06 for SPY over the same window. That is +387.4% for the strategy versus +146.5% for SPY.

Strategy return

+387.4%

SPY return

+146.5%

Excess vs SPY

+240.9 pts

Max drawdown

-24.4%

Simulated orders

124

Growth Of $10,000

Pipeline Barbell

$48,744

SPY

$24,653

Best And Worst Ideas

A useful backtest should expose the uncomfortable parts, not just the top number. The proof report includes position lifecycle tables, trade ledger, allocation snapshots, drawdown charts, and contribution by ticker.

Best Idea

CVNA: +$12,344
2023-11-06 to 2024-02-05. Roughly +637.9% on bought notional.

Worst Idea

F: -$933
2025-11-03 to open in the packaged report. Roughly -13.5% on bought notional.

Best Realized Sell

META: +$1,897
2023-08-07 sell, 9.1228 shares at $314.10.

Worst Realized Sell

DHI: -$207
2026-02-02 sell, 16.5947 shares at $149.34.

The Strategy Stack

The historical proof is one layer of a larger research system. The current stack separates discovery, scoring, portfolio construction, and allocation review.

Layer	What It Does
Market-call scoring	Measures strict podcast calls against SPY-relative forward outcomes.
Portfolio signal modes	Convert eligible evidence into weekly portfolio targets and ablations.
Capital queue	Ranks add, trim, hedge, research, hold, and blocked decisions.
Pipeline barbell	Selects single-stock leader and venture names from pipeline evidence.
Alpha Committee	Uses local or higher-reasoning model review only after deterministic selection.

In a fresh no-store diagnostic run through 2026-05-12, the SPY-core modes carried most of the return while pure podcast/alpha modes were much more muted. I like that result because it keeps the project honest. The podcast system is not magic. It improves research intake. Portfolio construction, sizing, taxes, liquidity, and execution still matter.

What I Built

Local podcast ingestion, transcription, diarization, and speaker polishing pipeline.
Structured market-call extraction with strict explicit-instrument and macro-theme lanes.
Deterministic ticker resolver that rejects proxy guesses and stores rejection reasons.
Local PGLite/Postgres memory for transcripts, claims, outcomes, forecasts, and portfolio state.
Publication audit that blocks public leaderboards unless strict resolver and speaker gates are clean.
Historical trading proof pack with equity curve, drawdown, trade ledger, and position lifecycle.
Capital queue and Alpha Committee review layer for bounded paper-trading decisions.

Why This Belongs In My Portfolio

My core work is data systems and geospatial software, but this project hits the same engineering muscles: messy source data, entity resolution, spatial-like attribution problems, quality gates, local databases, dashboards, reproducible analysis, and user-facing storytelling.

The project started with a fun question: can I rank finance shows by what happened after they made calls? The better question became: can a local agent pipeline make my research intake harder to fool?

That is the part I am proud of. The output is not just a clever summary. It is a system that can say no.

Final caveat: this is historical paper research. It uses cached market data, simplified fills, and diagnostic assumptions. The point is process: local agents extract, deterministic systems gate, backtests decide, and humans remain accountable.

Back to Projects

I Built Local Agents to Listen to Finance Podcasts for Me