A working journal of applied engineering Working paper · in preparation

jcousins/2026.04.004

Earnings-call Q&A alignment as an equity signal

A research replication of Chiang et al. (2025), with a contrastive PyTorch classifier over FinBERT embeddings.

J. Cousins

Independent · London, UK

Submitted  2026-02 Revised  2026-04 Status  Working paper · in preparation
Abstract. Most equity signals derive from price action or fundamental accounting data. Earnings-call transcripts contain a rich linguistic surface that is largely under-exploited. Chiang et al. (2025) argue that the semantic alignment between an analyst's question and management's answer correlates with subsequent equity returns: misaligned answers (evasive, off-topic, or partial) are informative beyond the narrative the call presents. We describe a working pipeline to test the claim on US equities: transcripts are ingested from Financial Modeling Prep and SEC EDGAR, Q&A pairs are embedded using ProsusAI/FinBERT (768d), and a PyTorch contrastive classifier with a combined cross-entropy / MSE / SimCSE-inspired loss is trained to score alignment. Architecture, dashboard, and Kubernetes deployment are complete. The remaining critical-path blocker is a hand-labelled training set.

Keywords NLP · FinBERT · contrastive learning · PyTorch · equity research · replication study

1Introduction

The information content of an earnings call extends well beyond the headline numbers and the prepared remarks. The Q&A section in particular reveals which questions management is willing or able to answer cleanly and which they evade. If a measurable signal exists in that evasion, it is the kind of edge that compounds: every quarter, every covered name, several times per year, in language that is technically already public.

2Method

2.1Embeddings

We use ProsusAI/FinBERT, a BERT model pre-trained on financial text, for question and answer embeddings. The 768-dimensional output captures domain-specific vocabulary (guidance, margins, capital allocation, tail risk) that generic BERT embeds poorly. We considered fine-tuning FinBERT itself but, with limited labelled data, the marginal gain is unlikely to justify the complexity.

2.2Contrastive architecture

The classifier learns from positive Q&A pairs (real question paired with the corresponding management answer) and negative Q&A' pairs (the same question paired with a randomly-sampled answer from a different call). Training minimises distance between aligned pairs and maximises it for misaligned ones. This is simpler than a multi-class formulation and gives a continuous alignment score that downstream backtesting can threshold flexibly.

2.3Loss

L = α · LCE + β · LMSE + γ · LSimCSE (1)

The combined loss has three terms: cross-entropy for binary aligned/misaligned classification, mean-squared error against per-pair alignment scores, and a SimCSE-inspired contrastive term to keep representations well-spread in embedding space. The trade-off is more hyperparameters; the payoff is a richer signal that is less brittle than a single-objective formulation.

2.4Backtesting

Historical alignment scores are evaluated against forward 20-day and 60-day equity returns. Reporting is by Sharpe ratio, information ratio, and drawdown analysis, with attribution by question topic (margins, guidance, competition, macro, capital structure).

3Implementation

LayerChoiceWhy
ML frameworkPyTorch 2.xResearch flexibility, debugging tooling, academic standard.
EmbeddingsProsusAI/FinBERTDomain-specific pre-training, fine-tuning ready.
NLP plumbingHuggingFace TransformersStandard FinBERT integration.
BackendFastAPIAsync endpoints for analysis, comparison, backtesting.
StoragePostgreSQLTranscript and embedding persistence; SQL ergonomics.
DashboardStreamlitQuick interactive exploration; Plotly for timelines and heatmaps.
DeploymentDocker · Kubernetes (Minikube)Reproducibility now; production scaling later.
DataFinancial Modeling Prep · SEC EDGAR · yfinanceTranscripts, returns, fundamentals.

4Results (planned)

  • Alignment score correlation with forward returns: target > 0.15.
  • Backtest Sharpe ratio: target > 0.8.
  • Question-categorisation accuracy: target > 85%.
  • Training corpus: 500+ earnings calls covering S&P 500 constituents.

5Critical path

  1. Labelled training data. Hand-score 100–200 Q&A pairs with alignment labels. The classifier cannot be fit without this.
  2. Train the alignment classifier on the labelled set, tuning the combined loss.
  3. Backtest against 2020–2025 earnings seasons across forward 20 and 60-day windows.
  4. Attribute the signal by question topic to understand what is actually driving it.
  5. Expand to international equities if the US signal validates.

6Discussion

If the signal validates, this becomes a personal trading strategy. If it does not, the architecture (FinBERT plus contrastive learning plus a transparent backtest) generalises to other financial NLP problems where the question is whether language adds information beyond the numbers. Either outcome produces something useful; only the labelling work is wasted in the negative case, and even that builds expertise.

7References

  1. Chiang, R., et al. (2025), "Q&A Alignment in Earnings Calls and the Cross-Section of Returns", working paper.
  2. Araci, D. (2019), "FinBERT: Financial Sentiment Analysis with Pre-trained Language Models", arXiv:1908.10063.
  3. Gao, T., Yao, X., Chen, D. (2021), "SimCSE: Simple Contrastive Learning of Sentence Embeddings", arXiv:2104.08821.