About — FilingDrift

What this is

FilingDrift is a language-change scoring tool for SEC 10-K annual and 10-Q quarterly filings. It measures how much a company's filing language changes year over year — the directed increase in distress vocabulary — normalized against the whole corpus.

The headline is a factor. Sorted into quintiles, the companies whose language stays most stable have historically outperformed — a quality factor that survives a five-factor + quality (FF5+QMJ) adjustment (Q1 alpha 93.2 bps/month, t=9.16, full period), and even survives removing all the corpus normalization.

Read from the other end, the same score flags distress early. Corporate distress has a pre-crisis signature in language: CFOs don't suddenly say "we're in trouble" — they gradually introduce hedging language, new risk-factor categories, and liquidity disclosures that weren't there before. SVB's 2022 10-K scored 57.5; the 95th-percentile control ceiling is 51.5. The FDIC arrived 14 days after filing.

What this is not

Not investment advice. We score language. We don't predict price movements, recommend positions, or guarantee outcomes.
Not a crystal ball. 34 of 43 tracked crisis companies with sufficient filing history exceeded the control ceiling before their collapse. The rest did not — Party City, Revlon, and SI were missed (Party City's distress vocabulary is common enough across the corpus that the corpus-wide weighting discounts it). FilingDrift is a screening tool, not a prediction one.
Not a financial advisory service. Latent Systems SAS is a software company. We are not registered investment advisors, broker-dealers, or credit rating agencies.
Not affiliated with the SEC. We use publicly available data from EDGAR. We have no relationship with the SEC or any regulatory body.
Not proprietary or insider information. Every score is derived entirely from public SEC filings available on EDGAR. We have no access to non-public information about any company, and no score reflects anything that was not already in the public record at the time of filing.

How the score works

The score measures two things independently:

What's new or escalating. Phrases that appeared for the first time, or dramatically increased, weighted by how rarely other companies use them. A phrase that only SVB was saying in 2022, that it hadn't mentioned before, scores higher than boilerplate that every bank uses.
Where the language is heading. We measure how the current year's language has shifted from last year, in the sections most predictive of distress — risk factors, liquidity disclosures, MD&A. This catches the drift that keyword lists miss.

Both are calibrated against the healthy companies in our corpus — currently 4930+ tracked. The 95th percentile of their filing-pair scores is the control ceiling (51.5). Scores above it are flagged.

The algorithm is deterministic: no AI generation, no prompting, no summarization. The same filing always produces the same score.

See the full FAQ →

The factor: stable language outperforms — FF5 + quality adjusted

Sorted into monthly quintiles by year-over-year change in distress vocabulary (1-month signal lag, full 2000–2026 period including the 2007–2011 crisis), the most-stable-language quintile (Q1) earns a large, persistent alpha:

Factor model	Q1 (stable) monthly alpha	Q1–Q5 long/short
Fama-French 3-factor	80.0 bps (t=7.77)	59.5 bps (t=4.90)
FF5 + momentum + quality (QMJ)	93.2 bps (t=9.16)	38.9 bps (t=3.47)

Q1 alpha rises when the quality factor is added — not a quality proxy — and it survives removing all the corpus normalization (~99 bps in the fully-raw version). The directed cousin of "Lazy Prices"; full write-up in the signal validation.

Survivorship caveat: the quintile backtest runs on filers that still report, so companies that have since delisted are absent — which inflates the long side. Correcting for it (rebuilding the universe with 4,716 delisted names, each modeled as a total loss — an upper bound) shrinks the alpha and concentrates the surviving edge in small-caps (micro-cap Q1–Q5 ≈ 164 bps/mo); above ~$300M market cap it inverts. An in-sample research signal, not a tradeable all-cap return. Detail →

The other end: distress early-warning — forward returns, full corpus

Read from the high end, the same score is a distress signal. For every flag event we measured stock returns at 12, 24, and 36 months vs. SPY. Treat the raw vs-SPY figures below as size-effect-dominated, not as the distress signal: the below-ceiling bucket shows a similar number, so the real distress evidence is the lift and recall (below), not these absolute returns.

After flag	N events	Median alpha vs. S&P 500	IQR (alpha)	% underperforming market
1 year	7069	−8.6%	-36.3% to 18.4%	58%
2 years	6597	−14.8%	-54.5% to 23.6%	61%
3 years	6059	−22.4%	-68.6% to 27.1%	63%

Alpha = company return minus SPY return over the same period. 58% of flagged events have negative alpha vs. ~50% expected by chance. IQR shows the middle 50% of outcomes (25th–75th percentile) — the distribution is wide, as expected for a prioritization signal, not a trading rule. N decreases at longer horizons because events flagged after 2023–2024 do not yet have complete forward windows.

These are size-effect-dominated, not the distress signal. The below-ceiling bucket shows a similar raw SPY number, so the absolute figures aren't the evidence — the distress signal is the lift (moderate-flagged companies reach a distress outcome about 1.2× the base rate) and the 75% labeled recall. The factor-adjusted long-side result is in the signal validation.

Caveats: delisted tickers use last available price as terminal value (understates losses for bankruptcies); full 2000–2026 period; no adjustment for market-period clustering.

Crisis & distress detection — labeled set: ~43 hand-selected crisis companies

The table below shows how the score performed against every labeled crisis company in our corpus. Events include bankruptcies, bank failures, FDIC seizures, and Chapter 11 filings (some companies subsequently emerged).

79%

Recall — 34/43 crisis companies detected
(≥2 pre-event filing pairs)

81%

Precision — 34 crisis / 42 flagged
(labeled set: 30 control + crisis companies)

Company	Event	Peak score	Lead time	Result
PRTY (Party City)	Bankruptcy 2023	46.3	—	Missed
NKLA (Nikola)	Bankruptcy 2023	85.3	3.7 years	Detected
BBBY (Bed Bath)	Bankruptcy 2023	138.5	2.0 years	Detected
RITEAID	Bankruptcy 2023	79.2	167 days	Detected
SVB Financial	Bank collapse 2023	57.5	14 days	Detected
SI (Silvergate)	Liquidation 2023	15.1	—	Missed
REVLON, CHKAQ	Various	<2	—	No data †

† REVLON and CHKAQ (Chesapeake) have a single, sparsely-parsed filing pair in our corpus — insufficient history to compute a meaningful change score. We count them as misses to avoid cherry-picking. The score requires at least two consecutive filings to measure change. (Party City, by contrast, has full history but its distress vocabulary is common enough across the corpus that the corpus-wide weighting discounts it — a genuine miss, not a data gap.)

False positives: 8 of 30 stable reference companies generated above-ceiling scores at some point — dominated by large financials (JPM, RTX). Some occurred during the COVID disruption years (2020–2021), when corpus-wide language shifts reduced the discriminating power of the period normalization. Others (e.g. RTX) followed a major corporate merger that produced large language changes for structural reasons.

📈

Corpus is actively expanding

Currently tracking 4930+ companies. We ingest new filings continuously as they appear on EDGAR. Coverage, recall, and statistical power improve with each new filing cycle. Control ceiling: 51.5.

Known limitations

M&A distortions. When a company acquires another and consolidates filings, the combined entity may show language change that reflects the target's pre-existing disclosure style, not a genuine deterioration.
Sector shocks. In years with corpus-wide stress (2008–2009, 2020), nearly all companies spike together. The per-period normalization is less informative when distress language is the baseline across the whole corpus.
Regulatory language. Banks under formal regulatory agreements (cease-and-desist, memoranda of understanding) are required to use specific disclosure language. This language reads as distressed but may not indicate approaching collapse.
Binomial false-flag rate. With 4930+ companies tracked over multiple years, some will exceed the ceiling by random variation. Companies with 10+ years of history have more opportunities to spike. We are working on potential solutions, e.g. adaptive thresholds.

Who we are

FilingDrift is built by Latent Systems, a small team of ML researchers based in Paris. We all have PhDs in machine learning. Our research focuses on training embedding models and studying the geometry of the spaces they produce: how meaning is encoded in high-dimensional representations, and what structural properties of those spaces can be exploited for detection, classification, and anomaly scoring.

FilingDrift grew out of that work. The question was whether financial distress leaves a detectable signature in how a company's filing language changes over time, and whether that signature appears before prices move. The core signal is a directed phrase-frequency change normalized across the whole corpus (with a secondary sentence-embedding component drawn directly from our research on representation geometry).

We are not a hedge fund, a financial services firm, or a consultancy. FilingDrift is a research product of an independent research company.

Questions, feedback, and enterprise inquiries: hello@filingdrift.com

About FilingDrift