The financial world likes to present itself as rational. On a chart, there is mathematics. In a report, restrained wording. In the news, “neutral” analysis. And then Reddit produces “unbelievable opportunity,” “everyone is buying,” and the market forgets what caution and rationality mean. In this project, I did not look at prices but at words. At how the brain’s cognitive tricks live inside texts: reports, news, and social media.

I collected three layers of financial reality from 2020–2025: reports from U.S. public companies (10‑K and 10‑Q filings from SEC EDGAR), financial news, and large sets of Twitter and Reddit posts focused on the market. Reports are the official stage, where every word passes through lawyers and compliance. News acts like a dramatist, turning the same numbers into either “record-breaking performance” or “worst decline since 2008.” Social media is more like a bar where everyone speaks loudly, emotionally, and often with much more confidence than the evidence justifies. All three describe the same market, but in the language of different cognitive biases.

What matters here is not the words alone, but the difference between genres: compliance language, media framing, and crowd rhetoric may describe the same market, yet they produce different behavioral signals.

To make these biases measurable rather than merely intuitive, I selected ten key phenomena from behavioral finance and translated them into specific lexical patterns. For example, overconfidence on social media often appears through markers such as “I’m 100% sure,” “guaranteed profit,” “can’t lose,” “no way this goes down,” and “this is a sure thing.” Herd behavior surfaces in texts with phrases like “everyone is buying,” “we all know,” “join us,” “don’t miss out,” and “the whole market is in,” where “we” suddenly sounds wiser than any individual. Framing in news coverage lives in formulations such as “only minor correction” instead of “double-digit drop,” “growth opportunity” instead of “high risk,” or in emphasis on “stability” and “resilience” even when the story is actually about decline.

The list also included anchoring — constructions like “from 52‑week high,” “compared to the peak,” and “since all-time high,” which tie perception to a reference point as if it were normal. For loss aversion, relevant phrases include “protect your capital,” “avoid drawdown,” “not willing to lose a single dollar,” and “can’t afford any loss.” Information overload appears in texts that stress “too much data,” “endless news flow,” “overwhelmed with information,” and “no time to process all reports,” all of which directly connect to the way the brain starts economizing on depth of analysis. A separate category is excessive trust in algorithms, expressed through phrases like “the bot knows better,” “the model is always right,” “just follow the algo,” and “AI already figured it out.”

The next step was to place a ruler over all of this. Counting separate words tells us very little if we do not distinguish between a bias that appears rarely but strikes at full intensity and one that exists everywhere as low-level noise. For each bias in each channel, I first looked at the share of texts containing characteristic markers, and then at how densely those markers appeared where they were already present. One report with a single mention of “risk” is one thing; dozens of Reddit posts saying “we all know this will go to the moon” and “just buy, don’t overthink” are something completely different, even if a raw marker count looks similar.

After running NLP across the full corpus and manually reviewing part of the material, several interesting patterns appeared. Reports turned out to be “full” of bias markers statistically — they contained such markers almost all the time. But when density is examined, the layer is thin. Very thin. There is a lot of “risk,” “uncertainty,” “volatility,” and “potential,” but inside constructions like “we are subject to market risk” or “there is potential impact of volatility.” This is the language of protection, not panic. That is how an effect emerged which I came to think of as the “report anomaly”: almost every text is tagged by markers, yet the real emotional intensity remains low.

On social media, the picture is different. Complex formulas are rare there, but categorical judgments are much more common. When the analysis detects clusters of texts where “guaranteed,” “you can’t lose here,” “everyone is all-in,” “if you miss this, you’ll regret forever,” and “the bot already backtested this strategy” appear together, the density of overconfidence, herd behavior, and algorithmic trust shoots upward. This is not background noise but bursts: on ordinary days the feed may look relatively calm, yet in moments of hype or fear the language changes sharply, and that shift itself becomes a behavioral signal.

In news coverage, the language is more controlled, but it is not neutral either. Here framing does much of the work: “only temporary setback” instead of “significant decline,” “investors take profits” where mass exits are actually occurring, “strong fundamentals despite short-term volatility” even in cases where the table looks much less impressive. In the thesis, several subtypes of framing stood out: goal-oriented framing (“on track to achieve long-term targets”), avoidance framing (“measures to avoid further losses”), positive framing (“solid growth,” “resilient performance”), and marketing framing (“exclusive opportunity,” “unique market position”). Taken together, this creates a stable informational environment in which facts never arrive alone — they arrive already dressed in a desired mood.

When these three layers are combined, it becomes clear how easily a simple model can be misled. It sees “risk” and “uncertainty” in a report and counts them as bias or fear. It sees a few meme-like phrases in a thread and may simply add them into the rest. Without context, a report can look as “emotional” as a wave of FOMO, even though in one case we are reading legal armor, and in the other, collective conviction that “it is impossible to be wrong here.”

A question for you: do you also use AI models to “read the market” faster — to scan reports, pull signals from news, and summarize social media? That is normal; it is 2026 after all. But there is a catch. An algorithm that cannot distinguish genre, hear irony, or recognize behavioral language patterns becomes one more participant in herd behavior — just without the habit of doubt. It catches a word and instantly concludes something about “market sentiment” without asking whether this is really a signal that “everyone is running,” or just another paragraph in the Risk Factors section of a 10‑K.

These models extract texts, build “fear,” “greed,” and “optimism” indices, highlight things in red and green, and suggest where to click. If their view of language is flat, they repeat the same cognitive mistakes as humans, only faster. The algorithm does not get tired or anxious, but it also does not ask itself whether this is truly a change in sentiment or simply the style of the document. This is exactly where an additional layer becomes necessary — one that distinguishes genre, context, and behavioral patterns, rather than just counting words.

How can this be used?

For FinTech teams and banks, this is a matter of system architecture. Models need to be trained to distinguish the language of compliance in reports from the language of behavioral spikes in social media and news. Dictionary-based indicators are most useful where language is “alive” — in news texts and public discussion — as early detectors of herd waves and overconfidence. For reports, they are better combined with contextual models such as FinBERT or specially trained LLMs, plus mandatory qualitative review. That makes it possible to build separate profiles of herd behavior, framing, and algorithmic trust across channels, without penalizing a company simply for complying honestly with regulatory requirements.

Figure 1
To be added from the thesis
Figure 2
To be added from the thesis
Figure 3
To be added from the thesis
Figure 4
To be added from the thesis

In the end, this case is about language. And about the mind. And about how both interact with money. It is about the cognitive patterns embedded in texts, patterns that move markets no less than macro data. And it is about the fact that in 2026, the advantage belongs not to the person with more information, but to the one who better understands through which lenses that information reaches both them and their models.