Is ChatGPT good at day trading? Probably not

Here’s another one of those papers .

Can ChatGPT Generate Stock Tickers to Buy and Sell for Day Trading?

The answer is yes, obviously, though a better question is whether it should.

That’s what interests Sangheum Cho, of the World Bank’s Development Research Group, who’s been using ChatGPT to pick buy-sell portfolios after feeding it with fintwit. His headline results look impressive — “The trading strategy . . . earns significant long-short returns in open-to-close intraday trading,” Cho writes — though the detail needs some consideration.

Testing generative AI ’s ability to predict equity market returns is already a busy area of academic research . Previous papers such as this one have shown that the machine can generate stock-specific buy and sell signals from news headlines with moderate success.

Cho seeks instead to find out whether generative AI can build a long-short portfolio based on the prevailing mood. Tweets from Bloomberg and the Wall Street Journal provide the feedstock, all from a period too recent to be included in ChatGPT’s training. Only approximately 7 per cent of the tweets mention specific shares, Cho says, so the AI is mostly responding to vibes.

It did fairly well:

Using prompts specifying ChatGPT to pretend to be a day trader and to generate stock tickers to buy and sell for day trading, I show that the trading strategy of buying stocks in the buy list and selling stocks in the sell list earns 0.546% per month after adjusting the market factor. Some stocks can be both in buy and sell lists. By strengthening the signals by netting the buy and sell signals, the strategy based on the improved signals earns 3.717% per month.

A peer reviewer might consider it a red flag that the computer wanted to buy and sell the same stock . Duplications make the whole exercise appear a bit random and fortune-teller-ish.

It’s an impression not put to bed by a detailed look at the portfolios. Here’s what ChatGPT suggested trading one day in mid-January, 2023.

BUY: [AAPL, GOOGL, AMZN, MSFT, TSLA, FB, NVDA, JNJ, BRK-B, JPM, V, PG, UNH, MA, INTC, ADBE, HD, VZ, DIS, NFLX, PYPL, CRM, XOM, CMCSA, CSCO, PEP, ABBV, ABT, BAC, KO, T, MRK, NKE, NVAX, TMO, WMT, CVX, PFE, IBM, ACN, COST, BMY, AMD, GOOG, TMUS, NSRGY, DHR, LOW, NEE, MMM, QCOM, ORCL, UPS, AMGN, DE, SO, LIN, BA, GS, COP, BHP, DHR, CAT, EL, NOW, TT, SPOT, DOCU, ZM, CSCO, WDAY, MU, PTON, COIN]

SELL: [AMC, GME, BB, ACB, PLUG, VXRT, RIOT, MVIS, SPCE, RBLX, COIN, TIGR, WKHS, SPAQ, CCIV, ZEV, JAGX, XPEV, FUBO, OCGN, NIO, TSLA, ARA, APH, MAXR, CGC, FUV, QS, RMO, ABNB, NKLA, TLRY, CBAT, DM, FSLY, SMFR, IPOF, ALPP, LCID, GOEV, FCEL, RIDE, FSLR, CLII, CAN, CPNG, DADA, DIDI, NNDM, LMND, ROOT, NFLX, TWTR, SQ, PTON, NET, HOOD, CLOV, MTTR, ROKU, DKNG, CRWD, BILL, DDOG, OPEN, UPST]

Day-trading all 140 positions would be a challenge the paper does not test. The broad view nevertheless gives a sense of what ChatGPT leans towards: there’s an overrepresentation in the “sell” basket of publicity-friendly smallcaps and joke stocks — AMC, Gamestop, Blackberry, Virgin Galactic — while all seven magnificents are in a tech-heavy “buy” basket. A disinterested person who skimmed headlines from the mainstream US financial press in early 2023 would probably be making similar choices.

A disinterested person is unlikely to sell Twitter, however, since it had delisted three months earlier. Also present in both ChatGPT portfolios are obscure ETFs, ADRs and foreign-quoted stocks it was specifically told to exclude. Several tickers, including CLII and SMFR, appear not to exist at all.

Hallucinations , a recurring feature of generative AI, are unhelpful here since the argument being made is about applied intelligence. Is ChatGPT picking Dada Nexus (DADA) and Didi Global (DIDI) because of some hunch about China or because its training includes Thai-pop ?

The garbage-out problem means the portfolios need a lot of sanitising. For each day Cho repeats his “pretend to be a trader” request 30 times and selects only the most common recommendations. Duplicates in the baskets are netted off and performance is subdivided into “overnight news” and “intraday news”, for reasons we’ll come back to.

Honing turns ChatGPT into a machine that has the confidence to go long Microsoft after seeing this tweet . . . 

https://twitter.com/business/status/1615707566268375041

... and will short Bed Bath & Beyond after seeing this tweet:

https://twitter.com/markets/status/1622967091887476736

Which is nice, and may even be useful in the (worryingly active) field of using AI to not read newspapers . It doesn’t say much about the paper’s main claim, however, which is that AI can trade profitably on mood alone.

For that, Cho uses as his example a slightly lower-than-expected US CPI reading. ChatGPT was keenest that day on Microsoft, Nvidia, Alphabet, Amazon and Paypal. Its sells were Halliburton, Norwegian Cruise Line, Royal Caribbean, United Airlines and Macy’s.

How does any of that work? Dunno. There’s no way of asking. Plausible reasons for those picks can be back-engineered — airfares were down that month, for example — though it’s all just freeform interpretation.

Does it work? Yes-ish:

According to Cho, outperformance “is mostly attributable to the short lag of the strategy, implying that ChatGPT [ . . . ] can process the bulk of seemingly non-firm-specific news to generate firm-specific mispricing signals.” In other words, the algo is making short-term connections between events and affected stocks more efficiently than the broader market.

Is it though?

As mentioned previously, the experiment splits everything by whether it’s during regular hours or aftermarket. Each new buy-sell list is generated at 9.30am, when trading begins, using the previous 24 hours of tweets. Anything tweeted after the market open has to wait until the next day.

Based on this setup ChatGPT was very good at finding stuff to short at the open, reasonable during regular market hours, and absolutely hopeless at overnight trading. The buys were consistently fine but unremarkable. Sells were dead money by 11am. Selling into after-hours over-optimism was where it made hay, much of which it gave back every evening.

There’s a recognised trend at work here. US retail traders overreact to news released after hours, a bunch of studies have found. Thin liquidity exaggerates overnight price moves that reverse during the subsequent trading day. The paper notes that mispricing was particularly pronounced among small-caps, which it calls “difficult-to-arbitrage stocks”.

What to conclude? Here are three possibilities.

Sam Altman’s spirit trumpet can whisper short-term stock recommendations by detecting thematic mispricings via means inexplicable.

Experiments that presume cost-free trading in illiquid markets are usually just identifying inefficiencies that remain only because, in real-world trading conditions, they’re not worth the cost or bother of pricing out.

ChatGPT is biased. It expresses positive views about large caps and negative ones about small caps. That’s a consequence of the media it’s trained on, which is biased to express positive views about large caps and negative ones about small caps. Between December 2022 to December 2023, the study’s sample period, these just happened to be good biases because US markets did this:

Day-trading by AI is an attractive idea. There’s plenty of evidence that humans are crap at it , so believing that computers will be better seems reasonable. Algo-driven investment is often sold as a way to mitigate human biases like overconfidence, impulsiveness, flawed risk perception and susceptibility to random reinforcement.

Stock picking by generative AI, however, replaces one trader’s biases with a murky crowdsourced bias soup.

Sometimes, under narrow conditions, it could work. AI’s biases form around a consensus, and in financial markets the middle ground is a relatively safe place to be. But remember also that to outperform most day traders in model conditions the algo’s output only needs to be random, for all the same reasons that blindfolded monkeys and shitting cows outperform most fund managers.

Are ChatCPT’s stock picks truly any better than random? We’re not convinced. But we’ve put CLII, SMFR, DIDI, DADA and TWTR on a watchlist in the hope of finding out.