Remote Frog

PEOPLE DATA | AI | REMOTE LEADERSHIP & LEARNING

What if we get better results from our AI tools by being rude?

–

AI, AI philosophy, filosof IA, filosofIA, language, language_en, LLMs, perplexity, Tone

I just read a paper, “Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy“, by Om Dobariya, Akhil Kumar, which reports that using very rude prompts can noticeably increase accuracy on multiple‑choice questions. Some other deeper cross-cultural studies, such as Yin et al. (2024), validate this thesis in some circumstances and cultural settings (they compare English, Chinese and Japanese)*. Both studies have clear methodological constraints and limited generalizability, but the results are still unsettling enough to take seriously.

_{(*) Note on Languages and LLMs: We consider most LLMs work directly in whatever language you write, but internally a lot of their processing is biased toward English‑like representations ==> Because training data and internal representations are often English‑centric, quality is still usually highest when both prompt and content are in English, and slightly lower for many other languages.}

_{Models like GPT or Claude are trained on many languages and can consume and generate text directly in multiple languages without an explicit external translation step. That said, because training is often dominated by English data, internal reasoning often resembles “thinking with an English accent”: intermediate representations tend to be closer to English even when the surface language is different (https://www.emergentmind.com/topics/multilingual-large-language-models-mllms-f4b6b848-43ec-4c05-a186-02be70797581)}
_{Recent analyses show that many multilingual LLMs make key decisions in an internal space that is closest to English, even when the input and output are in other languages, i.e. they tend to form English‑centric intermediate representations before producing the final answer. Using tools like the logit lens, they show that when the model generates a sentence in another language (e.g. French), the internal representations first pass through English words for the key content terms (“boat”, “water”, “sun”, etc.) and only later get translated into the target language. Non‑lexical words (articles, prepositions, conjunctions) are usually handled directly in the prompt language, not routed via English. (https://arxiv.org/html/2502.15603v1)}

Politeness levels in Dobariya and Kumar study explained with examples

The setup is narrow: multiple‑choice questions, a specific model snapshot, and strong cultural and linguistic biases baked into both prompts and data. Models evolve quickly, and Yin et al. already show notably different behaviours between GPT‑3.5 and GPT‑4. We still do not understand the mechanism behind the ‘rudeness boost’: it could be that emotional tone is largely ignored as just another sequence of tokens, or that certain rude formulations happen to have lower perplexity and therefore look more “familiar” to the model… There are lots of interrogations, yes. But when there’s smoke, there may be at least a small fire.

Note: Perplexity, roughly, measures how predictable a prompt is for the model: prompts with lower perplexity are those whose wording the model finds more familiar and easier to continue. If certain rude phrasings happen to have lower perplexity than their polite counterparts, that alone could partially explain why they perform better in some experiments. I wonder how these poor models were trained if perplexity is the reason 🤔

What if we get better results from our AI tools by being rude?

Well, I’d say that it is exactly the opposite of the communication culture I personaly advocate for: ncouraging, patient, understanding, friendly, sympathetic…

If rudeness gives us a few extra percentage points of accuracy or productivity, it might be in exchange for a cost. Using toxic, negative language shapes you. To speak with toxicity, sarcasm, or aggression, you first generate those emotions inside and you set up your inner narrative: the world starts to look more irritating, more blame‑worthy, and less worth listening to. You might get slightly better answers from a model—but at the price of training yourself to be a more reactive, less generous version of you.

Here is the uncomfortable tension: our tools might sometimes respond better (I recall study limitations here) to language we would never accept between humans. And we can be tempted to optimise for accuracy at any cost. Besides… just in case… let’s be kind with AI.

Spread the word

JOIN us!

Fancy getting RemoteFrog updates? - ¿Quieres estar al día de lo que pasa en RemoteFrog?

What if we get better results from our AI tools by being rude?

What if we get better results from our AI tools by being rude?

Like this:

Spread the word

Leave a ReplyCancel reply

JOIN us!

Discover more from Remote Frog