I’ve been testing a new internal data assistant concept built around a simple but powerful idea: instead of asking an AI model to know everything, we use curated workspaces with access to the database and the right (limited) context. The assistant I tested was integrally developed by technicians iterating Radically Rapidly in the frame of Automattic’s RSM, and the UI was -rara avis- nice, clean, and stylish.
I think it’s brilliant. Still, you can say the idea of curated workspaces sounds obvious. But when you work with AI, headaches are propped up by the obvious.
What this curated workspaces tool is, conceptually
At a high level, the tool is a conversational interface for curated knowledge. You can think of it as a system in which an assistant does not rely solely on its base-model knowledge but on a defined body of context tailored to a given domain.

That domain-specific context lives in what’s called a workspace.
A workspace describes the universe, which concepts matter, which documents or structured knowledge should be used, and how to interpret the user’s requests, including examples of user requests and how to deal with them. Conceptually clean, it separates three often mixed-up things:
- the interface where the conversation happens,
- the model that generates responses, and
- the context that makes those responses useful.
And this matters because we are collectively discovering, once again, that the model is not the product. The problem framing, the context, the structure, the assumptions, and the usability of the answer matter just as much, and often more.
If you’ve been following the evolution of AI tools, this should feel familiar. Agents can infer a great deal from artifacts, prototypes, code, examples, and tests. But what they still struggle to infer reliably is tacit knowledge and very specially, intent, the why, and the unstated rules humans use when they say “you know what I mean.” That last sentence, by the way, has probably financed half the hallucinations on the internet.
The workspace contract
The workspace description is based upon the “card” building block. A card is a single Markdown document that represents one discrete unit of knowledge — a metric definition, a source-table reference, a concept explanation, or a query pattern. And they are placed inside a ‘category subfolder’ in the workspace.
There are different categories of cards, with different rules in the manifest.yaml. Some are eager cards -automatically loaded-, which is useful for key vocabulary, scope rules, or mandatory checks. And there are on-demand categories, where most of the content lives, to be discovered as they are needed.
Other rules:
- When the model lists cards, it sees frontmatter
title,description,tables, and everymetadatakey — but not the Markdown body. You’d better make crystal clear and accurate frontmatters. - The unique key for a card is
(workspace, category, card_name). - There is a parser that you can run locally to catch errors early.
My test
I approached the test from a deliberately simple domain. Or, to be accurate, from a domain that looks simple until you start trying to explain it. I wanted to find out whether the simplicity is real or a disguise.
You get familiar with something when you acquire the tacit knowledge implicit in it.
Spoiler: my first conclusion was maybe a truism, but I appreciated the reminder: you get familiar with something when you acquire the tacit knowledge implicit in it! Or, in other words, tacit knowledge makes things familiar.
The assistant was sometimes impressive, often promising but not fully accurate -in both cases far better than a model without this layer-, and occasionally wrong in super revealing ways. Not random wrong. Not “the model is stupid” wrong. More like: “Ah, I can’t believe it! I didn’t tell the model about this crucial thing? “
What I found is that even in a relatively simple domain, there is a surprising amount of tacit knowledge: rules, exclusions, category boundaries, interpretive conventions, and little “obvious” assumptions that are not obvious at all unless you already live inside the system. In other words, the challenge was not feeding facts to compose the workspace. The real challenge was realizing the extent of implicit expert knowledge you have and use, even in the simplest reasoning arena.
Making the user aware of assumptions
I also noticed a second pattern. In several cases, the tool made a reasonable assumption, produced an answer, and then mentioned the assumption in a note or footnote. Flawless. Flawless for agents, but not for humans.
Why? Because users scan.
User scanning is nothing new. Jakob Nielsen’s classic research on web reading (How Users Read on the Web) found that most users do not read pages word by word; they scan for salient information instead (16% vs 79% in this study). This was already true in 1997, but the situation has gotten worse in parallel with this attention scarcity economy we’re living in, becoming more and more present since then.
So when an assistant answers first and clarifies assumptions later, there is a very real chance that the user will consume the answer and miss the caveat. The system can be transparent and still misleading in practice. Which leads to a very simple conclusion: in cases where assumptions materially affect the answer, the assistant should confirm the assumption before answering, not after. Yes, that makes the interaction a bit slower. But slower and more trustworthy beats fast and subtly/invisibly wrong.

Text usability matters
Related to this, I kept thinking about reading usability. If the goal is not merely to generate text but to land information in the user’s head, answer formulation matters. Structure matters. Salience matters. Ordering matters. We should care not only about whether the assistant knows something, but also about whether it presents that something in a way humans can actually absorb.
That’s why this second Nielsen Norman Group piece also feels relevant here: Measuring the Usability of Reading on the Web.
So the test did not make me think “the model failed.” It made my thinking pass from: the workspace is the product. To the workspace is, plus the conventions, the prompts, the explicit assumptions, the evaluation questions, and the release process and format.
Some -already known- conclusions
1. Workspace concept is solid.
I genuinely like this direction. A lot. It is clean, useful, and potentially powerful not only for internal expert workflows but also, eventually, for broader users. Especially in domains where people need to explore structured outputs, compare categories, and navigate a body of knowledge in a conversational way.
It’s interesting. This was in origin how the AI market started: with tools built upon models and particularized for a given usage (adding a lot of tacit knowledge). This was the idea behind theresanaiforthat and some tools I used for a while, like copy.ai. But I feel this path is being abandoned, and instead we’re using general models directly, adding tailored layers on top.
2. Tacit knowledge is the monster under the bed.
This was the main learning for me. Even the simplest-looking domain contains a large amount of tacit knowledge that is hard to capture because experts do not always realize they are using it. You often only discover it when the assistant gets something “wrong” in a way that reveals a missing rule, a missing distinction, or a missing purpose.
3. AI without curated context is token-costly
The interface and infrastructure can be excellent, but without a curated, well-structured context, the result will not be trustworthy enough. It’ll make the tokens invoice grow without yielding better results.
4. Assumptions should be surfaced before the answer.
If a conclusion depends on a choice at a fork, the system should ask. Footnotes are noble. Footnotes are civilized. Footnotes are also frequently ignored.
5. We need a staging process before workspaces are treated as mature.
This is probably the most operational conclusion. If workspaces become queryable before they are sufficiently tested, we risk creating confident but immature assistants. I think these systems need something like a staging environment where workspaces can be iterated on, tested by domain experts, challenged with canonical questions, and only then promoted.
Not every workspace should go straight to production just because it technically works.
6. Ownership matters.
A workspace should have an owner, and that owner should be accountable for its maturity, its test coverage, its boundaries, and the final “good enough to release” decision. This reminds me a bit of the classic idea of the domain expert in expert systems: not just someone who knows facts, but someone who knows the rules of good guessing.
7. Once again: focus on the problem, not the solution.
The temptation with AI is to become overly laconic and assume the model will infer our intent. Sometimes it will. Sometimes it will infer a different intent with great confidence and lovely formatting. The real work is still the same old work: clarify the problem, clarify the purpose, clarify the context.
That is not a limitation of AI. That is a reminder about AI thinking.
So yes, I came out of the test enthusiastic. Not because the system is magic, but because it is not pretending to be magic. It points in a direction I find much more mature: better context, clearer boundaries, better questions, better release discipline.
And honestly, I love that.


Leave a Reply