PEOPLE DATA | AI | REMOTE LEADERSHIP & LEARNING

AI Data Querying: The Holy Grail That Can Build or Break Data Trust

The AI Data qeurying Problem (Or: How to Lose Faith in Data in 3 Easy Steps)

Right now, the biggest mess with AI tools that query data isn’t that they’re stupid—it’s that they’re consistently stupid. The main problem at this moment with the state of the art of AI tools asking Data is consistency expectations. In other words: what happens when 2 persons ask for the same and get different answers? Or when they ask for something different and get the same result? (Understanding that depending on the context for the question and for the person, two apparently different questions are asking for the same and viceversa).

There is a good number of big companies and start ups working on the holy grial of “AI asking data” app or plugin. I’ve had the chance to chat with some of them and they’re all chasing the perfect guardrails to avoid the perfect catastrophe scenario: if the AI answers in a way that is not satisfactory, BIG chances are that the stakeholder is not going to lose the faith in the AI, but in your data + BIG chanecs are that if they use those wrong answers there are terrible impacts, specially when dealing with People Data (including legal implications).

After trying AI generators for a while, I went to Openverse and I found "Original Database" by shindoverse is licensed under CC BY-NC-SA 2.0.
After trying AI generators for a while, I went to Openverse and I found “Original Database” by shindoverse is licensed under CC BY-NC-SA 2.0.

Guardrails (AKA GIVE YOUR GUN TO A MONKEY AND ASK THEM NOT TO SHOOT YOU)

So what kind of protective measures I’ve heard about from these startups? Off the top of my head:

  • Data diet restrictions (LMAO with this title): Only feeding absolutely well documented (what’s this? Is it AI documentation the same as Human documentation?), AI vetted, clean, non-problematic datasets (basically putting it on a data cleanse).
  • Elephant memory for past questions: Keeping a history of questions and answers to avoid contradicting previous responses. I wonder what happens if I must contradict a previous response because that one was wrong and I’m right now 🤔
  • Ensure we all talk the same language. Even the same jargon: Data People, Stakeholders and the AI tool MUST understand the same when reading a question… and an answer. This needs creating shared dictionaries and rules -and ensure we all read, understand and use them-, rewriting user questions into less ambiguous terms (getting, for sure, user approval). These common language dictionaries are prone to be forgotten / not maintained once the first stage of the project is validated… and this is a huge risk.
  • Expectation management: Working with users beforehand to set realistic expectations about what they’ll get

And the most critical one of all the safety measures IMO. If we don’t have these guardrails in place and well tested and finetuned, we must let users know that using AI to interrogate data is basically a recipe for disaster. Doesn’t matter if they’re using a built-in-house app, an AI layer over Looker, or whatever shiny tool is trending this week.

Don’t get me wrong—I’m not anti-AI-data-querying. It’s just that ‘at this moment’ this particular topic maturity makes me think that every week we wait before users/stakeholders start interrogating data puts us in a better scenario. It’s a matter of just a bit of time, but patience is ATM a trully scarce resource.

Epilogue. What does the ai tool about ai tools querying databases?

I asked the tool:

Yes, it’s feasible and increasingly common, but requires careful implementation.

Key considerations:

✅ Potential Benefits:

  • Democratizes data access
  • Reduces bottleneck on data teams
  • Faster insights for business users

⚠️ Critical Guardrails Needed:

  1. Data Access Controls
    • Role-based permissions (users only query what they’re authorized to see)
    • Row-level security enforcement
  2. Query Validation
    • Review AI-generated SQL before execution
    • Set query cost/timeout limits to prevent expensive queries
    • Prevent writes/deletes (read-only access)
  3. Accuracy Safeguards
    • AI can hallucinate table names, columns, or relationships
    • Need robust schema documentation in prompts
    • Show users the actual SQL generated for transparency
  4. Data Quality
    • Clean, well-documented schema is essential
    • Clear business glossary/data dictionary
    • Standardized naming conventions
  5. User Education
    • Train users to validate results
    • Set expectations about AI limitations
    • Provide feedback mechanism for errors

Bottom line: Start with a pilot group, limited scope, and strong validation layers. Monitor usage and iterate. Don’t give unrestricted access without these guardrails—that’s a recipe for disaster.


Spread the word

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

JOIN us!

Fancy getting RemoteFrog updates? - ¿Quieres estar al día de lo que pasa en RemoteFrog?

Discover more from Remote Frog

Subscribe now to keep reading and get access to the full archive.

Continue reading