Snowflake is looking to give your AI agents a GPA. While the company is grading the accuracy of AI agents, it's really evaluating goals, plans and actions (GPA) in an open source framework that reaches near human levels of error detection rates and localization accuracy.

The framework, called Agent GPA, was outlined at its Build conference. For enterprises deploying agentic AI, Snowflake's efforts are worth a look.

In a blog post, Snowflake's AI Research team said evaluating AI agents comes down to trust. Snowflake said:

"An agent’s answer may appear successful, but the path it took to get there may not be. Was the goal achieved efficiently? Did the plan make sense? Were the right tools used? Did the agent follow through? Without visibility into these steps, teams risk deploying agents that look reliable but create hidden costs in production. Inaccuracies can waste compute, inflate latency and lead to the wrong business decisions, all of which erode trust at scale."

Snowflake argued that current evaluation frameworks fall short because they focus on the final answer, not the process behind the answers. Here's a look at the Agent GPA framework. Agent GPA, outlined in a paper, is available in Truelens.

Snowflake's Agent GPA was the headliner among a set of items released by the company's research team.

Other items include:

  • Text-to-SQL V1.5, a specialized model that fuels Snowflake Intelligence, Snowflake's enterprise agent, by tackling the slowness, cost, and dialect issues of general LLMs. The specialized model makes text-to-SQL queries up to 3 times faster while maintaining accuracy.
  • New optimizations will be introduced for Cortex AISQL, a tool that integrates AI directly into SQL queries, enabling teams to analyze all data types and build flexible AI pipelines using familiar SQL syntax.
  • The Cortex AISQL enhancements improve AI operator efficiency and cost, featuring 2-8x more performant execution plans, 2-6x faster inference (at 90-95% accuracy), and a 15-70x reduction in execution costs and time through techniques like cost-aware optimization, adaptive model cascading, and query enhancements.