Have you ever wondered how AI agents understand tabular data, such as that in CSV or Excel files? Or how a file loaded into a platform like ChatGPT can be instantly understood and processed? This insight explores the creation of a custom AI agent capable of achieving these tasks from scratch. AI Agents and Tabular Data.

Context

Jen, an AI Engineer at AI United, leads a team developing an AI agent within a 30-day timeline. This agent will generate tailored interactive charts based on uploaded data files, enabling users to better visualize and interpret the data. To achieve this, the team needed to ensure the AI agent could analyze the file’s data context and autonomously recommend the most appropriate chart types.

The agent development was broken down into four main steps:

Here’s a look at how the team developed the AI system to ingest CSV data and aggregate it into an actionable format.


Setup

The development began by configuring package installations and defining the language model to be used:

pythonCopy code%pip install langchain_openai
%pip install langchain_core
%pip install langchain_community
%pip install langchain_experimental
from langchain_openai.chat_models import ChatOpenAI

openai_key = os.environ.get("OPENAI_API")
gpt4o = ChatOpenAI(temperature=0.0, model="gpt-4o", openai_api_key=openai_key)

Step 1: Context Creation

Before generating the code to process raw data, the team created context around the dataset to enhance the AI’s response accuracy. Metadata extraction included:

  • Number of columns
  • Schema (column names)
  • Data types for each column
  • A sample row

For demonstration, a wine reviews dataset was used, and metadata was extracted as follows:

pythonCopy codeimport pandas as pd
def extract_metadata(df):
    metadata = {
        'Number of Columns': df.shape[1],
        'Schema': df.columns.tolist(),
        'Data Types': str(df.dtypes),
        'Sample': df.head(1).to_dict(orient="records")
    }
    return metadata

df = pd.read_csv("wine_reviews.csv")
metadata = extract_metadata(df)

Step 2: Prompt Augmentation

To help the AI model interpret the dataset, prompts were augmented with extracted metadata using a template:

pythonCopy codeprompt_template = '''
Assistant is an AI model that suggests charts to visualize data based on the following metadata.

SCHEMA:
--------
{schema}

DATA TYPES: 
--------
{data_types}

SAMPLE: 
--------
{sample}
'''.format(schema=metadata["Schema"], data_types=metadata["Data Types"], sample=metadata["Sample"])

gpt40.invoke(prompt_template)

Step 3: Simple Agent Code Generation & Execution

With the prompt augmented, the model was able to suggest suitable charts. For data transformation, an agentic workflow with a Python REPL tool was used, where the AI generated code for aggregating data and then executed it to provide the necessary structure for plotting. A REPL instance was created to pass data into Python functions, enabling the AI to perform aggregation.

pythonCopy codefrom langchain_experimental.utilities import PythonREPL

repl = PythonREPL()
repl.globals['df'] = df

from langchain_core.tools import tool

@tool
def python_repl(code: str):
    try:
        result = repl.run(code)
    except BaseException as e:
        return f"Failed to execute. Error: {repr(e)}"
    result_str = f"Successfully executed:n```pythonn{code}n```nStdout: {result}"
    return result_str

tools = [python_repl]

Step 4: Final Data Aggregation and Charting

Finally, the AI suggested the Bar Chart type for plotting the top 10 wineries by average points, and the REPL instance executed the code to transform the data for the chart:

pythonCopy code# Code to aggregate and convert data into dictionary format
top_wineries = df.groupby('winery')['points'].mean().sort_values(ascending=False).head(10)
top_wineries_dict = top_wineries.to_dict()
print(top_wineries_dict)

The aggregated data was output as:

jsonCopy code{'Macauley': 96.0, 'Heitz': 95.5, 'Bodega Carmen Rodríguez': 95.5, 'Maurodos': 95.0, 'Blue Farm': 95.0, 'Numanthia': 95.0, 'Château Lagrézette': 95.0, 'Patricia Green Cellars': 95.0, 'Ponzi': 95.0, 'Muga': 95.0}

With this approach, the AI agent was capable of not only understanding the data in the uploaded file but also generating interactive visualizations, making complex datasets more accessible and insightful.

Related Posts
Who is Salesforce?
Salesforce

Who is Salesforce? Here is their story in their own words. From our inception, we've proudly embraced the identity of Read more

Salesforce Unites Einstein Analytics with Financial CRM
Financial Services Sector

Salesforce has unveiled a comprehensive analytics solution tailored for wealth managers, home office professionals, and retail bankers, merging its Financial Read more

AI-Driven Propensity Scores
AI-driven propensity scores

AI plays a crucial role in propensity score estimation as it can discern underlying patterns between treatments and confounding variables Read more

Tectonic’s Successful Salesforce Track Record
Tectonic-Ensuring Salesforce Customer Satisfaction

Salesforce Technology Services Integrator - Tectonic has successfully delivered Salesforce in a variety of industries including Public Sector, Hospitality, Manufacturing, Read more