Have you ever wondered how AI agents understand tabular data, such as that in CSV or Excel files? Or how a file loaded into a platform like ChatGPT can be instantly understood and processed? This insight explores the creation of a custom AI agent capable of achieving these tasks from scratch.

Thank you for reading this post, don't forget to subscribe!

Context

Jen, an AI Engineer at AI United, leads a team developing an AI agent within a 30-day timeline. This agent will generate tailored interactive charts based on uploaded data files, enabling users to better visualize and interpret the data. To achieve this, the team needed to ensure the AI agent could analyze the file’s data context and autonomously recommend the most appropriate chart types.

The agent development was broken down into four main steps:

Here’s a look at how the team developed the AI system to ingest CSV data and aggregate it into an actionable format.


Setup

The development began by configuring package installations and defining the language model to be used:

pythonCopy code%pip install langchain_openai
%pip install langchain_core
%pip install langchain_community
%pip install langchain_experimental
from langchain_openai.chat_models import ChatOpenAI

openai_key = os.environ.get("OPENAI_API")
gpt4o = ChatOpenAI(temperature=0.0, model="gpt-4o", openai_api_key=openai_key)

Step 1: Context Creation

Before generating the code to process raw data, the team created context around the dataset to enhance the AI’s response accuracy. Metadata extraction included:

  • Number of columns
  • Schema (column names)
  • Data types for each column
  • A sample row

For demonstration, a wine reviews dataset was used, and metadata was extracted as follows:

pythonCopy codeimport pandas as pd
def extract_metadata(df):
    metadata = {
        'Number of Columns': df.shape[1],
        'Schema': df.columns.tolist(),
        'Data Types': str(df.dtypes),
        'Sample': df.head(1).to_dict(orient="records")
    }
    return metadata

df = pd.read_csv("wine_reviews.csv")
metadata = extract_metadata(df)

Step 2: Prompt Augmentation

To help the AI model interpret the dataset, prompts were augmented with extracted metadata using a template:

pythonCopy codeprompt_template = '''
Assistant is an AI model that suggests charts to visualize data based on the following metadata.

SCHEMA:
--------
{schema}

DATA TYPES: 
--------
{data_types}

SAMPLE: 
--------
{sample}
'''.format(schema=metadata["Schema"], data_types=metadata["Data Types"], sample=metadata["Sample"])

gpt40.invoke(prompt_template)

Step 3: Simple Agent Code Generation & Execution

With the prompt augmented, the model was able to suggest suitable charts. For data transformation, an agentic workflow with a Python REPL tool was used, where the AI generated code for aggregating data and then executed it to provide the necessary structure for plotting. A REPL instance was created to pass data into Python functions, enabling the AI to perform aggregation.

pythonCopy codefrom langchain_experimental.utilities import PythonREPL

repl = PythonREPL()
repl.globals['df'] = df

from langchain_core.tools import tool

@tool
def python_repl(code: str):
    try:
        result = repl.run(code)
    except BaseException as e:
        return f"Failed to execute. Error: {repr(e)}"
    result_str = f"Successfully executed:\n```python\n{code}\n```\nStdout: {result}"
    return result_str

tools = [python_repl]

Step 4: Final Data Aggregation and Charting

Finally, the AI suggested the Bar Chart type for plotting the top 10 wineries by average points, and the REPL instance executed the code to transform the data for the chart:

pythonCopy code# Code to aggregate and convert data into dictionary format
top_wineries = df.groupby('winery')['points'].mean().sort_values(ascending=False).head(10)
top_wineries_dict = top_wineries.to_dict()
print(top_wineries_dict)

The aggregated data was output as:

jsonCopy code{'Macauley': 96.0, 'Heitz': 95.5, 'Bodega Carmen Rodríguez': 95.5, 'Maurodos': 95.0, 'Blue Farm': 95.0, 'Numanthia': 95.0, 'Château Lagrézette': 95.0, 'Patricia Green Cellars': 95.0, 'Ponzi': 95.0, 'Muga': 95.0}

With this approach, the AI agent was capable of not only understanding the data in the uploaded file but also generating interactive visualizations, making complex datasets more accessible and insightful.

Related Posts
Salesforce OEM AppExchange
Salesforce OEM AppExchange

Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more

Salesforce Jigsaw
Salesforce Jigsaw

Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more

Health Cloud Brings Healthcare Transformation
Health Cloud Brings Healthcare Transformation

Following swiftly after last week's successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Salesforce’s Quest for AI for the Masses
Roles in AI

The software engine, Optimus Prime (not to be confused with the Autobot leader), originated in a basement beneath a West Read more