← Back to Tutorials
Analytics intermediate Elena Torres ·

Building an AI Marketing Analytics Dashboard 2026 — LangChain + Streamlit

Building an AI Marketing Analytics Dashboard 2026 — LangChain + Streamlit

Overview

Marketing teams drown in data — Google Analytics, ad platforms, CRM, email tools, social media — all producing disconnected reports that require manual analysis to connect. This tutorial shows you how to build an AI-powered marketing analytics dashboard that unifies all your data sources and lets you ask questions in plain English. The system uses LangChain to create an agent that queries your marketing database, generates visualizations on the fly, produces automated narratives explaining performance changes, and predicts future trends using time-series models. By the end, you’ll have a dashboard where your CMO can type “Why did our conversion rate drop last week?” and get an answer with supporting charts, root cause analysis, and recommended actions — in under 30 seconds.

Prerequisites

  • Python 3.10+
  • Streamlit: pip install streamlit
  • LangChain ecosystem: pip install langchain langchain-openai langchain-community langchain-experimental
  • Data stack: pip install pandas sqlalchemy psycopg2-binary plotly prophet scikit-learn
  • OpenAI API key with GPT-4o access
  • PostgreSQL database with marketing data (Google Analytics exports, ad spend, email stats)
  • Sample dataset: we’ll use a generated dataset if you don’t have real data
  • Basic familiarity with Pandas, SQL, and Streamlit

Step 1: Set Up the Marketing Data Warehouse

First, create a unified marketing data schema in PostgreSQL:

-- Campaign performance
CREATE TABLE campaign_metrics (
    id SERIAL PRIMARY KEY,
    date DATE NOT NULL,
    campaign_name VARCHAR(255),
    channel VARCHAR(50),  -- google_ads, meta, email, organic, direct
    spend DECIMAL(10,2),
    impressions INTEGER,
    clicks INTEGER,
    conversions INTEGER,
    revenue DECIMAL(12,2),
    cost_per_click DECIMAL(6,4),
    conversion_rate DECIMAL(5,4),
    roas DECIMAL(8,2),  -- Return on Ad Spend
    created_at TIMESTAMP DEFAULT NOW()
);

-- Website analytics
CREATE TABLE web_analytics (
    id SERIAL PRIMARY KEY,
    date DATE NOT NULL,
    page_path VARCHAR(500),
    source VARCHAR(100),
    sessions INTEGER,
    pageviews INTEGER,
    bounce_rate DECIMAL(5,4),
    avg_session_duration DECIMAL(6,2),
    goal_completions INTEGER
);

-- Email marketing
CREATE TABLE email_metrics (
    id SERIAL PRIMARY KEY,
    date DATE NOT NULL,
    campaign_id VARCHAR(100),
    subject_line VARCHAR(500),
    sent INTEGER,
    opens INTEGER,
    clicks INTEGER,
    unsubscribes INTEGER,
    open_rate DECIMAL(5,4),
    click_rate DECIMAL(5,4)
);

-- Customer data
CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    first_seen DATE,
    channel VARCHAR(50),
    lifetime_value DECIMAL(12,2),
    orders INTEGER,
    last_order_date DATE
);

Generate sample data if you don’t have real data:

import pandas as pd
import numpy as np
from datetime import datetime, timedelta
from sqlalchemy import create_engine

engine = create_engine("postgresql://postgres:password@localhost:5432/marketing_db")

def generate_sample_data(days=365):
    """Generate realistic marketing data for dashboard development."""
    dates = [datetime.now() - timedelta(days=i) for i in range(days)]
    
    # Campaign data
    channels = ['google_ads', 'meta', 'email', 'organic', 'direct']
    campaigns = ['Brand_Awareness_Q1', 'Product_Launch', 'Retargeting', 
                 'Newsletter_Weekly', 'Holiday_Promo', 'Evergreen']
    
    data = []
    for date in dates:
        for channel in channels:
            base_spend = np.random.lognormal(mean=6, sigma=0.5) if channel in ['google_ads', 'meta'] else np.random.lognormal(mean=4, sigma=0.3)
            cpc = np.random.uniform(0.5, 5.0) if channel == 'google_ads' else np.random.uniform(0.2, 2.0)
            impressions = int(base_spend / cpc * np.random.uniform(50, 200))
            clicks = int(impressions * np.random.uniform(0.01, 0.05))
            conversions = int(clicks * np.random.uniform(0.02, 0.08))
            revenue = conversions * np.random.lognormal(mean=4, sigma=0.8)
            
            data.append({
                'date': date,
                'campaign_name': np.random.choice(campaigns),
                'channel': channel,
                'spend': round(base_spend, 2),
                'impressions': impressions,
                'clicks': clicks,
                'conversions': conversions,
                'revenue': round(revenue, 2),
                'cost_per_click': round(cpc, 4),
                'conversion_rate': round(conversions / max(clicks, 1), 4),
                'roas': round(revenue / max(base_spend, 1), 2)
            })
    
    df = pd.DataFrame(data)
    df.to_sql('campaign_metrics', engine, if_exists='replace', index=False)
    print(f"Inserted {len(df)} campaign records")

generate_sample_data()

Step 2: Build LangChain SQL Agent for Natural Language Queries

This is the core AI capability — letting users ask questions in plain English:

import streamlit as st
from langchain_openai import ChatOpenAI
from langchain_community.utilities import SQLDatabase
from langchain_community.agent_toolkits import create_sql_agent
from langchain.agents import AgentType
from langchain_core.messages import SystemMessage

# Connect to the database
db = SQLDatabase.from_uri("postgresql://postgres:password@localhost:5432/marketing_db")

# Verify connection
print(f"Database dialect: {db.dialect}")
print(f"Usable tables: {db.get_usable_table_names()}")

# Create the agent
llm = ChatOpenAI(model="gpt-4o", temperature=0)

agent_executor = create_sql_agent(
    llm=llm,
    db=db,
    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
    handle_parsing_errors=True,
    max_iterations=10,
    agent_executor_kwargs={
        "extra_prompt_messages": [
            SystemMessage(content="""
            You are a marketing analytics expert. When answering questions:
            1. Always write valid PostgreSQL queries
            2. Use proper date filtering — today is 2026-06-01
            3. For "trend" questions, group by date and order chronologically
            4. For comparison questions, use week-over-week or month-over-month
            5. When asked "why", look for correlations across channels
            6. Format numbers: currency with $, percentages with %, large numbers with commas
            7. If you can't find enough data, explain what's missing
            
            Table schemas:
            - campaign_metrics: date, campaign_name, channel, spend, impressions, clicks, conversions, revenue, cost_per_click, conversion_rate, roas
            - web_analytics: date, page_path, source, sessions, pageviews, bounce_rate, avg_session_duration, goal_completions
            - email_metrics: date, campaign_id, subject_line, sent, opens, clicks, unsubscribes, open_rate, click_rate
            """)
        ]
    }
)

# Test: ask a question
result = agent_executor.invoke({
    "input": "What was our total ad spend last month compared to the previous month?"
})
print(result["output"])
# Expected: Shows spend comparison with dollar amounts and percentage change

Step 3: Build Automated Insight Generation

Beyond Q&A, the system should proactively surface insights:

def generate_weekly_insights():
    """AI-powered weekly performance review."""
    
    query = """
    Get me all campaign_metrics from the last complete week (Monday-Sunday).
    Compare to the previous week. Group by channel.
    Show: channel, current_week_spend, prev_week_spend, current_conv_rate, 
          prev_conv_rate, current_roas, prev_roas
    """
    
    data_result = agent_executor.invoke({"input": query})
    
    # Now generate narrative insights
    insight_prompt = f"""
    Based on this marketing data from the last two weeks:
    
    {data_result['output']}
    
    Generate a concise weekly marketing performance report with:
    1. **Executive Summary** — one paragraph
    2. **Winners** — top 2 performing channels with data
    3. **Decliners** — channels that dropped with possible reasons
    4. **Anomalies** — any unusual patterns
    5. **Recommendations** — 3 specific actions for next week
    
    Use a professional, data-driven tone.
    """
    
    response = llm.invoke(insight_prompt)
    return response.content

insights = generate_weekly_insights()
print(insights)

Custom insight detectors — use LangChain’s structured output:

from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_core.output_parsers import PydanticOutputParser

class Insight(BaseModel):
    metric: str = Field(description="The metric being analyzed")
    change_pct: float = Field(description="Percentage change")
    direction: str = Field(description="up or down")
    significance: str = Field(description="low, medium, high")
    narrative: str = Field(description="Human-readable explanation")
    recommended_action: str = Field(description="What to do about it")

class WeeklyReport(BaseModel):
    executive_summary: str
    insights: list[Insight]
    recommendations: list[str]

parser = PydanticOutputParser(pydantic_object=WeeklyReport)

def structured_insights(data: str) -> WeeklyReport:
    prompt = f"""
    Analyze this marketing data and extract structured insights:
    
    {data}
    
    {parser.get_format_instructions()}
    """
    
    response = llm.invoke(prompt)
    return parser.parse(response.content)

report = structured_insights(data_result['output'])
for insight in report.insights:
    icon = "📈" if insight.direction == "up" else "📉"
    print(f"{icon} {insight.metric}: {insight.change_pct:+.1f}% ({insight.significance})")
    print(f"   {insight.narrative}")

Step 4: Build Predictive Analytics with Prophet

Forecast future performance using Meta’s Prophet:

from prophet import Prophet
import plotly.graph_objects as go

def forecast_metric(df, metric_col='revenue', periods=30, metric_name='Revenue'):
    """Forecast a marketing metric using Prophet + LLM context."""
    
    # Prepare data for Prophet
    prophet_df = df[['date', metric_col]].rename(
        columns={'date': 'ds', metric_col: 'y'}
    )
    
    # Fit model with seasonality
    model = Prophet(
        yearly_seasonality=True,
        weekly_seasonality=True,
        daily_seasonality=False,
        changepoint_prior_scale=0.05
    )
    model.add_seasonality(name='monthly', period=30.5, fourier_order=5)
    model.fit(prophet_df)
    
    # Forecast
    future = model.make_future_dataframe(periods=periods)
    forecast = model.predict(future)
    
    # Generate AI narrative about the forecast
    last_month = df[df['date'] >= df['date'].max() - timedelta(days=30)][metric_col].mean()
    forecast_mean = forecast.tail(periods)['yhat'].mean()
    change_pct = ((forecast_mean - last_month) / last_month) * 100
    
    narrative = llm.invoke(f"""
    Marketing forecast summary:
    - Metric: {metric_name}
    - Last 30 days average: ${last_month:,.0f}
    - Next 30 days forecast: ${forecast_mean:,.0f}
    - Projected change: {change_pct:+.1f}%
    - Confidence interval: ${forecast.tail(periods)['yhat_lower'].mean():,.0f} - ${forecast.tail(periods)['yhat_upper'].mean():,.0f}
    
    Write a 2-sentence narrative explaining this forecast and its implications for marketing strategy.
    """)
    
    return {
        'forecast': forecast,
        'model': model,
        'narrative': narrative.content,
        'change_pct': change_pct
    }

# Example: forecast revenue
df = pd.read_sql("SELECT date, SUM(revenue) as revenue FROM campaign_metrics GROUP BY date ORDER BY date", engine)
forecast_result = forecast_metric(df)
print(forecast_result['narrative'])

Step 5: Build the Main Streamlit Dashboard

Combine everything into a single, powerful dashboard:

import streamlit as st
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime, timedelta

st.set_page_config(page_title="AI Marketing Analytics", layout="wide")
st.title("📊 AI Marketing Analytics Dashboard")

# Sidebar configuration
with st.sidebar:
    st.header("Controls")
    date_range = st.date_input(
        "Date Range",
        value=(datetime.now() - timedelta(days=90), datetime.now()),
        max_value=datetime.now()
    )
    channels = st.multiselect(
        "Channels",
        ['google_ads', 'meta', 'email', 'organic', 'direct'],
        default=['google_ads', 'meta', 'email']
    )
    
    st.divider()
    st.subheader("🤖 AI Query")
    use_ai = st.toggle("Enable AI Analysis", value=True)
    auto_insights = st.toggle("Auto-generate Insights", value=True)

# Load data
@st.cache_data(ttl=300)
def load_data(start_date, end_date):
    query = f"""
    SELECT * FROM campaign_metrics 
    WHERE date >= '{start_date}' AND date <= '{end_date}'
    """
    return pd.read_sql(query, engine)

df = load_data(date_range[0], date_range[1])
df_filtered = df[df['channel'].isin(channels)] if channels else df

# Top-level KPIs
col1, col2, col3, col4, col5 = st.columns(5)
total_spend = df_filtered['spend'].sum()
total_revenue = df_filtered['revenue'].sum()
total_conv = df_filtered['conversions'].sum()
avg_roas = total_revenue / total_spend if total_spend > 0 else 0
avg_cvr = df_filtered['conversion_rate'].mean()

col1.metric("Total Spend", f"${total_spend:,.0f}")
col2.metric("Revenue", f"${total_revenue:,.0f}")
col3.metric("Conversions", f"{total_conv:,}")
col4.metric("ROAS", f"{avg_roas:.2f}x")
col5.metric("Avg CVR", f"{avg_cvr:.2%}")

# AI Query Section
with st.expander("🔍 Ask AI About Your Marketing Data", expanded=True):
    st.caption("Ask questions in natural language. Examples: 'Why did conversion rate drop last week?', 'Which channel has the best ROAS?', 'Show me campaign performance by day for the last 30 days'")
    
    user_query = st.text_input("Your question:", placeholder="e.g., What was our best performing campaign last month?")
    
    if user_query and use_ai:
        with st.spinner("AI is analyzing your data..."):
            result = agent_executor.invoke({"input": f"""
                Context: Today is 2026-06-01. Data range: {date_range[0]} to {date_range[1]}.
                Channels: {', '.join(channels)}.
                
                Question: {user_query}
                
                Return: the SQL query used, then the data, then your analysis.
            """})
            
            st.markdown("### 🤖 AI Analysis")
            st.markdown(result["output"])

# Main dashboard tabs
tab1, tab2, tab3, tab4 = st.tabs(["📈 Performance", "🔍 Channel Analysis", "📊 Forecast", "💡 Insights"])

with tab1:
    col_left, col_right = st.columns([2, 1])
    
    with col_left:
        # Daily revenue + spend chart
        daily = df_filtered.groupby('date').agg({
            'revenue': 'sum', 'spend': 'sum'
        }).reset_index()
        
        fig = go.Figure()
        fig.add_trace(go.Scatter(x=daily['date'], y=daily['revenue'],
                                 mode='lines', name='Revenue', line=dict(color='green', width=2)))
        fig.add_trace(go.Scatter(x=daily['date'], y=daily['spend'],
                                 mode='lines', name='Spend', line=dict(color='red', width=2)))
        fig.update_layout(title='Daily Revenue vs Spend', height=400)
        st.plotly_chart(fig, use_container_width=True)
    
    with col_right:
        # ROAS by channel
        channel_roas = df_filtered.groupby('channel').apply(
            lambda x: x['revenue'].sum() / x['spend'].sum() if x['spend'].sum() > 0 else 0
        ).reset_index(name='ROAS')
        
        fig = px.bar(channel_roas, x='channel', y='ROAS', 
                     title='ROAS by Channel', color='ROAS',
                     color_continuous_scale='RdYlGn')
        fig.update_layout(height=400)
        st.plotly_chart(fig, use_container_width=True)

with tab2:
    # Detailed channel breakdown
    channel_summary = df_filtered.groupby('channel').agg({
        'spend': 'sum', 'impressions': 'sum', 'clicks': 'sum',
        'conversions': 'sum', 'revenue': 'sum'
    }).reset_index()
    channel_summary['CPC'] = channel_summary['spend'] / channel_summary['clicks'].replace(0, 1)
    channel_summary['CVR'] = channel_summary['conversions'] / channel_summary['clicks'].replace(0, 1)
    channel_summary['ROAS'] = channel_summary['revenue'] / channel_summary['spend'].replace(0, 1)
    
    st.dataframe(channel_summary.style.format({
        'spend': '${:,.0f}', 'revenue': '${:,.0f}',
        'CPC': '${:.2f}', 'CVR': '{:.2%}', 'ROAS': '{:.2f}x'
    }), use_container_width=True)
    
    # Campaign comparison
    st.subheader("Campaign Performance Comparison")
    campaign_df = df_filtered.groupby('campaign_name').agg({
        'spend': 'sum', 'conversions': 'sum', 'revenue': 'sum'
    }).reset_index()
    campaign_df['ROAS'] = campaign_df['revenue'] / campaign_df['spend'].replace(0, 1)
    
    fig = px.scatter(campaign_df, x='spend', y='revenue', 
                     size='conversions', color='campaign_name',
                     hover_name='campaign_name', text='campaign_name',
                     title='Campaign ROI Quadrant (Spend vs Revenue)')
    fig.add_hline(y=campaign_df['revenue'].mean(), line_dash="dash", line_color="gray")
    fig.add_vline(x=campaign_df['spend'].mean(), line_dash="dash", line_color="gray")
    st.plotly_chart(fig, use_container_width=True)

with tab3:
    st.subheader("📊 Revenue Forecast (Next 30 Days)")
    
    daily_rev = df_filtered.groupby('date')['revenue'].sum().reset_index()
    forecast = forecast_metric(daily_rev)
    
    fig = go.Figure()
    fig.add_trace(go.Scatter(
        x=forecast['forecast']['ds'],
        y=forecast['forecast']['yhat'],
        mode='lines', name='Forecast', line=dict(color='blue')
    ))
    fig.add_trace(go.Scatter(
        x=forecast['forecast']['ds'],
        y=forecast['forecast']['yhat_upper'],
        mode='lines', name='Upper Bound',
        line=dict(color='lightblue', dash='dot')
    ))
    fig.add_trace(go.Scatter(
        x=forecast['forecast']['ds'],
        y=forecast['forecast']['yhat_lower'],
        mode='lines', name='Lower Bound',
        line=dict(color='lightblue', dash='dot'),
        fill='tonexty'
    ))
    # Add actual data
    fig.add_trace(go.Scatter(
        x=daily_rev['date'], y=daily_rev['revenue'],
        mode='markers', name='Actual', marker=dict(color='black', size=4)
    ))
    fig.update_layout(title='Revenue Forecast with Confidence Intervals', height=500)
    st.plotly_chart(fig, use_container_width=True)
    
    st.info(f"🤖 **AI Forecast Narrative:** {forecast['narrative']}")

with tab4:
    if auto_insights:
        with st.spinner("Generating AI insights..."):
            insights = generate_weekly_insights()
            st.markdown(insights)
    
    st.divider()
    
    # Anomaly detection
    st.subheader("🚨 Anomaly Detection")
    daily_stats = df_filtered.groupby('date')['conversions'].sum().reset_index()
    mean = daily_stats['conversions'].mean()
    std = daily_stats['conversions'].std()
    
    anomalies = daily_stats[
        (daily_stats['conversions'] > mean + 2*std) | 
        (daily_stats['conversions'] < mean - 2*std)
    ]
    
    if not anomalies.empty:
        st.warning(f"Found {len(anomalies)} anomalous days in conversion data")
        for _, row in anomalies.iterrows():
            direction = "⬆️ Spike" if row['conversions'] > mean else "⬇️ Drop"
            st.write(f"{direction} on {row['date']}: {row['conversions']} conversions "
                    f"(avg: {mean:.0f} ± {std:.0f})")
    else:
        st.success("✅ No significant anomalies detected in the selected period")

Step 6: Deploy and Schedule Updates

# Run the dashboard
streamlit run marketing_dashboard.py

# For production: deploy with Docker
cat > Dockerfile << 'EOF'
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "marketing_dashboard.py", "--server.port=8501", "--server.address=0.0.0.0"]
EOF

# Build and run
docker build -t marketing-ai-dashboard .
docker run -d -p 8501:8501 marketing-ai-dashboard

What You’ve Built

A complete AI-powered marketing analytics dashboard:

  • Natural language query interface to your marketing database (LangChain SQL Agent)
  • Automated weekly insights with structured output
  • Predictive revenue forecasting using Prophet + AI narrative
  • Interactive visualizations with anomaly detection
  • Channel performance comparison and campaign ROI analysis
  • Docker-ready deployment

The dashboard saves marketing teams 5-10 hours per week on reporting and analysis.

Troubleshooting

LangChain agent generates invalid SQL: Enable handle_parsing_errors=True and reduce max_iterations to prevent infinite retry loops. For complex queries, set agent_type=AgentType.OPENAI_FUNCTIONS instead of ZERO_SHOT_REACT_DESCRIPTION. Add explicit column name hints in the system prompt matching your actual table schema.

Prophet forecast shows declining trend when data is seasonal: Add explicit seasonality: model.add_seasonality(name='monthly', period=30.5, fourier_order=5). For e-commerce data with holiday spikes, add holiday effects: model.add_country_holidays(country_name='US').

Streamlit dashboard loads slowly on large datasets: Add @st.cache_data(ttl=300) decorator and consider pre-aggregating data in PostgreSQL views:

CREATE VIEW daily_summary AS
SELECT date, channel, SUM(spend) as spend, SUM(revenue) as revenue,
       SUM(conversions) as conversions
FROM campaign_metrics GROUP BY date, channel;

OpenAI API costs for frequent queries: Use gpt-4o-mini for routine insight generation and gpt-4o only for complex analytical queries. Implement a rate limiter: time.sleep(1) between API calls. Cache common queries by hashing the input.

Next Steps

  • Connect live API sources (Google Analytics 4, Meta Ads, Google Ads) via their API clients
  • Add anomaly alerting via Slack/Email webhooks when metrics exceed thresholds
  • Implement multi-user authentication with Streamlit’s built-in authentication or Auth0
  • Build custom report exports (PDF with embedded charts using weasyprint)
  • Add A/B test result analysis module using Bayesian statistics
  • Integrate with Google Looker Studio as a data source connector