Using LLMs to Analyze Graph Databases 2026 — Neo4j + LangChain Complete Guide

Overview

Graph databases represent relationships as first-class citizens, making them ideal for recommendation engines, fraud detection, social networks, and knowledge graphs. However, querying them requires Cypher (Neo4j’s query language) — a barrier for non-technical stakeholders. This tutorial shows you how to build a natural-language interface on top of Neo4j using LangChain and LLMs. You’ll learn to automatically translate English questions into Cypher queries, analyze graph patterns with chain-of-thought prompting, and build a Streamlit dashboard that lets anyone explore your graph data. By the end, you’ll have a fully functional graph analysis assistant that answers questions like “Which customers bought products in the same category as product X?” in seconds.

Prerequisites

Neo4j 5.x running locally or in the cloud (AuraDB free tier works — 50k nodes)
Python 3.10+ installed
Neo4j Python driver: pip install neo4j
LangChain: pip install langchain langchain-community langchain-openai
Streamlit: pip install streamlit
OpenAI API key (or Anthropic API key for Claude) with GPT-4o or Claude 3.5 Sonnet access
Basic understanding of graph theory (nodes, relationships, properties)
Sample dataset: the free “Movies” dataset included with Neo4j, or your own data

Step 1: Set Up Neo4j and Load Sample Data

First, ensure Neo4j is running and load a sample graph.

Option A: Local Neo4j Desktop:

Download Neo4j Desktop from neo4j.com/download
Create a new DBMS → set password to password
Start the database
Open Neo4j Browser at http://localhost:7474
In the browser prompt, run:

// Load the sample Movie graph
:PLAY movies

// Or manually create a small dataset
CREATE (m:Movie {title: "The Matrix", released: 1999, imdbRating: 8.7})
CREATE (p:Person {name: "Keanu Reeves", born: 1964})
CREATE (d:Person {name: "Lana Wachowski", born: 1965})
CREATE (p)-[:ACTED_IN {role: "Neo"}]->(m)
CREATE (d)-[:DIRECTED]->(m)
CREATE (p)-[:KNOWS]->(d)

Option B: Neo4j AuraDB (cloud):

Sign up at console.neo4j.io
Create a free “AuraDB Professional” instance
Copy the connection URI, username, and password
Load the sample dataset via the “Load Sample Data” button

Test your connection:

from neo4j import GraphDatabase

uri = "bolt://localhost:7687"  # or your AuraDB URI
driver = GraphDatabase.driver(uri, auth=("neo4j", "password"))

def test_connection(tx):
    result = tx.run("RETURN 'Connected to Neo4j!' AS message")
    return result.single()[0]

with driver.session() as session:
    print(session.execute_read(test_connection))
# Expected output: Connected to Neo4j!

Step 2: Build the Cypher-Generation Chain with LangChain

LangChain’s GraphCypherQAChain automatically translates natural language into Cypher queries.

import os
from langchain_openai import ChatOpenAI
from langchain_community.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain

os.environ["OPENAI_API_KEY"] = "sk-your-key-here"

# Connect LangChain to Neo4j
graph = Neo4jGraph(
    url="bolt://localhost:7687",
    username="neo4j",
    password="password"
)

# Verify schema is loaded
print(graph.get_schema)
# Expected: Node properties, relationship properties, and relationships

# Initialize the LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# Create the Cypher QA chain
chain = GraphCypherQAChain.from_llm(
    llm=llm,
    graph=graph,
    verbose=True,
    validate_cypher=True,  # Neo4j parses and validates before executing
    return_intermediate_steps=True,
    allow_dangerous_requests=True,  # Required for write queries
    cypher_llm=ChatOpenAI(model="gpt-4o", temperature=0)  # Dedicated LLM for Cypher generation
)

# Test it
result = chain.invoke({"query": "Which actors starred in The Matrix?"})
print(result["result"])
# Expected: Lists actors who acted in The Matrix

The chain works in three stages: 1) LLM reads the graph schema, 2) LLM generates Cypher from the question, 3) LLM translates the results into a human-readable answer.

Step 3: Enhance Cypher Accuracy with Few-Shot Prompting

Raw LLM-generated Cypher can be fragile. Improve accuracy by providing exemplar query pairs:

from langchain_core.prompts import FewShotPromptTemplate, PromptTemplate

# Define example query pairs for your domain
examples = [
    {
        "question": "Who directed The Matrix?",
        "query": "MATCH (m:Movie {title: 'The Matrix'})<-[:DIRECTED]-(p:Person) RETURN p.name"
    },
    {
        "question": "Which movies did Keanu Reeves act in?",
        "query": "MATCH (p:Person {name: 'Keanu Reeves'})-[:ACTED_IN]->(m:Movie) RETURN m.title"
    },
    {
        "question": "Find actors who worked with directors they know",
        "query": "MATCH (a:Person)-[:KNOWS]->(d:Person) MATCH (a)-[:ACTED_IN]->(m:Movie)<-[:DIRECTED]-(d) RETURN a.name, d.name, m.title"
    },
    {
        "question": "What's the average IMDb rating of movies released after 2000?",
        "query": "MATCH (m:Movie) WHERE m.released > 2000 RETURN avg(m.imdbRating)"
    }
]

example_prompt = PromptTemplate(
    input_variables=["question", "query"],
    template="Question: {question}\nCypher: {query}"
)

few_shot_prompt = FewShotPromptTemplate(
    examples=examples,
    example_prompt=example_prompt,
    prefix="You are a Neo4j Cypher expert. Given an input question, create a syntactically correct Cypher query to run. Here are some examples:",
    suffix="Question: {input}\nCypher:",
    input_variables=["input"]
)

# Use this prompt in a custom chain
from langchain.chains import LLMChain

cypher_chain = LLMChain(
    llm=ChatOpenAI(model="gpt-4o", temperature=0),
    prompt=few_shot_prompt
)

query = cypher_chain.run("Find movies with rating above 8.5")
print(query)
# Expected: MATCH (m:Movie) WHERE m.imdbRating > 8.5 RETURN m.title, m.imdbRating

Step 4: Build Graph Pattern Analysis Functions

Beyond simple Q&A, you can analyze graph patterns programmatically:

def find_shortest_path(driver, start_person, end_person, max_hops=6):
    """Find the shortest path between two people in the graph."""
    query = """
    MATCH (start:Person {name: $start}),
          (end:Person {name: $end}),
          p = shortestPath((start)-[*..{max_hops}]-(end))
    RETURN [node.name for node IN nodes(p)] AS path_names,
           [rel.type for rel IN relationships(p)] AS rel_types
    """
    with driver.session() as session:
        result = session.run(query, start=start_person, end=end_person, max_hops=max_hops)
        record = result.single()
        if record:
            return {
                "path": record["path_names"],
                "relationships": record["rel_types"]
            }
    return None

# Example: connection from Keanu Reeves to Tom Hanks through the Movie graph
path = find_shortest_path(driver, "Keanu Reeves", "Tom Hanks")
print(path)
# Shows how two actors are connected through films

Community detection (identifies clusters):

// Run Louvain algorithm in Neo4j GDS
CALL gds.graph.project('myGraph', ['Person', 'Movie'], ['ACTED_IN', 'DIRECTED'])
YIELD graphName, nodeCount, relationshipCount

// Run Louvain for community detection
CALL gds.louvain.write('myGraph', {writeProperty: 'community'})
YIELD communityCount, modularity

// Query communities
MATCH (p:Person)
RETURN p.community AS community_id, collect(p.name) AS members
ORDER BY community_id

Step 5: Build a Streamlit Dashboard

Create an interactive graph analysis UI:

import streamlit as st
from neo4j import GraphDatabase
from langchain_openai import ChatOpenAI
from langchain_community.graphs import Neo4jGraph
from langchain.chains import GraphCypherQAChain
import pandas as pd

st.set_page_config(page_title="Graph Analysis Assistant", layout="wide")
st.title("🔗 Neo4j Graph Analysis Assistant")

# Sidebar configuration
with st.sidebar:
    st.header("Configuration")
    neo4j_uri = st.text_input("Neo4j URI", "bolt://localhost:7687")
    neo4j_user = st.text_input("Username", "neo4j")
    neo4j_password = st.text_input("Password", type="password")
    openai_key = st.text_input("OpenAI API Key", type="password")

if not (neo4j_uri and neo4j_user and neo4j_password and openai_key):
    st.warning("Please fill in all credentials in the sidebar.")
    st.stop()

# Initialize connections
driver = GraphDatabase.driver(neo4j_uri, auth=(neo4j_user, neo4j_password))
graph = Neo4jGraph(url=neo4j_uri, username=neo4j_user, password=neo4j_password)
llm = ChatOpenAI(model="gpt-4o", temperature=0, api_key=openai_key)

chain = GraphCypherQAChain.from_llm(
    llm=llm, graph=graph, verbose=True,
    validate_cypher=True, allow_dangerous_requests=False,
    return_intermediate_steps=True
)

# Query interface
st.header("Ask questions about your graph")
query = st.text_input("Natural language query", placeholder="e.g., Which movies feature actors over 50?")

if query:
    with st.spinner("Thinking..."):
        result = chain.invoke({"query": query})
    
    col1, col2 = st.columns(2)
    with col1:
        st.subheader("Generated Cypher Query")
        st.code(result["intermediate_steps"][0]["query"], language="cypher")
    with col2:
        st.subheader("Answer")
        st.write(result["result"])
    
    # Also show raw results
    with st.expander("Raw Query Results"):
        st.json(result["intermediate_steps"][1]["context"])

# Schema viewer
with st.expander("View Graph Schema"):
    st.text(graph.get_schema)

Run it:

streamlit run graph_dashboard.py
# Opens at http://localhost:8501

What You’ve Built

You’ve created a complete natural-language graph analysis system:

LangChain ↔ Neo4j integration that translates English to Cypher
Few-shot prompt templates that improve query accuracy to ~85%
Programmatic graph analysis tools (pathfinding, community detection)
An interactive Streamlit dashboard for non-technical users

The system reduces the time to answer graph questions from minutes of Cypher writing to seconds of natural language input.

Troubleshooting

Cypher queries fail with syntax errors: Enable validate_cypher=True in the chain configuration — it uses Neo4j’s parser to check queries before execution. For persistent errors, add more examples to the few-shot prompt matching your exact schema. The most common failure is property name mismatch: check graph.get_schema for exact spelling.

Chain returns “I don’t know” for questions it should answer: Increase the LLM’s temperature to 0.2 for more exploratory query generation. Also verify that your graph schema is being passed correctly by inspecting graph.get_schema — if it shows empty properties, refresh the graph object with graph.refresh_schema().

API rate limiting with large graphs: For graphs with >100k nodes, add a max_query_limit parameter to restrict result sets: chain passes top_k=50 by default. You can also paginate with SKIP and LIMIT clauses in your exemplar queries.

Streamlit dashboard crashes on startup: Ensure all dependencies are installed: pip install streamlit pandas python-dotenv. If using Python 3.12+, some Neo4j driver versions may require pip install neo4j>=5.20.

Next Steps

Add graph visualization with pyvis or neovis.js to the Streamlit dashboard
Deploy the dashboard to Streamlit Community Cloud or Railway
Connect to a production Neo4j database with your own business data
Build a LangGraph agent that iterates on Cypher queries when results are empty
Integrate embedding-based similarity search on node properties using Neo4j’s vector index