10 - AI Agents and Orchestration β
1. What is an AI Agent? β
What: An AI agent is an LLM-powered system that can autonomously plan and execute multi-step tasks by using tools, observing results, and deciding next actions. Unlike simple tool-calling (one-shot), agents loop until the task is complete.
Simple tool use: User β LLM β Tool β Response (one step)
Agent: User β LLM β Tool β Observe β Think β Tool β ... β Response
βββββββββ autonomous loop βββββββββββββKey properties of agents:
- Autonomy: Decide which tools to use and when
- Planning: Break complex tasks into steps
- Memory: Track progress and context across steps
- Observation: Interpret tool results and adapt
2. ReAct (Reasoning + Acting) β
What: The foundational agent pattern. The model alternates between reasoning about what to do and taking actions.
Thought: I need to find the user's order status. Let me search by email.
Action: search_orders(email="user@example.com")
Observation: Found order #1234, status: shipped, tracking: XYZ123
Thought: I have the order. Now I need to get tracking details.
Action: get_tracking(tracking_id="XYZ123")
Observation: Package in transit, expected delivery: March 7
Thought: I have all the information needed to respond.
Answer: Your order #1234 has been shipped and is expected to arrive March 7.Implementation:
REACT_SYSTEM_PROMPT = """You are a helpful assistant with access to tools.
For each step:
1. Think about what you need to do next
2. Use a tool if needed
3. Observe the result
4. Repeat until you can give a final answer
Always think before acting."""
def react_agent(query, tools, max_steps=10):
messages = [{"role": "user", "content": query}]
for step in range(max_steps):
response = llm.create(
messages=messages,
tools=tools
)
if response.stop_reason == "end_turn":
return response.content[0].text # Final answer
# Execute tools and continue
messages.append({"role": "assistant", "content": response.content})
tool_results = execute_tools(response.content)
messages.append({"role": "user", "content": tool_results})
return "Max steps reached without completing task."3. Plan-and-Execute β
What: Instead of interleaving thinking and acting, first create a full plan, then execute steps sequentially. Better for complex multi-step tasks.
Planning phase:
"To deploy the application, I need to:
1. Run tests
2. Build the Docker image
3. Push to registry
4. Update Kubernetes deployment
5. Verify health check"
Execution phase:
Step 1: run_tests() β All 42 tests passed
Step 2: build_docker(tag="v1.2.3") β Image built
Step 3: push_image("v1.2.3") β Pushed to registry
Step 4: update_k8s(image="v1.2.3") β Deployment updated
Step 5: health_check() β 200 OK
β Plan complete.When to use each:
| Pattern | Best For |
|---|---|
| ReAct | Exploratory tasks, unclear goals, simple workflows |
| Plan-and-Execute | Well-defined multi-step tasks, complex workflows |
| Hybrid | Plan first, then ReAct within each step |
4. LangChain β
What: The most popular framework for building LLM applications and agents. Provides abstractions for chains, agents, tools, memory, and retrieval.
from langchain_openai import ChatOpenAI
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_core.tools import tool
from langchain_core.prompts import ChatPromptTemplate
# Define tools
@tool
def search_web(query: str) -> str:
"""Search the web for information."""
return web_search(query)
@tool
def calculator(expression: str) -> str:
"""Evaluate a math expression."""
return str(eval(expression)) # simplified
# Create agent
llm = ChatOpenAI(model="gpt-4")
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant"),
("human", "{input}"),
("placeholder", "{agent_scratchpad}")
])
agent = create_tool_calling_agent(llm, [search_web, calculator], prompt)
executor = AgentExecutor(agent=agent, tools=[search_web, calculator])
result = executor.invoke({"input": "What's the population of France times 2?"})LangChain components:
| Component | Purpose |
|---|---|
| LLMs/ChatModels | Model wrappers (OpenAI, Anthropic, etc.) |
| Prompts | Template management |
| Chains | Sequential LLM calls |
| Agents | Autonomous tool-using loops |
| Tools | Function interfaces |
| Memory | Conversation/session state |
| Retrievers | RAG integration |
LangGraph (newer): Graph-based agent framework from LangChain for more complex, stateful workflows with cycles and conditional branching.
5. LlamaIndex β
What: Framework focused on connecting LLMs with data. Best for RAG pipelines, knowledge bases, and data-augmented applications (where LangChain is more general-purpose).
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# Load and index documents
documents = SimpleDirectoryReader("./data").load_data()
index = VectorStoreIndex.from_documents(documents)
# Query
query_engine = index.as_query_engine()
response = query_engine.query("What is the refund policy?")
print(response)Key abstractions:
| Concept | What |
|---|---|
| Documents | Raw data (PDFs, web pages, databases) |
| Nodes | Chunks of documents with metadata |
| Indices | Data structures for retrieval (vector, list, tree, keyword) |
| Query Engines | End-to-end query β retrieve β synthesize pipeline |
| Agents | LlamaIndex's agent framework for tool use |
LangChain vs LlamaIndex:
- LangChain: General-purpose, agent-focused, more flexibility
- LlamaIndex: Data/RAG-focused, more opinionated, easier for data pipelines
- Many projects use both together
6. Memory Patterns β
What: Agents need to remember context across interactions. Different memory strategies for different needs.
Conversation memory:
Short-term: Full conversation history in context window
Pro: Perfect recall Con: Context window limit
Sliding window: Keep last N messages
Pro: Bounded size Con: Loses old context
Summary memory: LLM summarizes older messages
Pro: Compressed history Con: Lossy, summary may miss details
Token buffer: Keep messages up to token limit, drop oldest
Pro: Predictable size Con: Arbitrary cutoffLong-term memory:
# Vector-based memory: store and retrieve past interactions
class VectorMemory:
def __init__(self, vector_db):
self.db = vector_db
def store(self, interaction: str, metadata: dict):
embedding = embed(interaction)
self.db.upsert(embedding, metadata)
def recall(self, query: str, top_k=5):
query_embedding = embed(query)
return self.db.query(query_embedding, top_k=top_k)Working memory (scratchpad):
- Agent maintains a scratchpad of intermediate results
- Updated after each tool call
- Helps track multi-step task progress
7. Multi-Agent Systems β
What: Multiple specialized agents collaborating on complex tasks, each with their own tools and expertise.
ββββββββββββββββββββββββββββββββββββββββββ
β ORCHESTRATOR AGENT β
β (Routes tasks, manages workflow) β
β β
β βββββββββββββββ βββββββββββββββ β
β β Research β β Coding β β
β β Agent β β Agent β β
β β (web search,β β (file edit,β β
β β reading) β β run code) β β
β βββββββββββββββ βββββββββββββββ β
β β
β βββββββββββββββ βββββββββββββββ β
β β Review β β Deploy β β
β β Agent β β Agent β β
β β (code reviewβ β (CI/CD, β β
β β testing) β β infra) β β
β βββββββββββββββ βββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββPatterns:
| Pattern | How | Example |
|---|---|---|
| Orchestrator | Central agent delegates to specialists | Project manager routing tasks |
| Pipeline | Agents pass work sequentially | Research β Write β Review β Publish |
| Debate | Agents argue for/against, third decides | Red team / blue team security |
| Consensus | Multiple agents propose, vote on best | Ensemble code solutions |
Challenges:
- Communication overhead between agents
- Error propagation across agent boundaries
- Debugging multi-agent interactions
- Cost (multiple LLM calls per step per agent)
- Coordination and deadlock prevention