PATH 04

Agentic AI &
Production Deployment

Multi-agent architectures, coordination patterns, Azure AI services, and deploying agents on Kubernetes at scale.

01

Multi-Agent Coordination Patterns

Pattern 1: Hierarchical (Orchestrator + Specialists)

An Orchestrator agent receives the high-level goal, decomposes it, and delegates to specialized sub-agents. Most suitable for business workflows.

flowchart TD U[User Goal:\nрезник Analyze Q3 sales and prepare report] --> O[Orchestrator Agent\nPlans and delegates] O --> R[Researcher Agent\nPulls data from Snowflake] O --> A[Analyst Agent\nRuns Python analysis] O --> W[Writer Agent\nFormats executive summary] O --> V[Reviewer Agent\nFact-checks + quality gate] R --> O2[Aggregated Results] A --> O2 W --> O2 V --> O2 O2 --> F[Final Report\nDelivered to User] style O fill:#ccfbf1,stroke:#0d9488 style V fill:#fef3c7,stroke:#d97706

Pattern 2: Pipeline (Sequential Agents)

Output of one agent flows directly into the input of the next. Clear, auditable, good for document processing.

flowchart LR A[Ingestion Agent\nLoad raw docs] --> B[Extraction Agent\nParse + structure] B --> C[Enrichment Agent\nAdd metadata + classify] C --> D[Indexing Agent\nEmbed + store in vector DB] D --> E[Validation Agent\nQuality check] E --> F[(Knowledge Base\nReady for RAG)] style F fill:#f0fdf4,stroke:#16a34a

Pattern 3: Peer-to-Peer (Collaborative Debate)

Agents communicate directly, challenge each other's outputs, reach consensus. Used in AutoGen multi-agent conversation.

flowchart LR A[Agent: Programmer\nwrites code] <-->|Code review| B[Agent: Code Reviewer\ncritiques + suggests] B <-->|Revised code| A A <-->|Tests pass?| C[Agent: QA Tester\nruns test suite] C -->|All green| D[Merged Output] style D fill:#ccfbf1,stroke:#0d9488
02

Agent Roles Reference

RoleResponsibilityTools Typically Used
PlannerDecompose goal into sub-tasksNo tools — pure reasoning
ResearcherGather informationweb_search, retriever, file_reader
ExecutorTake actions in systemscode_exec, api_call, db_write
ReviewerValidate quality of outputcode_exec (tests), assertion checks
CriticChallenge assumptions, find flawsNo tools — adversarial reasoning
SummarizerCondense + format resultsfile_writer, email_sender
RouterClassify + redirect requestsClassifier tool or pure LLM
03

Framework Deep Dives

CrewAI
Role-based multi-agent crew
from crewai import Crew, Agent, Task

researcher = Agent(
 role="Researcher",
 goal="Find accurate data",
 tools=[search_tool]
)
writer = Agent(
 role="Writer",
 goal="Write clear reports"
)
crew = Crew(
 agents=[researcher, writer],
 tasks=[research_task, write_task]
)
result = crew.kickoff()
Best: Business workflows, content pipelines
LangGraph
State machine agent graphs
from langgraph.graph import StateGraph

workflow = StateGraph(AgentState)
workflow.add_node("research", research_fn)
workflow.add_node("analyze", analyze_fn)
workflow.add_node("write", write_fn)

workflow.add_conditional_edges(
 "research",
 should_continue,
 {"continue": "analyze",
  "end": END}
)
app = workflow.compile()
Best: Conditional flows, HITL, loops
AutoGen
Conversational multi-agent
from autogen import AssistantAgent, UserProxyAgent

coder = AssistantAgent(
 name="Coder",
 llm_config={"model": "gpt-4o"}
)
reviewer = AssistantAgent(
 name="Reviewer",
 system_message="Critique code"
)
user = UserProxyAgent(
 name="User",
 code_execution_config={...}
)
user.initiate_chat(coder, message="Solve FizzBuzz")
Best: Code generation, peer review
04

Azure AI Agent Service

Azure AI Agent Service (Azure AI Foundry) provides a fully managed runtime for agents — handling threads, file storage, tool execution, auto-scaling, and observability out of the box.

flowchart LR subgraph SDK ["Your Code (Python/C#)"] A[Create Agent\nGPT-4o + tools + instructions] B[Create Thread\nconversation context] C[Add Message\nuser input] D[Run Thread\ntrigger execution] E[Poll + Stream\nget results] end subgraph AZURE ["Azure AI Foundry (Managed)"] F[Thread Store\nAzure Cosmos DB] G[File Store\nAzure Blob] H[Tool Execution\nCode Interpreter, Bing, Functions] I[LLM Inference\nGPT-4o / o1] end A --> AZURE B --> F C --> F D --> H D --> I E --> F style AZURE fill:#cffafe,stroke:#0891b2
# Create and run an Azure AI Agent — minimal skeleton
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential

client = AIProjectClient.from_connection_string("<FOUNDRY_CONNECTION_STRING>", DefaultAzureCredential())

agent = client.agents.create_agent(
  model="gpt-4o", name="MyAgent",
  instructions="You are a helpful data analyst.",
  tools=client.agents.get_file_search_tool()
)

thread = client.agents.create_thread()
client.agents.create_message(thread.id, role="user", content="Summarize Q3 report")
run = client.agents.create_and_process_run(thread.id, agent.id)

messages = client.agents.list_messages(thread.id)
print(messages.get_last_text_message_by_role("assistant").text.value)

Built-in Tools

  • ✅ Code Interpreter (Python sandbox)
  • ✅ File Search (auto-RAG on uploaded files)
  • ✅ Bing Search grounding
  • ✅ Custom Function Calling
  • ✅ Azure Functions integration

Managed Features

  • ✅ Thread & message persistence (Cosmos)
  • ✅ File upload & chunking (auto-RAG)
  • ✅ Auto-scaling LLM calls
  • ✅ Cost metering per agent/thread
  • ✅ Azure Monitor integration
05

Deploying Agents on Kubernetes

Production agents on AKS follow a microservices pattern — each capability isolated, scaled independently, secured via Workload Identity.

flowchart TB subgraph AKS ["AKS Cluster"] subgraph NS_APP ["namespace: ai-app"] FE[Frontend Pod\nnginx static] API[Backend API\nFastAPI :8000] end subgraph NS_AGENT ["namespace: ai-agents"] ORCH[Orchestrator\nLangGraph :8001] SEARCH[Search Agent\nRAG :8002] CODE[Code Agent\nPython executor :8003] end subgraph NS_OBS ["namespace: observability"] PROM[Prometheus] GRAF[Grafana] end end subgraph AZURE ["Azure Services"] ACR[Container Registry] KV[Key Vault\nAPI keys, secrets] OPENAI[Azure OpenAI\nGPT-4o Endpoint] AIS[Azure AI Search\nVector Store] end FE --> API API --> ORCH ORCH --> SEARCH ORCH --> CODE SEARCH --> AIS CODE --> OPENAI ACR --> AKS KV -->|Workload Identity| AKS style NS_AGENT fill:#ccfbf1,stroke:#0d9488

Resource Recommendations for Agent Pods

resources:
  requests:
    memory: "512Mi"
    cpu: "250m"
  limits:
    memory: "2Gi"   # agents need headroom
    cpu: "1000m" # bursted on LLM calls

Workload Identity for LLM Keys

# Agent pod gets OIDC token → Azure AD
# No secrets in K8s — pulls from Key Vault
serviceAccountName: ai-agent-sa
# SA annotated with Azure managed identity
env:
- name: AZURE_CLIENT_ID
  valueFrom:
    configMapKeyRef:
      name: agent-config
      key: client_id
06

Python Backend Features You Can Add

Core API Features

  • Chat API with streaming responses via SSE or WebSocket
  • Document upload API for PDF, DOCX, TXT, HTML ingestion
  • RAG query endpoint with citations, confidence, and source chunks
  • Agent run endpoint with status polling and execution history
  • Authentication, per-user sessions, tenant isolation

Operational Features

  • Usage tracking: token counts, latency, model used, per-user quotas
  • Prompt/version registry for experimentation and rollback
  • Conversation memory store in Postgres or Redis
  • Background jobs for indexing, re-embedding, scheduled evaluations
  • Audit logs and guardrails for prompt injection or unsafe tool calls

Recommended Architecture

flowchart LR UI[Static Frontend] --> API[FastAPI Backend] API --> LC[LangChain / LangGraph Layer] API --> AUTH[Auth + Sessions] API --> JOBS[Background Workers] LC --> OPENAI[Azure OpenAI] LC --> SEARCH[Azure AI Search / Vector DB] AUTH --> PG[(Postgres)] JOBS --> REDIS[(Redis / Queue)] JOBS --> N8N[n8n for notifications\nand business workflows] style API fill:#dbeafe,stroke:#2563eb style LC fill:#dcfce7,stroke:#16a34a style N8N fill:#ffe4e6,stroke:#e11d48
# Example backend endpoints
POST /api/chat
POST /api/documents/upload
POST /api/rag/query
POST /api/agents/run
GET /api/agents/{id}/status
GET /api/conversations
GET /api/metrics/usage
POST /api/workflows/trigger
07

Governance & Safety Layer

Content Filtering

  • Azure OpenAI content filters (hate, violence, self-harm, sexual)
  • Prompt injection detection middleware
  • Output validation before user delivery

Rate Limiting & Budget

  • Per-user token budgets (e.g., 100K tokens/day)
  • Max iterations per agent run (prevent infinite loops)
  • Cost alarms via Azure Cost Management

Audit Logging

  • Every tool call logged with input/output
  • LangSmith / Azure Monitor tracing
  • Immutable audit trail for compliance

Human-in-the-Loop

  • Checkpoint before irreversible actions (email, delete, deploy)
  • LangGraph interrupt() for async approvals
  • Confidence threshold → escalate to human
← AI Agents Next: Orchestration →