The Weather Outfit Advisor
A Multi-Agent System Leveraging ADK, A2A Protocol, and Vertex AI Agent Engine
Capstone Project: Prototype to Production
A Multi-Agent System Leveraging ADK, A2A Protocol, and Vertex AI Agent Engine
Capstone Project: Prototype to Production
Executive Summary ........................... 3
Project Overview .............................. 4
Development Journey ........................ 5
Day 1: Introduction to Agents
Agent Fundamentals ........................ 6
LLM Architecture ............................ 7
System Prompt ............................... 8
Design Patterns ............................. 9
Multi-Agent Systems ....................... 10
Day 2: Agent Tools & Interoperability
Tool Integration ............................. 11
Model Context Protocol .................... 12
Weather Tools Implementation ........... 13
API Integration .............................. 14
Day 3: Context Engineering
Session Management ....................... 15
Memory Systems ............................ 16
RAG Implementation ........................ 17
Context Strategies .......................... 18
Day 4: Agent Quality
Quality Framework .......................... 19
Safety & Security ........................... 20
Monitoring & Observability ................ 21
Evaluation Metrics .......................... 22
Day 5: Prototype to Production
A2A Protocol ................................. 23
Multi-Agent Architecture ................... 24
Agent Engine Deployment ................. 25
Production Considerations ................. 26
Technical Implementation
System Architecture ........................ 27
Agent Specifications ....................... 28
Code Examples .............................. 29
Best Practices .............................. 30
Results & Conclusions
Key Achievements .......................... 72
References ................................... 73
Thank You .................................... 74
Building a single, monolithic agent is inefficient for combining real-time data, complex logic, and personalization. The system becomes hard to maintain, scale, and test.
A microservices architecture where specialized agents collaborate via the Agent-to-Agent (A2A) Protocol. This allows for decoupling, independent scaling, and clear separation of concerns.
| Pillar | Achievement | Course Day |
|---|---|---|
| Architecture | Decoupled 5 agents for independent scaling via A2A | Day 5 |
| Agent Quality | Integrated dedicated Safety Agent and Monitoring Alerts | Day 4 |
| Context | Implemented persistent Memory for user personalization | Day 3 |
| Performance | Achieved optimization via Smart Caching and Observability | Day 2 & 4 |
Part I
(Day 1)
| Agent | Model | Core Responsibility | Communication |
|---|---|---|---|
| Coach Agent | Gemini 2.0 Flash Exp | User I/O, Orchestration, Memory, Final Response | User / A2A Client |
| Weather Agent | Gemini 2.0 Flash Exp | Fetches & Caches Weather Data (structured WeatherData) | A2A Service |
| Stylist Agent | Gemini 2.0 Flash Exp | Generates OutfitPlan based on logic & preferences | A2A Service |
| Activity Agent | Gemini 2.0 Flash Exp | Classifies user intent (work, sports, formal, casual) | A2A Service |
| Safety Agent | Gemini 2.0 Flash Exp | Monitors extreme cold/heat/wind/storm risks | A2A Service |
Key Concept (Day 1): Level 3 Collaborative Multi-Agent System - "Team of Specialists"
Agentic Problem-Solving Process: Think → Act → Observe → Iterate (managed by Coach orchestration layer)
Role: The Central Nervous System - Main user interface, coordination, personalization
gemini-2.0-flash-expget_user_preferencesupdate_user_preferencesget_weather_smartclassify_activityplan_outfitcheck_safetyYou are the Weather Outfit Coach - the main AI assistant. Help users decide what to wear, provide personalized recommendations, and consider their activities with safety warnings.
In monolithic systems, a privileged agent can be tricked into performing unauthorized actions. This risk is amplified in agents because tool execution is a trusted action.
The A2A architecture assigns restricted Agent Identities to every service, following the Principle of Least Privilege.
| Agent | Permissions |
|---|---|
| Coach Agent | Only orchestration and memory permissions |
| Weather Agent | Only external API key access (least privilege) |
| Stylist Agent | Only local outfit logic access |
Role: Data Provider & Caching - Isolates external API calls and manages performance
get_current_weatherget_hourly_forecastget_weather_smartget_weather_smart for efficiency (caching)Role: Recommendation Engine
plan_outfit toolRole: Classifying User Intent
Role: Responsible AI Implementation - Ensures safety guidelines
| Condition | Threshold | Warning Action |
|---|---|---|
| Extreme Cold | Below 20°F | Requires warm layers, protection |
| Freezing | 32°F and below | Ice warning, careful movement |
| Extreme Heat | Above 95°F | Stay hydrated, seek shade |
| Strong Winds | Above 25 mph | Avoid large umbrellas |
| Heavy Rain/Storms | Above 70% chance | Suggests seeking shelter |
Key Part of Day 4 Agent Quality Principles: Safety as a deterministic, programmatic guardrail
Part II
(Day 2 & 3)
Tools embody discrete, testable business logic and integrate core optimizations like caching.
| Tool | Functionality & Benefit | Day 2 Principle |
|---|---|---|
get_weather_smart |
Checks in-memory cache (15-min TTL) before calling external API | Cost & Latency Reduction |
plan_outfit |
Implements complex layering logic based on Persona and Activity | Tool-Tool Composition |
check_safety |
Flags specific risks based on hard thresholds | Safety as a Tool (Enforcement) |
classify_activity |
Translates free-form text into structured constraints | Structured Output Design |
The get_weather_smart tool uses an in-memory dictionary cache keyed by city and time_window. Data is only considered valid for 15 minutes (900 seconds).
weather_tools.py), transparent to LLMProduction Evidence: Trace logs show Weather Agent calls completing in 125ms (cache hit) vs 800ms+ (cache miss)
The success of the Stylist Agent is entirely dependent on receiving clean, unambiguous Structured Content from the Weather Agent. If data is unreliable or malformed, the Stylist will hallucinate or fail.
Enforce reliability by making the Weather Agent deterministic with strict schema adherence.
temperature, rain_chance, wind_speedDesign Principle: "Design for Concise Output" (Day 2: Page 19)
The Activity Agent translates free-form text into structured constraints used by the Stylist Agent's rule engine.
| Constraint | Value | Usage by Stylist Agent |
|---|---|---|
category |
sports |
Triggers athletic apparel suggestions |
formality_level |
casual |
Avoids suggesting formal wear/shoes |
movement_level |
high |
Prioritizes flexible, moisture-wicking layers |
Short-Term Workbench
Example Data:
Permanent Preferences
UserMemory classExample Data:
practicalruns_coldSeattleThe Coach Agent reads the Persona from memory and injects persona-specific instructions into the Stylist Agent's prompt or final output formatting.
| Persona | Coach Instruction | Stylist Instruction |
|---|---|---|
| Practical | Focus on function and essentials | Focus on function, comfort, and simplicity |
| Fashion | Add style tips and coordination advice | Add style tips, color coordination, and trends |
| Kid-Friendly | Use fun, simple language | Use fun language and prioritize safety |
Implementation: UserMemory class abstracts storage (in-memory dictionary) with clean API for Coach Agent tools
Part III
(Day 4)
| Pillar | Description | Implementation |
|---|---|---|
| 1. Logs (The Diary) | Structured records of every tool call, decision, error | loging_confi.py → Google Cloud Logging |
| 2. Metrics (Health Report) | Quantifiable data on latency, error rates, throughput | metrics.py → Google Cloud Monitoring |
| 3. Traces (The Narrative) | End-to-end flow of single request across all 5 A2A agents | OpenTelemetry → Google Cloud Trace |
| 4. Alerts (The Act Phase) | Programmatic alerts on metric thresholds | alert.py → Google Cloud Monitoring |
Custom logging configuration (loging_confi.py) outputs structured JSON logs, making them easily searchable in Google Cloud Logging.
The MetricsCollector instruments key components using Python decorators for automatic measurement.
agent_call_latency - Duration of agent executionsagent_calls (by status: success/error)tool_execution_latency - Duration of tool callstool_calls (by status: success/error)Alerts provide automated reflexes, maintaining stability in real-time. Programmatically defined using alert.py.
| Alert Type | Condition | Purpose |
|---|---|---|
| High Error Rate | > 5 errors/min for 5 min | Detect service degradation early |
| High Latency | P95 latency > 2000ms for 3 min | Catch slow API calls, resource constraints |
| Low Success Rate | < 10 successes/min for 10 min | Triggers on significant degradation or total failure |
Why P95 Latency? Monitors outlier events that degrade user experience, not just averages
Scenario: "What should I wear for hiking in Seattle today? I run cold."
| Span Name | Agent/Tool | Status | Duration | Insight |
|---|---|---|---|---|
| coach_agent.run_query | Coach | ✅ Success | 412ms | Orchestrates entire request |
| └ classify_activity | Activity Tool | ✅ Success | 45ms | "hiking" → sports/high_movement |
| └ get_user_preferences | Memory Tool | ✅ Success | 12ms | Loaded comfort: runs_cold |
| └ call_weather_agent | Weather (A2A) | ✅ Success | 125ms | CACHE HIT - Fetched forecast |
| └ call_stylist_agent | Stylist (A2A) | ✅ Success | 180ms | Generated OutfitPlan with all context |
| └ check_safety | Safety Tool | ✅ Success | 30ms | No warnings required |
The call_weather_agent span completed in only 125ms.
This low duration confirms the Smart Caching optimization was effective, resulting in a cache hit. This directly reduces dependency on the external Meteostat API and improves user-facing latency.
| Metric | Observed Value | Target vs. Actual |
|---|---|---|
| Requests/Second (Traffic) | 0.4/s Peak | Validating system load under concurrent requests |
| Median Latency (p50) | 80 ms | Exceeds target - Proves caching efficiency |
| 95th Percentile (p95) | 126 ms | Shows worst-case UX is still fast |
Part IV
(Day 5)
Goal: Move from local development to scalable, production-ready microservices on Google Cloud.
weather-agent-abc.a.run.app)While Cloud Run offers general flexibility for stateless containers, Agent Engine was chosen for built-in enterprise features required for stateful, complex agents.
| Advantage | Benefit |
|---|---|
| Managed State | Durable, highly available Session Storage crucial for persistence |
| Built-in Observability | Automatically integrates with Cloud Trace/Logging for audit trail |
| Simplified Deployment | Offloads complex plumbing, focus on core agent logic |
The system runs as a fully integrated ADK Multi-Agent System with 6 services:
| Service | Port | Description |
|---|---|---|
| Flask Frontend | 5000 | User interface with Tailwind CSS |
| Coach Agent | 8000 | Main orchestrator using A2A protocol |
| Weather Agent | 8001 | Weather data fetching and caching |
| Stylist Agent | 8002 | Outfit recommendation engine |
| Activity Agent | 8003 | Activity classification |
| Safety Agent | 8004 | Extreme weather alerts |
Local Testing: Docker Compose for multi-service testing
Verified all 5 system components (Tools, Agents, App, Schemas, Memory) function independently before deployment.
Result: 5/5 tests passed - System fully functional and ready for A2A integration
User logs show spike in error calls to external weather API
High Error Rate Alert triggers → MLOps team scales down to reduce load on failing API
Team creates test cases, strengthens caching/retry logic, deploys via CI/CD with Safe Rollout (Canary)
Production Mindset: Continuous monitoring enables proactive issue resolution before user impact
Beautiful, modern chat interface for the Weather Outfit Assistant
POST /api/chat - Sends message to Coach agent (proxied to ADK Agent Engine)GET /health - Health check endpointQuick action buttons and outfit suggestions change based on which city you search for.
| Location | Quick Actions | Context |
|---|---|---|
| Seattle | "Good for hiking?", "Rain gear needed?" | Pacific Northwest - Hiking & Rain |
| Denver | "Mountain hiking?", "Cold weather gear?" | Mountains & Snow |
| Miami | "Beach ready?", "Pool party?" | Beach & Hot Weather |
Deploy All Agents (MVP)
Deploy A2A Services
Deploy to Cloud Run
Post-Deployment: Test Coach Agent endpoint using standard curl command
Default alerts (e.g., > 5 errors/min) are rate-based. They don't distinguish between 5 errors in 10 requests (50% error rate) and 5 errors in 10,000 requests (0.05% error rate).
For true production monitoring, use MQL (Monitoring Query Language) to calculate ratios. This provides accurate percentage-based alerting regardless of traffic volume.
| Status | Component | Confirmation |
|---|---|---|
| ✅ | A2A Architecture | 5 specialized agents communicating via A2A protocol |
| ✅ | Observability (Day 4) | Logs, Metrics, Tracing, and Alerts fully implemented |
| ✅ | Context/Memory (Day 3) | Personalization (Persona, Comfort) loaded from memory |
| ✅ | Performance (Day 2) | Smart Caching operational (proven by low trace latency) |
| ✅ | Testing (Day 5) | All 5/5 component tests pass. Full orchestration verified |
| ✅ | Deployment Ready | Designed for and deployable to Vertex AI Agent Engine |
Production URL: https://agentengine-689252953158.us-central1.run.app/
Part V
(Appendices A-P)
| Level | Description | Example |
|---|---|---|
| Level 1 | Connected Problem Solver | Single agent with weather API tool |
| Level 2 | Agent with Specialized Tools | Agent with multiple domain tools |
| Level 3 | Collaborative Multi-Agent System | This Project: 5 specialized agents |
Core Metaphor (Day 1):
weather_outfit_adk/
├── app.py # Main ADK app entry point
├── agents/
│ ├── coach.py # Coach orchestrator agent
│ ├── weather.py # Weather specialist agent
│ ├── stylist.py # Stylist recommendation agent
│ ├── activity.py # Activity classification agent
│ └── safety.py # Safety warning agent
├── tools/
│ ├── weather_tools.py # Weather API + caching
│ ├── outfit_tools.py # Outfit planning logic
│ ├── activity_tools.py # Activity classification
│ ├── safety_tools.py # Safety threshold rules
│ └── memory_tools.py # User preference tools
├── schemas/
│ ├── memory.py # UserPreferences, Persona types
│ ├── weather.py # WeatherData schema
│ ├── outfit.py # OutfitPlan schema
│ └── activity.py # Activity constraints schema
├── memory/
│ └── user_memory.py # UserMemory class (storage abstraction)
└── config/
├── loging_confi.py # Structured logging setup
├── metrics.py # MetricsCollector class
└── alert.py # AlertPolicyManager class
# coach.py
from google.adk.agents import Agent
from ..tools.weather_tools import get_weather_smart
from ..tools.activity_tools import classify_activity
from ..tools.outfit_tools import plan_outfit
from ..tools.safety_tools import check_safety
from ..tools.memory_tools import get_user_preferences, update_user_preferences
coach_agent = Agent(
name="coach_agent",
model="gemini-2.0-flash-exp",
instruction="""You are the Weather Outfit Coach...
Workflow:
1. Get user preferences to personalize response
2. Extract city from query
3. If activity mentioned, classify using classify_activity
4. Get weather using get_weather_smart
5. Plan outfit using plan_outfit
6. Check safety using check_safety
7. Combine into friendly, personalized response
""",
tools=[
get_user_preferences, update_user_preferences,
get_weather_smart, classify_activity,
plan_outfit, check_safety
]
)
# weather.py
from google.adk.agents import Agent
from ..tools.weather_tools import (
get_current_weather,
get_hourly_forecast,
get_weather_smart
)
weather_agent = Agent(
name="weather_agent",
model="gemini-2.0-flash-exp",
instruction="""You are a weather specialist agent.
Your role:
- Always call weather tools before providing info
- Never guess or use training data
- Return clean, structured forecast data
- Focus only on weather, not clothing
Rules:
- Use get_weather_smart for efficiency (caches results)
""",
tools=[get_current_weather, get_hourly_forecast, get_weather_smart]
)
# stylist.py
from google.adk.agents import Agent
from ..tools.outfit_tools import plan_outfit
stylist_agent = Agent(
name="stylist_agent",
model="gemini-2.0-flash-exp",
instruction="""You are a clothing and style advisor agent.
Your role:
- Take structured weather data and user preferences
- Use plan_outfit tool to generate recommendations
- Provide clear, practical advice with layers/accessories
- Never call weather APIs yourself
Style approaches based on persona:
- Practical: Focus on function, comfort, simplicity
- Fashion: Add style tips, color coordination, trends
- Kid-friendly: Use fun language, prioritize safety
""",
tools=[plan_outfit]
)
# activity.py
activity_agent = Agent(
name="activity_agent",
model="gemini-2.0-flash-exp",
instruction="""Classify user activity into:
- Work: Office, meetings (business_casual, low movement)
- Sports: Hiking, biking (casual, high movement)
- Formal: Dates, dinners (formal, low movement)
- Casual: Walking, shopping (casual, medium movement)
""",
tools=[classify_activity]
)
# safety.py
safety_agent = Agent(
name="safety_agent",
model="gemini-2.0-flash-exp",
instruction="""Review weather for risks.
Safety thresholds:
- Extreme cold: Below 20°F
- Extreme heat: Above 95°F
- Strong winds: Above 25 mph
- Heavy rain/storms: Above 70% chance
""",
tools=[check_safety]
)
# user_memory.py
from typing import Dict, Optional
from ..schemas.memory import UserPreferences, PersonaType, ComfortProfile
class UserMemory:
"""Manages long-term user preferences and profile data."""
def __init__(self):
self._memory_store: Dict[str, UserPreferences] = {}
def get_preferences(self, user_id: str) -> UserPreferences:
"""Retrieve user preferences from memory."""
if user_id not in self._memory_store:
self._memory_store[user_id] = UserPreferences()
return self._memory_store[user_id]
def update_preferences(
self, user_id: str,
persona: Optional[PersonaType] = None,
comfort_profile: Optional[ComfortProfile] = None,
default_city: Optional[str] = None
) -> UserPreferences:
current_prefs = self.get_preferences(user_id)
if persona: current_prefs.persona = persona
if comfort_profile: current_prefs.comfort_profile = comfort_profile
if default_city: current_prefs.default_city = default_city
return current_prefs
# metrics.py
from contextlib import contextmanager
import time
class MetricsCollector:
@contextmanager
def measure_time(self, metric_name: str, labels=None):
"""Context manager to measure execution time"""
start_time = time.time()
try:
yield
finally:
duration_ms = (time.time() - start_time) * 1000
self.record_latency(metric_name, duration_ms, labels)
def increment_counter(self, metric_name: str, value=1, labels=None):
# Increments in-memory counter
if self.enabled:
self._write_custom_metric(...)
def record_latency(self, metric_name: str, duration_ms: float, labels=None):
# Records in-memory timer
if self.enabled:
self._write_custom_metric(...)
# Global instance
agent_metrics = MetricsCollector()
# loging_confi.py
import logging, sys
from google.cloud import logging as cloud_logging
def setup_logging(service_name: str, level="INFO",
enable_cloud_logging=True) -> logging.Logger:
logger = logging.getLogger(service_name)
logger.setLevel(getattr(logging, level.upper()))
# Structured JSON format
formatter = logging.Formatter(
'{"time": "%(asctime)s", "level": "%(levelname)s", '
'"service": "' + service_name + '", "message": "%(message)s"}',
datefmt='%Y-%m-%dT%H:%M:%S'
)
console_handler = logging.StreamHandler(sys.stdout)
console_handler.setFormatter(formatter)
logger.addHandler(console_handler)
# Cloud Logging integration
if enable_cloud_logging:
client = cloud_logging.Client(project=project_id)
cloud_handler = client.get_default_handler()
logger.addHandler(cloud_handler)
return logger
✅ All tools imported successfully [WARNING] No RAPIDAPI_KEY found - using mock weather data ✅ Weather: 65.0°F, partly cloudy ✅ Activity: sports, formality=casual ✅ Outfit: long-sleeve shirt or light sweater, no jacket ✅ Safety: high risk ✅ Memory: persona=practical, comfort=neutral ✅ Updated: persona=fashion, city=Seattle ✅ ADK Agent class imported ✅ All agents imported successfully - Coach has 6 tools ✅ Main app.py imported successfully ✅ ADK app name: app ✅ All schemas imported ✅ WeatherData: 65.0°F ✅ UserPreferences: practical ✅ UserMemory instance created ✅ Preferences stored ✅ Preferences retrieved: Portland ✅ Multiple users supported ✅ All 5/5 tests passed ✅ All tools functional ✅ All agents operational ✅ Main app ready ✅ Schemas validated ✅ Memory system integrated
| Feature | Implementation |
|---|---|
| Clean Modern UI | Material Design with Tailwind CSS |
| Real-time Chat | WebSocket connection to Coach Agent |
| Quick Prompts | Location-aware quick action buttons |
| Session Management | Persistent conversations with Agent Engine |
| Outfit Icons | Visual icons for each clothing item |
API Endpoints:
POST /api/chat - Send message to Coach agent
GET /health - Health check endpoint
| Test | Action | Expected Result |
|---|---|---|
| Seattle | Type "Seattle" | Buttons: "Good for hiking?", "Rain gear needed?" |
| Denver | Type "Denver" | Buttons: "Mountain hiking?", "Cold weather gear?" |
| Miami | Type "Miami" | Buttons: "Beach ready?", "Pool party?" |
| Method | Description | Use Case |
|---|---|---|
| Method 1: All Agents (MVP) | Deploy entire system as single ADK app | Simplest deployment, all agents scale together |
| Method 2: A2A Services | Deploy each agent as independent service | True microservices, independent scaling |
| Method 3: Cloud Run | Deploy to Cloud Run containers | More infrastructure control, custom configs |
Post-Deployment Test:
curl -X POST https://YOUR-ENDPOINT/api/chat \
-H "Content-Type: application/json" \
-d '{"message": "What should I wear in Seattle today?"}'
Default alerts (e.g., > 5 errors/min) are rate-based. They don't distinguish between 5 errors in 10 requests (50% error rate) vs. 5 errors in 10,000 requests (0.05%).
-- MQL for Error Rate > 5%
fetch global
| { metric 'custom.googleapis.com/agent/agent_calls'
| filter metric.status == 'error'
| align rate(1m)
| group_by [], [value_error: sum(value.agent_calls)]
;
metric 'custom.googleapis.com/agent/agent_calls'
| filter metric.status == 'success'
| align rate(1m)
| group_by [], [value_success: sum(value.agent_calls)]
}
| join
| value [error_rate: cast_double(val(0)) /
(cast_double(val(0)) + cast_double(val(1)))]
| condition error_rate > 0.05
Benefit: Accurate percentage-based alerting regardless of traffic volume
| Service | Port | Role |
|---|---|---|
| Flask Frontend | 5000 | User interface with Tailwind CSS |
| Coach Agent | 8000 | Main orchestrator using A2A protocol |
| Weather Agent | 8001 | Weather data fetching and caching |
| Stylist Agent | 8002 | Outfit recommendation engine |
| Activity Agent | 8003 | Activity classification |
| Safety Agent | 8004 | Extreme weather alerts |
Deployment: Docker Compose for local testing, Vertex AI Agent Engine for production
| Principle | Implementation | Example |
|---|---|---|
| Clear Names | Descriptive function names, not generic | `get_day_summary` not `fetch_data` |
| Represent Tasks | Tools embody business logic, not raw APIs | `plan_outfit` encapsulates outfit rules |
| Concise Output | Return only essential fields | `temp`, `rain_chance`, `wind_speed` only |
| Natural Language Docs | Clear docstrings for LLM understanding | """Get weather forecast for city...""" |
| Short Parameters | Minimal, focused parameter lists | `city, datetime` not 10+ parameters |
Optional Built-in Tools: Code execution (numerical conversions), URL context (severe weather alerts)
| Phase | Action | Example |
|---|---|---|
| 1. Fetch | Retrieve relevant memories | `get_user_preferences(user_id)` → persona, comfort |
| 2. Prepare | Mix memories into context | Inject persona into Stylist instructions |
| 3. Invoke | Execute agent with enriched context | Coach calls Stylist with persona-aware prompt |
| 4. Upload | Update memory if user provides new info | `update_user_preferences` if user says "I run cold" |
Privacy Note: Redact sensitive details (exact location, schedule) in stored sessions. Keep long-term memory as high-level facts only.
| Aspect | Session (Short-Term) | Memory (Long-Term) |
|---|---|---|
| Scope | Single conversation | Across all conversations |
| Lifetime | Minutes to hours | Days, weeks, months |
| Storage | Agent Engine Sessions | UserMemory class / Database |
| Content | City, time window, last forecast | Persona, comfort profile, default city |
| Update Frequency | Every turn in conversation | Only when user shares new preference |
| Multi-Agent | Separate per agent (A2A pattern) | Shared across all agents |
Key Design Choice: Separate A2A histories prevent memory pollution and unexpected side effects
Step 1: User → Coach Agent
Input: "What should I wear for a walk at 7pm in Redmond?"
User ID: user_123 (no preferences stored yet)
Step 2: Coach → Memory Tool
Tool Call: get_preferences_for_user(user_123)
Response: {} (empty - no preferences)
Step 3: Coach → User (Followup)
"To personalize: Do you prefer practical, fashion, or kid-friendly style?"
Step 4: User → Coach
"Practical please"
Step 5: Coach → Memory Tool
Tool Call: set_preferences_for_user(user_123, "practical", "neutral")
Response: {persona: "practical", comfort_profile: "neutral"}
Step 6: Coach → Activity Agent (A2A)
Message: {activity_text: "walk"}
Response: {category: "casual", formality: "casual", movement: "medium"}
Step 7: Coach → Weather Agent (A2A)
Message: {city: "Redmond", time_window: "evening"}
Response: {temp_c: 12.0, rain_chance: 0.6, wind_speed: 15.0, ...}
Source: cache (CACHE HIT - 125ms)
Step 8: Coach → Stylist Agent (A2A)
Message: {weather_context, activity, persona: "practical"}
Response: {top_outer: "light jacket", bottom: "jeans", ...}
Step 9: Coach → Safety Agent (A2A)
Message: {weather_context}
Response: {risk_level: "medium", safety_note: "High chance of rain..."}
Step 10: Coach → User
"For your evening walk in Redmond: Light jacket, jeans, waterproof
shoes recommended. 60% chance of rain - bring an umbrella!"
| Pillar | Weather Outfit Implementation | Metric/Test |
|---|---|---|
| Effectiveness | Suggestion matches actual forecast | User satisfaction, accuracy checks |
| Efficiency | Smart caching, minimal tool calls | Tokens/interaction, latency (80ms p50) |
| Robustness | Graceful degradation on API failure | Error rate tracking, fallback behavior |
| Safety | Dedicated Safety Agent, no medical advice | Warning coverage, content filters |
Trajectory is Truth: Observability captures the entire decision path, not just final output
❌ App import failed: 'Flask' object has no attribute 'root_agent'
Test script `test_full_system_with_metrics.py` imported the Flask frontend `app.py` instead of the ADK backend `app.py`. Flask app has no `.root_agent` attribute.
Solution: Renamed entry points and corrected import path to point to ADK `App` object
Final Status: All 5/5 tests passed after fixing import path
| Agent | Current Framework | Could Be Replaced With |
|---|---|---|
| Weather Agent | Python ADK | Java legacy service, external vendor API |
| Stylist Agent | Python ADK + Gemini | LangChain + OpenAI, custom ML model |
| Safety Agent | Python ADK | Rule engine service (no LLM needed) |
Key Requirement: Each agent must adhere to A2A Protocol Contract (The Agent Card)
Benefit: Swap implementations without touching other agents
| Status | Component | Confirmation |
|---|---|---|
| ✅ | A2A Architecture (Day 5) | 5 specialized agents communicating via A2A protocol |
| ✅ | Observability (Day 4) | Logs, Metrics, Traces, Alerts fully operational |
| ✅ | Context/Memory (Day 3) | Persona & Comfort personalization from memory |
| ✅ | Performance (Day 2) | Smart Caching (15min TTL, cache hits @ 125ms) |
| ✅ | Agent Fundamentals (Day 1) | Level 3 Multi-Agent System implemented |
| ✅ | Testing | All 5/5 component tests pass, full orchestration verified |
| ✅ | Deployment | Deployed to Vertex AI Agent Engine |
| ✅ | Frontend | Modern chat UI with location-aware features |
Production URL: https://agentengine-689252953158.us-central1.run.app/
| Step | Action | Example (Customer Support) |
|---|---|---|
| 1. Get the Mission | Receive high-level goal from user or trigger | "Where is my order #12345?" |
| 2. Scan the Scene | Gather context from memory, tools, user input | Check what tools are available, user history |
| 3. Think It Through | Devise multi-step plan using reasoning model | "Find order → Get tracking → Report status" |
| 4. Take Action | Execute first step by invoking appropriate tool | Call `find_order("12345")` → Get tracking # |
| 5. Observe & Iterate | Observe result, add to context, return to Step 3 | Got "Out for Delivery" → Synthesize response |
Key Insight: This "Think → Act → Observe" cycle continues until the Mission is achieved
Level 4 agents can identify gaps in their capabilities and dynamically create new tools or agents to fill them.
Scenario: Solaris Headphones Launch Project Manager Agent realizes it needs social media sentiment tracking, but no such tool exists. Step 1: Meta-Reasoning Think: "I must track social media buzz for 'Solaris,' but I lack the capability." Step 2: Autonomous Creation Act: Invoke AgentCreator tool with mission: "Build agent that monitors social media for 'Solaris headphones', performs sentiment analysis, and reports daily summary." Step 3: Observe & Deploy Observe: New SentimentAnalysisAgent created, tested, and added to team. Result: Agent dynamically expanded its own capabilities mid-task.
Evolution: From using fixed resources to actively expanding them
| Type | Purpose | Examples |
|---|---|---|
| Information Retrieval | Fetch data from various sources | Web search, database queries, RAG, NL2SQL |
| Action / Execution | Perform real-world operations | Send emails, post messages, code execution, device control |
| System / API Integration | Connect with existing systems | Google Workspace, enterprise APIs, third-party services |
| Human-in-the-Loop | Facilitate human collaboration | Ask clarification, seek approval, hand off for judgment |
| Concept | Session (Workbench) | Memory (Filing Cabinet) |
|---|---|---|
| Purpose | Temporary workspace for current task | Organized long-term knowledge storage |
| Contents | Tools, notes, rough drafts, in-progress work | Only critical, finalized documents in labeled folders |
| Accessibility | Everything immediately accessible | Clean, efficient retrieval system |
| After Task | Review, discard redundant items | File only key information for future use |
| State | Temporary, specific to one project | Persistent, available across all projects |
Key Insight: Session = messy but necessary workspace. Memory = curated knowledge base.
| Level | Question | What We Measure |
|---|---|---|
| 1. Black Box (End-to-End) | Did agent achieve user's goal? | Task success rate, user satisfaction, overall quality |
| 2. Glass Box (Trajectory) | Why did it succeed/fail? | LLM planning quality, tool usage, parameter correctness |
| 3. Component-Level | Which component failed? | Individual tool performance, API reliability, context quality |
| Phase | Action | Output |
|---|---|---|
| 1. Capture | Collect logs, traces, metrics from production | Complete trajectory data (Think → Act → Observe) |
| 2. Evaluate | Run automated judges (LLM-as-Judge, metrics) | Quality scores across 4 pillars (Effectiveness, Efficiency, Robustness, Safety) |
| 3. Review | Human-in-the-loop for edge cases | Validated test cases, corrected labels |
| 4. Improve | Update prompts, tools, context engineering | New agent version deployed |
| 5. Monitor | A/B test new version vs baseline | Quantified improvement metrics → Return to Capture |
Result: Each iteration improves agent quality systematically, using data not guesswork
How do you measure the "accuracy" of a generated paragraph? Traditional ML metrics (precision, recall, F1) don't apply to open-ended agent outputs.
| Evaluator Type | Pros | Cons |
|---|---|---|
| Human Reviewers | Gold standard, nuanced judgment | Expensive, slow, doesn't scale |
| Automated Metrics | Fast, cheap, scales infinitely | Can't judge quality, only format/presence |
| LLM-as-a-Judge | Scales well, understands context, nuanced | Requires careful prompt engineering, can have bias |
Best Practice: Use hybrid approach:
| Pattern | How It Works | Best For |
|---|---|---|
| Shared, Unified History | All agents read/write to same conversation log | Tightly coupled collaborative tasks (problem-solving pipeline) |
| Separate, Individual Histories | Each agent maintains private log, communicates via messages | Loosely coupled systems, microservices architecture |
| Agent-as-a-Tool | One agent invokes another as tool, receives final output only | Delegation with black-box sub-agents |
| A2A Protocol | Structured messaging between agents using standard protocol | Cross-framework interoperability, enterprise systems |
Weather Outfit Advisor: Uses separate histories + A2A Protocol for clean agent boundaries and framework agnosticism
| Requirement | Solution | Implementation |
|---|---|---|
| Security & Privacy | Strict user isolation, PII redaction | ACLs per session, authenticated access only |
| Data Integrity | Persistent storage, backup/recovery | Managed database (Agent Engine Sessions), not in-memory |
| Performance | Fast retrieval, context window management | Indexed queries, conversation summarization |
| Context Rot Prevention | Dynamic history compaction | Summarization, selective pruning, token management |
Context Rot: As context grows, cost/latency increase AND model's attention to critical info diminishes
Solution: Dynamically mutate history via summarization or pruning
Blount, A., Gulli, A., Saboo, S., Zimmermann, M., & Vuskovic, V. (2025, November). Introduction to agents [Whitepaper]. Kaggle. https://www.kaggle.com/whitepaper-introduction-to-agents
Kartakis, S., Hernandez Larios, G., Li, R., Secchi, E., & Xia, H. (2025, November). Prototype to production [Whitepaper]. Kaggle. https://www.kaggle.com/whitepaper-prototype-to-production
Milam, K., & Gulli, A. (2025, November). Context engineering sessions and memory [Whitepaper]. Kaggle. https://www.kaggle.com/whitepaper-context-engineering-sessions-and-memory
Styer, M., Patlolla, K., Mohan, M., & Diaz, S. (2025, November). Agent tools and interoperability with MCP [Whitepaper]. Kaggle. https://www.kaggle.com/whitepaper-agent-tools-and-interoperability-with-mcp
Subasioglu, M., Bulmus, T., & Bakkali, W. (2025, November). Agent quality [Whitepaper]. Kaggle. https://www.kaggle.com/whitepaper-agent-quality
Note: All references are from the Kaggle "5 Days of AI - Agents" course materials, which provided the foundational knowledge and practical implementation guidance for this capstone project.
The Weather Outfit Advisor
A Multi-Agent System Leveraging ADK, A2A Protocol,
and Vertex AI Agent Engine
Questions & Discussion
GitHub: https://github.com/tabitha-dev
LinkedIn: https://www.linkedin.com/in/tabitha-dev/
Production URL:
https://agentengine-frontend-689252953158.us-central1.run.app/