How to Build Human-in-the-Loop Plan-and-Execute AI Agents with Explicit User Approval Using LangGraph and Streamlit

In this tutorial, we build a human-in-the-loop travel booking agent that treats the user as a teammate rather than a passive observer. We design the system so the agent first reasons openly by drafting a structured travel plan, then deliberately pauses before taking any action. We expose this proposed plan in a live interface where we can inspect, edit, or reject it, and only after explicit approval do we allow the agent to execute tools. By combining LangGraph interrupts with a Streamlit frontend, we create a workflow that makes agent reasoning visible, controllable, and trustworthy instead of opaque and autonomous.

!pip -q install -U langgraph openai streamlit pydantic !npm -q install -g localtunnel   import os, getpass, textwrap, json, uuid, time if not os.environ.get("OPENAI_API_KEY"):    os.environ["OPENAI_API_KEY"] = getpass.getpass("OPENAI_API_KEY (hidden input): ") os.environ.setdefault("OPENAI_MODEL", "gpt-4.1-mini")

We set up the execution environment by installing all required libraries and utilities needed for agent orchestration and UI exposure. We securely collect the OpenAI API key at runtime so it is never hardcoded or leaked in the notebook. We also configure the model selection upfront to keep the rest of the pipeline clean and reproducible.

app_code = r''' import os, json, uuid import streamlit as st from typing import TypedDict, List, Dict, Any, Optional from pydantic import BaseModel, Field from openai import OpenAI   from langgraph.graph import StateGraph, START, END from langgraph.types import Command, interrupt from langgraph.checkpoint.memory import InMemorySaver     def tool_search_flights(origin: str, destination: str, depart_date: str, return_date: str, budget_usd: int) -> Dict[str, Any]:    options = [        {"airline": "SkyJet", "route": f"{origin}->{destination}", "depart": depart_date, "return": return_date, "price_usd": int(budget_usd*0.55)},        {"airline": "AeroBlue", "route": f"{origin}->{destination}", "depart": depart_date, "return": return_date, "price_usd": int(budget_usd*0.70)},        {"airline": "Nimbus Air", "route": f"{origin}->{destination}", "depart": depart_date, "return": return_date, "price_usd": int(budget_usd*0.62)},    ]    options = sorted(options, key=lambda x: x["price_usd"])    return {"tool": "search_flights", "top_options": options[:2]}   def tool_search_hotels(city: str, nights: int, budget_usd: int, preferences: List[str]) -> Dict[str, Any]:    base = max(60, int(budget_usd / max(nights, 1)))    picks = [        {"name": "Central Boutique", "city": city, "nightly_usd": int(base*0.95), "notes": ["walkable", "great reviews"]},        {"name": "Riverside Stay", "city": city, "nightly_usd": int(base*0.80), "notes": ["quiet", "good value"]},        {"name": "Modern Loft Hotel", "city": city, "nightly_usd": int(base*1.10), "notes": ["new", "gym"]},    ]    if "luxury" in [p.lower() for p in preferences]:        picks = sorted(picks, key=lambda x: -x["nightly_usd"])    else:        picks = sorted(picks, key=lambda x: x["nightly_usd"])    return {"tool": "search_hotels", "top_options": picks[:2]}   def tool_build_day_by_day(city: str, days: int, vibe: str) -> Dict[str, Any]:    blocks = []    for d in range(1, days+1):        blocks.append({            "day": d,            "morning": f"{city}: coffee + a must-see landmark",            "afternoon": f"{city}: {vibe} activity + local lunch",            "evening": f"{city}: sunset spot + dinner + optional night walk"        })    return {"tool": "draft_itinerary", "days": blocks} '''

We define the Streamlit application core and implement safe, deterministic tool functions that simulate flights, hotels, and itinerary generation. We design these tools to behave like real-world APIs while still running fully in a Colab environment. We ensure all tool outputs are structured so they can be audited before execution.

app_code += r''' class TravelPlan(BaseModel):    trip_title: str = Field(..., description="Short human-friendly title")    origin: str    destination: str    depart_date: str    return_date: str    travelers: int = 1    budget_usd: int = 1500    preferences: List[str] = Field(default_factory=list)    vibe: str = "balanced"    lodging_nights: int = 4    daily_outline: List[Dict[str, Any]] = Field(default_factory=list)    tool_calls: List[Dict[str, Any]] = Field(default_factory=list)   class State(TypedDict):    user_request: str    plan: Dict[str, Any]    approval: Dict[str, Any]    execution: Dict[str, Any]   def make_llm_plan(state: State) -> Dict[str, Any]:    client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])    model = os.environ.get("OPENAI_MODEL", "gpt-4.1-mini")      sys = (        "You are a travel planning agent. "        "Return a JSON travel plan that matches the provided schema. "        "Be realistic, concise, and include a tool_calls list describing what you want executed "        "(e.g., search_flights, search_hotels, draft_itinerary)."    )      schema = TravelPlan.model_json_schema()      resp = client.responses.create(        model=model,        input=[            {"role":"system","content": sys},            {"role":"user","content": state["user_request"]},            {"role":"user","content": f"Schema (JSON): {json.dumps(schema)}"}        ],    )      text = resp.output_text.strip()    start = text.find("{")    end = text.rfind("}")    if start == -1 or end == -1:        raise ValueError("Model did not return JSON. Try again or change model.")    raw = text[start:end+1]    plan_obj = json.loads(raw)      plan = TravelPlan(**plan_obj).model_dump()      if not plan.get("tool_calls"):        plan["tool_calls"] = [            {"name":"search_flights", "args":{"origin": plan["origin"], "destination": plan["destination"], "depart_date": plan["depart_date"], "return_date": plan["return_date"], "budget_usd": plan["budget_usd"]}},            {"name":"search_hotels", "args":{"city": plan["destination"], "nights": plan["lodging_nights"], "budget_usd": int(plan["budget_usd"]*0.35), "preferences": plan["preferences"]}},            {"name":"draft_itinerary", "args":{"city": plan["destination"], "days": max(2, plan["lodging_nights"]+1), "vibe": plan["vibe"]}},        ]      return {"plan": plan}   def wait_for_approval(state: State) -> Dict[str, Any]:    payload = {        "kind": "approval",        "message": "Review/edit the plan. Approve to execute tools.",        "plan": state["plan"],    }    decision = interrupt(payload)    return {"approval": decision}   def execute_tools(state: State) -> Dict[str, Any]:    approval = state.get("approval") or {}    if not approval.get("approved"):        return {"execution": {"status": "not_executed", "reason": "User rejected or did not approve."}}      plan = approval.get("edited_plan") or state["plan"]    tool_calls = plan.get("tool_calls", [])      results = []    for call in tool_calls:        name = call.get("name")        args = call.get("args", {})        if name == "search_flights":            results.append(tool_search_flights(**args))        elif name == "search_hotels":            results.append(tool_search_hotels(**args))        elif name == "draft_itinerary":            results.append(tool_build_day_by_day(**args))        else:            results.append({"tool": name, "error": "Unknown tool (blocked for safety).", "args": args})      return {"execution": {"status": "executed", "tool_results": results, "final_plan": plan}} '''

We formalize the agent’s reasoning using a strict schema that requires the model to output an explicit travel plan rather than free-form text. We generate the plan using the OpenAI model and validate it before allowing it into the workflow. We also auto-inject tool calls if the model omits them to guarantee a complete execution path.

app_code += r''' def build_graph():    builder = StateGraph(State)    builder.add_node("plan", make_llm_plan)    builder.add_node("approve", wait_for_approval)    builder.add_node("execute", execute_tools)      builder.add_edge(START, "plan")    builder.add_edge("plan", "approve")    builder.add_edge("approve", "execute")    builder.add_edge("execute", END)      memory = InMemorySaver()    graph = builder.compile(checkpointer=memory)    return graph   st.set_page_config(page_title="Plan → Approve → Execute Travel Agent", layout="wide") st.title("Human-in-the-Loop Travel Booking Agent (Plan → Approve/Edit → Execute)")   with st.sidebar:    st.header("Runtime")    if st.button("New Session / Thread"):        st.session_state.thread_id = str(uuid.uuid4())        st.session_state.ran_once = False        st.session_state.interrupt_payload = None        st.session_state.last_execution = None   thread_id = st.session_state.get("thread_id") or str(uuid.uuid4()) st.session_state.thread_id = thread_id   graph = build_graph() config = {"configurable": {"thread_id": thread_id}}   st.caption(f"Thread ID: {thread_id}")   req = st.text_area(    "Describe your trip request",    value=st.session_state.get("user_request", "Plan a 5-day trip from Dubai to Istanbul in April. Budget $1800. Prefer museums, street food, and a relaxed pace."),    height=120 ) st.session_state.user_request = req   colA, colB = st.columns([1,1]) run_plan = colA.button("1) Generate Plan (LLM)") resume_btn = colB.button("2) Resume After Approval")   if run_plan:    st.session_state.ran_once = True    st.session_state.interrupt_payload = None    st.session_state.last_execution = None      initial = {"user_request": req, "plan": {}, "approval": {}, "execution": {}}    out = graph.invoke(initial, config=config)      if "__interrupt__" in out and out["__interrupt__"]:        st.session_state.interrupt_payload = out["__interrupt__"][0].value    else:        st.session_state.last_execution = out.get("execution")   payload = st.session_state.get("interrupt_payload")   if payload:    st.subheader("Plan proposed by agent (editable)")    plan = payload.get("plan", {})    left, right = st.columns([1,1])      with left:        st.write("**Edit JSON (advanced):**")        edited_text = st.text_area("Plan JSON", value=json.dumps(plan, indent=2), height=420)      with right:        st.write("**Quick actions:**")        approved = st.radio("Decision", options=["Approve", "Reject"], index=0)        st.write("Tip: If you edit JSON, keep it valid. You can also reject and re-run planning.")      try:        edited_plan = json.loads(edited_text)        json_ok = True    except Exception as e:        json_ok = False        st.error(f"Invalid JSON: {e}")      if resume_btn:        if not json_ok:            st.stop()          decision = {            "approved": (approved == "Approve"),            "edited_plan": edited_plan        }        out2 = graph.invoke(Command(resume=decision), config=config)        st.session_state.interrupt_payload = None        st.session_state.last_execution = out2.get("execution")   exec_result = st.session_state.get("last_execution") if exec_result:    st.subheader("Execution result")    st.json(exec_result)    if exec_result.get("status") == "executed":        st.success("Tools executed only AFTER approval ✅")    else:        st.warning("Not executed (rejected or not approved).") '''

We construct the LangGraph workflow by separating planning, approval, and execution into distinct nodes. We deliberately interrupt the graph after planning so we can review and control the agent’s intent. We only allow tool execution to proceed when explicit human approval is provided.

import pathlib pathlib.Path("app.py").write_text(app_code)   !streamlit run app.py --server.port 8501 --server.address 0.0.0.0 & sleep 2 !lt --port 8501

We connect the agent workflow to a live Streamlit interface that supports editing, approval, and rejection of plans. We persist the state across runs using a thread identifier so the agent behaves consistently across interactions. We finally launch the app and make it publicly available, enabling real human-in-the-loop collaboration.

In conclusion, we demonstrated how plan-and-execute agents become significantly more reliable when humans remain in the loop at the right moment. We showed that interrupts are not just a technical feature but a design primitive for building trust, accountability, and collaboration into agent systems. By separating planning from execution and inserting a clear approval boundary, we ensured that tools run only with human consent and context. This pattern scales beyond travel planning to any high-stakes automation, giving us agents that think with us rather than act for us.

Check out the Full Codes here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Michal Sutter

Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.

Michal Sutter

Leave a Reply Cancel reply

Related Posts

Building a Comprehensive AI Agent Evaluation Framework with Metrics, Reports, and Visual Dashboards

How to Build a Conversational Research AI Agent with LangGraph: Step Replay and Time-Travel Checkpoints

Augment Code Released Augment SWE-bench Verified Agent: An Open-Source Agent Combining Claude Sonnet 3.7 and OpenAI O1 to Excel in Complex Software Engineering Tasks