How to Build Memory-Driven AI Agents with Short-Term, Long-Term, and Episodic Memory

how-to-build-memory-driven-ai-agents-with-short-term,-long-term,-and-episodic-memory
How to Build Memory-Driven AI Agents with Short-Term, Long-Term, and Episodic Memory

In this tutorial, we build a memory-engineering layer for an AI agent that separates short-term working context from long-term vector memory and episodic traces. We implement semantic storage using embeddings and FAISS for fast similarity search, and we add episodic memory that captures what worked, what failed, and why, so the agent can reuse successful patterns rather than reinvent them. We also define practical policies for what gets stored (salience + novelty + pinned constraints), how retrieval is ranked (hybrid semantic + episodic with usage decay), and how short-term messages are consolidated into durable memories. Check out the Full Codes here.

import os, re, json, time, math, uuid from dataclasses import dataclass, asdict from typing import List, Dict, Any, Optional, Tuple from datetime import datetime   import sys, subprocess   def pip_install(pkgs: List[str]):    subprocess.check_call([sys.executable, "-m", "pip", "install", "-q"] + pkgs)   pip_install([    "sentence-transformers>=2.6.0",    "faiss-cpu>=1.8.0",    "numpy",    "pandas",    "scikit-learn" ])   import numpy as np import pandas as pd import faiss from sentence_transformers import SentenceTransformer from sklearn.preprocessing import minmax_scale   USE_OPENAI = False OPENAI_MODEL = os.environ.get("OPENAI_MODEL", "gpt-4o-mini")   try:    from getpass import getpass    if not os.getenv("OPENAI_API_KEY"):        k = getpass("Optional: Enter OPENAI_API_KEY for better LLM responses (press Enter to skip): ").strip()        if k:            os.environ["OPENAI_API_KEY"] = k      if os.getenv("OPENAI_API_KEY"):        pip_install(["openai>=1.40.0"])        from openai import OpenAI        client = OpenAI()        USE_OPENAI = True except Exception:    USE_OPENAI = False

We set up the execution environment and ensure all required libraries are available. We handle optional OpenAI integration while keeping the notebook fully runnable without any API keys. We establish the base imports and configuration that the rest of the memory system builds upon. Check out the Full Codes here.

@dataclass class ShortTermItem:    ts: str    role: str    content: str    meta: Dict[str, Any]   @dataclass class LongTermItem:    mem_id: str    ts: str    kind: str    text: str    tags: List[str]    salience: float    usage: int    meta: Dict[str, Any]   @dataclass class Episode:    ep_id: str    ts: str    task: str    constraints: Dict[str, Any]    plan: List[str]    actions: List[Dict[str, Any]]    result: str    outcome_score: float    lessons: List[str]    failure_modes: List[str]    tags: List[str]    meta: Dict[str, Any]   class VectorIndex:    def __init__(self, dim: int):        self.dim = dim        self.index = faiss.IndexFlatIP(dim)        self.id_map: List[str] = []        self._vectors = None      def add(self, ids: List[str], vectors: np.ndarray):        assert vectors.ndim == 2 and vectors.shape[1] == self.dim        self.index.add(vectors.astype(np.float32))        self.id_map.extend(ids)        if self._vectors is None:            self._vectors = vectors.astype(np.float32)        else:            self._vectors = np.vstack([self._vectors, vectors.astype(np.float32)])      def search(self, query_vec: np.ndarray, k: int = 6) -> List[Tuple[str, float]]:        if self.index.ntotal == 0:            return []        if query_vec.ndim == 1:            query_vec = query_vec[None, :]        D, I = self.index.search(query_vec.astype(np.float32), k)        hits = []        for idx, score in zip(I[0].tolist(), D[0].tolist()):            if idx == -1:                continue            hits.append((self.id_map[idx], float(score)))        return hits

We define clear data structures for short-term, long-term, and episodic memory using typed schemas. We implement a vector index backed by FAISS to enable fast semantic similarity search over stored memories. It lays the foundation for efficiently storing, indexing, and retrieving memory. Check out the Full Codes here.

class MemoryPolicy:    def __init__(self,                 st_max_items: int = 18,                 ltm_max_items: int = 2000,                 min_salience_to_store: float = 0.35,                 novelty_threshold: float = 0.82,                 topk_semantic: int = 6,                 topk_episodic: int = 3):        self.st_max_items = st_max_items        self.ltm_max_items = ltm_max_items        self.min_salience_to_store = min_salience_to_store        self.novelty_threshold = novelty_threshold        self.topk_semantic = topk_semantic        self.topk_episodic = topk_episodic      def salience_score(self, text: str, meta: Dict[str, Any]) -> float:        t = text.strip()        if not t:            return 0.0          length = min(len(t) / 420.0, 1.0)        has_numbers = 1.0 if re.search(r"bd+(.d+)?b", t) else 0.0        has_capitalized = 1.0 if re.search(r"b[A-Z][a-z]+(?:s+[A-Z][a-z]+)*b", t) else 0.0          kind = (meta.get("kind") or "").lower()        kind_boost = 0.0        if kind in {"preference", "procedure", "constraint", "definition"}:            kind_boost = 0.20        if meta.get("pinned"):            kind_boost += 0.20          generic_penalty = 0.15 if len(t.split()) < 6 and kind not in {"preference"} else 0.0          score = 0.45*length + 0.20*has_numbers + 0.15*has_capitalized + kind_boost - generic_penalty        return float(np.clip(score, 0.0, 1.0))      def should_store_ltm(self, salience: float, novelty: float, meta: Dict[str, Any]) -> bool:        if meta.get("pinned"):            return True        if salience >= self.min_salience_to_store and novelty >= self.novelty_threshold:            return True        return False      def episodic_value(self, outcome_score: float, task: str) -> float:        task_len = min(len(task) / 240.0, 1.0)        val = 0.55*(1 - abs(0.65 - outcome_score)) + 0.25*task_len        return float(np.clip(val, 0.0, 1.0))      def rank_retrieved(self,                       semantic_hits: List[Tuple[str, float]],                       episodic_hits: List[Tuple[str, float]],                       ltm_items: Dict[str, LongTermItem],                       episodes: Dict[str, Episode]) -> Dict[str, Any]:        sem = []        for mid, sim in semantic_hits:            it = ltm_items.get(mid)            if not it:                continue            freshness = 1.0            usage_penalty = 1.0 / (1.0 + 0.15*it.usage)            score = sim * (0.55 + 0.45*it.salience) * usage_penalty * freshness            sem.append((mid, float(score)))          ep = []        for eid, sim in episodic_hits:            e = episodes.get(eid)            if not e:                continue            score = sim * (0.6 + 0.4*e.outcome_score)            ep.append((eid, float(score)))          sem.sort(key=lambda x: x[1], reverse=True)        ep.sort(key=lambda x: x[1], reverse=True)          return {            "semantic_ids": [m for m, _ in sem[:self.topk_semantic]],            "episodic_ids": [e for e, _ in ep[:self.topk_episodic]],            "semantic_scored": sem[:self.topk_semantic],            "episodic_scored": ep[:self.topk_episodic],        }

We encode the rules that decide what is worth remembering and how retrieval should be ranked. We formalize salience, novelty, usage decay, and outcome-based scoring to avoid noisy or repetitive memory recall. This policy layer ensures memory growth remains controlled and useful rather than bloated. Check out the Full Codes here.

class MemoryEngine:    def __init__(self,                 embed_model: str = "sentence-transformers/all-MiniLM-L6-v2",                 policy: Optional[MemoryPolicy] = None):        self.policy = policy or MemoryPolicy()          self.embedder = SentenceTransformer(embed_model)        self.dim = self.embedder.get_sentence_embedding_dimension()          self.short_term: List[ShortTermItem] = []        self.ltm: Dict[str, LongTermItem] = {}        self.episodes: Dict[str, Episode] = {}          self.ltm_index = VectorIndex(self.dim)        self.episode_index = VectorIndex(self.dim)      def _now(self) -> str:        return datetime.utcnow().isoformat() + "Z"      def _embed(self, texts: List[str]) -> np.ndarray:        v = self.embedder.encode(texts, normalize_embeddings=True, show_progress_bar=False)        return np.array(v, dtype=np.float32)      def st_add(self, role: str, content: str, **meta):        self.short_term.append(ShortTermItem(ts=self._now(), role=role, content=content, meta=dict(meta)))        if len(self.short_term) > self.policy.st_max_items:            self.short_term = self.short_term[-self.policy.st_max_items:]      def ltm_add(self, kind: str, text: str, tags: Optional[List[str]] = None, **meta) -> Optional[str]:        tags = tags or []        meta = dict(meta)        meta["kind"] = kind          sal = self.policy.salience_score(text, meta)          novelty = 1.0        if len(self.ltm) > 0:            q = self._embed([text])[0]            hits = self.ltm_index.search(q, k=min(8, self.ltm_index.index.ntotal))            if hits:                max_sim = max(s for _, s in hits)                novelty = 1.0 - float(max_sim)                novelty = float(np.clip(novelty, 0.0, 1.0))          if not self.policy.should_store_ltm(sal, novelty, meta):            return None          mem_id = "mem_" + uuid.uuid4().hex[:12]        item = LongTermItem(            mem_id=mem_id,            ts=self._now(),            kind=kind,            text=text.strip(),            tags=tags,            salience=float(sal),            usage=0,            meta=meta        )        self.ltm[mem_id] = item          vec = self._embed([item.text])        self.ltm_index.add([mem_id], vec)          if len(self.ltm) > self.policy.ltm_max_items:            self._ltm_prune()          return mem_id      def _ltm_prune(self):        items = list(self.ltm.values())        candidates = [it for it in items if not it.meta.get("pinned")]        if not candidates:            return        candidates.sort(key=lambda x: (x.salience, x.usage))        drop_n = max(1, len(self.ltm) - self.policy.ltm_max_items)        to_drop = set([it.mem_id for it in candidates[:drop_n]])        for mid in to_drop:            self.ltm.pop(mid, None)        self._rebuild_ltm_index()      def _rebuild_ltm_index(self):        self.ltm_index = VectorIndex(self.dim)        if not self.ltm:            return        ids = list(self.ltm.keys())        vecs = self._embed([self.ltm[i].text for i in ids])        self.ltm_index.add(ids, vecs)      def episode_add(self,                    task: str,                    constraints: Dict[str, Any],                    plan: List[str],                    actions: List[Dict[str, Any]],                    result: str,                    outcome_score: float,                    lessons: List[str],                    failure_modes: List[str],                    tags: Optional[List[str]] = None,                    **meta) -> Optional[str]:          tags = tags or []        ep_id = "ep_" + uuid.uuid4().hex[:12]        ep = Episode(            ep_id=ep_id,            ts=self._now(),            task=task,            constraints=constraints,            plan=plan,            actions=actions,            result=result,            outcome_score=float(np.clip(outcome_score, 0.0, 1.0)),            lessons=lessons,            failure_modes=failure_modes,            tags=tags,            meta=dict(meta),        )          keep = self.policy.episodic_value(ep.outcome_score, ep.task)        if keep < 0.18 and not ep.meta.get("pinned"):            return None          self.episodes[ep_id] = ep          card = self._episode_card(ep)        vec = self._embed([card])        self.episode_index.add([ep_id], vec)        return ep_id      def _episode_card(self, ep: Episode) -> str:        lessons = "; ".join(ep.lessons[:8])        fails = "; ".join(ep.failure_modes[:6])        plan = " | ".join(ep.plan[:10])        return (            f"Task: {ep.task}n"            f"Constraints: {json.dumps(ep.constraints, ensure_ascii=False)}n"            f"Plan: {plan}n"            f"OutcomeScore: {ep.outcome_score:.2f}n"            f"Lessons: {lessons}n"            f"FailureModes: {fails}n"            f"Result: {ep.result[:400]}"        ).strip()

We implement a main-memory engine that integrates embeddings, storage, pruning, and indexing into a single system. We manage short-term buffers, long-term vector memory, and episodic traces while enforcing size limits and pruning strategies. Check out the Full Codes here.

  def consolidate(self):        recent = self.short_term[-min(len(self.short_term), 10):]        texts = [f"{it.role}: {it.content}".strip() for it in recent]        blob = "n".join(texts).strip()        if not blob:            return {"stored": []}          extracted = []          for m in re.findall(r"b(?:prefer|likes?|avoid|don['’]t want)b[: ]+(.*)", blob, flags=re.I):            if m.strip():                extracted.append(("preference", m.strip(), ["preference"]))          for m in re.findall(r"b(?:must|should|need to|constraint)b[: ]+(.*)", blob, flags=re.I):            if m.strip():                extracted.append(("constraint", m.strip(), ["constraint"]))          proc_candidates = []        for line in blob.splitlines():            if re.search(r"b(step|first|then|finally)b", line, flags=re.I) or "->" in line or "⇒" in line:                proc_candidates.append(line.strip())        if proc_candidates:            extracted.append(("procedure", " | ".join(proc_candidates[:8]), ["procedure"]))          if not extracted:            extracted.append(("note", blob[-900:], ["note"]))          stored_ids = []        for kind, text, tags in extracted:            mid = self.ltm_add(kind=kind, text=text, tags=tags)            if mid:                stored_ids.append(mid)          return {"stored": stored_ids}      def retrieve(self, query: str, filters: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:        filters = filters or {}        qv = self._embed([query])[0]          sem_hits = self.ltm_index.search(qv, k=max(self.policy.topk_semantic, 8))        ep_hits = self.episode_index.search(qv, k=max(self.policy.topk_episodic, 6))          pack = self.policy.rank_retrieved(sem_hits, ep_hits, self.ltm, self.episodes)          for mid in pack["semantic_ids"]:            if mid in self.ltm:                self.ltm[mid].usage += 1          return pack      def build_context(self, query: str, pack: Dict[str, Any]) -> str:        st = self.short_term[-min(len(self.short_term), 8):]        st_block = "n".join([f"[ST] {it.role}: {it.content}" for it in st])          sem_block = ""        if pack["semantic_ids"]:            sem_lines = []            for mid in pack["semantic_ids"]:                it = self.ltm[mid]                sem_lines.append(f"[LTM:{it.kind}] {it.text} (salience={it.salience:.2f}, usage={it.usage})")            sem_block = "n".join(sem_lines)          ep_block = ""        if pack["episodic_ids"]:            ep_lines = []            for eid in pack["episodic_ids"]:                e = self.episodes[eid]                lessons = "; ".join(e.lessons[:8]) if e.lessons else "(none)"                fails = "; ".join(e.failure_modes[:6]) if e.failure_modes else "(none)"                ep_lines.append(                    f"[EP] Task={e.task} | score={e.outcome_score:.2f}n"                    f"     Lessons={lessons}n"                    f"     Avoid={fails}"                )            ep_block = "n".join(ep_lines)          return (            "=== AGENT MEMORY CONTEXT ===n"            f"Query: {query}nn"            "---- Short-Term (working) ----n"            f"{st_block or '(empty)'}nn"            "---- Long-Term (vector) ----n"            f"{sem_block or '(none)'}nn"            "---- Episodic (what worked last time) ----n"            f"{ep_block or '(none)'}n"            "=============================n"        )      def ltm_df(self) -> pd.DataFrame:        if not self.ltm:            return pd.DataFrame(columns=["mem_id","ts","kind","text","tags","salience","usage"])        rows = []        for it in self.ltm.values():            rows.append({                "mem_id": it.mem_id,                "ts": it.ts,                "kind": it.kind,                "text": it.text,                "tags": ",".join(it.tags),                "salience": it.salience,                "usage": it.usage            })        df = pd.DataFrame(rows).sort_values(["salience","usage"], ascending=[False, True])        return df      def episodes_df(self) -> pd.DataFrame:        if not self.episodes:            return pd.DataFrame(columns=["ep_id","ts","task","outcome_score","lessons","failure_modes","tags"])        rows = []        for e in self.episodes.values():            rows.append({                "ep_id": e.ep_id,                "ts": e.ts,                "task": e.task[:120],                "outcome_score": e.outcome_score,                "lessons": " | ".join(e.lessons[:6]),                "failure_modes": " | ".join(e.failure_modes[:6]),                "tags": ",".join(e.tags),            })        df = pd.DataFrame(rows).sort_values(["outcome_score","ts"], ascending=[False, False])        return df

We show how recent interactions are consolidated from short-term memory into durable long-term entries. We implement a hybrid retrieval that combines semantic recall with episodic lessons learned from past tasks. This allows the agent to answer new queries using both factual memory and prior experience. Check out the Full Codes here.

def openai_chat(system: str, user: str) -> str:    resp = client.chat.completions.create(        model=OPENAI_MODEL,        messages=[            {"role": "system", "content": system},            {"role": "user", "content": user},        ],        temperature=0.3    )    return resp.choices[0].message.content   def heuristic_responder(context: str, question: str) -> str:    lessons = re.findall(r"Lessons=(.*)", context)    avoid = re.findall(r"Avoid=(.*)", context)    ltm_lines = [ln for ln in context.splitlines() if ln.startswith("[LTM:")]      steps = []    if lessons:        for chunk in lessons[:2]:            for s in [x.strip() for x in chunk.split(";") if x.strip()]:                steps.append(s)    for ln in ltm_lines:        if "[LTM:procedure]" in ln.lower():            proc = re.sub(r"^[LTM:procedure]s*", "", ln, flags=re.I)            proc = proc.split("(salience=")[0].strip()            for part in [p.strip() for p in proc.split("|") if p.strip()]:                steps.append(part)      steps = steps[:8] if steps else ["Clarify the target outcome and constraints.", "Use semantic recall + episodic lessons to propose a plan.", "Execute, then store lessons learned."]      pitfalls = []    if avoid:        for chunk in avoid[:2]:            for s in [x.strip() for x in chunk.split(";") if x.strip()]:                pitfalls.append(s)    pitfalls = pitfalls[:6]      prefs = [ln for ln in ltm_lines if "[LTM:preference]" in ln.lower()]    facts = [ln for ln in ltm_lines if "[LTM:fact]" in ln.lower() or "[LTM:constraint]" in ln.lower()]      out = []    out.append("Answer (memory-informed, offline fallback)n")    if prefs:        out.append("Relevant preferences/constraints remembered:")        for ln in (prefs + facts)[:6]:            out.append(" - " + ln.split("] ",1)[1].split(" (salience=")[0].strip())        out.append("")    out.append("Recommended approach:")    for i, s in enumerate(steps, 1):        out.append(f" {i}. {s}")    if pitfalls:        out.append("nPitfalls to avoid (from episodic traces):")        for p in pitfalls:            out.append(" - " + p)    out.append("n(If you add an API key, the same memory context will feed a stronger LLM for higher-quality responses.)")    return "n".join(out).strip()   class MemoryAugmentedAgent:    def __init__(self, mem: MemoryEngine):        self.mem = mem      def answer(self, question: str) -> Dict[str, Any]:        pack = self.mem.retrieve(question)        context = self.mem.build_context(question, pack)          system = (            "You are a memory-augmented agent. Use the provided memory context.n"            "Prioritize:n"            "1) Episodic lessons (what worked before)n"            "2) Long-term facts/preferences/proceduresn"            "3) Short-term conversation staten"            "Be concrete and stepwise. If memory conflicts, state the uncertainty."        )          if USE_OPENAI:            reply = openai_chat(system=system, user=context + "nnUser question:n" + question)        else:            reply = heuristic_responder(context=context, question=question)          self.mem.st_add("user", question, kind="message")        self.mem.st_add("assistant", reply, kind="message")          return {"reply": reply, "pack": pack, "context": context}   mem = MemoryEngine() agent = MemoryAugmentedAgent(mem)   mem.ltm_add(kind="preference", text="Prefer concise, structured answers with steps and bullet points when helpful.", tags=["style"], pinned=True) mem.ltm_add(kind="preference", text="Prefer solutions that run on Google Colab without extra setup.", tags=["environment"], pinned=True) mem.ltm_add(kind="procedure", text="When building agent memory: embed items, store with salience/novelty policy, retrieve with hybrid semantic+episodic, and decay overuse to avoid repetition.", tags=["agent-memory"]) mem.ltm_add(kind="constraint", text="If no API key is available, provide a runnable offline fallback instead of failing.", tags=["robustness"], pinned=True)   mem.episode_add(    task="Build an agent memory layer for troubleshooting Python errors in Colab",    constraints={"offline_ok": True, "single_notebook": True},    plan=[        "Capture short-term chat context",        "Store durable constraints/preferences in long-term vector memory",        "After solving, extract lessons into episodic traces",        "On new tasks, retrieve top episodic lessons + semantic facts"    ],    actions=[        {"type":"analysis", "detail":"Identified recurring failure: missing installs and version mismatches."},        {"type":"action", "detail":"Added pip install block + minimal fallbacks."},        {"type":"action", "detail":"Added memory policy: pin constraints, drop low-salience items."}    ],    result="Notebook became robust: runs with or without external keys; troubleshooting quality improved with episodic lessons.",    outcome_score=0.90,    lessons=[        "Always include a pip install cell for non-standard deps.",        "Pin hard constraints (e.g., offline fallback) into long-term memory.",        "Store a post-task 'lesson list' as an episodic trace for reuse."    ],    failure_modes=[        "Assuming an API key exists and crashing when absent.",        "Storing too much noise into long-term memory causing irrelevant recall context."    ],    tags=["colab","robustness","memory"] )   print("✅ Memory engine initialized.") print(f"   LTM items: {len(mem.ltm)} | Episodes: {len(mem.episodes)} | ST items: {len(mem.short_term)}")   q1 = "I want to build memory for an agent in Colab. What should I store and how do I retrieve it?" out1 = agent.answer(q1) print("n" + "="*90) print("Q1 REPLYn") print(out1["reply"][:1800])   q2 = "How do I avoid my agent repeating the same memory over and over?" out2 = agent.answer(q2) print("n" + "="*90) print("Q2 REPLYn") print(out2["reply"][:1800])   def simple_outcome_eval(text: str) -> float:    hits = 0    for kw in ["decay", "usage", "penalty", "novelty", "prune", "retrieve", "episodic", "semantic"]:        if kw in text.lower():            hits += 1    return float(np.clip(hits/8.0, 0.0, 1.0))   score2 = simple_outcome_eval(out2["reply"]) mem.episode_add(    task="Prevent repetitive recall in a memory-augmented agent",    constraints={"must_be_simple": True, "runs_in_colab": True},    plan=[        "Track usage counts per memory item",        "Apply usage-based penalty during ranking",        "Boost novelty during storage to reduce duplicates",        "Optionally prune low-salience memories"    ],    actions=[        {"type":"design", "detail":"Added usage-based penalty 1/(1+alpha*usage)."},        {"type":"design", "detail":"Used novelty = 1 - max_similarity at store time."}    ],    result=out2["reply"][:600],    outcome_score=score2,    lessons=[        "Penalize overused memories during ranking (usage decay).",        "Enforce novelty threshold at storage time to prevent duplicates.",        "Keep episodic lessons distilled to avoid bloated recall context."    ],    failure_modes=[        "No usage tracking, causing one high-similarity memory to dominate forever.",        "Storing raw chat logs as LTM instead of distilled summaries."    ],    tags=["ranking","decay","policy"] )   cons = mem.consolidate() print("n" + "="*90) print("CONSOLIDATION RESULT:", cons)   print("n" + "="*90) print("LTM (top rows):") display(mem.ltm_df().head(12))   print("n" + "="*90) print("EPISODES (top rows):") display(mem.episodes_df().head(12))   def debug_retrieval(query: str):    pack = mem.retrieve(query)    ctx = mem.build_context(query, pack)    sem = []    for mid, sc in pack["semantic_scored"]:        it = mem.ltm[mid]        sem.append({"mem_id": mid, "score": sc, "kind": it.kind, "salience": it.salience, "usage": it.usage, "text": it.text[:160]})    ep = []    for eid, sc in pack["episodic_scored"]:        e = mem.episodes[eid]        ep.append({"ep_id": eid, "score": sc, "outcome": e.outcome_score, "task": e.task[:140], "lessons": " | ".join(e.lessons[:4])})    return ctx, pd.DataFrame(sem), pd.DataFrame(ep)   print("n" + "="*90) ctx, sem_df, ep_df = debug_retrieval("How do I design an agent memory policy for storage and retrieval?") print(ctx[:1600]) print("nTop semantic hits:") display(sem_df) print("nTop episodic hits:") display(ep_df)   print("n✅ Done. You now have working short-term, long-term vector, and episodic memory with storage/retrieval policies in one Colab snippet.")

We wrap the memory engine inside a simple memory-augmented agent and run end-to-end queries. We demonstrate how episodic memory influences responses, how outcomes are evaluated, and how new episodes are written back into memory. It closes the loop and shows how the agent continuously learns from its own behavior.

In conclusion, we have a complete memory stack that lets our agent remember facts and preferences in long-term vector memory, retain distilled “lessons learned” as episodic traces, and keep only the most relevant recent context in short-term memory. We demonstrated how hybrid retrieval improves responses, how usage-based penalties reduce repetition, and how consolidation turns noisy interaction logs into compact, reusable knowledge. With this foundation, we can extend the system toward production-grade agent behavior by adding stricter budgets, richer extraction, better evaluators, and task-specific memory schemas while keeping the same core idea: we store less, store smarter, and retrieve what actually helps.


Check out the Full Codes here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Leave a Reply

Your email address will not be published. Required fields are marked *