How to Build a Fully Searchable AI Knowledge Base with OpenKB, OpenRouter, and Llama

In this tutorial, we explore how to build and query a local knowledge base with OpenKB using a free, open model via OpenRouter. We securely retrieve the API key with getpass, set up the environment without hardcoding secrets, and initialize a structured, wiki-style knowledge base from scratch. As we move through the workflow, we add source documents, generate summaries and concept pages, inspect the resulting wiki structure, run queries, save explorations, and even perform programmatic analysis of cross-links and page relationships. Also, we demonstrate how we turn raw Markdown documents into a navigable, synthesized knowledge system that supports both interactive querying and incremental updates.

import subprocess, sys   def run(cmd, capture=False, cwd=None):    result = subprocess.run(        cmd, shell=True, text=True,        capture_output=capture, cwd=cwd    )    if capture:        return result.stdout.strip(), result.stderr.strip()    return result.returncode   print("📦 Installing OpenKB…") run("pip install openkb --quiet") print("✅ OpenKB installed.n")   import getpass, os   print("━" * 60) print("  🔑  Secure API Key Setup") print("━" * 60) print("  Provider : OpenRouter  (https://openrouter.ai)") print("  Model    : meta-llama/llama-3.3-70b-instruct:free") print("  Sign-up  : free, no credit card required") print("━" * 60)   OPENROUTER_API_KEY = getpass.getpass("nPaste your OpenRouter API key (hidden): ").strip()   if not OPENROUTER_API_KEY:    raise ValueError("❌ No API key provided. Please re-run and enter a valid key.")   os.environ["OPENROUTER_API_KEY"] = OPENROUTER_API_KEY os.environ["LLM_API_KEY"]        = OPENROUTER_API_KEY   LLM_MODEL = "openrouter/meta-llama/llama-3.3-70b-instruct:free"   print("✅ API key set (not printed). Model:", LLM_MODEL, "n")   import json, textwrap, time, re, shutil from pathlib import Path from collections import Counter   KB_DIR   = Path("https://www.marktechpost.com/content/my_knowledge_base") wiki_dir = KB_DIR / "wiki" raw_dir  = KB_DIR / "raw"   def kb_cmd(command: str) -> str:    stdout, stderr = run(f"openkb {command}", capture=True, cwd=str(KB_DIR))    return stdout or stderr   def section(title: str):    bar = "─" * (len(title) + 4)    print(f"n┌{bar}┐")    print(f"│  {title}  │")    print(f"└{bar}┘")   def show_tree(root: Path, indent=0, max_depth=3):    if indent > max_depth:        return    prefix = "  " * indent + ("└─ " if indent else "")    print(prefix + root.name + ("https://www.marktechpost.com/" if root.is_dir() else ""))    if root.is_dir():        for child in sorted(root.iterdir()):            show_tree(child, indent + 1, max_depth)   def show_md(path: Path, max_lines=35):    lines = path.read_text().splitlines()    for line in lines[:max_lines]:        print(line)    if len(lines) > max_lines:        print(f"  … ({len(lines) - max_lines} more lines)")   def print_wrapped(text: str, width=90):    for line in text.splitlines():        print(textwrap.fill(line, width=width, subsequent_indent="   ") if line else "")

We install OpenKB and prepare the Colab environment to run the full workflow smoothly. We securely collect the OpenRouter API key using getpass, store it in environment variables, and configure the free Llama 3.3 70B model without hardcoding any secrets. We also import all required libraries, define the core paths, and create helper functions we use throughout the tutorial to run commands, print sections, and inspect generated files.

DOCS = {    "transformer_architecture.md": textwrap.dedent("""        # Transformer Architecture          ## Overview        The Transformer is a deep learning architecture introduced in "Attention Is All        You Need" (Vaswani et al., 2017). It replaced recurrent networks with a        self-attention mechanism, enabling parallel training and better long-range        dependency modelling.          ## Key Components        - **Multi-Head Self-Attention**: Computes attention in h parallel heads, each          with its own learned Q/K/V projections, then concatenates and projects.        - **Feed-Forward Network (FFN)**: Two linear layers with a ReLU activation,          applied position-wise.        - **Positional Encoding**: Sinusoidal or learned embeddings that inject          sequence-order information, since attention is permutation-invariant.        - **Layer Normalisation**: Applied before (Pre-LN) or after (Post-LN) each          sub-layer, stabilising gradients.        - **Residual Connections**: Added around each sub-layer to ease gradient flow.          ## Encoder vs Decoder        The encoder stack processes input tokens bidirectionally (e.g. BERT).        The decoder stack uses causal (masked) attention over previous outputs plus        cross-attention over encoder outputs (e.g. GPT, T5).          ## Scaling Laws        Kaplan et al. (2020) showed that model loss decreases predictably as a power        law with compute, data, and parameter count. This motivated GPT-3 (175B) and        subsequent large language models.          ## Limitations        - Quadratic complexity in sequence length: O(n^2)        - No inherent recurrence -> long-context challenges        - High memory footprint during training          ## References        Vaswani et al. (2017). Attention Is All You Need. NeurIPS.        Kaplan et al. (2020). Scaling Laws for Neural Language Models. arXiv:2001.08361.    """),      "rag_systems.md": textwrap.dedent("""        # Retrieval-Augmented Generation (RAG)          ## Definition        RAG augments a generative LLM with a retrieval step: given a query, relevant        documents are fetched from a corpus and prepended to the prompt, giving the        model grounded context beyond its training data.          ## Architecture        1. **Indexing Phase** — Documents are chunked, embedded via a bi-encoder           (e.g. text-embedding-3-large), and stored in a vector database (e.g.           Faiss, Pinecone, Weaviate).        2. **Retrieval Phase** — The user query is embedded; approximate nearest-           neighbour (ANN) search returns the top-k chunks.        3. **Generation Phase** — Retrieved chunks + query are passed to the LLM           which synthesises a final answer.          ## Variants        - **Dense Retrieval**: DPR, Contriever — queries and docs in the same space.        - **Sparse Retrieval**: BM25 — term frequency-based, no embeddings needed.        - **Hybrid Retrieval**: Reciprocal Rank Fusion (RRF) combines dense + sparse.        - **Re-ranking**: A cross-encoder re-scores the top-k before the LLM sees them.          ## Challenges        - Context window limits: long retrieved passages may not fit.        - Retrieval quality is a hard ceiling on generation quality.        - Chunking strategy significantly affects recall.        - Multi-hop questions require iterative retrieval (IRCoT, ReAct).          ## Relationship to Transformers        RAG systems rely on transformer-based encoders for embedding and decoder        models for generation. The quality of the embedding model directly determines        retrieval precision and recall.          ## References        Lewis et al. (2020). RAG for Knowledge-Intensive NLP Tasks. NeurIPS.        Gao et al. (2023). RAG for Large Language Models. arXiv:2312.10997.    """),      "knowledge_graph_integration.md": textwrap.dedent("""        # Knowledge Graphs and LLM Integration          ## What is a Knowledge Graph?        A knowledge graph (KG) is a directed labelled graph of entities (nodes) and        relations (edges): (subject, predicate, object) triples, e.g.        (Vaswani, authored, "Attention Is All You Need").          ## Why Combine KGs with LLMs?        LLMs hallucinate facts; KGs provide structured, verifiable ground truth.        KGs are hard to query in natural language; LLMs provide the interface.        Together they enable faithful, grounded, explainable question answering.          ## Integration Strategies        ### KG-Augmented Generation (KGAG)        Retrieve triples or sub-graphs instead of text chunks, serialise into text,        then feed to the LLM prompt.          ### LLM-Assisted KG Construction        LLMs extract (subject, relation, object) triples from unstructured text,        reducing manual curation effort significantly.          ### GraphRAG (Microsoft Research, 2024)        GraphRAG clusters document communities, generates community summaries, and        stores them in a KG. Queries answered by map-reduce over community summaries        outperform flat-vector RAG on sensemaking tasks.          ## Challenges        - KG construction quality depends on extraction LLM accuracy.        - Graph databases add infrastructure complexity.        - Ontology design requires domain expertise.        - KGs go stale without continuous update pipelines.          ## Relation to RAG and Transformers        KG integration addresses two key RAG limitations: lack of structured reasoning        and inability to follow multi-hop relations.          ## References        Pan et al. (2023). Unifying LLMs and KGs. IEEE Intelligent Systems.    """), }

We define the sample source documents that we want to load into the knowledge base. We prepare rich Markdown content on transformer architecture, retrieval-augmented generation, and knowledge graph integration so that OpenKB has meaningful material to summarise and connect. We essentially build the raw knowledge corpus here, which serves as the foundation for all subsequent indexing, synthesis, and querying steps.

section("Step 1 — Initialise Knowledge Base")   if KB_DIR.exists():    shutil.rmtree(KB_DIR) KB_DIR.mkdir(parents=True)   config_dir = KB_DIR / ".openkb" config_dir.mkdir() (config_dir / "config.yaml").write_text(    f"model: {LLM_MODEL}nlanguage: ennpageindex_threshold: 20n" ) (KB_DIR / ".env").write_text(    f"OPENROUTER_API_KEY={OPENROUTER_API_KEY}n"    f"LLM_API_KEY={OPENROUTER_API_KEY}n" )   for sub in ["sources", "summaries", "concepts", "explorations", "reports"]:    (wiki_dir / sub).mkdir(parents=True)   (wiki_dir / "AGENTS.md").write_text(textwrap.dedent("""    # Wiki Schema      ## Conventions    - All pages use Markdown with [[wikilinks]] for cross-references.    - `summaries/` -- one page per source document.    - `concepts/`  -- cross-document synthesis pages.    - `index.md`   -- knowledge base overview.    - `log.md`     -- operations timeline.      ## Concept page template    #     ## Overview    ## Key Points    ## Related Concepts    ## Sources """)) (wiki_dir / "index.md").write_text("# Knowledge Base IndexnnNo documents indexed yet.n") (wiki_dir / "log.md").write_text("# Operations Lognn")   raw_dir.mkdir() for fname, content in DOCS.items():    (raw_dir / fname).write_text(content)   print(f"✅ Knowledge base initialised at: {KB_DIR}") print(f"   Model  : {LLM_MODEL}") print(f"   Docs   : {list(DOCS.keys())}")   section("Step 2 — Compile Documents into the Wiki")   print("Each document is read by the LLM, which writes summaries + concept pages.n")   for fname in DOCS:    doc_path = raw_dir / fname    print(f"  ➕ Adding: {fname}")    out = kb_cmd(f"add {doc_path}")    print(textwrap.indent(out[:600], "     "))    print()    time.sleep(1)   print("✅ All documents compiled.")   section("Step 3 — Explore the Generated Wiki")   print("n📂 Directory tree (wiki/):n") show_tree(wiki_dir, max_depth=3)   print("nn📄 wiki/index.md:") print("─" * 50) show_md(wiki_dir / "index.md")   print("nn📄 wiki/log.md:") print("─" * 50) show_md(wiki_dir / "log.md")   concepts = list((wiki_dir / "concepts").glob("*.md")) print(f"nn💡 Generated concept pages ({len(concepts)}):") for cp in sorted(concepts):    print(f"  • {cp.name}")   if concepts:    print(f"nn📄 Sample concept — {concepts[0].name}:")    print("─" * 50)    show_md(concepts[0])

We initialize the OpenKB knowledge base, create the required directory structure, and write the configuration and environment files needed by the tool. We then save the sample documents into the raw folder and compile them into the wiki so that OpenKB can generate summaries, concepts, and cross-linked knowledge pages. After that, we inspect the generated wiki structure, preview important files such as the index and log, and review the concept pages generated from our input documents.

section("Step 4 — List Indexed Content & Status")   print("── openkb list ──") print(kb_cmd("list"))   print("n── openkb status ──") print(kb_cmd("status"))   section("Step 5 — Query the Knowledge Base")   QUERIES = [    "What is the Transformer architecture and what problem did it solve?",    "How does RAG differ from a traditional knowledge base like OpenKB?",    "What are the connections between knowledge graphs, RAG, and transformers?",    "What are the shared limitations across all three AI topics covered?", ]   for i, query in enumerate(QUERIES, 1):    print(f"n❓ Query {i}: {query}")    print("─" * 60)    print_wrapped(kb_cmd(f'query "{query}"'))   section("Step 6 — Save a Deep Synthesis Query")   deep_query = (    "Synthesise the key architectural themes across transformers, RAG, and "    "knowledge graphs into a unified mental model." ) print(f"❓ Query: {deep_query}n") out = kb_cmd(f'query "{deep_query}" --save') print_wrapped(out[:800])   explorations = list((wiki_dir / "explorations").glob("*.md")) if explorations:    print(f"n📄 Saved → {explorations[-1].name}")    print("─" * 50)    show_md(explorations[-1])   section("Step 7 — Lint: Wiki Health Checks")   print(kb_cmd("lint"))   reports = list((wiki_dir / "reports").glob("*.md")) if reports:    print(f"n📄 Report — {reports[-1].name}:")    print("─" * 50)    show_md(reports[-1])

We examine the indexed content using the built-in list and status commands to understand what OpenKB has created so far. We then query the knowledge base with several increasingly complex questions and observe how the system synthesizes answers from the stored information. Finally, we save a deeper exploration query into the wiki and run lint checks to evaluate the health, consistency, and completeness of the generated knowledge base.

section("Step 8 — Programmatic Wiki Analysis (Python)")   wiki_pages = {} for md_file in wiki_dir.rglob("*.md"):    rel     = str(md_file.relative_to(wiki_dir))    content = md_file.read_text()    links   = re.findall(r'[[([^]]+)]]', content)    wiki_pages[rel] = {"lines": len(content.splitlines()), "wikilinks": links}   print(f"Total wiki pages : {len(wiki_pages)}n") print(f"{'Page':<45} {'Lines':>6}  {'Links':>5}") print("─" * 60) for page, m in sorted(wiki_pages.items()):    print(f"  {page:<43} {m['lines']:>6}  {len(m['wikilinks']):>5}")   link_targets = Counter(    link for m in wiki_pages.values() for link in m["wikilinks"] ) if link_targets:    print("n🏆 Most-referenced wiki pages (hub concepts):")    for page, count in link_targets.most_common(8):        print(f"  {count:>3}x  [[{page}]]")   print("n🔗 Cross-reference graph:") for page, m in sorted(wiki_pages.items()):    if m["wikilinks"]:        shown = ", ".join(f"[[{l}]]" for l in m["wikilinks"][:4])        more  = f"  +{len(m['wikilinks'])-4} more" if len(m["wikilinks"]) > 4 else ""        print(f"  {page}")        print(f"    -> {shown}{more}")   section("Step 9 — Incremental Update: Add a 4th Document")   new_doc = raw_dir / "sparse_attention.md" new_doc.write_text(textwrap.dedent("""    # Sparse Attention Mechanisms      ## Motivation    Standard transformer attention is O(n^2) in sequence length, limiting context    windows. Sparse attention patterns reduce this to O(n log n) or O(n*sqrt(n)).      ## Key Approaches    - **Longformer** (Beltagy et al., 2020): local sliding-window + global tokens.    - **BigBird** (Zaheer et al., 2020): random + window + global; Turing-complete.    - **Flash Attention** (Dao et al., 2022): exact attention, hardware-aware CUDA      tiling. Not sparse but dramatically faster in practice.      ## Impact on RAG    Larger context windows reduce the need for chunking and retrieval. However,    retrieval still helps for corpora larger than any single context window.      ## References    Beltagy et al. (2020). Longformer. arXiv:2004.05150.    Zaheer et al. (2020). Big Bird. NeurIPS.    Dao et al. (2022). FlashAttention. NeurIPS. """))   concepts_before = len(list((wiki_dir / "concepts").glob("*.md"))) print(f"Adding: {new_doc.name}") print_wrapped(kb_cmd(f"add {new_doc}")[:500])   concepts_after = list((wiki_dir / "concepts").glob("*.md")) print(f"n💡 Concept pages: {concepts_before} -> {len(concepts_after)}") for c in sorted(concepts_after, key=lambda p: p.stat().st_mtime, reverse=True)[:3]:    print(f"  • {c.name}")   section("Tutorial Complete 🎉")   print(textwrap.dedent(f"""  What we covered  ───────────────  1.  Installed OpenKB  2.  Entered API key securely via getpass (never printed/stored in code)  3.  Used FREE open model: meta-llama/llama-3.3-70b-instruct via OpenRouter  4.  Initialised KB at {KB_DIR}  5.  Created 3 AI research docs and compiled them into a wiki  6.  Explored auto-generated summaries, concept pages, and index  7.  Listed content (openkb list) and checked stats (openkb status)  8.  Ran 4 queries of increasing complexity  9.  Saved a deep synthesis query to wiki/explorations/  10. Linted the wiki for health issues (contradictions, orphans, gaps)  11. Analysed the wiki graph programmatically (hub pages, cross-refs)  12. Added a 4th document -- demonstrated incremental live updates    Other free OpenRouter models to try (change LLM_MODEL):  ────────────────────────────────────────────────────────    openrouter/mistralai/mistral-7b-instruct:free    openrouter/google/gemma-3-27b-it:free    openrouter/qwen/qwen3-14b:free    openrouter/microsoft/phi-4-reasoning:free    Docs: https://github.com/VectifyAI/OpenKB """))

We analyze the generated wiki programmatically by reading Markdown pages, counting lines, extracting wikilinks, and identifying the most referenced concepts. We also visualize the internal cross-reference structure so that we can better understand how the knowledge base is connected and which pages act as hubs. In the final part, we add a new document incrementally, observe how the wiki updates, and conclude the tutorial with a complete summary of everything we built and tested.

In conclusion, we created a complete OpenKB workflow using a Llama model served through OpenRouter, while keeping API access secure and easy to manage. We initialized the knowledge base, ingested multiple research documents, generated linked wiki artifacts, queried the compiled knowledge, and validated the structure through linting and Python-based inspection. We also show how we extend the knowledge base incrementally by adding new material without rebuilding everything from scratch. Also, it provides a practical, reproducible foundation for using OpenKB as a lightweight, LLM-powered system for knowledge organization, synthesis, and exploration.

Check out the Full Codes here. Find 100s of ML/Data Science Colab Notebooks here. Also, feel free to follow us on Twitter and don’t forget to join our 130k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us

Sana Hassan

Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.

Sana Hassan

Leave a Reply Cancel reply

Related Posts

Top 19 AI Red Teaming Tools (2026): Secure Your ML Models

Liquid AI’s New LFM2-24B-A2B Hybrid Architecture Blends Attention with Convolutions to Solve the Scaling Bottlenecks of Modern LLMs

Mistral AI Releases Voxtral TTS: A 4B Open-Weight Streaming Speech Model for Low-Latency Multilingual Voice Generation