The Glass Box

Context Graphs

Sreeram Nudurupati — Mon, 04 May 2026 19:33:32 GMT

Have you ever seen a database centered on AI-Agents?

A few weeks ago, I wrote about the “dumb storage, smart index” pattern, which I traced across seemingly independent technologies like Delta Lake for data lakes, MCAP for Physical AI, and turbopuffer for object-storage based vector search.

In my search for this pattern extending to graph databases, I have encountered the old guard like Neo4J, which just slapped a vector database and text-search engine onto a graph database, a disjointed system lacking a unified query planner where results are orchestrated at the application layer. I’ve also encountered neo-modern graph databases like ArangoDB, which is natively multi-modal, has a unified query planner and DSL, but is very human-centered, and treats AI like an afterthought. Where AI Agents have to learn to use a query language designed for humans.

I came to the conclusion that traditional databases have been designed around humans in the loop. People design the schema, people write the queries, people manage state, and finally decide what to persist. The human-in-the-loop assumption is baked into every layer of the database, and that is the real limitation we are imposing on AI agents. Thus, I had given up my search and set out on building my own reference architecture for an “autonomous-context-fabric” with Memgraph as my graph substrate, until I was introduced to Omnigraph recently.

Enter Omnigraph

Omnigraph is a versioned property graph database built on Lance, designed to be read from and written to by AI agents rather than humans. It treats typed graph data like code with branch, commit and merge semantics. So I set out to replace Memgraph in my reference architecture with Omnigraph, and the first thing I noticed was a sheer lack of a Python SDK or fully documented API docs. What I found instead in its GitHub repo was a project structure recommendation with a CLAUDE.md file and two agentic skills to get started with. This is an architectural shift away from people-centric SDKs to AI Agents synthesizing their own SDK based on the existing schema.

This is a fundamental paradigm shift. Omnigraph is designed from the ground up with an “Agent-centric” approach. Omnigraph assumes an AI in the loop and is designed for AI agents to read the schema, understand the graph topology, and synthesize their own query language dynamically.

My retrofit of Omnigraph with Memgraph was fairly easy once I enabled the published skills in my AI Agent (Gemini CLI) and let it design the schema and the queries by itself. Once the swap was complete and I ran the latency numbers, they couldn’t match for Memgraph’s purely in-memory engine, but were completely acceptable considering the trade-offs of S3-native persistence and agentic AI-centered paradigm. Though I am told by a source very close to Omnigraph that tiered storage is in the works to help with the latency numbers.

However, given my years of experience in the enterprise data space, what I really wanted to understand was whether this architecture scales and finds enterprise adoption. So I looked deeper into Omnigraph’s design philosophy, especially two of the core principles on which Omnigraph was built.

The Central Planning Problem [1]

SaaS sprawl, tools that don’t talk to each other, each fragments enterprise knowledge, and each locks a version of the schema. This is a problem that enterprises have been grappling with for decades now, and the one solution which really tried to solve this is Master Data Management or MDM. MDM tried to create a single version of enterprise truth, the “golden record for its core business data, which was bloated and failed miserably.

MDM failed for a few reasons:

MDM required humans to agree on a schema before anything could be written. Omnigraph’s schema evolution doesn’t require committee approval because there’s no human-written client code that breaks when the schema changes. With Omnigraph, agents synthesize their own queries against whatever schema exists.
MDM had no way to handle conflicting writes gracefully except to block everything until the conflict was resolved. This is where Omnigraph’s Git-style branching comes to the rescue by sending any conflicting writes to a new branch and merging them when a confidence threshold is met.
MDM hubs were on-prem servers with their own operational burden. Omnigraph is S3-native and headless. The “dumb storage, smart index” pattern means the complexity lives in the query layer and not in an always-on process.
The MDM schema was determined by people who decided what gets persisted via ETL pipelines. Omnigraph can overcome this trap if its agents are allowed to evolve the schema and are the ones hydrating and maintaining the context graph.

Of Stigmergy and Ontologies

Stigmergy is the indirect coordination of AI agents through environment modification. Agents writing to the context graph modify the ontology that other agents read from.

Oh, the dreaded ontologies, they were popular circa 2015 during the last knowledge graph wave. They went into oblivion yet again, not because the graph engines of the era weren’t good, nor because the DSL wasn’t expressive enough, nor because of the formats. What killed them was the Ontology design process as a strict prerequisite before deriving any value out of them, and those ontology engineers were a scarce and expensive resource.

“Ontologies define the fitness landscape over which AI agents optimize. When the ontology is clear, agents make clear decisions, when unclear they provide garbage.”[1]

But who makes the ontology clear? Based on what I know so far, I can formulate a thesis.

Since Omnigraph is agent-centric, AI agents infer and evolve the schema from the data itself. Ontology drift is handled by schema branching, where an agent can propose a change to the ontology on a new branch, validate it against real queries, and merge only when a certain confidence threshold is met.

I put the first part of the thesis in practice in my “autonomous-context-fabric” reference architecture, where my AI agent (Gemini CLI), given its existing knowledge of my problem domain, was successfully able to synthesize the relevant schema. In future iterations, I fully intend to test ontology drift using my planned agent swarm.

Conclusion

We have seen compute become ephemeral over the last decade owing to durable cloud storage. The team at Omnigraph envisions a future where enterprise software becomes thinner, and the context becomes thicker [1].

My journey in Context graphs started off to solve the AI hallucination problem by trying to ground it in context. But here I am now, experiencing an agent-centric paradigm shift first-hand and completely onboard with Omnigraph’s vision: “The beginning of Infinity” [1], where the durable system of record of an AI-native organization becomes a governed context graph, and enterprise applications become increasingly transient.

Next Steps

While the team at Omnigraph is busy executing their vision, my mission is to build the application layer, the “autonomous-context-fabric”, and watch it grow ever so thin as Omnigraph matures.

References: The Knowledge-Coordination Manifesto [1]

The Ghost Node Pattern

Sreeram Nudurupati — Tue, 21 Apr 2026 14:14:36 GMT

In my last post, I broke down the latency metrics of my streaming graph pipeline. Hitting a sub-15ms P95 ingestion time felt like a massive win for the architecture, until I actually looked at the topology of what I was ingesting. I had successfully built a high-speed pipe, but I quickly realized I was using it to build a data swamp.

When building standard data pipelines over the last 18 years, the primary goal has almost always been throughput: get the data from Source A to Destination B as fast as possible. But when building an AI Context Engine, a hyper-eager pipeline is actively dangerous.

If you connect a firehose of enterprise events directly to Memgraph, the pipeline’s eagerness becomes its biggest liability. Imagine a sales rep fat-fingering a Salesforce update for “Microsft” instead of “Microsoft.” A naive streaming pipeline instantly writes a brand new, isolated node into the graph for that typo. Your AI agent now has fractured context and is reasoning over hallucinated entities.

It’s the Bootstrap Problem: how do you start building a graph from a firehose of noisy events without instantly turning it into a hallucination factory?

The Architecture: Building the GhostNodeManager

To solve this, I had to fundamentally change how data moves in Sprint 14.

Before this sprint, the ingestion scripts (sec_ingestion.py and synthetic_crm.py) were taking events and throwing them straight over the wall into the resolution engine, and then directly into Memgraph. I needed a bouncer at the door.

I built the GhostNodeManager into the routing layer (pipelines/routing.py). Instead of acting as a pure passthrough, it uses Pathway’s stateful stream processing capabilities to hold events in a temporary memory buffer. These buffered events are called “Ghost Nodes” because they exist in the stream’s state, but they are strictly forbidden from materializing in the graph database yet.

The Code Logic: Eagerness vs. Skepticism

When a new AccountEvent arrives, the GhostNodeManager runs it through an evidence checklist before allowing it to touch the graph. Here is exactly what the logic is doing under the hood:

The “Strong Signal” Fast-Track: The code checks the event for definitive identifiers. Does this event have a verified cik_number, an exact company_domain, or a hard account_id? If Yes: The event skips the waiting room. It is instantly promoted, sent through the resolvers, and written to Memgraph.
The “Corroboration” Waiting Room: If the event only has a fuzzy text string (like company_name = "Globel Corp"), the code parks it in the stateful buffer and starts a time window. It waits to see if another distinct event arrives with that same fuzzy name. If 2+ events accumulate: The code decides this isn’t a one-off typo; it’s a real (if poorly named) entity. The threshold is met, and the batch of events is flushed out of the buffer and promoted to Memgraph. If no other events arrive: The ghost node eventually expires and is dropped entirely, saving the graph from permanent pollution.

Skepticism at the Ingestion Layer

The biggest lesson from this architectural pivot is shifting the mindset from standard data engineering to context engineering.

In traditional pipelines, the goal is pure throughput. But when we are feeding an LLM, real-time context is completely useless if the underlying graph is flooded with noise. The only way an LLM agent can reason accurately is if the graph it queries has been aggressively defended against bad data at the ingestion layer.

We need to teach our context engineering pipelines to be skeptical rather than mere dumb pipes.

See you next week.

The repository: github.com/snudurupati/autonomous-knowledge-fabric

Sreeram Nudurupati | LinkedIn

AI Architect | Building the Autonomous Knowledge Fabric in public: 90 days, no hand-waving.

Why Object Storage is Eating the Database World

Sreeram Nudurupati — Mon, 13 Apr 2026 06:15:16 GMT

The Pattern

Dumb storage coupled with a smart index and a ranged GET request is what made the decoupling of compute and storage in the modern cloud era. This pattern was first invented and made possible by the HTTP range header, making HTTPS requests formatted as Range: bytes=X-Y. A standard HTTP feature since 1999.

When you scrub a YouTube video to 2:16, your browser doesn’t download the entire video file from the beginning. Instead, it sends an HTTP request with a Range header telling the server, “give me bytes 10, 230,000 to 10, 432, 000”. The server only returns that slice. Any modern object store, whether it be AWS S3, Google Cloud Storage, or Azure ADLS, supports the same HTTP ranged request natively. So instead of downloading 10 GBs worth of parquet files and reading like 6 row groups out of it, you can ask S3 for exactly the byte ranges of those row groups, provided you have a smart index that can tell you which byte range those 6 row groups are in. S3 doesn’t care what’s in those bytes; it just serves it up as long as the intelligence to ask which bytes to ask for lives in your index.

That’s the exact “dumb storage, smart index” pattern I explore in this post, across three distinct domains: Data Lakes, Physical AI and Robotics, and now applied to Vector Search at scale.

You might ask, if the technology has existed since 1999, why has the primitive only now caught on? The answer is a few recent advancements that have made the idea possible for data architectures, owing to a few notable advancements in the last decade.

The Latency Layer: NVMe in Cloud (2017)

NVMe killed the SATA bottleneck. For high-performance systems like Databricks or Snowflake, or turbopuffer, this is the essential “hot/cold” divide: hot data stays on local NVMe, while cold data sits on S3. This local SSD caching is exactly what makes sub-10ms latency possible; without it, S3 round-trips would destroy performance.

S3 Strong Consistency (2020)

Before 2020, S3 was “eventually consistent,” making it a nightmare for databases to perform immediate reads after writes. The shift to strong consistency was the “floodgate” moment. It transformed object storage from a simple dumping ground into a viable transactional substrate, ending the era of complex workarounds.

The Concurrency Fix: S3 Compare-and-Swap (2024)

Compare-and-swap (CAS) was the final piece of the puzzle. Until late 2024, S3 lacked this fundamental concurrency primitive, without which two concurrent writers could corrupt shared state. Forcing architects to use external coordinators like DynamoDB to safely create a metadata layer where multiple concurrent writers could safely update the same state. Now that S3 supports atomic updates, the entire metadata layer can live on object storage.

So now that all the technology advancements have brought us to this point in time, let me show you how the “dumb storage, smart index” pattern is already being implemented across 3 distinct domains.

Why Now?

This week, AWS announced S3 Files: S3 buckets accessible as POSIX file systems. Legacy file-based applications can now read and write S3 directly without code changes. Object storage isn’t just eating databases anymore. It’s set to eat file systems too.

Part 1 - Delta Lake (or your favorite kind of lettuce)

Delta Lake (or Iceberg or Hudi) implements a transaction log that acts as a metadata-based index, speeding up query performance through file-level statistics, partition pruning, and file pruning. The transaction log helps a reader/writer get to the exact files required for the current operation over an object storage like S3. This is a common “dumb storage, smart index” pattern developed by Databricks over the last decade that has significantly eaten into legacy data warehouse revenues (RIP Teradata and Netezza), and forced proprietary data warehouses like Snowflake to reluctantly adopt open formats like Iceberg.

Part 2 - Physical AI and MCAP

Most engineers don’t know this one. MCAP is an open standard much like Apache Parquet, designed by Foxglove to efficiently store and retrieve Robotics telemetry data. Follows a similar pattern where each MCAP file contains a Summary section (transaction log) and Chunk offsets (Parquet row groups). Where a reader could fetch the exact data required using three ranged GETs instead of one full download.

The Summary section at the end of every MCAP file is a chunk index: it records the byte offset and length of every chunk in the file, along with which topics it contains.

Three ranged GETs to retrieve only what you need:

Footer (28 bytes) - where does the Summary section start?
Summary section - which chunks contain /robot/status, at what offsets?
Only those chunks - everything else is never touched.

In production, MCAP files span hours of multi-robot telemetry, and this pattern skips gigabytes of unrelated sensor data.

Part 3 - Vector Search (turbopuffer and LanceDB)

Vector Search and Vector Databases are the backbone of every production AI application. However, we seem to be repeating the same pattern that we adopted in the legacy data warehouse world, i.e., use proprietary databases (Pinecone, Weaviate). With the cost of LLM token consumption spiralling, it doesn’t make any economic sense to add yet another proprietary black-box database to the mix. So how can we implement the same “dumb storage, smart index” pattern in the vector search world? That’s exactly what I have tried to implement in the demo codebase I built for this post.

My approach was to store embeddings in an object store, which solves the storage cost problem. Then my approach was to download all the vectors onto my system memory and perform a local cosine-similarity search. This works well with my demo dataset of around a thousand vector embeddings. However, I have scaled enough data pipelines in my career to know that my approach is pretty naive and would break as soon as I hit just GBs of vector embeddings.

The scalable, production solution is the same 3-step GET sequence as MCAP. A small index on S3 first (HNSW/IVF), ranged GET to find which embedding clusters match your query vector, and then ranged GET only those clusters.

I realized during building this demo that implementing a smart vector index is not a trivial problem to solve over a weekend. Thankfully, we have the likes of turbopuffer and LanceDB already solving for this.

Why Should I Care?

The “Dumb Storage, Smart Index” architecture is the strategic pivot that separates AI POCs from production-grade solutions.

Cursor pioneered this playbook. They were storing billions of vectors across millions of codebases, every index kept in memory, leading to astronomical costs. Moving to turbopuffer’s object-storage-native architecture cut their costs by 95%. But here’s the more interesting part: they didn’t just save money, they immediately started creating more vectors per user than before.

When infrastructure costs shrink, product ambition expands. Features that were shelved come back. That’s the real “so what” of this pattern. If you’re building on Pinecone, Weaviate, or Elasticsearch today, the question isn’t whether to move; it’s when. The architectural shift has already happened. The teams that adopt this pattern first build features that their competition can’t afford to build.

If you want to see the full pattern implemented across all three domains, Delta Lake, MCAP, and Vector Search, running against a local MinIO object store, the code is here: github.com/snudurupati/vectors-at-rest

98% Less Physical AI Data. Zero loss of operational context.

Sreeram Nudurupati — Mon, 30 Mar 2026 18:50:22 GMT

I was simulating a robot fleet to generate meaningful data for my “Physical Context Fabric,” the ChatGPT moment for Physical AI, and I realized that a simulated Turtlebot3 running at 10 HZ, even if I collected every 10^th event, generates approximately 1 event/second. That’s 28K events in 8 hours just for one robot, just for odometry and command velocity data. This is already unmanageable for my Knowledge Graph because the signal-to-noise ratio is way too high.

Turtlebot3 simulation after 20 minutes, 20,000+ Event nodes, signal-to-noise ratio unusable.

Imagine the fate of some large-scale Robot fleet operators like Serve Robotics or Waymo who operate hundreds of autonomous robots per day and have to store and process not just odometry data but also LIDAR point cloud, IMU, image/ video, RADAR/SONAR, etc. data. That’s terabytes of data stored on the robot SSDs, waiting to be bulk uploaded when the robot is back at the dock. And by the time the data scientist gets a CSV dump 8 hours later, the trail is already cold, making it difficult to reconstruct what happened, if not impossible.

So the problem with bulk Physical AI data uploads that I see is:

Staleness – batch uploads that you are always looking backward.
Upload costs – TB dumps over spotty connections fail, corrupt, out of order.
Context loss – even when the full data arrives, raw sensor replay answers the “what”, not the “why”.

However, this is not a new or novel problem. This is a decades-old data engineering problem that I have already seen and solved in my 18 years of data career. So I set out to solve the Physical AI data problem, the same way I have solved the Industrial IOT data problem using the tried and tested “Delta-Keyframe-Heartbeat” pattern. In this pattern, we make the Edge IoT, or in this case, the Robot Gateway, intelligent by introducing logic to send only the deltas (change in velocity or direction), periodic key frames that contain the full data state and robot health status as infrequent heartbeats. The delta ensures we conserve bandwidth, the periodic keyframes ensure we don’t lose the robot state in case of connection loss, and the heartbeat keeps us informed that the robot is active and operational.

To be specific about what each frame type in the “delta-keyframe-heartbeat” patterns does.

Delta – Is any kind of change in telemetry data, like position moved or velocity shifted significantly, or robot state changed from moving to stopped, etc. This delta is detected and calculated by the edge gateway and written only when something meaningful changes.
Keyframe – A keyframe is a full state snapshot written periodically (say every 30 seconds), which serves as a ground truth of the robot state.
Heartbeat – An alive ping every 60 seconds or so when there is no delta being generated to communicate the robot’s health, without sharing any real data.

I implemented the same pattern in my own simulation, running a fleet of 3 Turtlebot3 robots over 24 hours. The gateway received 233,353 raw telemetry events and wrote only 522 to the knowledge graph, 204 anomalies, 114 keyframe snapshots, and 3 robot identity nodes. That is a 99.8% reduction. Every single anomaly was captured.

Every meaningful event is captured in the Redis Event Stream.

This is what the knowledge graph looks like after 25 hours of fleet operation. Three robots, each with its own anomaly history and position context, queryable in real time. Still compressed and manageable, and with zero context loss.

24 hours of fleet telemetry: 3 robots, 204 anomalies, geographic clusters visible.

And because every anomaly carries a position, you can ask questions that raw replay can never answer. robot_001 kept stopping in two specific map regions. robot_002 stopped less frequently but in a tighter cluster. That is not a replay result; that is an operational context.

Geographic anomaly clustering across 3 robots, robot_001 has two distinct hotspot regions.

The hardware problem in physical AI is largely solved. Tools like Foxglove let you replay exactly what the robot saw. But a reply only answers the “what”. Nobody is answering the “why”. The gap between raw telemetry and operational understanding is where I think the real infrastructure work needs to happen and frankly where I am spending my time.

The delta-keyframe-heartbeat pattern itself is not new or novel. I have seen and solved this exact problem across 18 years of data engineering in Industrial IoT. What is new is bringing this thinking to Physical AI, combining it with a streaming pipeline and a knowledge graph, and making the robot’s edge compute node intelligent enough to decide what matters before it ever hits the network.

This is Series 2 of my Physical Context Fabric open-source project. Series 1 covered standing up ROS2 natively on a Raspberry Pi 5 and connecting it to Foxglove during GTC week. The next post covers the part I am most excited about, a natural language Q&A interface on top of the knowledge graph. Ask it why robot_001 kept stopping in the top left quadrant. Get an answer grounded entirely in what the robot actually did. No hallucination. No speculation. Purely context-driven.

If you are building a physical AI data infrastructure or just curious about the stack, the repo is fully open source and runs on a Raspberry Pi 5 and a MacBook Air. Come build with me.

Physical Context Fabric on GitHub

[Series 1 — The ChatGPT Moment of Robotics Has Arrived]

The ChatGPT Moment of Robotics Has Arrived

Sreeram Nudurupati — Wed, 18 Mar 2026 04:46:01 GMT

Jensen Huang just announced the “ChatGPT moment” for robotics. But nobody is talking about what happens to the telemetry after the demo.

Yesterday, Jensen Huang walked onto the stage at the SAP Center and declared that the ChatGPT moment for autonomous physical AI has arrived.

There were 110 robots on the show floor. Humanoids, robo-taxis, autonomous factory arms. The hardware story for Physical AI is no longer a theoretical question—it is a shipping product roadmap.

But the slide that stopped me wasn’t the Vera Rubin architecture announcement. It was quieter than that. Huang put up what he called his most important slide of the keynote: “Structured data is the foundation of trustworthy AI.” And then: “Unstructured data is the context of AI.”

He was talking about enterprise software agents. I kept thinking about fleets of robots.

What Physical AI Actually Generates

A ROS2 robot: the software standard that runs on everything from a student’s TurtleBot3 to Boston Dynamics’ Spot publishes a continuous, relentless stream of structured telemetry. Every movement, every sensor reading, every velocity command.

Here is what raw odometry from a simulated TurtleBot3 looks like at rest:

ROS2 topic echo /odom — stationary robot, all zeros

And here is that same robot the moment you publish a velocity command (linear 0.2, angular 0.5), sending it into a circle:

ROS2 topic echo /odom — moving robot, x/y/orientation updating

The position updates 10 times per second. The orientation quaternion spins. This is kinematic state, continuously broadcast. Now realize that a real deployment robot generates this, plus LiDAR point clouds, camera feeds, IMU data, battery states, and nav stack outputs all, simultaneously.

The King of the Replay Buffer: Foxglove

If you want to visualize this firehose of data, Foxglove Studio is the best tool in the physical AI ecosystem.

It is genuinely excellent. You connect it to a live robot or a recorded MCAP file, and you get a beautiful, real-time visualization of every single topic. Here is what that looks like when connected to my TurtleBot3 sim, tracking position x plotted over time as the robot drives in a circle:

Foxglove Studio sine wave plot - /odom.pose.pose.position.x

That sinusoidal curve is the robot’s x-coordinate oscillating as it traces its path. Meanwhile, the 3D panel shows the robot’s coordinate frame updating in real-time against the world:

Foxglove Studio 3D panel - robot frame in sim

If you need to inspect a system or replay an event, Foxglove is the undisputed answer. It handles the spatial and temporal reality of the robot perfectly.

It answers the critical question: What did the robot do?

The Question Nobody is Answering

But Foxglove and the visualization ecosystem at large are essentially a world-class replay buffer.

It does not answer: Why did the robot stop unexpectedly at 14:32:07? What was it doing in the 30 seconds before that stop? Has it stopped in this specific map region before? Does this failure pattern correlate with a particular task type or a degrading battery level?

These are operational questions, and answering them requires more than just a sensor stream. It requires context. It requires understanding the relationship between discrete events, the history of a robot’s behavior in a specific environment, and the anomaly patterns across 50 deployments, not just one recording.

This is the operational context problem. And right now, nobody has built the data infrastructure layer to solve it.

Building the Physical Context Fabric

I am open-sourcing a project to build that exact layer. I call it the Physical Context Fabric.

The architecture is a real-time streaming pipeline sitting directly above ROS2, feeding into a live knowledge graph.

The Pipeline: ROS2 topics → Pathway (Stream Processor) → Memgraph (Knowledge Graph)

The schema is designed to connect the relational dots that raw telemetry leaves behind: Robot → Task → Event → Anomaly → Environment_State

The Pathway layer watches the telemetry stream in real-time and detects anomalies. For example: a velocity drop below a specific threshold, held for N seconds, becomes an unexpected_stop event. That event is enriched with its full context window and written directly to Memgraph.

Suddenly, the queries that become possible aren’t just plotting lines on a graph. They look like this:

“What specific events preceded the last 5 unexpected stops across the fleet?”
“Which map region has the highest historical anomaly rate?”
“How long do tasks of type X typically take before a hardware failure occurs?”

Here is the first version of the context bridge in action, a Python node that reads /odom and /cmd_vel and emits structured JSON events in real-time:

odom_subscriber.py JSON output streaming - timestamp, position, velocity, event_type: moving_and_turning

Each event carries a timestamp, position, velocity, commanded velocity, and an inferred event type. This is the raw material the knowledge graph ingests. This is what turns a sensor replay into operational context.

Why This Matters Right Now

Jensen Huang said it plainly on that stage: structured data is the ground truth of the AI era.

The Physical AI world is about to experience a massive scale-up. We are going to see a lot more robots, a lot more deployments, and a lot more fleet operators asking why their robots behaved the way they did.

The hardware roadmap is clear. The visualization tooling (thanks to teams like Foxglove) is elite. The glaring gap is the data infrastructure layer between raw telemetry and operational understanding.

That gap is what the Physical Context Fabric is built to fill.

The repository is live here: github.com/snudurupati/physical-context-fabric

This is Post 1 of a series documenting the build in public. The full stack, including the Pathway anomaly detection engine and the Memgraph knowledge graph integration, is coming in the posts ahead.

If you are building in physical AI infrastructure or dealing with fleet-scale telemetry, I’d love to hear from you.

Sreeram Nudurupati | LinkedIn Sreeram writes about context engineering and AI data infrastructure at nudurupati.co. He is also building TexonAI: context engineering for enterprise B2B.

I Measured My AI Pipeline. The Number Changed Everything.

Sreeram Nudurupati — Mon, 16 Mar 2026 14:32:25 GMT

I set out to prove my AI pipeline could deliver context in under 60 seconds. The instrumentation told a different story. The real number was 12.8 milliseconds.

But getting to that number required three runs, two wrong measurements, and one architectural insight that changes how I think about real-time AI systems entirely.

New here? This is Week 2 of a 90-day build-in-public project called the Autonomous Knowledge Fabric — a reference architecture that replaces stale batch RAG pipelines with a live knowledge graph fed by real-time event streams. The core problem: enterprise AI agents are failing in production not because the models are bad, but because the context they reason over is hours old. I’m building the fix in public, one sprint at a time.
Week 1 recap: I introduced Context Debt — the gap between what your agent believes and what is actually true — and shipped a live SEC EDGAR ingestion pipeline using Pathway and Memgraph. Read Week 1 here.

Last week, I introduced Context Debt, the growing gap between what your AI agent believes and what is actually true. I made a claim: the Autonomous Knowledge Fabric would deliver sub-60-second context freshness from live SEC filings to a queryable knowledge graph.

This week, I built the instrumentation to prove it.

The number I got back wasn’t what I expected. It was significantly better. And understanding why it was better taught me more about real-time AI architecture than any benchmark paper I’ve read.

The Graph Is Alive

Before we get to the latency numbers, let me show you what two weeks of streaming data looks like.

This is the Autonomous Knowledge Fabric’s knowledge graph after 48 hours of ingesting live SEC 8-K filings:

Account nodes in red, Event nodes in orange, hub-and-spoke relationships

186 nodes. 129 relationships. Zero manual curation. Every node a real company. Every edge a real SEC filing event.

The hub-and-spoke pattern tells the story immediately: Account nodes at the center, Event nodes radiating outward. The larger the hub, the more actively a company is filing. The graph isn’t just storing data. It’s already revealing structure.

The first company to emerge as the most active filer? Carbonite, with 14 events, all from 2016. Which brings me to the first lesson of Week 2.

When the Pipeline Is Right But the Data Is Wrong

Carbonite was acquired by OpenText in 2019. It hasn’t filed independently since. Our pipeline found it because the initial SEC feed URL was returning historical filings, not current ones.

The pipeline was technically correct. It found the most active filer in its dataset. The dataset was wrong.

This is Context Debt in reverse, garbage-in, confident-wrong-out. And here’s the uncomfortable truth: a naive RAG system would have served those 2016 filings to an agent with full confidence, because the agent has no concept of “this data is from a different era.” The vector similarity score would be fine. The context would be a decade stale.

The fix was a feed URL update and a User-Agent header. The lesson was more durable: observability surfaces problems you didn’t know to look for. We only caught the 2016 dates because we were instrumenting every event with a timestamp. Without that, Carbonite would have quietly sat in our graph as a “high-risk account” based on decade-old filings.

After the fix, the feed returned companies like Applied Digital Corp (Item 5.02 - executive departure signal), MultiSensor AI Holdings (Item 1.01 - material agreement), and Azitra Inc (Item 3.01 - delisting notice). All 2026. All real. All actionable.

The Latency Investigation

This is the part I want to linger on, because the journey from “810ms” to “12.8ms” is a microcosm of how production AI systems fail in ways that look like performance problems but are actually measurement problems.

Run 1: 810ms P50

The first latency reading showed 810ms end-to-end. Plausible. Disappointing. I’d hoped for faster. But something didn’t add up. The Bolt write to Memgraph was clocking at 1ms warm. Where were the other 809ms going?

Run 2: 2,434ms P50

After a longer run, the number got worse; P50 jumped to 2.4 seconds. That’s when the measurement was clearly wrong, not the pipeline.

The diagnosis: record_event_received() was being called at poll-cycle start, the moment the RSS fetch began, rather than per-entry-detected. This meant we were measuring “time since we started fetching the feed,” not “time to process and write this specific event.” A 40-entry batch with a 30-second poll interval meant average measured latency included up to 30 seconds of queue wait time that had nothing to do with our processing speed.

Run 3: 12.8ms P50

After fixing the instrumentation and moving record_event_received() to fire per-entry, after title parsing, the real numbers emerged:

══════════════════════════════════════════
AUTONOMOUS KNOWLEDGE FABRIC — LIVE STATS
══════════════════════════════════════════
Timestamp:       2026-03-15 19:04:31 UTC
Events tracked:  39
P50 latency:     12.8ms
P95 latency:     14.1ms
P99 latency:     157.8ms
Min latency:     7.1ms
Max latency:     245.8ms
Mean latency:    17.8ms
Context freshness: RECENT
══════════════════════════════════════════

P50 of 12.8ms. P95 of 14.1ms. Not sub-60 seconds. Sub-15 milliseconds for 95% of events.

What 12.8ms Actually Means

For CXOs and architects who’ve been sold “real-time” by vendors who mean “hourly micro-batch”, let me put this in context:

┌───────────────────────────────┬────────────────────────────────────────────────┐
│ System                        │ Latency from event to queryable context        │
├───────────────────────────────┼────────────────────────────────────────────────┤
│ Nightly batch RAG             │ 8–24 hours                                     │
│ Hourly micro-batch            │ 30–60 minutes                                  │
│ "Real-time" vector sync       │ 5–15 minutes                                   │
│ AUTONOMOUS KNOWLEDGE FABRIC   │ 12.8ms (processing) + ~15s (poll interval)     │
└───────────────────────────────┴────────────────────────────────────────────────┘

The poll interval, how often we check the SEC feed, is 30 seconds and fully configurable. In a production deployment with webhook-based ingestion (Salesforce, Zendesk firing events directly), that 30-second floor disappears entirely. You’re left with 12.8ms.

For the QBR scenario: a hostile takeover filing hits the SEC wire. Your agent knows about it in under a second. The Sales Director walking into that meeting has context that’s 14 milliseconds old, not 8 hours.

That’s not an incremental improvement. That’s a different category of system.

The Breakdown That Explains Everything

Here’s the latency decomposition for three real events that ran through the pipeline this week:

┌───────┬──────────────┬──────────┬──────────┬──────────┬──────────┐
│ Event │   Company    │ fetch_ms │ parse_ms │ write_ms │ total_ms │
├───────┼──────────────┼──────────┼──────────┼──────────┼──────────┤
│  #1   │ ispecimen    │ 233.4ms  │  0.1ms   │ 217.3ms  │ 217.4ms  │
│  #2   │ evolus       │ 451.4ms  │  0.1ms   │   2.2ms  │   2.2ms  │
│  #3   │ (subsequent) │    —     │  0.1ms   │   1.6ms  │   1.7ms  │
└───────┴──────────────┴──────────┴──────────┴──────────┴──────────┘

Three things jump out immediately:

Parse time is 0.1ms. The Pydantic schema validation, SEC item code extraction, risk signal classification all run in a tenth of a millisecond. This is the Rust core of Pathway doing what it was designed to do.
Event #1 Bolt write is 217ms. This is the cold connection establishment, Memgraph’s first handshake. Every subsequent write is 1–2ms. The cold-start cost is a one-time tax, not a recurring one.
The RSS fetch (233–451ms) is intentionally excluded from our latency measurement. That’s network time to SEC EDGAR’s servers, outside our control, not part of our processing cost.

This decomposition is what I mean when I say observability is not optional. Without it, we’d have reported 800ms and moved on. With it, we know exactly where every millisecond goes.

What the Agent Actually Sees

After all the instrumentation, here’s what your Sales Director’s agent receives when it calls get_agent_context("carbonite") on a live account:

JSON

ACCOUNT INTELLIGENCE REPORT
Company: carbonite
Last Updated: 2026-03-15T17:21:17 (14 seconds ago)
Risk Signals: executive_departure, takeover_bid
Recent Events (13 total):
- Carbonite Inc filed 8-K on 2026-03-15. Item 2.05: 
  Departure of Directors or Certain Officers
- Carbonite Inc filed 8-K on 2026-03-15. Item 2.01: 
  Completion of Acquisition or Disposition of Assets
- Carbonite Inc filed 8-K on 2026-03-15. Item 1.01: 
  Entry into a Material Definitive Agreement
Context Freshness: LIVE

That string, formatted, structured, timestamped, and freshness-labeled, is injected directly into the LLM prompt as context. No vector search. No embedding lookup. No stale index. Just the current state of the account, 14 seconds after the last filing.

Compare that to what a batch RAG system injects: the same paragraph from a document that was last embedded at 2 am.

The model is identical in both cases. The context is not.

The Architecture Decision That Made This Possible

I want to address something I’ve been asked several times since Week 1: “Why Pathway? Why not just use Spark Structured Streaming, especially with the new Real-Time Mode (RTM)?”

I have five years of production experience with Kafka and Spark. I know the incumbent stack well enough to know exactly what it would cost us here. Yes, Databricks recently dropped micro-batching with Spark RTM, enabling concurrent processing stages to hit sub-100ms latencies. It’s an incredible engineering feat. But for this specific architecture, Pathway still wins. Let me be direct:

The JVM GC Penalty: Spark RTM still runs on the JVM. The JVM has a garbage collector. Garbage collection introduces unpredictable “stop-the-world” pauses. When your baseline parse time is 0.1ms, and your database write is 1ms, a 200ms GC pause isn’t a footnote—it’s a 100x latency spike that ruins your P99 determinism. Pathway’s engine is written in Rust. No GC. No pauses.
The Stateful Complexity Trap: Building a live knowledge graph requires complex, stateful entity resolution. To minimize latency in Spark RTM, you have to jump through configuration hoops (like forcing Arrow batch sizes to 1, which severely bottlenecks Python UDF performance). Pathway’s incremental dataflow model was built natively to propagate complex delta updates. When a new SEC entry arrives, it processes exactly that entry and its graph relationships immediately, without forcing you to contort your logic to fit a rigid streaming API.
The Operational Overhead: Running Spark RTM requires provisioning a dedicated JVM cluster. You have to disable autoscaling, disable Photon acceleration, and manage executor nodes. Pathway runs as a single Python process backed by a Rust core. One container. No cluster. For a reference architecture designed to be adopted by enterprise teams without a dedicated platform engineering group, that difference is not cosmetic.

To be fair, if your enterprise already runs the latest Databricks runtime and has a team managing high-throughput pipelines (like AdTech attribution), Spark RTM is a massive leap forward.

But for this specific problem, continuous incremental updates to a live knowledge graph with deterministic sub-15ms latency, Pathway’s architecture is the right fit. Spark would require fighting the framework to get the behavior that Pathway gives you by default.

What’s Coming in Week 3

The numbers are real. The graph is alive. Next week, I will build the system that makes the graph trustworthy, the 3-Tier Entity Resolver.

Right now, “iSpecimen Inc.” and “ispecimen” are two different nodes in our graph. “Applied Digital Corp.” and “Applied Digital” could be two different accounts. Every duplicate node is a split relationship, a piece of risk context that never reaches the agent because it’s attached to the wrong entity.

The resolver is where most real-time knowledge graph projects fail. We’re going to solve it in three tiers: deterministic hashing (zero LLM cost, catches 60% of duplicates), graph-contextual neighbor matching (uses the graph’s own relationships to resolve identity), and LLM-as-judge for true ambiguity (batched, structured, minimal cost).

The Ghost Node pattern, how you resolve entities when the graph is too sparse to trust, is the architectural insight at the heart of it.

See you next week.

The repository: github.com/snudurupati/autonomous-knowledge-fabric

All latency numbers are measured from live pipeline runs. OpenTelemetry instrumentation code is in /observability. Reproduce it yourself.

Sreeram Nudurupati | LinkedIn

AI Architect | Building the Autonomous Knowledge Fabric in public: 90 days, no hand-waving.

Context Debt: Why Enterprise RAG is Failing in Production

Sreeram Nudurupati — Tue, 10 Mar 2026 15:13:38 GMT

A Tier-1 enterprise sales team walks into a high-stakes QBR. Their AI-driven Account Intelligence agent, a system built on the industry-standard RAG stack, classified the account as “Stable.” Confident. Well-reasoned.

What the agent didn’t know and couldn’t know was that 40 minutes earlier, an SEC 8-K filing had hit the wire signaling a hostile takeover bid on that exact account’s parent company.

The agent wasn’t wrong because the model was bad. It was wrong because the context was stale.

This is the problem I’m spending the next 90 days solving in public. Context Debt.

The Debt Nobody Is Talking About

In software engineering, we understand Technical Debt. In the age of Agentic AI, we must now account for Context Debt.

We are currently witnessing a massive divergence in the Enterprise AI stack. We are feeding 2026-speed reasoning models with 1996-speed batch processing pipelines.

Context Debt is the delta between an agent’s internal world-model and the ground truth of a high-velocity business environment.

When your pipeline relies on nightly batches or even hourly micro-batches to refresh a vector store, you aren’t just lagging; you are accumulating Context Debt. In relationship-intensive workflows like M&A, supply chain, and clinical trials, that debt doesn’t just accrue interest; it causes catastrophic hallucinations that no amount of prompt engineering can fix.

Most of your agent's hallucinations aren't model failures. They're Context Debt coming due.

The Missing Middle: Beyond Vector Search

The “Modern AI Stack” of 2024-2025 was built on a simple premise: Retrieve a document, augment the prompt.

But enterprise reality isn’t a collection of static documents; it is a dynamic web of relationships. If a key stakeholder leaves a company, that isn’t just a new data point. It’s a structural change to the entire account’s risk profile. Standard RAG treats this as a keyword update.

If you map the modern enterprise AI stack, there are two well-solved layers: the model layer at the top (GPT-4o, Claude, Gemini — mature, capable, improving fast) and the data layer at the bottom (your CRM, your ERP, your CDW platform, messy but existent).

What’s missing is the middle: a stateful, real-time layer that continuously resolves business entities, tracks relationships, and delivers live context to the agent at query time.

I’m calling this the Autonomous Knowledge Fabric, a pipeline that transforms high-velocity business events into a continuously updated Knowledge Graph, so your agents reason over live reality rather than a historical snapshot.

To solve Context Debt, we have to move away from the “Search-and-Retrieve” paradigm and toward Stateful Stream-Graph-RAG.

The Autonomous Knowledge Fabric Thesis

Over the next 90 days, I am building and open-sourcing a reference architecture designed to solve the “Missing Middle” for the enterprise. The thesis is straightforward: Real-time context is a streaming problem, not a database problem.

The stack is intentionally lean and operationally resilient:

Pathway: For high-consistency, stateful stream processing (Differential Dataflow).
Memgraph: As a high-performance “Hot State” cache for real-time relationship traversal.
Ghost Node Resolution: A multi-tier identity engine that resolves entities (e.g., “Acme” vs. “Acme Corp”) at stream-speed before they ever reach the agent.

Proving the Alpha

Enterprise Architects don’t need more “Hello World” demos. They need defensible ROI and operational clarity. By the end of this sprint, I will demonstrate the Alpha of Real-Time Context through:

Context Freshness Metrics: Moving from an 8-hour “batch lag” to sub-60-second “context freshness.”
The CFO Math: A transparent cost-benefit analysis comparing the compute overhead of streaming versus the risk-adjusted cost of “Context Debt” in $500k+ enterprise workflows.
The Immutable Baseline: A side-by-side comparison against a professionally tuned Pinecone/LlamaIndex batch baseline. No strawmen, just a clinical look at where batch fails and where streams win.

Join the Architecture Review

This isn’t just a repository; it’s a 90-day stress test of how we handle truth in production AI. I am building autonomous-knowledge-fabric entirely in public because the “Missing Middle” is too critical a problem to solve behind closed doors.

If your team is hitting the “Stale Context” wall, or if you’re an architect tired of babysitting batch jobs that can’t keep up with your agents, let’s talk.

The repository is live: github.com/snudurupati/autonomous-knowledge-fabric

Next Week: The Ghost Node Pattern: Solving Entity Resolution in Sparse Knowledge Graphs.

Sreeram Nudurupati AI Architect | Building the Autonomous Knowledge Fabric

Weights vs. Maps: The Myth of the ‘Magic’ AI

Sreeram Nudurupati — Mon, 02 Mar 2026 16:17:12 GMT

This is the conclusion of an 8-week series: “From Data Engineer to AI Architect.” We’ve explored the shift from traditional data engineering to designing agentic systems built on stochastic LLM compute, where tokens, latency, and probabilistic behavior are first-class constraints. Along the way, we progressed from enforcing structure and determinism to building increasingly sophisticated RAG pipelines, orchestration layers, and multi-hop graph reasoning systems with explainability built in. We also examined Continued Pre-Training (CPT) and the practical realities of building knowledge graphs on local infrastructure.

Today, we declare the winner of our capstone: GraphRAG.

Executive Summary:

For the last couple of weeks, we have been running a high-stakes contextual “Bake-Off.” We pitted the reigning champion, a model fine-tuned using Continuous Pre-training (CPT) on specialized clinical data (the “Parametric Contender”), against a baseline model augmented by a structured Graph Substrate (the “Semantic Contender”).

The goal was to solve the “Missing Middle” in enterprise AI; the point where an LLM is fluent but factually untraceable. While CPT provided fluency, GraphRAG provided fidelity. The moment of truth arrived in our final “Multi-Hop” query.

The Test: “In our clinical records, what role does the ‘forearm’ play in measuring glucose uptake for hypertensive subjects?”

The Results:

Contender A (Week 7 CPT): "Data not found in substrate."
Contender B (Week 8 GraphRAG): "The forearm is the site where glucose uptake was measured..."

🕵️ Post-Mortem: Why Weights Forget What Maps Remember

At first glance, this “failure” of fine-tuning is confusing. The model was trained on that exact data in Week 7. As an AI Architect, you must understand the plumbing of memory.

1. Technical Deep Dive: The Parametric Fog vs. Explicit Edges

When you train a model like Llama-3 (CPT), you are using Gradient Descent on massive amounts of text. Backpropagation smooths out information, prioritizing frequent patterns.

The relationship between “Forearm” and “Glucose Uptake” is extremely rare (low-frequency) across 3,500 abstracts. It exists as a fragile statistical ghost in the 8 billion parameters. When queried, that detail gets “drowned out” by more common patterns. Weights forget the rare fact.

In contrast, our Semantic Substrate (Memgraph) materialized that rare fact as an explicit edge: (Forearm)-[MENTIONS]-(Abstract_123)-[MENTIONS]-(Glucose_Uptake). When the query hits the graph, we walk that path with 100% fidelity. Maps remember the exact path.

2. Knowledge Graph Advantage: Explainability

This is the largest business win for GraphRAG.

CPT (The Black Box): When it answers, it gives you a fluent prediction based on a memory it cannot cite. If a clinician asks, “Which abstract justified this claim?” the CPT model is a dead end. In an enterprise setting, this is a liability.
GraphRAG (The Glass Box): The response is a synthesis of extracted documents. It explicitly provides Source IDs (e.g., [Abstract ID: abs_001]). The model has provided an auditable decision path. This is explainable AI.

🏗️ The AI Architect’s Comparison Matrix

We have visualized the core metrics of our two contenders below. Since Substack doesn’t handle tables, we have generated an infographic performance matrix.

3. Business and Economic Impact

CPT (The Week 7 Model): The economic story is high upfront compute to train (even with LoRA). Updates require retuning, another costly compute job.
GraphRAG (The Week 8 Substrate): The economic story is low update cost. Adding new data is computationally cheap (adding nodes to Memgraph). The primary cost is in the initial ingestion pipeline and Entity Resolution.

When to Use Continuous Pre-training (CPT)

Despite the clear win in fact fidelity, CPT is not dead. As an AI Architect, you must understand its unique value:

Specialized Vocabulary: When your data uses dense acronyms, proprietary codes, or unique clinical jargon, CPT “seeds” the base model’s tokenizer with this dialect.
Domain Tone & Style: CPT can adapt Llama-3 to a specific format. A good example is transforming complex medical research into concise patient-facing summaries.
Low-Resource Languages: CPT is the standard for introducing entirely new languages to a base LLM.

Final Verdict: Architecture over Weights

The "Missing Middle" isn't solved by making models bigger or training them longer. It's solved by Context Engineering and Knowledge Graphs are the Key. By building a Graph Substrate, we moved from an AI that "remembers" to an AI that "researches." CPT gives you fluency, but GraphRAG gives you fidelity.

We have completed the transformation from Data Engineer to AI Architect. We have moved from raw data ingestion to fine-tuning Llama-3, and finally to building a queryable contextual nervous system. The substrate is built. Now, it's time to let the agents run.

This concludes this 8-week series, but stay tuned for my next series, where I combine two of my favorite topics, streaming and context engineering, to build a real-time context engine.

Get the full code here in the repo: https://github.com/snudurupati/agentic-ai-architect

Curing AI Hallucinations

Sreeram Nudurupati — Fri, 27 Feb 2026 14:54:26 GMT

In the enterprise, the biggest barrier to deploying reliable AI agents isn’t a lack of compute; it’s the “Missing Middle.”

General-purpose LLMs are brilliant at reasoning, but they often lack the specific, high-fidelity context of a company’s internal world. When an agent is asked a question that requires deep domain knowledge it hasn’t “seen,” it fills the gap with plausible-sounding nonsense. This is the root of hallucinations.

To solve this, enterprises generally have two architectural options:

Continued Pre-training (CPT): Modifying the model’s Parametric Layer (its “brain”) so that specialized knowledge is baked directly into the weights.
Knowledge Graphs (GraphRAG): Creating a Semantic Layer (a “library”) where relationships and facts are stored externally and retrieved as needed.

This week, as part of a multi-week “Bake-Off,” I am putting Continued Pre-training to the test. I’ve built a specialized “Clinical Contender” to see if altering the model’s internal weights can bridge this context gap. Next week, we will implement a Knowledge Graph solution, concluding the series with a final side-by-side comparison.

Phase 1: What is Continued Pre-training?

Continued Pre-training is the process of taking a “commodity” model (like Llama-3) and extending its self-supervised learning phase on a specialized, private corpus. Unlike fine-tuning, which teaches a model how to act, CPT teaches a model a new language.

Why Enterprises Consider this Path:

Vocabulary Priming: Teaching the model technical acronyms and jargon that don’t exist in general web crawls.
Conceptual Density: Increasing the model’s internal “intuition” for domain-specific relationships.
Data Sovereignty: Building a model that is natively “aware” of your proprietary data distribution.

The Experiment: 500 Iterations on an M4 Mac

To “walk the walk,” I performed a QLoRA-based run on my MacBook Air M4 (24GB RAM) using the MLX framework.

The Methodology:

To maintain architectural stability and prevent Catastrophic Forgetting, I engineered a “Substrate Mix” for the training data:

85% Domain-Specific Technical Data: High-fidelity abstracts and technical documentation.
15% General Replay Data: A slice of WikiText-2 to act as a linguistic anchor.

The Performance Metrics:

Training Iterations: 500
Peak Memory Usage: 9.456 GB (Successfully leveraging the M4’s Unified Memory).
Throughput: ~68 Tokens/sec (Optimized for fanless thermal profiles).

✅ Findings: The Impact on Parametric Memory

The experiment yielded a specialized adapters.safetensors file. After 500 iterations, the Train Loss dropped by approximately 19%, resulting in several notable shifts:

Increased Technical Precision: The model moved from generalities to technical standards. It refined definitions of complex acronyms and processes from vague “web-search” results to precise, industry-standard definitions.
Structural Awareness: It demonstrated a more detailed understanding of internal mechanics and structural relationships within the specialized dataset.
Linguistic Trade-offs: The model became highly technical, but the process highlighted a drift in the “Assistant” persona, resulting in a loss of conversational formatting (repetitive <|eot_id|> tokens) as the weights favored the raw technical substrate over instruction-following.

The Business Impact Matrix: CPT vs. The Enterprise Reality

For VPs and CXOs, the choice to pursue Continued Pre-training isn’t just a technical one; it’s a question of Total Cost of Ownership and Intellectual Property Portability.

🏗️ Infrastructure: High-Density GPU Clusters (A100/H100) or High-Memory Apple Silicon
- Business Impact: High CapEx/OpEx. Requires significant upfront investment or aggressive cloud-compute burn.
⏱️ Time-to-Value: 4–12 Weeks
- Business Impact: Agility Risk. The process involves extensive data prep, training, and safety testing. If the base model (e.g., Llama 3) is updated, you may be forced to re-train from scratch.
👥 Staffing: Requires ML Scientists & Specialized Engineers
- Business Impact: Hiring Gap. High “specialist” dependency. These roles carry $250k+ price tags and represent a significant “flight risk” in a competitive market.
🔍 Auditability: “Black Box” Weights
- Business Impact: Compliance Risk. It is difficult to trace the origin of a specific output back to a specific data point, creating barriers for regulated industries like Health or Finance.

1. The Human Capital Factor: Upskilling vs. Specialist Hiring

A CPT strategy requires a “Pit Crew” of PhDs and ML Scientists to manage weight decay and prevent the model from “losing its mind” (Catastrophic Forgetting). Conversely, leveraging existing Data Engineering talent to build Knowledge Graphs allows for a faster upskilling path. Data Engineers are already experts in schemas; transitioning them to “AI Architects” is often more sustainable than competing for scarce ML Research talent.

2. Operational Stability: The “Llama-4” Problem

The most significant business risk for CPT is Sunk Cost. If you spend $200k in compute to bake your proprietary data into Llama-3, and a more efficient Llama-4 is released next month, your investment is non-transferable. You are back at the starting line. A Knowledge Graph approach ensures State Sovereignty; your data lives in your database, not in the model weights, making it LLM-agnostic.

Architectural State of Play

We now have a fully realized Clinical Contender, a model whose weights have been specialized for a technical domain. This represents our first successful approach to solving the “Missing Middle.”

Get the full code here in the repo: https://github.com/snudurupati/agentic-ai-architect/tree/main/07_continued_pre_training

Next Week: The Showdown. We implement the second option, a Knowledge Graph (GraphRAG) architecture. Once both are built, we will perform a final “Bake-Off” to determine which approach provides the best balance of accuracy, cost, and operational flexibility.

The Corporate Brain: Eliminating Contextual Drift through Knowledge Graphs

Sreeram Nudurupati — Mon, 16 Feb 2026 16:15:32 GMT

The industry is currently obsessed with “Chat with your PDF” bots. But for an AI Architect, these are toys. In a high-precision enterprise environment, employee knowledge is a living, breathing network of dependencies.

Most internal AI initiatives eventually hit a plateau characterized by Contextual Blindness, the inability of a model to see the logical connective tissue between disparate data silos. To solve the Digital Employee Experience (DEX) crisis, we don’t just need bigger models; we need a more sophisticated “memory.” We need the Knowledge Graph.

The DEX Crisis: The “Missing Middle” and Contextual Drift

What is DEX? DEX is the sum of every digital interaction, from the first line of code in GitHub to the final status update in Jira. To truly optimize this experience, we must bridge the 'Semantic Gap' between Slack, Jira, and GitHub once and for all. In high-stakes technical environments, DEX isn’t a luxury; it’s a mission-critical moat.

Why Current AI Fails DEX: Most organizations are currently distracted by “Linear RAG.” They take a Vector Database and treat knowledge as isolated points in space, rather than a sequence of events.

This creates the “Missing Middle”, the layer between your raw data silos and the LLM’s reasoning. This is where Contextual Drift sets in. As data evolves across Slack, Jira, and GitHub, a vector-only system loses the “thread” of truth:

Relational Fragility: Vector search finds fragments that look similar but cannot verify if they are logically related.
Contextual Blindness: The model might see a “GPU error” in a document but remains blind to the fact that a specific developer is currently fixing that exact error in a private branch.
The Provenance Gap: Without a graph, an LLM cannot provide an audit trail of how it connected a Slack complaint to a technical resolution.

The Solution: GraphRAG & The Semantic Bridge

To solve Contextual Blindness, we must decouple the Logic (the LLM) from the State (the Knowledge Graph). By using ArangoDB as our reasoning substrate, we solve the two most difficult technical hurdles in DEX:

1. The Multi-Hop Problem: Bridging Slack to Jira

Most internal questions are “multi-hop” by nature.

The Query: “What was the resolution of the sensor anomaly discussed in Slack last Tuesday by the lead on Project Orion?”
The Problem: Linear RAG might find the Slack thread or the Jira ticket, but it cannot logically traverse the link between them.
The Semantic Bridge: In a Graph database like ArangoDB, we define the “verbs” that connect these entities, allowing the agent to “walk” from a conversation to a technical fix deterministically.

2. Entity Resolution: The Identity Handshake

Corporate data is a “siloed aliases” nightmare.

Slack: @Sreeram | Jira: snudurupati | GitHub: sreeram-dev
If your system doesn’t know these are the same physical person, your “Corporate Brain” has multiple personalities. We solve this through Entity Resolution. Mapping disparate aliases to a unique Global Entity ID in a graph database like ArangoDB before the data ever hits the graph.

Architecting the Corporate Brain: A 10-Step Blueprint

Here is the 10-step path from fragmented data to an autonomous reasoning engine using ArangoDB and LangGraph.

1. The Secure Substrate

Establishing a secure, authenticated connection to our ArangoDB instance. Governance is the first priority; the "Brain" must only access authorized nodes.

2. Topological Definition

Defining the Topology of Work. We create specific collections for Employees, JiraTickets, and Commits, turning “static rows” into “active entities.”

3. Graph Logic Initialization

Initializing the schema and defining the Semantic Bridges, the “Edge Collections” (like HAS_IDENTITY or ADDRESSES) that serve as neural pathways.

4. Seeding the “Project Orion” Loop

Creating a cross-domain trace: a Slack message references a project, which links to a Jira ticket, which links to a GitHub commit.

5. Entity Resolution (ER)

Implementing the “Identity Handshake” to normalize siloed aliases into a unified graph node. This is the foundation of cross-domain intelligence.

6. Multi-Hop Pathfinder (AQL)

Executing Deterministic Reasoning using ArangoDB Query Language (AQL). We ask the graph to “walk” the path from a handle to a project, then to a task, and finally to the code.

7. Orchestration via LangGraph

Defining a stateful agent that records every step in a “Glass Box” trace, making the AI’s reasoning fully auditable and transparent.

8. Contextual Resolution

Upgrading the agent to handle Discovery Queries. If a user asks about “GPU issues,” the agent resolves the Project Context and navigates the graph to find the expert assigned to the relevant task.

9. Wiring the Agentic Workflow

Compiling the nodes to ensure the agent must resolve an identity and traverse the semantic bridges before drawing conclusions.

10. The Gap Detector (The Proactive Architect)

The final polish: A “Critic” node that proactively identifies Stalled Work, identifying Jira tickets that have no linked GitHub commits.

The Business Case for Graph-Powered AI

By leveraging a Knowledge Graph substrate like ArangoDB, we provide the enterprise with three massive business value propositions:

Explainable AI (XAI): We move away from “black box” hallucinations. Every answer includes a provenance trace, showing the exact AQL traversal that connected the intent to the action.
Operational Auditability: The ability to “detect gaps” between management tools and technical reality provides leadership with a real-time risk-assessment engine.
Organizational Velocity: By bridging silos through Entity Resolution, we eliminate the “Context Tax” that costs engineers hours of productivity every week.

For organizations where data is a strategic asset and precision is a requirement, the Knowledge Graph isn’t just an option; it’s the engine of the modern Digital Employee Experience.

Get the full code here in the repo: snudurupati/agentic-ai-architect

Next Week: We move from architecture to a high-stakes competitive analysis. We are setting up a definitive "Bake-Off" between two of the most discussed strategies in the enterprise AI space: Continued Pre-Training vs. Knowledge Graphs (GraphRAG).

While some argue that baking internal knowledge directly into model weights is the path to "true" intelligence, we will examine the cold realities of cost, latency, and the inevitable "Drift" that occurs when static weights meet a dynamic codebase. We’ll look at the trade-offs of both paradigms to see how they might coexist or where one might hold the edge in a mission-critical production environment, while the other is an expensive dead end.

Building Self-Correcting Agents

Sreeram Nudurupati — Mon, 02 Feb 2026 04:24:47 GMT

In Weeks 1-4 of the “Agentic AI architect” course, we built pipelines. We moved data from A to B, vectorized it, and shoved it into a prompt. It worked, but it was a Black Box. If the retrieval failed, the system failed silently, often hallucinating an answer to cover its tracks.

As a Data Engineer, my instinct was to “fix the pipeline.” But as an AI Architect, I realized I needed to fix the process.

For Week 5, we are moving from “data in a graph” to “graph as the engine of thought.” The big shift here is moving from deterministic pipelines to reasoning loops that can self-correct. We implement Corrective RAG (CRAG) - an agentic workflow that doesn’t just retrieve data; it judges it, rejects it, and tries to fix its own mistakes before answering.

Phase 1: The Mental Model (The “Loop”)

As a Data Engineer, you’re used to Linear Logic: Input -> Query -> Transformation -> Output. In AI, that linear path breaks because LLMs are “probabilistic”; they might get it right, or they might hallucinate. LangGraph is how we build a “Correction Loop” around that uncertainty.

The Linear Way (Standard RAG): You ask a question. The system grabs the first folder it sees, reads one page, and gives you an answer. If that page was wrong, the answer is wrong.
The Self-Correcting Agent Way (LangGraph): You ask a question. The agent grabs a folder and stops to think: “Wait, does this actually answer the question?” If the answer is “No,” it goes back to the filing cabinet, looks for a different folder, and tries again.

Phase 2: What is “State”? (The Shared Notepad)

In a SQL procedure, variables exist within the scope of the run. In LangGraph, we use a State Machine. Think of the State as a “Shared Notepad” that every part of your program can see and write to. As the agent moves through the process, it keeps track of vital information on this notepad:

What was the question?
What have I found so far?
How many times have I tried to fix this? (The loop_count).

The Problem: The “Silent Failure”

We start with a simple query: “What is the security protocol for Project Alpha?” In a standard RAG setup, the retriever grabs the top 3 documents. The first one is about generic ISO27001 standards, high vector similarity, but irrelevant content. A standard chain would blindly feed this to the LLM, resulting in a generic (and wrong) answer. We need a system that could say, “Wait, this isn’t right.”

This is the "Moltbot" trap: letting agents loose without any grounding. If the agents popping up on moltbook.com right now would just stop and say, “Wait, this isn’t right,” the internet would be a much saner place. Instead, like a standard RAG chain, they hallucinate with absolute certainty. To move from a Data Engineer to an AI Architect, we need a system that has the "ego" to pause, evaluate, and pivot when the data doesn't match the intent.

The Solution: The “CRAG” Architecture

We use LangGraph to build a state machine with four distinct “workers” (nodes):

The Retriever: The muscle. It pulls data from our ArangoDB vector index.
The Grader: The brain. A specialized prompt that scans retrieved documents for specific evidence (e.g., “Project Alpha”). If it sees junk, it grades the retrieval as “NO.”
The Pivot (Transformer): The strategist. If the Grader says “NO,” this node rewrites the user’s query to be more specific (e.g., adding “technical protocols” or “encryption”) and loops back to the Retriever.
The Generator: The finisher. It only runs when the Grader gives a “YES.”

The “Aha!” Moment: The Scan & Filter Pattern

The hardest part wasn’t the code; it was the logic. During testing, my agent kept failing even after patching the data with the “Golden Record.” The debug trace revealed the “Distractor Problem”:

Doc #1: ISO27001 (Irrelevant, but high similarity score).
Doc #2: Project Alpha Security (The Golden Record).

Our original Grader looked at Doc #1, said “garbage,” and gave up. We had to implement a Scan and Filter pattern, teaching the Grader to iterate through all retrieved results, discard the distractors, and lock onto the Golden Record.

The Results

Once we deployed the “CRAG” logic, the trace was beautiful to watch:

Iteration 1: Retrieved generic data. Grader said NO.
Pivot: Agent rewrote the query.
Iteration 2: Retrieved mixed data.
Scan: Grader ignored Doc #1, found the Golden Record in Doc #2. Grader said YES.
Generation: Produced a grounded, accurate response citing AES-256 and MFA.

Key Takeaway for Data Engineers

Moving from Data Engineering to AI Architecture isn’t about writing more complex Python; it’s about designing Control Flow.

Data Engineering is about ensuring data gets to its destination.
AI Architecture is about ensuring the system knows what to do when the data is wrong.

You have now successfully moved from a script that crashes on empty data to an autonomous system that detects missing context, pivots its search strategy, filters results, and generates a grounded response.

You are no longer just a Data Engineer; you are an AI Architect.

Get the full code here in the repo: snudurupati/agentic-ai-architect

Next Week: We tackle the final boss, Multi-Hop Reasoning. What happens when the answer isn’t in one document, but split across three? We dive into Graph Traversal with AQL.

AI is great at finding words. It’s time to help it reason.

Sreeram Nudurupati — Tue, 27 Jan 2026 05:33:05 GMT

Week 4: Data Engineer to AI Architect Series

For 18 years, SQL has been my bread and butter. I’ve lived through the rise of NoSQL, the Hadoop era, and the Cloud Data Warehouse boom. But as I transition into an AI Solutions Engineer role, I’ve realized a hard truth: The tools that helped us store the world’s data are failing to help us think with it. I’ve noticed a sobering reality: despite the hype, the success rate for AI projects is surprisingly low. MIT research shows nearly 40% of companies with high AI investment see no significant business gains.

The primary culprit? The Context Problem.

Beyond Semantic Similarity: The Need for Context

We’ve been told that Vector Databases are the “brain” of GenAI. In reality, they are more like a sophisticated index of “semantic vibes.” They measure the mathematical distance between strings of text, but they don’t understand the logical relationship between facts.

Simply vectorizing PDFs and storing chunks in a vector database doesn’t solve the underlying problem. A vector database might find two separate documents that mention “Project Alpha”, but it lacks the inherent logic to know that the Project Manager listed in a 2024 HR directory is the same person responsible for the Security Breach mentioned in a 2026 server log. Without the capability to traverse these connections, your retrieval system fails to capture the full picture, leading to the “I don’t know” loop or, worse, confident hallucinations.

The “Structuring Crisis”: Why Multi-Model is the Solution

The hardest part of AI isn’t retrieval, it’s the mess that comes before it. This is what I call the Structuring Crisis. Teams spend months manually writing scripts to parse 200-page PDF case files, breaking them into chunks, and attempting to extract metadata. Even after that labor-intensive work, they face a dilemma: Should the data live in a vector store for search? A relational system for filtering? Or a graph for relationships?

This “architectural tax”, the need to move data between three specialized databases, is where most AI projects struggle. This is precisely why Multi-Model Databases are the solution to the structuring crisis.

A Multi-Model approach solves the crisis by eliminating the need to choose:

Flexible Ingestion: You can ingest the “mess” as raw Documents (JSON) without a rigid schema, preserving every scrap of metadata from emails to log files.
Contextual Linking: You can then define Edges (Relationships) between those documents as you discover them, turning your document store into a Knowledge Graph on the fly.
Integrated Retrieval: When it’s time for the AI to “think,” you don’t have to hop between systems. You can perform a Vector Search to find the starting point and immediately execute a Graph Traversal to find the context, all in a single query.

The symbiosis is simple: AI helps clean and connect the unstructured data, and the Multi-Model database provides a single, unified home for the resulting “Reasoning Backbone.” You aren’t just storing data anymore; you are building a system where the document, the vector, and the relationship live together.

The Bakeoff: Vector Search vs. GraphRAG

This week, I ran a “Glass Box” experiment in my lab. I loaded a small network into ArangoDB, representing a lead architect (me), a project (Project Alpha), and a security protocol (ISO 27001).

👉 GitHub Repo: Data Engineer to AI Architect - Week 4

Then I asked: “What security constraints are linked to Sreeram’s work?”

Vector RAG Answer: “I don’t know.”
- The Failure: The vector engine found the chunk about “Sreeram” and the chunk about “Project Alpha,” but it couldn’t “see” the relationship to the “ISO27001” document three degrees away. They were semantically similar but logically disconnected.
GraphRAG Answer: “Sreeram’s work is governed by ISO27001, which mandates hardware-level encryption for the payment services used in Project Alpha.”
- The Success: The system performed a Deep Traversal. It followed the logical edges from Person → Project → Service → Protocol.

Why ArangoDB? (A Small Disclosure)

Now, full transparency: I recently joined the team at Arango. But my choice of tool here isn’t just a matter of professional alignment; it’s about solving the “Architectural Tax.”

In a typical AI stack, you’d have to sync data between a Document store, a Vector DB, and a Graph DB. That is a data engineering nightmare. I chose ArangoDB because it is Multi-Model.

It handles document-style lookups, deep graph-style traversals, and native vector search with one query language. In the GenAI era, the baseline requirement is a database that lets you retrieve everything at once without moving data between three different systems.

The Future: Context Engineering

If you’ve spent the last few years mastering prompt engineering and vector indexing, those skills are your foundation, but they aren’t the ceiling. The next evolution of AI will be built on Context Engineering.

Graphs will be the backbone of this movement. They are the way to provide AI with the structural reasoning it needs to stop hallucinating and start performing. Context engineering isn’t just about the words; it’s about the connections between them.

“A relational database can store your knowledge. A graph database lets you think with it.”

Next Week: We’ll move into Agentic Workflows. Showing how an AI agent can “self-correct” by querying the graph when it hits a dead end.

References:

Code, Media, and Leverage: How I Navigated the 2025 Tech Market

Sreeram Nudurupati — Fri, 23 Jan 2026 14:15:21 GMT

We’re a few weeks into 2026 now, and with some distance, it’s easier to see 2025 for what it really was: a correction year.

Post-COVID excess finally unwound. Hiring slowed, budgets tightened, and the bar for technical roles jumped, not just on depth, but on relevance. Being “good at coding” stopped being enough. Companies wanted people who could ship, explain why it mattered, and connect technical decisions to business outcomes from day one.

I don’t claim to have cracked some secret code. I made plenty of wrong turns, second-guessed myself often, and benefited from timing and luck as much as effort. But I did change how I approached the market, and that shift helped me land a Staff-level role going into 2026.

What follows isn’t a blueprint, just the lessons that made the biggest difference for me.

1. Advertise Yourself (Build Your Signal)

There’s a comforting myth in engineering: “Good work speaks for itself.”

In a noisy market, that’s simply false. Good work whispers. Noise wins.

For a long time, I assumed that if I just kept my head down and built solid systems, the right people would eventually notice. In 2025, that assumption stopped holding. If people didn’t know what I was working on, what I was curious about, or how I thought about problems, I effectively didn’t exist.

So I stopped treating my work like it was in permanent stealth mode. I started sharing learnings, partial ideas, architectural tradeoffs, and even things I hadn’t fully figured out yet. I leaned heavily on a framework popularized by Naval Ravikant: Code and Media as permissionless leverage.

Not everything landed. Some posts went nowhere. Some ideas aged poorly. That was uncomfortable but necessary.

The lesson: You might be capable, but value is invisible until it’s communicated. Being visible isn’t arrogance; it’s clarity. When you increase your surface area of luck, the market has something to react to.

2. Learn a New Skill (But Don’t Chase Hype)

“Learn whatever’s hot” is lazy advice. If you chase every trend, you end up broad, shallow, and forgettable.

Instead, I tried to be deliberate about adjacency. I didn’t abandon my background in data engineering; I extended it. I focused on the bridge between where I already had depth and where the market was clearly heading: AI systems, RAG, GraphRAG, real-time architectures, and the unglamorous infrastructure problems underneath them.

In version two of this story, it might sound like a clean pivot from “data plumber” to “AI architect.” In reality, it was messier. I built things that didn’t scale. I over-engineered. I underestimated operational complexity more than once. But I was learning in the direction of leverage.

The lesson: Don’t learn tools for their own sake. Learn skills that compound your existing experience in a new context. Be the bridge, not the tourist.

3. Code + Media = Leverage

Naval Ravikant’s idea that code and media are permissionless leverage sounds abstract until you live it.

Code was still the foundation. Building real systems, especially ones that touched retrieval, orchestration, streaming, and AI infrastructure, gave me credibility. Shipping mattered far more than talking.

But code alone didn’t close the loop.

What surprised me was how often media made the difference, not podcasts or YouTube, just clear communication.

Your resume is media.
Your interview answers are media.
Your blog posts, diagrams, and “About Me” blurbs are media.

I noticed a pattern: my code got me in the room, but my ability to explain why the system was designed a certain way and what tradeoffs I accepted was what moved conversations forward.

That didn’t come naturally. I rewrote explanations repeatedly. I stumbled in interviews. I learned (slowly) how to connect technical decisions to business risk, cost, and reliability.

The lesson: Code builds the value. Media translates it. If you neglect either, you leave leverage on the table.

4. The Recruiter Is an Ally (If You Let Them Be)

Earlier in my career, I treated recruiters like hurdles, something to clear before talking to the “real” team. That mindset cost me opportunities.

In 2025, I changed tactics. I started treating good recruiters as partners. I asked direct questions. I asked for feedback. I listened for what wasn’t written in the job description.

Recruiters often know:

What the hiring manager is worried about
Why the last candidate failed
Which skills are negotiable and which aren’t

They don’t always volunteer that information, but they will if you treat them like collaborators instead of obstacles.

The lesson: Prep with your recruiter, not just for them. An informed ally on the inside is a real advantage.

5. Using AI as a Force Multiplier (Not a Crutch)

One thing I learned quickly: if you’re applying for AI roles and not using AI in your job search, you’re leaving efficiency on the table.

I treated LLMs like a Chief of Staff, not a resume-spamming machine.

I used them to tailor resumes, not fabricate experience.
I used them to synthesize company strategy and competitive landscapes.
I used chat-based mock interviews to stress-test my explanations.

The outputs weren’t perfect. I had to sanity-check everything. But the time savings and clarity were real.

The lesson: AI won’t replace judgment, but it dramatically amplifies it if you already know what “good” looks like.

Looking Ahead: 2026 and the Shift to AI Production

If 2025 was the year of the AI demo, 2026 is shaping up to be the year of AI production.

The novelty is gone. A chatbot that works 80% of the time is no longer impressive. What matters now are the hard, messy problems: Governance, Security, Accuracy, Cost, and operational reliability. We’re moving from the magical phase of AI to the industrial one. The opportunities won’t go to people who can use tools, but to those who can make systems dependable.

I’m still learning. I’m still wrong more often than I’d like. But the direction is clear.

Keep building. Keep shipping. And don’t be afraid to tell your story - clearly, honestly, and without pretending you have it all figured out. The opportunity is out there.

RAG: ETL for Intelligence

Sreeram Nudurupati — Mon, 12 Jan 2026 16:59:42 GMT

Beyond the Prompt: Giving Your AI Infinite Memory

Week 3: Scaling from Simple Chats to Enterprise Knowledge with RAG

In Week 2, we explored the power of structure, transforming open-ended conversations into reliable and actionable data. It was a huge step forward in building systems we can trust.

This week, we get to tackle one of the most exciting challenges in modern AI: Knowledge.

We all know the feeling of chatting with a powerful model like GPT-5. It is incredibly smart, knowledgeable about the world, and helpful. But there is one thing it doesn’t know: Us.

It doesn’t know the specific error codes in your legacy logs, your unique network topology, or the details of that new internal project you launched yesterday.

The good news? We have the tools to bridge that gap.

The Opportunity: Why RAG is a Game Changer

You might be thinking, “Context windows are getting huge! With 128k tokens, can’t I just paste my entire document into the chat?”

For a quick prototype, absolutely! It’s a great way to test an idea. But as Architects, we have an opportunity to build something more scalable, efficient, and cost-effective.

Think of RAG (Retrieval Augmented Generation) not just as a fix, but as an upgrade to your AI’s operating system.

Efficiency: Instead of asking the model to read a “100-page book” for every single question (which takes time and money), RAG lets the model instantly flip to the exact page it needs.
Focus: By providing only the most relevant information, we help the model give sharper, more accurate answers, avoiding the distraction of unrelated data.
Scale: RAG allows your AI to access vast libraries of information, far more than could ever fit in a single prompt.

The Business Win: Solving Real Problems

If you are explaining this to stakeholders, like a VP of Market Intelligence at a large telecom company, the value proposition is robust.

The Challenge: Their team spends hundreds of hours every quarter manually sifting through competitor earnings transcripts, industry analyst PDF reports, and news releases just to answer one question: “What are our top 3 competitors doing about 5G pricing in APAC?”

The Opportunity: We can build a “Market Intelligence Engine” that indexes all those PDF reports and transcripts.

For the VP of Market Intelligence: It means instant synthesis. Instead of waiting three days for an analyst to summarize the data, they can ask, “Compare our churn rate against Competitor X based on their Q4 earnings call,” and get a citation-backed answer in seconds.
For the Data Engineer: It means you are no longer just maintaining pipelines for dashboards. You are unlocking the value trapped in the “dark data” (unstructured text) that the strategy team is desperate to access.

RAG is Just ETL (with a little Math)

For those of us coming from a Data Engineering background, RAG can sound intimidating with terms like “Embeddings” and “Vector Spaces.” But here is the secret: RAG is just an ETL pipeline. You already have the skills to build this.

In this week’s notebook, we build a “Glass Box” system to see this pipeline in action:

1. Extract (The Source)

We start by loading our raw data, in our case, a PDF that the model has never seen before. We use tools like PyMuPDFLoader to bring that text into our environment.

2. Transform (The Art of Chunking)

This is where we add our engineering touch. Just as we wouldn’t load a massive CSV into a database without cleaning it, we don’t load a whole book into a vector store as one block.

We “chunk” the text into smaller, meaningful windows. We use a technique called RecursiveCharacterTextSplitter with overlap. This ensures that we capture complete ideas, even if a sentence sits on the boundary between two chunks. It’s about preserving the context of the data.

3. Load (Embeddings & Vectors)

This is the magical part where text becomes numbers. We pass our chunks through an Embedding Model.

The model turns a sentence like “Competitor X plans to aggressively discount 5G plans in Q4 to capture market share” into a vector, a list of numbers that represents the concept of that sentence.

We load these into a Vector Database (like ChromaDB). Now, instead of searching for keywords, we can search for concepts. If a user asks, “What is the pricing strategy for next quarter?”, the math will point them straight to this vector, even though the word “strategy” wasn’t in the original sentence.

The “Glass Box” Inspector

One of the best ways to build confidence in this new technology is to peek under the hood.

In our code, we build an inspection loop. Before the AI answers a user’s question, we make it show us exactly which chunks of text it found. It’s a great way to verify that our “ETL pipeline” is working as expected and to understand why the model gave a certain answer. It turns the “magic” into engineering.

The Architect’s Quiz (Homework)

One of the best ways to solidify new knowledge is to test it against real-world scenarios. Here are three questions to ask yourself as you play with the code this week:

The “Granularity” Trade-off: In our notebook, we used a chunk size of 500 characters. What do you think happens to the retrieval quality if you change that to 50? What if you change it to 5000? (Hint: Think about “Context” vs. “Noise”. Data engineering is all about finding the right balance!)
The Staleness Factor: If you update the source PDF on your local drive tomorrow to fix a typo, does your Vector Database automatically know about the change? (Hint: Think about how ETL pipelines work. Do vectors update themselves, or do we need to trigger a new “Load” job?)
The Production Challenge: We set our retriever to look for the top 3 results (k=3). If a user asks a complex question that requires information scattered across 10 different pages, what will happen to the answer? How might we solve this without just setting k=100?

The Next Frontier: Reasoning Across Documents

We wrap up this week’s exploration with an interesting experiment. We ask our new RAG system, which is configured to find the top 3 most relevant facts, to do something broad:

User: “Summarize the whole document in 1 paragraph.”

The result is usually a summary of… just the first few pages.

This isn’t a failure; it’s a discovery! We learn that basic retrieval is perfect for finding specific facts (“What is the error code?”). But it needs help with broad summarization (“What is this book about?”).

This sets the stage perfectly for Week 4. Next week, we will explore Advanced RAG patterns, such as re-ranking and contextual chunking, to help our AI understand the “big picture” just as well as the small details.

We are building something powerful, one block at a time.

See you in the repo.

The Robot Brain Diaries

Stop Texting Your AI

Sreeram Nudurupati — Mon, 05 Jan 2026 16:02:01 GMT

How to Enforce JSON Output with Pydantic

Week 2: Moving from “Vibes” to “Validation” with the Agentic AI Architect

In Week 1, we established that LLMs are not magic brains; they are Stochastic CPUs. They are non-deterministic engines that predict the next token.

If you treat them like a chat buddy, sending polite texts like “Please analyze this email and give me a summary, thanks!”, you are building fragile software. You are hoping the Stochastic CPU is in a good mood.

But we are Architects. We are in the business of engineering certainty.

To build production-grade agents, we need to stop “texting” our AI and start treating it like a function call. We need inputs that are typed and outputs that are guaranteed.

Welcome to Week 2: Structured Output & Prompt Engineering as Code.

The “Hello World” of Enterprise AI: Classification

Before we look at the code, let’s look at the landscape.

While LinkedIn is busy debating the “Promise vs. Reality” of AI or doom-scrolling about “AI Slop,” most engineers are stuck in analysis paralysis, waiting to see who “wins the race” between OpenAI and DeepSeek.

We aren’t waiting. We are building.

The backbone of Enterprise AI isn’t image generation; it’s Classification.

Routing customer support tickets (Billing vs. Technical).
Scoring sales leads (High Intent vs. Kicker).
Detecting severity (P1 vs. P3).

The Traditional ML Path: If you have a dedicated Data Science team, a year of lead time, and 100,000 rows of clean, labeled data, by all means, train a BERT model or an XGBoost classifier. It is efficient, specialized, and cheap at scale.

The Architect Way: But what if you need a production-ready classifier today and you don’t have labeled data? You write a schema. You use a zero-shot LLM. You have a working system by lunch.

This week, we are building exactly that: A SaaS Email Router that takes messy, angry customer emails and converts them into rigid, actionable JSON.

(Note: This pattern is the foundational building block for the Data Contract Validator we will build in our Capstone project.)

The Architecture: Chaos In, Structure Out

The goal is to turn unstructured noise into structured signal. To do this, we are using LangChain and Pydantic.

Here is the difference between a “Script” and an “Architecture”:

1. The Contract (Pydantic)

We don’t ask the LLM for “a JSON.” We define exactly what that JSON looks like using a Pydantic model. This is our interface contract.

Python

from pydantic import BaseModel, Field
from typing import Literal

class EmailAnalysis(BaseModel):
    category: Literal["billing", "technical_support", "sales", "complaint", "spam"] = Field(
        description="The primary intent of the email."
    )
    priority: Literal["high", "medium", "low"]
    summary: str = Field(description="A concise 1-sentence summary.")
    confidence: float = Field(ge=0, le=1, description="Confidence score 0.0 to 1.0.")

Why this is powerful:

Enforced Enums: We use Literal to restrict the category. The LLM cannot invent a category like “Unsure” or “Maybe Billing.” It must pick from our list.
Constraints: We enforce that confidence must be a float between 0.0 and 1.0 using ge (greater or equal) and le (less or equal).

If the LLM misses a field or returns a string instead of a float, Pydantic throws a validation error. In our architecture, the PydanticOutputParser catches this error and automatically asks the LLM to fix it. Making the code Self-healing.

2. The Logic Layer (Why we need “Confidence”)

You noticed the confidence field in the schema above. This isn’t just for show.

In a production system, getting the data structure right is only half the battle. You also need to know how much to trust it. By forcing the LLM to self-evaluate (0.0 to 1.0), we can build a Logic Layer on top of the prediction:

High Confidence (> 0.9): Auto-process the refund.
Medium Confidence (0.5 – 0.9): Route to a human queue with a “suggested” tag.
Low Confidence (< 0.5): Trigger a retry loop or flag for manual audit.

Without this field, your application is flying blind.

The Architect’s Note: Temperature vs. Confidence

A common point of confusion: “If I set Temperature=0, isn’t the model already confident?”

No. Think of it like a Weather Forecaster.

Temperature (0.0) is telling the forecaster: “Always give me the most statistically likely prediction. No creativity. No ‘maybe it will snow in July’.”
Confidence is asking the forecaster: “Okay, you predicted rain. On a scale of 0 to 100, how sure are you?”

You use Temperature to make the forecast consistent (deterministic), but you use the Confidence Score to decide if you should actually carry an umbrella (business logic).

3. Prompts as Code (The YAML Strategy)

Finally, stop hardcoding your prompt strings inside your Python functions. It is messy, hard to read, and harder to version control.

In this repo, we treat Prompts as Code. We extract the logic into classifier_prompt.yaml.

Why this matters:

Version Control: You can see diffs in your prompt logic over time.
Testability: You can swap out prompts without touching the deployment code.
Determinism: It forces you to think of the prompt as a config file, not a magic spell.

Trust, but Verify (LLM-as-Judge)

You have your JSON. Great. But is it correct?

Most tutorials stop here. We don’t. In the repo, main.py includes a Golden Dataset, a list of tricky emails with known “correct” answers. We use GPT-4o as a Judge to grade our cloud/local model’s output.

We don’t just “eyeball” it. We run a test suite.

Prompt Engineering Isn’t Dead, It Just Graduated

Some people claim “Prompt Engineering is dead.” They are wrong. It just evolved from “guessing magic words” to System Design.

When building your YAML prompts, keep these four pillars in mind:

Context (Persona): Don’t just say “Classify this.” Say “You are a Tier 3 Support Manager.”
Clarity: Be verbose. Ambiguity in the prompt leads to hallucinations in the output.
Structure: Explicitly define the output format (like we did with Pydantic).
Iteration: Don’t change the prompt based on one example. Run it against your Golden Dataset to ensure you didn’t fix one edge case only to break three others.

When you lock these principles into a YAML file, your prompt becomes code: version-controlled, testable, and deterministic.

Conclusion: The Architect’s Checklist

If you are building an Agentic System, ensure you can check these boxes:

No “Vibes”: Are your inputs and outputs strongly typed (Pydantic/JSON schemas)?
Config, not Code: Is your prompt logic separated from your application logic (YAML)?
Deterministic Base: Is your temperature set to 0 for logic tasks?
Self-Awareness: Does your agent return a “Confidence” score for logic flow?
Graded: Do you have a “Golden Dataset” to verify the agent’s accuracy?

The Architect’s Quiz (Homework)

The Assignment: Clone the repo: https://github.com/snudurupati/agentic-ai-architect
Run python 02_structured_output/main.py. Watch as the system processes the Golden Dataset and grades itself.
The Twist: What happens to your system when the definition of “Priority: High” changes? Do you update the Python code, or just the YAML prompt?
The Edge Case: If the LLM is 90% confident but technically wrong, how do you catch that in production?

The Road Ahead: Why Structure Matters for RAG

You might be asking, “Why do I need JSON if I just want to build a Chatbot?”

Next week, we tackle RAG (Retrieval Augmented Generation). To build a RAG system that doesn’t hallucinate, you cannot just shovel documents into a context window. You need to structure your queries and your retrieved data.

If you can’t control the output shape of a simple email classifier, you have no chance of controlling a multi-document retrieval agent.

Stop guessing. Start Architecting.

See you in the repo.

The Robot Brain Diaries

The Stochastic CPU

Sreeram Nudurupati — Mon, 29 Dec 2025 16:16:31 GMT

The Agentic AI Architect: Week 1

Why LLMs are just a new type of processor, and how to benchmark them.

Last week, I wrote about The Agentic Data Engineer, and the response was overwhelming. 40,000 impressions and hundreds of new connections later, one thing is abundantly clear: Data Engineers are ready to evolve.

We are tired of being “plumbers.” We want to be architects.

So, I’m doubling down. I am turning that single blog post into an open-source, 8-week curriculum called The Agentic AI Architect. I’m building it in public, and you can follow along.

I have renamed and restructured the repo to support this journey.

Star the Repo: github.com/snudurupati/agentic-ai-architect

We start today with Week 1: The Stochastic CPU.

The Mindset Shift: It’s Not Magic, It’s Compute

The biggest mistake I see developers make is treating LLMs like magic genies. You rub the lamp (send a prompt), and you get a wish (an answer).

To build production systems, you need to stop thinking of them as magic and start treating them as a Stochastic (Probabilistic) CPU.

Your Laptop CPU: Deterministic. 2+2 always equals 4.
The LLM CPU: Probabilistic. 2+2 is probably 4, but depending on the temperature, it might be “Four” or “Math is a social construct.”

As an architect, your job isn’t to marvel at the intelligence; it’s to manage the constraints of this new processor.

1. Context Window is the New RAM (And it has Amnesia)

If an LLM is a CPU, the Context Window is its RAM. But unlike your laptop, this RAM gets wiped clean after every single operation.

The “Hidden” Cost of Chat:

When you use ChatGPT, it feels like the model remembers you. It doesn’t. The application secretly copies your entire conversation history and pastes it back into the prompt for every new message. This means every time you ask a follow-up question, you aren’t just sending 5 words; you are re-sending the previous 5,000 words.

Why “Quadratic Scaling” Matters (The Data Engineer Analogy):

You might hear that Attention scales “quadratically.” What does that actually mean?

Think of it like a SQL Join. When an LLM processes text, it doesn’t just read linearly. It compares every token to every other token to understand the relationships (Attention).

In SQL terms: It is a Self-Cross Join (Cartesian Product) of the input data.
- Input: 10 tokens → Join checks 100 relationships.
- Input: 100 tokens → Join checks 10,000 relationships.
- Input: 100k tokens → Join checks 10,000,000,000 relationships.

This is why a 128k context window isn’t just “more storage”, it’s an exponentially harder math problem that slows everything down.

Cloud (128k) vs. Local (32k):

Cloud (GPT-4o): With a 128k window, you can “Join” an entire book or a massive codebase in one go. The cloud provider has the massive H100 clusters required to handle that explosion of compute.
Local (Ollama/GPT-OSS:20B): If your local model has a 32k limit, that is your hard RAM ceiling. If you try to feed it a 50-page document, it’s like trying to load a 1TB CSV into a 16GB laptop. It will either crash (OOM) or you will have to truncate (delete) data, causing the model to “forget” the beginning of the document.

The Data Pipeline of Thought: From Words to Vectors

Before we write code, we need to understand the data pipeline. You are used to ETL (Extract, Transform, Load). LLMs have their own pipeline: Tokenize, Embed, Infer.

Step 1: Words → Tokens

LLMs do not read “English.” They read numbers. The Tokenizer is a pre-processor that chops text into chunks.

Text to Token (The Chop): The tokenizer splits text into chunks.
- Input: “Applesauce”
- Tokens: "Apple" and "sauce" (Two distinct chunks).
Token to ID (The Lookup): The system looks up these chunks in its fixed vocabulary dictionary.
- "Apple" → ID 452
- "sauce" → ID 8812
ID to Embedding (The Meaning): The GPU looks up ID 452 in its learned memory and retrieves a vector.
- ID 452 → [0.02, -0.44, 0.91, ...]

Why this matters: We don’t pay API providers for words; we pay for those integer chunks (IDs). And the conversion from ID to Vector? That is the model’s “brain” that it learned during training.

The “Vocabulary” Hook:

Every model comes with a fixed Vocabulary, a specific list of words it knows. If you swap the tokenizer (e.g., use Llama’s tokenizer with GPT-4’s model), the model will receive the wrong IDs. It’s like sending a French dictionary code to an English speaker.

Architect’s Note: This Token -> Embedding translation is learned during training. It is the foundation of the model’s intelligence. This will become critical in Week 3 (RAG), because if you use an embedding model that speaks a different “vector language” than your LLM, your search results will be garbage.

Conclusion: The Architect’s Trade-Off

As an architect, you don’t pick the “best” model. You choose the right one for the Constraint.

Here is how you decide:

The Context Constraint: Do you need to paste a 50-page PDF into the prompt?
- Yes: You need Cloud (GPT-4o, Claude 3.7). Most local servers (Ollama) default to 8k or 32k context windows to prevent your RAM from exploding.
- No: If you are just summarizing emails, Local is fine.
The Latency Constraint:
- Cloud: Generally consistent (~1-2s).
- Local: Depends entirely on your hardware. On an M4 Mac, a small model (Llama 3.2 3B) is lightning fast (<0.5s). A large model (GPT-OSS:20B) might take 30 seconds per reply. If your app requires sub-second responses, stick to small local models or optimized cloud endpoints.
The “Classic ML” Reality Check: Before you spin up a GPU cluster, ask yourself: Do I actually need Generative AI?
- If you are classifying transactions as “Fraud” or “Not Fraud,” and you have 100,000 labeled examples, do not use an LLM. Use XGBoost or Logistic Regression.
- Classic ML is reproducible, 1000x cheaper, and 1000x faster. Utilize LLMs for reasoning and handling ambiguity, rather than for simple pattern matching.

The Build: `llm_benchmark.py`

“An ounce of action is worth a ton of theory.”

For Week 1, we aren’t building a chatbot. We are building a Latency & Cost Profiler. If you are going to put AI in production, you need to know exactly how much “thought” costs.

I wrote a script that races OpenAI (GPT-4o) against a local model (GPT-OSS:20B) running on my M4 MacBook Air with 24GB RAM. The results were surprising.

Link to Code: Week 1 – The Stochastic CPU

The Findings (M4, 24GB RAM):

Cloud (GPT-4o-mini): 1.6s latency. (Fast, but costs money).
Local (GPT-OSS:20B): 15.7s latency. (Free, but slow).

The Takeaway:

My local machine struggled with the 20B model (15s is too slow for a chatbot). To make local viable on this hardware, I would need to scale down to a smaller model (like Llama 3.2 3B) or accept the cost of the Cloud.

The Architect’s Quiz (Homework)

I’m digging deep into the internals of why these models work the way they do. If you really want to master this stack, try to answer these five questions before next week.

Tip: Don’t just guess. Fire up your favorite model: ChatGPT, Gemini, or Claude and ask it to explain these concepts to you “like a Data Engineer.”

The “Pre-Processing” Step: In classic ML, we manually create features (feature engineering). LLMs have a “Tokenizer” built-in. Why can’t we just feed them raw text or use our tokenizer?
The Dictionary: Where does the model store its vocabulary?
The Billing Question: Since “compute is compute,” why do OpenAI and Anthropic charge us by the Token and not simply by the Word?
Vector Math: Does GPT use “One-Hot Encoding” to represent words, or something else?
Positioning: When you send a sentence to an LLM, does it automatically know that “Select” came before “From”? Or do we have to explicitly tell it?

What’s Coming Next?

We have defined our compute. Next, we need to write the software.

Week 2: Prompt Engineering as Code

We are ditching the “prompt poetry.” No more “Please act as a helpful assistant.” Next week, we treat prompts as Software Contracts. We will force the LLM to output structured, valid JSON, handle errors, and build a deterministic classification system.

Your Action Items:

Fork the Repo: agentic-ai-architect
Run the Benchmark: See how your machine handles local models.
Do the Research: Ask your favorite AI to explain “Positional Embeddings” and see what you find.

See you in the repo.

The Robot Brain Diaries

The Agentic Data Engineer

Sreeram Nudurupati — Tue, 16 Dec 2025 15:35:28 GMT

Why You Already Have the Skills to Build with AI.

“We spent the last decade building pipelines that move data. The next decade will be about building Agents that act on it.”

If you work in data, the sudden rise of AI Agents might feel like an alien invasion. The jargon alone, RAG, MCP, Embeddings, Probabilistic Inference, sounds like a different language. It’s easy to feel like our hard-earned skills in SQL, ETL, and Python scripting are about to become legacy tech.

But after diving deep into building a few end-to-end AI Agents, I realized something surprising: This isn’t alien territory. It’s familiar ground.

The gap between a Data Engineer and an AI Systems Engineer is smaller than you think. You don’t need to build the model; you just need to engineer the infrastructure that makes it useful. You don’t need to change who you are; you just need to elevate how you think.

The journey from writing scripts to architecting intelligence happens in three stages.

Level 1: The Engineer (Mastering the Tools)

The Tactical Shift: From Scripting to Contracts.

I used to treat Python primarily as a scripting tool, a flexible way to glue API responses into pandas DataFrames or Parquet files. In the world of AI, however, that flexibility is a bug.

When you connect a Large Language Model (LLM) to a business system, ambiguity is dangerous. If you send a loose JSON blob to an LLM, it might hallucinate. It guesses. This is where your existing data engineering rigor becomes a superpower. We aren’t just validating data anymore; we are defining the “contracts” that allow AI to safely interact with the world.

Pydantic: You use this today for data validation. In the Agentic world, it defines the reality for the AI. It tells the model exactly what structure the output must have, preventing hallucinations.
FastAPI: You use this to serve data. For an agent, these endpoints become its “hands.” It ensures the AI cannot try to pull a parameter that doesn’t exist.

We aren’t learning magic; we are just applying strict typing to synthetic intelligence. The code looks almost identical; the purpose just shifted from “storage” to “action.”

This shift turns us from “plumbers” who fix leaks into “architects” of logic. When you build an Agent, you are effectively cloning yourself. You write the logic for a task once (e.g., “Check Inventory”), and the AI executes it endlessly, handling the edge cases and natural language parsing. This is how you become a “Force Multiplier.” You aren’t typing faster; you are building systems that scale your own output.

Level 2: The Manager (Optimizing Operations)

The Operational Shift: From Integration to Protocol

As you mature in your role, you stop looking at code as just syntax and start seeing it as operational capability.

One of the biggest headaches in Data Engineering is integration fatigue. But for a Manager, the headache is Operational Expense (OpEx). We hire brilliant humans, but then we bog them down with Tier-1 support tickets, manual sanity checks, and writing custom connectors for yet another API.

The AI world offers a solution via the Model Context Protocol (MCP). Think of MCP as “ODBC for AI.”

JDBC lets us connect any BI tool to any database without rewriting the driver.
MCP lets us connect any AI model (Claude, GPT-4) to any tool (your database, your internal APIs) without rewriting the integration code.

From an architecture perspective, this is pure efficiency. We separate the “Brain” (the AI) from the “Tools” (the execution). As data engineers, we are the ones who build the Tools. We define the get_customer_data function, and the AI just calls it.

But the real value here is Deflection. In my recent build, I created a “Bureaucratic Agent.” It wasn’t allowed to process a refund until it read the policy PDF and verified the claim.

The Old Way: A human support agent reads the ticket, searches the wiki, checks the date, and clicks “Refund.” Cost: High.
The Agentic Way: The system does the “reading” and “checking” automatically. It only escalates the complex cases.

You are no longer just “automating a task”; you are structurally reducing the operational cost of the business.

The Context Layer: RAG is Just a Join

Bridging Operations and Strategy.

Before we look at strategy, we need to demystify one last piece of jargon: RAG (Retrieval-Augmented Generation). It sounds complex, but let’s look at it through a Data Engineering lens.

RAG is simply a Join.

Left Table: The User’s Query.
Right Table: Your Knowledge Base (Vector DB).
Join Condition: Semantic Similarity.

When we build a RAG pipeline, we are engineering a “Just-In-Time” context join. We retrieve the relevant policy PDF, “join” it to the user’s prompt, and then let the model generate an answer. This transforms your static documents into active decision-making tools.

Level 3: The Strategist (Building the Moat)

The Strategic Shift: From Passive Reports to Active Systems.

When you zoom out to the executive level, this shift is about Reinvention. You stop asking “How do I move this data?” and start asking “How does this data create value?”

For years, our output has been passive. We built dashboards. We waited for a human to look at the dashboard, interpret the line chart, and make a decision. Now, we are building active systems.

Old Way: The pipeline moves data to a table so a human can decide to approve a refund.
Agentic Way: The pipeline moves data to a context window so the system can propose the refund itself.

For years, companies have hoarded data, hoping it would be valuable someday. That “someday” is today.

Building a Data Moat: Competitors can copy your features, but they cannot copy your context. By grounding AI agents in your proprietary data (your logs, your policies, your customer history), you create a service that no generic model (like ChatGPT) can replicate.
New Revenue Streams: We aren’t just optimizing existing processes; we are creating new products. A “Consulting Agent” that sells your internal expertise to customers 24/7 is a new line of business.
Valuation: Companies that successfully infuse AI into their core operations are valued fundamentally differently. They are seen as scalable technology platforms rather than operation-heavy service providers.

We are moving from Deterministic Systems (rigid rules) to Probabilistic Systems (reasoning engines). The data engineer’s role is to build the deterministic guardrails, the schemas and access controls that allow the business to safely ride this wave.

The Missing Criticals: Security & Observability

A word of caution as you embark on this journey: What I’ve described is a prototype. Bringing this to production requires two things that data engineers are uniquely positioned to solve:

Security: An agent needs Authentication (Who are you?) and Authorization (Are you allowed to see this table?). We cannot give an AI “admin” access. It needs scoped, least-privilege tokens.
Observability: When an ETL job fails, we check logs. When an Agent fails, we need to trace its thoughts. We need to know why it decided to skip the refund.

I will be exploring these critical pillars in future posts.

Conclusion

If you can build a data pipeline, you can build an AI Agent. The components, Python, APIs, Structured Data, SQL/Vector queries, are already in your toolkit.

The models themselves, GPT-4, Claude, Llama, are becoming commodities. They are powerful, but they are generic. The true competitive advantage for any company isn’t the AI model; it is the context that the model can access.

As a data engineer, you control that context. You manage the pipelines, the schemas, and the quality. You hold the keys to the only thing that creates a moat. You don’t need to reinvent yourself; you just need to realize that you are now the most dangerous person in the room.

Code & Next Steps I am building this out in public to demystify the process. You can find the full working code for the Agent, the MCP Server, and the RAG implementation in my GitHub repository here: snudurupati/agent-fde-demo

Keep watching this space. Next, I’ll try to tackle the “dark matter” of AI engineering: securing the agent with OAuth2/JWTs and tracing its thoughts with OpenTelemetry.

𝗬𝗼𝘂𝗿 𝗔𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻 𝗜𝘀 𝗬𝗼𝘂𝗿 𝗢𝗦

Sreeram Nudurupati — Sun, 16 Nov 2025 15:47:23 GMT

Design your permissions, don’t live on defaults.

Thesis

“The ability to choose how I spend my time and mental bandwidth is my operating system”
When I run processes outside that design, like taking on consistent late evening work obligations, fragmented evenings, constant context switches, everything else malfunctions.

Personal Defaults

I’m a morning person. Always have been. My best rest and recovery are 10 pm-4 am, and my sharpest work is before dawn. When I ignore that and let after-hours obligations preempt sleep and spill into family time, the costs show up fast: forgetful in meetings, short with my 4-year-old’s bedtime routine, reading less, caring less. That’s on me for leaving my scheduler open.

“The ability to choose how I spend my time and my mental bandwidth is not a luxury for me, it’s my operating system. When stripped away, everything else malfunctions.”

What an OS Does

Autonomy over my attention and energy is my fundamental architecture, not a feature, the underlying platform on which everything else runs on. This metaphor highlights sovereignty. An operating system controls resource allocation, determines what processes get priority, manages what runs in the background. If that’s truly not mine to control, then I am not really running my own OS but running someone else’s software on my hardware.

This perspective also clarifies the current environment that we live in. We are surrounded by entities, platforms, employers, cultural expectations that want to install themselves as our OS or at least run a persistent background process. Treating our attentional autonomy as foundational rather than negotiable is a necessary defense against this age of massive distractions.

Interdependence, By Consent

The tension I see is that we are interdependent beings. Some of the most meaningful things in life: relationships, kids, commitments, care; involve voluntarily granting access to our attention and letting them make legitimate claims on our time and mental bandwidth. That’s not a drain; it’s wholesome and dopamine-replenishing. The question is, are you choosing which permissions to grant or has your system been rooted without your consent?

Here’s the uncomfortable bit: if something can schedule over my sleep, preempt my focus, and seize my attention on demand, that’s not “hustle”, that’s root-level access I failed to manage. It snuck in as “one quick late call,” then installed updates, then rewrote my defaults. If my mind’s scheduler isn’t mine, I’m not free; I’m just highly managed by defaults I didn’t design.

Interdependence doesn’t contradict sovereignty. My kid’s bedtime gets whitelisted access. Reading “just one more book” at 9 pm isn’t a drain; it’s a trusted process that returns energy. Same with relationships, real care, real commitments, they get explicit permissions, not a blank SSH key to my brain. The difference is consent, clarity, and kill-switches.

Why This Matters Beyond Calendar Ethics

Multitasking is the human version of “just add more context.” It feels powerful but it’s not. My head isn’t a data lake; it’s closer to an L2 cache. Push too much through it and you pay in cache misses, and heat. Every late meeting was an interrupt that blew away useful state. I’d show up foggier the next morning, then compensate by stacking more tasks, then miss details, then open more tabs. That loop looks productive from the outside but is a leak from the inside.

If an LLM analogy helps: models “hallucinate” because they must keep predicting even when the prompt is noisy or the facts aren’t there. Humans “hallucinate” under multitasking because we keep acting while our working memory is shredded and half of our attention is stuck on the last thing. Different mechanisms, same shape of failure: too many tokens, too little signal.

Running My Own OS

I’m not optimizing for output anymore; I’m optimizing for throughput with integrity. That means less context stuffed into my day, fewer parallel threads, and more deliberate retrieval of what I need right now. I’m keeping the early mornings and the bedtime stories. And I’m uninstalling anything that thinks it owns my scheduler. Control over my attention isn’t a perk, it is the platform. This isn’t a perfect playbook; it’s me debugging my life in production. What’s your operating system and are you the one running it?

𝗛𝘂𝗺𝗮𝗻 𝘃s AI 𝗧𝗵𝗶𝗻𝗸𝗶𝗻𝗴: 𝗣𝗿𝗼𝗯𝗮𝗯𝗶𝗹𝗶𝘀𝘁𝗶𝗰 𝗼𝗿 𝗗𝗲𝘁𝗲𝗿𝗺𝗶𝗻𝗶𝘀𝘁𝗶𝗰?

Sreeram Nudurupati — Mon, 27 Oct 2025 12:31:25 GMT

The Debate I Keep Seeing.

These days, all I see in my LinkedIn feed are intense arguments surrounding AI and LLMs. It’s either over-the-top salesy optimism like “AI agents are no longer experiments; your next 5 hires could cost $0,” or a lot of naysaying like “LLMs are at or near their peak… the ‘game-changer’ crowd is in for disappointment,” to downright fearmongering: “Your AI will confidently lie with your logo on it,” or “AI will wipe out millions of jobs in the next 24 months.” There is also a middle-ground crowd, though few and far between, who say, “LLMs cannot be AGI by definition. An LLM is a static model; once trained, it does not change, unlike our brains.”

I have been contemplating a similar question since I was first introduced to Large Language Models. I am not a data scientist, but during my initial foray into data science, one cardinal rule I learned is: “Don’t use machine learning for problems that have a deterministic way of solving them.”

Should We Use LLMs for Deterministic Tasks?

Large Language Models are fundamentally probabilistic in their core function, so the question is: should we even be using LLMs for more deterministic tasks like spelling “strawberry” or adding “2 + 2,” or for more serious functions like reliably and consistently answering questions such as “What were my sales last quarter?”. Maybe not, given the cardinal rule of not using a probabilistic system to solve a deterministic problem. Problem solved, right? So stop wasting your money on all these AI agents and stick to the tried-and-true systems.

Humans Are Probabilistic Too.

Not so fast. We humans, so adept at spelling and solving basic arithmetic as well as complex and abstract problems, have probabilistic hardware too. Neural activity has inherent noise and randomness, and neurotransmitter release is probabilistic by nature. At the computational level, human thinking shows clear probabilistic patterns. We make decisions under uncertainty and update our beliefs based on evidence (at least the more rational among us do), very Bayesian-style. In fact, we would be paralyzed if our thinking and reasoning systems were fully deterministic when faced with uncertainty. So, if human thinking and reasoning are indeed probabilistic, then how come we are so good at deterministic tasks? It turns out our “probabilistic determinism” is a result of our brain’s hierarchical, modular architecture with redundancy and grounded, multi-modal training.

Where Human “Probabilistic Determinism” Stems From?

Hierarchical, Modular Architecture with Redundancy

When we try to spell “strawberry,” we don’t rely on generic probabilistic pattern matching; we have dedicated neural circuits for phonology, orthography, motor control, etc. These specialized subsystems have been reinforced millions of times with human feedback and are redundant, where the same information is represented in overlapping ways across many neurons. These specialized neural circuits are so well trained that they are practically deterministic under normal conditions. In summary, human brains compile overlearned skills into near-deterministic routines. However, LLMs lack this modularity: they’re a large probabilistic blobs, primarily predicting the next token. I believe agentic AI with access to external tools is already addressing this shortcoming of LLMs.

Grounded, Multi-Modal Learning in Humans

When we learned to add “2 + 2,” we didn’t learn this by just reading a book, but also by manipulating physical objects, seeing quantities, and hearing this across multiple contexts. Thus, our learning is multi-modal, grounded in physical reality, and is encoded redundantly across many of our neural subsystems. Whereas LLMs learn from text alone, so for an LLM “2 + 2” is just a token pattern without any grounded understanding of quantity. While vision-language models can process text and images together, the learning is often centered around correlations between image features and text descriptions rather than truly grounding concepts in sensory experience.

How to introduce Grounded, Multi-modal Learning in AI?

Embodied Learning and Robotics

Embodied AI and robotics are much closer to how humans learn. The idea is to give AI systems physical bodies to manipulate objects, sensors to experience cause and effect, and training through interaction rather than just observation. To draw an analogy: you can read every book about golf, understand the physics of ball flight, memorize the biomechanics of the perfect swing, and know all the theory about weight transfer and club path, but the first time you swing a club, you might miss the ball altogether (at least the less athletic among us).

Does Embodied Learning Really Matter? (Stephen Hawking as a Counterpoint)

Here you might ask: how important is the body for human learning? Look at the case of Dr. Stephen Hawking, he did his best work after his body was severely disabled, and all he was left with was a brilliant and active mind. The counterpoint: Dr. Hawking had a normal childhood with about 20 years of sensorimotor experiences. He learned to walk, manipulate physical objects, and experienced physics directly as a child.

Simulated Embodiment at Scale

LLMs have little to no grounding, primarily learning from text and pattern matching without an underlying world model. AI models may need grounded, multimodal learning to establish basic concepts and build from there. Projects like Tesla’s Optimus and various robotics labs are working on this. However, learning via robotics is slow and expensive. This is where simulated embodiment comes into play. Physics simulators like NVIDIA Isaac Sim let AI practice orders of magnitude faster than through physical robotic embodiment.

A Story Analogy (and a Spoiler)

To end on a lighter note, spoiler alert. Do not read further if you (for whatever reason) are yet to watch the movie Good Will Hunting.

I was rewatching the movie. Every decade or so it presents me with a brand-new interpretation, and it suddenly occurred to me. Will possesses extraordinary intellect; he’s read nearly every book there is to read, absorbed every fact, every theory, and yet he can’t use his genius to build a meaningful life.

In many ways, an LLM is like Will Hunting: it has ingested statistical patterns in human language but remains ungrounded, fluent without understanding, articulate without perception, intelligent without empathy. “There’s a difference between knowing the path and walking the path.”

Secondly, the dynamic between Will and Sean (Robin Williams) represents the kind of mentorship AI needs, human guidance that is emotional, not just technical. As I embark on this journey of embodied-AI learning, I intend to be a Sean for AI’s Will Hunting, and I’ll keep posting my learnings, philosophical musings, and realizations about this topic in this space.

AI Workloads as First-Class Apps

Sreeram Nudurupati — Mon, 20 Oct 2025 12:38:21 GMT

A Data Engineer’s perspective on how to make sense of all the AI Noise.

Last week, I attended the dbt Coalesce summit, where I served as booth duty and spoke with attendees, explaining how AI agents can help data engineers increase productivity, reduce their cloud data warehouse spend, and enhance data quality throughout the data stack. There was a recurring theme, a recurring question: my CDW, my DE tool, and my orchestrator all individually have built-in copilots, so what’s the need for yet another AI layer? While I am still pondering an answer to that question, another different but relevant question popped into my mind: What’s up with all these acquisitions by all these data and analytics companies lately?

Lots of Noise, One real Signal.

In the past few months, I have been closely following all of Databricks’ acquisitions and partnerships alone. First, they agreed to acquire Neon, the company behind serverless PostgreSQL, and the reason given was to enhance AI agent-building capabilities. Like most data engineers who haven’t fully bought into the Agentic AI hype yet, I wrote it off as Databricks just trying to eliminate the data ingestion middleware like Fivetran and Matillion and get the freshest data straight from the source; this idea was reinforced by their partnership with SAP that came later. However, the acquisition of Mooncake Labs and now their partnership with Cognite and OpenAI, they all seem to be really playing into their Lakebase + Agent Bricks strategy.
So when I try to isolate a signal from all the AI noise, it is clear that Databricks expects AI beyond just copilots and analytics; they are heavily betting on AI Apps as first-class citizens of the data analytics ecosystem.

Now let’s try to unpack what first-class AI Apps look and feel like from a Data Engineer’s perspective.

AI Apps: The Art of the Possible

AI Apps as first-class analytics citizens means treating AI apps as core data products and not as side priorities. In my limited data engineering world-view and based on what I have researched so far, first-class AI apps could mean multimodal applications that can simultaneously process multiple types of data like voice, text, medical diagnostics, images, and radar data. Some applications of multimodal AI apps could be in the healthcare field where AI analyzes information from diverse sources like clinical notes, medical images, and other documents and generates a personalized patient treatment plan.

AI Apps could be also be AI agents that can autonomously build and maintain workflows. As data engineers, the latter is of more curiosity to us, and some possibilities of such first-class AI Apps could be:

Explainable KPI Copilot – A natural language interface that answers questions like “why did profit dip last quarter despite an increase in total revenue” and proves the answer with lineage, SQL, and provides row-level drilldowns. It always backs up every generated metric with traceable data.
Closed-loop metrics improvement – A natural language analyst that proposes experiments for a specific metric, ships a feature flag, observes that metric, and proposes the next iteration with guardrails.
Enterprise RAG with existing Data Warehouse – A governed retrieval and generation service that answers business questions using your data warehouse as the system of record, with citations, lineage, and write-backs, and SLOs like any other app. This solves the problem of business stakeholders wanting to ask natural language questions that they can trust and verify.

While AI apps as first-class citizens are absolutely possible, there are risks that can bring enterprise adoption to a standstill.

Why AI Adoption Stalls

The fastest way to stall AI adoption is to ship a shiny agent with no audit trail. When messy stuff starts showing up, when the AI agent reads some PII data that it isn’t supposed to, a KPI number can’t be reproduced or traced back to the source, or a friendly agent updates a record or opens a ticket without the right permissions. When decision-making is a black box, with no transparency into what query ran, what docs were read, or who approved the scope. Thats when trust in the agent fades, audit teams step in, leadership loses confidence in AI, and we’re back to manual reviews.

What good looks like

Good means every agentic answer and action is traceable. Every KPI links straight to the underlying table, the owner, and the version. Agentic AI apps only run with purpose, bound credentials, narrow tool scopes, and use short-lived sessions. Reproducing a result needs to be made boring, replaying the same prompts, tool calls, and SQL with the same inputs. Safety checks aren’t bolted on; they run before execution, and anything risky gets routed to a human.

For executives, there’s a single page that shows where agents are deployed, what data they can touch, what broke last week, and what business results they delivered.
We need to build apps that both think and write, on governed data, with real SLOs. The teams that make low-latency state, evaluation, and guardrails boringly reliable will cut through the hype, and then AI starts paying dividends.