Skip to content

Blog

Anthropic SKILLs – Prime Example of Red Riding Hood Principle

In Albert and my book, Albert introduced the "Red Riding Hood Principle". You remember the story, right? A young, naive girl strays off of the well trodden path and ends up in a lot of trouble.

This is true for you when building AI applications. If you provide context to the agent that is familiar – similar to the training – then the agent will be able to navigate the terrain more easily.

Anthropic SKILLs is such a good example of this. Anthropic realized that in Claude Code, it had trained a model and constructed an agent to be exceptionally good at navigating file systems, reading files, and managing context. Further, the filesystem metaphor provides natural navigational affordances. The agent can look at the directory structure and get a big picture of what exists, and an agent can grep around for details – much like a developer would do.

You should consider all of this when building your own agents! SKILLs benefits from the filesystem metaphor, so it bears to reason that your domain could benefit as well – imagine presenting graph-based knowledge or filter-based search as if it was a file structure.

Incremental AI Adoption for E-commerce

When you think of e-commerce, your mind is probably drawn to Amazon.com as "the definitive" example. But it's actually the exception. The internet is filled with tons of small- and medium-sized e-commerce sites. These sites typically follow the same pattern - a search page with a search box at the top, selectable filters along the left side, and results filling the remainder of the screen. And the whole goal is to quickly usher customers to the products they seek.

For most of these sites, the implementation is quite simple. Product metadata is indexed into a search engine such as Elasticsearch or Algolia. This includes fields like the title of the product, its description, its price, and other relevant features (sizes for shoes, square feet for houses, etc.) And the application is typically quite simple – the user submits a search, and the backend issues a query that hopefully captures the customers intent, and then captures the responses and sends them to the frontend for display in the search results.

Unfortunately "right-out-of-the-box" search results are often not that great, and fixing the problem often requires hiring a team of search experts – something that smaller shops are unable to afford. Fortunately, modern AI is coming to the rescue! In this post we'll demonstrate how e-commerce shops can incrementally adopt AI and explore improvements in search which would have been unbelievable just 5 years ago.

Search Architecture Evolution

Context Engineering Requires AI Empathy

A big part of context engineering comes down to empathy. ... Does this sound surprising?

Consider this, LLMs have been trained to act like humans. So when you are building an AI agent, it's a useful exercise to put yourself in their shoes and walk around a bit. For me, I like to think of the AI agent as if it's an AI intern showing up for its first day of work. How would you feel if you were coming in for your first day of work and the boss gave you 50 pages to read? What if you only learned what you were supposed to do with this information after you had already read the 50 pages? And what if the instructions were poorly written, ambiguous, and impossible to achieve with the tools provided!?

In this post I'll go over several places where I've learned to empathize with the AI Intern. But understanding the world from their unique vantage point, you can build better context for the agents and drastically improve the quality of your AI application.

Spec-Driven Development

Why Spec-Driven Development Breaks at Scale (And How to Fix It)

When GitHub Copilot launched in 2021, AI code completion took the development world by storm. But after a mere year or two, code completion was completely eclipsed by vibe-coding, allowing much larger tasks to be accomplished with much less effort. Vibe-coding is great, but it has some problems that limit its utility. Agents tend to work with the code as if they are over-ambitious interns; they often do more damage than good if you're not guiding them at every step.

The most recent trend is spec-driven development. This term is still ill-defined, but the basic idea is that prior to tackling a meaningful code change, you first create a specification document for that change and then use the specification as a guide for the AI to make changes. This helps the agent to better understand the big picture. Once the implementation is complete, you throw away the spec because it has served its purpose.

This form of spec-driven dev is a good idea! But I want more! In this post I'll talk about a bigger notion of spec-driven development. I'm talking about an ideal world where we keep track of the global product specification, and then we allow the agent to build code based upon that.

Spec-Driven Development

Recipes – A Pattern for Common Code Transformations

I did a thing. A very silly, very meta thing. I vibe-coded a CLI tool that summarizes YouTube videos, recorded myself making the tool, and then used the tool to summarize the video of me making the tool. And now, dear reader, you are reading a blog post that was largely generated from that summary.

But the real star of the show isn't the tool, or the video, it's the Recipe Pattern – a way to encapsulate repetitive coding work into a one-off, reusable doc.

Recipe Bot

Supercharging LLM Classifications with Logprobs

I was just reading the classification chapter of Jay Alammar and Maarten Grootendorst's excellent book Hands-On Large Language Models. I felt inspired to extend their work and show yet another cool trick you can do with LLM-based text classification. In their work they demonstrated how an LLM can be used as a "hard classifier" to determine the sentiment of movie reviews. By "hard" I mean that it gives a concrete answer, "positive" or "negative". However, we can do one better! Using "this one simple trick"™ we can make a "soft" classifier that returns the probabilities of each class rather than a concrete single choice. This makes it possible to tune the classifier – you can set a threshold in the probabilities so that classifications are optimally aligned with a training set.

Soft Classification

Fire Yourself First: The E-Myth Approach to Iteratively AI App Development

I've always been interested in entrepreneurship, so, early on in my career, I asked my financial advisor for book recommendations about startups. He handed me "The E-Myth" by Michael Gerber – a book about... building food service franchises? In the heat of the dot-com explosion, this wasn't exactly the startup guide I was hoping for, but its core message stuck with me and turned out to be surprisingly relevant to the problems I hear about regularly when talking to people about building reliable LLM applications.

Fire Yourself

Roaming RAG – RAG without the Vector Database

Let's face it, RAG can be a big pain to set up, and even more of a pain to get right.

There's a lot of moving parts. First you have to set up retrieval infrastructure. This typically means setting up a vector database, and building a pipeline to ingest the documents, chunk them, convert them to vectors, and index them. In the LLM application, you have to pull in the appropriate snippets from documentation and present them in the prompt so that they make sense to the model. And things can go wrong. If the assistant isn't providing sensible answers, you've got to figure out if it's the fault of the prompt, the chunking, or the embedding model.

If your RAG application is serving documentation, then there might be an easy alternative. Rather than setting up a traditional RAG pipeline, put the LLM assistant to work. Let it navigate through the documentation and find the answers. I call this "Roaming" RAG, and in this post I'll show you how it's done.

Roaming RAG