Skip to content

2024

Cut the Chit-Chat with Artifacts

Most chat applications are leaving something important on the table when it comes to user experience. Users are not satisfied with just chit-chatting with an AI assistant. Users want to work on something with the help of the assistant. This is where the prevailing conversational experience falls short.

Asset-Aware Assistant

Consider pair programming. In a real, human pairing session, you and your partner discuss your objectives, talk about how the code should be modified, and then take turns actually modifying the code to implement your ideas. In this scenario – and in most where work is to be done – there is the discussion and then there are the objects of the discussion.

Contrast this with the naive AI assistant chat in which the assistant is not able to make the distinction between the discussion and the things being discussed. The assistant may come up with fantastic ideas about how to write your report or accomplish your task, but those ideas are quickly lost in the scrollback. And if there are multiple objects floating around in the discussion, then it's nearly impossible to tell the assistant which objects you're talking about and which version and how they relate to one another. At the end of the conversation, the user might find themselves scrolling back to copy out pieces of the conversation that they need.

The answer to this problem is artifacts. Artifacts, are referenceable chunks of stateful content. Artifacts are the objects of the discussion and the items being worked upon. Both the assistant and the user have the ability to create, retrieve, update, and delete these artifacts and to refer to them as needed.

Artifact-aware Conversation
Simple Illustration of a Conversation with Artifacts

In this post I will show you how to step beyond the status quo and build a better user experience with artifact-aware AI assistants.

Blog posts not your thing?

Here's an 8 minute video that covers the juicier points.

Status Quo – No Artifacts

But first, let's take a closer look at the status quo experience, just to drive home the pain. In the demo below, the user (blue text) is a real estate agent. The real estate agent is working with an AI assistant to prepare a home listing email to send to a client. (Note: the empty panel on the right is intended to hold artifacts – we'll put it to good use in a moment.)

There's a lot going on here.

  • First, the real estate agent asks the assistant to retrieve a home listing. The assistant complies – and also proactively retrieves the email template – but, unless the agent digs through the tool usage, they don't know anything more about the listing than what the assistant tells them. Is this even the right listing?
  • Next, the real estate agent asks the assistant to prepare the email. The assistant complies by generating a draft in their follow-up message. The annoying part here is that there is no boundary between the assistant's text and the object that we are working on. It's just one big blob of text.
  • The buyer's name and agent's names have been omitted, so the agent asks the assistant to update the email with the correct names, but also to add their name to the template. The assistant complies with the email, but ignores the request to fix the template because it doesn't know how to comply.
  • If the real estate agent wants to use the email, they have to scroll around and copy-paste it out and send it themselves. This is unnecessary toil.

This is not be a good experience. The user feels lost (not able to see the original data), confused (they can't see the data that the assistant can see), and overburdened (it's on you to extract the work and apply it). And if the conversation were to continue, it would only get worse. More items will be discussed, many of them will have several versions, and all of it will be scattered in the scrollback and effectively lost.

Change is in the Air as Companies Move Toward Artifacts

Anthropic Artifacts
Anthropic Artifacts: AI-generated diagrams and documents
OpenAI Canvas
OpenAI Canvas: Interactive content creation workspace
Cursor IDE
Cursor: Project-aware AI coding assistant
Hex Analytics
Hex: AI-powered data analytics and dashboards

Some major players are beginning to explore the potential of artifact-based interactions. Anthropic’s Artifacts and OpenAI’s Canvas allow users to describe diagrams, documents, and simple applications, which then materialize alongside the conversation. While these tools offer a glimpse into new UX possibilities, they still feel like prototypes focused more on form than function. For instance, Anthropic’s Artifacts lack direct editing capabilities, making even small adjustments cumbersome, and limiting their utility for serious work.

In contrast, companies like Cursor and Hex are using artifacts to drive tangible productivity. Cursor provides software developers with a project-aware assistant that listens to requests, suggests file changes, and lets users apply edits selectively. By clearly separating the conversation from project files, Cursor gives both the user and assistant a better mental model of the task, leading to a more productive workflow. Compare this to copy-pasting swaths of code back-and-forth into ChatGPT. (I did a lot of this before Cursor!)

Similarly, Hex empowers data scientists by combining notebooks, SQL, Python, and AI to create interactive dashboards and analytics. Its "magic" AI assistant enhances workflow by tracking both the conversation and the artifacts (e.g. dashboards and datasets). In conversation, users can "@-reference" artifacts, and instruct the assistant to generate new dashboards. In this way it's easy for analysts (and even non-analysts) to quickly piece together dashboards for their company.

The New State of the Art – Artifacts

Let's take another look at our real estate application, but this time let's make it artifact aware.

  • The real estate agent starts by asking the assistant to retrieve the home listing, and it does. But this time the assistant response includes a link to one of the artifacts in the artifact panel on the right. There the agent can see the full details of the home listing. In this simple demo the artifacts are just bare JSON. But in a real application, the home listing would include a image carousel, property details, interactive maps, and integrated scheduling for viewings.
  • The agent asks the assistant to create the email according to their saved template. This time two new artifacts appear. The first is the template retrieved from the get_email_template tool call. The second is customized email generated by the assistant which pulls the contents of the listing into the template.
  • Finally, the agent tells the assistant to correct the names in the email and to update the template. And the assistant does as it's told! In the conversation it provides links to the updated artifacts and explains the actions taken.

This experience is so much better than before. It's intuitive because it's how conversation work in real life: you have a conversation about your work. In the left panel is the conversation and in the right panel are the work items of work being referred to. When the assistant talks about the work, it conveniently links to the actual artifact so that you can review the full details. The assistant understands that it can create, retrieve, and update artifacts – which leads to a much more coherent interaction. And you don't have to copy-paste assets out of the scrollback. If this were a real application, you would likely even send the email directly from the app!

Now You Try!

If you have a moment, try the demo yourself and compare the difference between the two assistants:

Note that I've added some suggested comments to get you started. Make sure to let me know if you find anything interesting (... or broken).

Implementing Artifact-Aware Assistants

Artifact-aware assistants require coordinated implementation in the backend, frontend, and system message. Fortunately it's actually rather simple.

System Message

In order to build an artifact-aware assistant, the first thing you need to do is to convey to the model what artifacts are, and how they work. Here's the system message that I used to build the above demo.

Artifacts System Message
You are a helpful assistant.

<artifacts_info>
Artifacts are self-contained pieces of content that can be referenced in the conversation. The assistant can generate artifacts during the course of the conversation upon request of the user. Artifacts have the following format:

ˋˋˋ
<artifact identifier="acG9fb4a" type="mime_type" title="title">
...actual content of the artifact...
</artifact>
ˋˋˋ

<artifact_instructions>

- The user has access to the artifacts. They will be visible in a window on their screen called the "Artifact Viewer". Therefore, the assistant should only provide the highest level summary of the artifact content in the conversation because the user will have access to the artifact and can read it.
- The assistant should reference artifacts by `identifier` using an anchor tag like this: `<a href="#18bacG4a">linked text</a>`.
- If the user says "Pull up this or that resource", then the assistant can say "I found this resource: <a href="#18bacG4a">linked text</a>".
- The linked text should make sense in the context of the conversation. The assistant must supply the linked text. The artifact title is often a good choice.
- The user can similarly refer to the artifacts via an anchor. But they can also just say "the thing we were discussing earlier".
- The assistant can create artifacts on behalf of the user, but only if the user asks for it.
- The assistant will specify the information below:
    - identifiers: Must be unique 8 character hex strings. Examples: 18bacG4a, 3baf9f83, 98acb34d
    - types: MIME types. Examples: text/markdown, text/plain, application/json, image/svg+xml
    - titles: Must be short, descriptive, and unique. Examples: "Simple Python factorial script", "Blue circle SVG", "Metrics dashboard React component"
    - content: The actual content of the artifact and must conform to the artifact's type and correspond to the title.
- To create an artifact, the assistant should simply write the content in the format specified above. The content will not be visible to the user in chat, but instead will be visible in the Artifact Viewer. After creating an artifact, they can refer to it in the conversation using an anchor tag as described above. Example:
    ˋˋˋ
    HUMAN: Create a simple Python int sort function.
    ASSISTANT: I will create a simple Python merge sort function.
    <artifact identifier="18bacG4a" type="text/markdown" title="Simple Python int sort function">
    def sort_ints(ints):
        if len(ints) <= 1:
            return ints

        mid = len(ints) // 2
        left = sort_ints(ints[:mid])
        right = sort_ints(ints[mid:])

        # Merge sorted halves
        result = []
        i = j = 0
        while i < len(left) and j < len(right):
            if left[i] <= right[j]:
                result.append(left[i])
                i += 1
            else:
                result.append(right[j])
                j += 1

        result.extend(left[i:])
        result.extend(right[j:])
        return result
    </artifact>

    It is available in the Artifact Viewer as <a href="#18bacG4a">Simple Python int sort function</a>.
    ˋˋˋ
- The assistant can edit artifacts. They do this by simply rewriting the artifact content.
- If the user asks the assistant to edit the content of an artifact, the assistant should rewrite the full artifact (e.g. keeping the same identifier, but modifying the content and the title if needed).
- The user doesn't have to explicitly ask to edit an "artifact". They can just say "modify that" or "change that" or something similar.
- When editing the artifact, you must completely reproduce the full artifact block, including the identifier, type, and title. Example:
    ˋˋˋ
    HUMAN: Make that sorting function sort in descending order.
    ASSISTANT: <artifact identifier="18bacG4a" type="text/markdown" title="Simple Python int sort function (descending)">
    def sort_ints(ints):
        if len(ints) <= 1:
            return ints

        mid = len(ints) // 2
        left = sort_ints(ints[:mid])
        right = sort_ints(ints[mid:])

        # Merge sorted halves in descending order
        result = []
        i = j = 0
        while i < len(left) and j < len(right):
            if left[i] >= right[j]:  # Changed <= to >= for descending order
                result.append(left[i])
                i += 1
            else:
                result.append(right[j])
                j += 1

        result.extend(left[i:])
        result.extend(right[j:])
        return result
    </artifact>
    ˋˋˋ
- All existing artifacts are presented in the <artifacts> tag below.
</artifact_instructions>

</artifacts_info>

<artifacts>
<artifact identifier="ab3f42ca" type="application/json" title="123 Maple Street Listing">
{
    "address": "123 Maple Street",
    "price": 450000,
    "bedrooms": 3,
    "bathrooms": 2,
    "sqft": 1800,
    "description": "Charming craftsman with original hardwood floors",
    "yearBuilt": 1925,
    "status": "For Sale"
}
</artifact>
</artifacts>

The approach here is straightforward.

  • Explain what an artifact is - a formatted blob of information that takes the form

    <artifact identifier="d3adb33f" type="application/json" title="The Title">
        ... content ...
    </artifact>
    

    Further, explain the expected format and constraints of the fields.

  • Explain that the user can see these the artifacts, and therefore the assistant does not need to recreate them in its messages. Instead, the assistant should refer to the artifact using a link formatted as <a href="d3adb33f">link text</a>.

  • Explain that the assistant can both create and modify artifacts by retyping them. I've included a couple of example interactions to help the model out.
  • Finally, present the existing artifacts to the assistant.

This simple system message works quite well even though the model I'm using, claude-3-5-sonnet, is not trained on artifacts. I think the reason for this is because the models are used to text that includes references. In natural language we use, nicknames and pronouns. In programming we refer to variables and packages. And in HTML – which is found in abundance in training – we use links! Thus, the model has ample training to differentiate the content of a conversation from the objects of discourse.

Backend Implementation

A naive assistant (not aware of artifacts) is implemented as a loop which keeps track of messages in a conversation. When a user submits a message, the assistant:

  1. Appends the user message to the existing messages.
  2. Sends the message list to the model and then retrieves the response message. (If you use tool calling, then that also happens here.)
  3. Appends the response message to the list of existing messages as the assistant message.
  4. Sends the assistant message back to the user.

With an artifact-aware assistant, you have to keep track of both the messages and the artifacts, so there are a couple of extra steps. When a user submits a message, the assistant:

  1. Extracts any artifacts from the user message (for instance if the user created a new work item) and replaces them with links.
  2. Generates the system message, which contains both the instructions and the list of all artifacts.
  3. Append the user message to the existing messages.
  4. Sends the system message and conversation messages to the model. In the demo we are using tool calling, so there's also a for-loop handling tool invocations.
  5. Extract artifacts that are either generated by the assistant or retrieved from a tool call and replace them with links.
  6. Send the assistant message and the artifacts back to the user.

I must make a note on artifact extraction. In my implementation, the assistant is able to modify existing artifacts. It does this by simply rewriting the artifact in the conversation. (In the demo, this is automatically replaced with links, so the user only ever sees the artifacts in the artifact panel.) Since the assistant can rewrite artifacts, this means that there can be multiple versions of the same artifacts. The way I'm handling this is to use the most recent version and delete the old version of the artifact. In a more sophisticated implementation, perhaps we would track the changes to the artifacts so that the assistant can understand their history.

Frontend Implementation

The frontend requires changes as well. Most notably, you need a place to present the artifacts. In the demo, this is a dedicated panel to the right of the conversation. If your UI doesn't have room for that, then there are alternatives. For instance, you can have a tabbed chat window that allows the user to flip over and see the artifacts. Or you can still incorporate the artifacts into the chat as embedded UI elements. This loses some of the benefit because your artifacts will scroll away as the conversation continues, but at least the artifacts aren't just blobs of text - they can be made into "smart" objects that the user can interact with.

The chat panel requires a small update. The backend will now return messages that include links to the tabs in the artifacts. Make sure that these links look nice and, most importantly, reveal the corresponding item in the artifact panel when clicked.

Next, unlike with Anthropic Artifacts, why not let the user directly edit the artifacts? Make sure to capture the edits and send the updated artifacts to the backend.

Finally, I haven't done this with my demo, but if the assistant is creating and updating artifacts, it is probably important to make sure the user can understand the changes and explicitly accept or reject them. Perhaps the best approach is to follow Cursor's lead and present the users with a GitHub-style red/green diff of the changes, and a button beside each that allows the user to accept or reject the change.

Check It Out

If you'd like to see how the sausage is made, check out the repo that implements the demo here:

https://github.com/arcturus-labs/artifact-aware-assistant

Warning, it is not production-ready code!

Conclusion

If you are curious about Artifact assistance, fortunately they're not hard to set up! The demo that I prepared for this blog post is just the tip of the iceberg. Consider other possibilities that artifacts unlock:

  • Interactive artifacts that presented themselves in a malleable user interface. For instance a home listing, complete with an image carousel and an interactive map.
  • Durable artifacts that save themselves to disk as they are created and updated. Such as a modified email template.
  • Active artifacts that accomplish real work. Imagine an email artifact that sends itself with the click of a button.
  • Rich versioning, allowing the user to traverse the changes associated with this artifact and link to the portion of the conversation where the change occurred.

I bet that you can think of plenty more things!

Special thanks to Doug Turnbull, Freddie Vargus, Bryan Bischof, and Julia Neagu for providing feedback on this post.


Hey, and if you liked this post, then maybe we should be friends!

Bridging the Gap Between Keyword and Semantic Search with SPLADE

In information retrieval, we often find ourselves between two tools: keyword search and semantic search. Each has strengths and limitations. What if we could combine the best of both?

By the end of this post, you will:

  • Understand the challenges of keyword and semantic search
  • Learn about SPLADE, an approach that bridges these methods
  • See a practical implementation of SPLADE to enhance search

If you've struggled with inaccurate search results or wanted a more transparent search system, this post is for you. Let's explore how SPLADE can change your approach to information retrieval.

The Unfortunate State of the Art

With the rise of RAG methods in prompt engineering, vector-based semantic search has become essential for many applications. It's easy to see why: semantic search overcomes some key limitations of keyword search. In traditional keyword search, you might type terms that mean the same thing as the document you're seeking, but if you use different words, you won't get a match. For example, searching for "ape costume" won't find a document mentioning "gorilla suit." Semantic search, on the other hand, converts your query into a vector representing its meaning. If there's a document with a similar meaning (represented by a nearby vector), you get a match!

Semantic search seems almost magical... until it's not.

There are some gnarly challenges with semantic search that we're still grappling with:

  • Larger indexes – Keyword search indexes typically grow to 1.5x-2x the original document size. Semantic search indexes can be twice that size.
  • Chunking complexity – You need to split text into chunks because embedding quality degrades with too much input. But where do you split? Do chunks need to overlap? How do you ensure important context isn't lost?
  • Lack of transparency – With keyword search, debugging is straightforward – the tokens are human-readable, so you can understand why a document matches. You can adjust queries, field boosts, and phrase matches to tune relevance. Semantic search is opaque; if a query doesn't match as expected, it's hard to understand why. Fixing relevance often means training a new embedding model and reindexing everything. Ouch!

Wouldn't it be great to have the best of both worlds? We want semantic search's ability to match on meaning, combined with the transparency and simplicity of traditional keyword search.

Enter SPLADE

SPLADE (Sparse Lexical and Expansion Model for First Stage Ranking) was introduced in a July 2021 paper and quickly improved upon in a September follow-up. The concept is simple: instead of asking a semantic model for a meaning-carrying vector, ask it for important terms that should be in the document, whether they're actually present or not. For instance, given a document containing "ape costume," the model might identify similar terms like "gorilla orangutan monkey suit clothes." These synthetic terms can then be indexed in a traditional search engine, boosting recall when added to the search field.

In this post, we'll explore how to use SPLADE to enhance your search. We'll create a silly document set (because what fun is it to use a realistic example?), index it, and demonstrate how conventional search can fall short when query terms don't quite match. Then, we'll add SPLADE and show how it addresses this problem.

Setup

What's your favorite superhero? Superman? Wolverine? Batman? ... Mine's got to be Hindsight Lad – a computer researcher who contributed to his team by critically reviewing past decisions and explaining what they should have done instead. (Real character! Look him up!)

Hindsight Lad
Image borrowed from the Marvel fandom wiki

Inspired by Hindsight Lad, I've chosen superheroes for our example dataset. It's a simple list of superheroes including their names, true identities, descriptions, and superpowers. Here's an excerpt:

Name True Identity Description Superpowers
Spider-Man Peter Parker A high school student bitten by a radioactive spider Web-slinging, superhuman strength, spider-sense
Hindsight Lad Carlton LaFroyge A teenager with the ability to analyze past events and point out mistakes Retroactive clairvoyance, tactical analysis of past events
Batman Bruce Wayne A billionaire industrialist and philanthropist Genius-level intellect, master detective, peak human physical condition
Arm-Fall-Off Boy Floyd Belkin A superhero with the ability to detach his arms Detachable arms, using detached arms as weapons (yes... another real character!)
Superman Clark Kent An alien from the planet Krypton Flight, super strength, heat vision, invulnerability

To demonstrate the semantic mismatch problem, I've also generated alternative descriptions that convey the same meaning but use almost no common words:

Name Alternate Description
Spider-Man An adolescent scholar affected by an irradiated arachnid
Hindsight Lad A young critic gifted with retrospective wisdom
Batman A wealthy entrepreneur and humanitarian
Arm-Fall-Off Boy A costumed vigilante capable of limb separation
Superman An extraterrestrial being from a distant celestial body

Our curated list has just 50 heroes, so querying with alternate descriptions might work well for semantic search, but traditional information retrieval will likely struggle.

Indexing

Let's demonstrate this. Here is function that will index all of our documents:

def index_superheroes(num_tokens=50):
    # Create the index with mappings
    index_name = "superheroes"
    mappings = {
        "mappings": {
            "dynamic": "false",
            "properties": {
                "description": {
                    "type": "text",
                    "analyzer": "english",
                },
                "splade": {
                    "type": "text",
                }
            }
        }
    }

    # delete and recreate the index
    if es.indices.exists(index=index_name):
        es.indices.delete(index=index_name)
        print(f"Index '{index_name}' deleted successfully.")
    else:
        print(f"Index '{index_name}' does not exist.")

    es.indices.create(index=index_name, body=mappings)
    print(f"Index '{index_name}' created successfully.")

    df = pd.read_csv('superheroes.csv')
    # Index the superheroes
    for i, (index, row) in enumerate(df.iterrows(), start=1):
        # Combine the index (superhero name) with the row data
        full_row = pd.concat([pd.Series({'name': index}), row])
        doc = full_row.to_dict()
        doc['splade'] = get_splade_embedding(doc['description'], num_tokens)
        es.index(index=index_name, id=i, body=doc)

    print(f"Indexed {len(df)} superheroes.")

This script creates an index with two fields: description for superhero descriptions and splade for synthetic terms. The SPLADE content is generated by processing the description through get_splade_embedding, which we'll define next:

from transformers import AutoModelForMaskedLM, AutoTokenizer
import torch

# Load the SPLADE model and tokenizer
model_id = 'naver/splade-cocondenser-ensembledistil'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id)

# Create a mapping from token IDs to tokens
vocab = tokenizer.get_vocab()
id2token = {v: k for k, v in vocab.items()}

def get_splade_embedding(text, num_tokens=50):
    # get the tokens
    tokens = tokenizer(text, return_tensors='pt')

    # get the splade embedding
    output = model(**tokens)
    vec = torch.max(
        torch.log(
            1 + torch.relu(output.logits)
        ) * tokens.attention_mask.unsqueeze(-1),
    dim=1)[0].squeeze()

    # Convert vec to numpy for easier manipulation
    vec_np = vec.detach().numpy()

    # Get indices of non-zero elements
    non_zero_indices = vec_np.nonzero()[0]

    # Create a list of (token, value) pairs for non-zero elements, excluding the input tokens
    token_value_pairs = [
        (id2token[idx], vec_np[idx]) 
        for idx in non_zero_indices 
        if idx not in tokens['input_ids'][0]
    ]

    # Sort by value in descending order
    token_value_pairs.sort(key=lambda x: x[1], reverse=True)

    new_tokens = [token for token, value in token_value_pairs[:num_tokens]]

    return new_tokens

This code is more complex, but builds on existing work. It's adapted from Pinecone's SPLADE writeup, with equations detailed in the SPLADEv2 paper. Essentially, it extracts tokens from input text, uses the SPLADE model to identify important terms (SPLADE tokens), filters out original tokens, converts remaining tokens to readable text, and returns the result.

Searching

What good is an index that can't be searched? Let's remedy that:

def search_superheroes(description, size, splade):
    # If SPLADE is enabled, we search both the description and SPLADE fields
    if splade:
        # Get SPLADE tokens for the description
        splade_tokens = get_tokens_as_text(description)
        query = {
            "query": {
                "bool": {
                    "should": [
                        {
                            "multi_match": {
                                "query": description,
                                "fields": ["description"]
                            }
                        },
                        {
                            "multi_match": {
                                "query": splade_tokens,
                                "fields": ["splade"]
                            }
                        }
                    ]
                }
            }
        }
    # If SPLADE is not enabled, we only search the description field
    else:
        query = {
            "query": {
                "multi_match": {
                    "query": description,
                    "fields": ["description"]
                }
            }
        }
    # Set the number of results to return
    query['size'] = size

    # Execute the search query
    response = es.search(index="superheroes", body=query)

    # Extract the hits from the response
    hits = [hit['_source'] for hit in response['hits']['hits']]
    return hits

This function searches for superheroes based on a description (which will be drawn from our alternative description list). When splade is true, it searches both description and splade fields; otherwise, only the description field.

We still need the get_tokens_as_text function to convert descriptions into SPLADE tokens. Note that this doesn't expand the description with synthetic terms, it simply tokenizes it:

def get_tokens_as_text(text):
    tokens = tokenizer(text, return_tensors='pt').input_ids[0]
    return ' '.join([id2token[i] for i in tokens.tolist()][1:-1])

Now we're ready to see if this all actually works!

Demo Time

Let's take the above code out for a spin.

First we index our superheroes with index_superheroes(num_tokens=50). Here we inject up to 50 SPLADE tokens for each row in our data set.

Next, with SPLADE turned off, let's see if we can catch Iron Man using his alternative description:

use_splade = false

hero = "Iron Man"
alt_description = hero_dict_alt[hero]
search_results = search_superheroes(alt_description, 3, use_splade)
result_heroes = [result['name'] for result in search_results]

print(result_heroes)
['Beast']

Nope... that's a miss! Well, after I've spent all this time writing a blog post, I hope that we can turn SPLADE back on and see Iron Man in the results.

['Black Panther', 'Iron Man', 'Beast']

Yay! I mean, I would have preferred that Iron Man was number 1 in the search results. But being in the top 3 results out of 50 for something as generic as "A brilliant innovator and corporate magnate" is not bad.

But perhaps we were lucky with this example. Let's create a new function recall_at_3 that will run through every hero and and see if SPLADE is actually helping us improve recall.

def recall_at_3(splade):
    counter = 0
    for hero in hero_dict.keys():
        alt_description = hero_dict_alt[hero]
        search_results = search_superheroes(alt_description, 3, splade)
        result_heroes = [result['name'] for result in search_results]
        # Check if the hero is in the top 3 search results
        if hero in result_heroes:
            counter += 1

    # Calculate and return the recall@3 score
    return counter / len(hero_dict.keys())

First we test without SPLADE recall_at_3(False) and see that the recall is 28% – as expected, not great. Now with SPLADE recall_at_3(True) returns (... drum roll please ...) 52%.

Alright! (Whew!) So by injecting synthetic tokens into our indexed documents we have improved recall (recall@3 to be precise) by a hefty 24%!

Retrospective

I can feel my inner Hindsight Lad jumping up and down in my head. It's time to take a closer, more critical look at what we just accomplished. SPLADE is definitely neat, but it doesn't fix all of the problems we've identified with semantic search.

Hindsight Lad
Image borrowed from the some guy on X

We've improved recall, but in a longer blog post (which I shall never write) we would also look at how precision changes. The problem is that sometimes the synthetic tokens produced in get_splade_embedding can be... wonky. Take a look at this example:

get_splade_embedding("mary had a little lamb, it's fleece was white as snow", 15)
['marriage',
 'married',
 'winter',
 'song',
 'wedding',
 'have',
 'sheep',
 'whites',
 'baby',
 'like',
 'color',
 'wearing',
 'film',
 'character',
 'murder']

There's a lot going on here. We start off with several words related to marriage (which is not mentioned in the original song) and then right at the end it takes a darker turn with murder. You know how the rest of that song goes, and these words are clearly a miss. There are also a couple of stop words (super common words) in there: have, and like. This will definitely increase recall as it will match about half of the docs in the index, but this will take it's toll on precision.

Next, my SPLADE implementation in Elasticsearch is oversimplified. If you scroll back up to get_splade_embedding, we extract non-zero elements from vec_np (the SPLADE tokens) but discard their associated weights. This is a missed opportunity. The SPLADE papers use these weights for scoring matches. Incorporating this nuance – for instance, the fact that murder is less relevant to Mary than sheep, song, baby, and white – would significantly enhance precision.

Finally, one of the problems with semantic search that we were trying to avoid is the complexity of dealing with the embedding model when it doesn't quite do what you want it to do. When an embedding model doesn't match the correct documents, then your only option is to retrain the model, reindex, and hope. But with SPLADE, if it thinks that Mary likes murder, our options aren't much better. The main benefit of SPLADE in this case is that you can actually see the words produced by the model, (rather than an opaque vector). This will make it easier to debug the problem and improve it. ... Maybe SPLADE's training data had too many references to Mary I of England (you know... "Bloody Mary").

Conclusion

SPLADE is a promising approach that bridges the gap between traditional keyword search and modern semantic search. And this is a good thing! In many ways, good ol' keyword search is the right tool because it's relatively simple, it's well understood, and it's easy to scale and maintain. But traditional keyword search still falls short when it comes to matching on meaning.

This post is begging for follow-up posts:

  • How does my implementation of SPLADE+Elasticsearch affect precision?
  • How does semantic search perform against my implementation of SPLADE+Elasticsearch?
  • Can we improve SPLADE+Elasticsearch? I want to see how tough it is to get the SPLADE weights into the Elasticsearch scoring.
  • Did you know that Elasticsearch offers a SPLADE-like solution called ESLER? I wonder how that compares with the solution presented here.

If you're interested in hearing more about this topic, then let me know. We could write a post together about it.

Before You Go: Exciting News!

While we're on the topic of innovative technologies, I'm thrilled to announce that I'm authoring a book on LLM Application Development, set to release in November 2024. This book distills years of experience building production-grade LLM applications at GitHub and for various consulting clients into practical, actionable insights.

Hindsight Lad

What's in it for you?

  • Insider tips and tricks from real-world LLM projects
  • Strategies to overcome common challenges in LLM application development
  • A comprehensive guide to building robust, scalable LLM solutions

Are you currently working on an LLM application and facing roadblocks? Or perhaps you're looking to leverage LLMs in your next big project? I'd be delighted to lend my expertise. Reach out to me at jfberryman﹫gmail‧com for consulting inquiries or just to chat about the exciting world of LLMs!

Let's push the boundaries of what's possible with LLMs together!