David•6mo ago

KB Chunk Score matches technically, but semantically it misses the point

Hey! I’m running into a challenge with the Knowledge Base chunk scoring. Technically it works as expected, but practically, it's not returning what I actually need. Example: I asked: “Do you have a snowmobile?” Voiceflow returned the top 10 chunks all related to snowmobile parts (e.g. suspension kits, windshields, etc.). These chunks mention the word snowmobile several times, so I get why the score is high. But none of them answer the actual question: whether we sell complete snowmobiles. So technically the ranking is correct, but semantically it misses the intent. 🔍 What I’m looking for - Is there a way to bias chunk ranking toward intent match, not just keyword overlap? - Any way to boost certain metadata (like type: product vs type: spare-part) during scoring?

2 Replies

Tico | Voiceflow Assistant•6mo ago

Here’s how you can get more “intent-aware” results from the Knowledge Base. 1️⃣ Add & use metadata • When you upload a doc (file, URL or table) you can attach any key/value metadata – e.g. {"type":"product"} vs {"type":"spare-part"}. • At run-time you can filter the search with that metadata so only the desired chunks are considered:

{
  "question": "Do you sell snowmobiles?",
  "chunkLimit": 5,
  "filters": {
    "type": { "$eq": "product" }
  }
}

{
  "question": "Do you sell snowmobiles?",
  "chunkLimit": 5,
  "filters": {
    "type": { "$eq": "product" }
  }
}

The filter is part of the KB Query API (and is also available in the Knowledge Base > Advanced > Query API step). Logical operators $and, $or, $in, etc. let you combine multiple signals (price, availability, tags…). See docs: https://developer.voiceflow.com/docs/filter-with-metadata 2️⃣ Call the KB Query API directly (optional) If you need even finer control (e.g. do your own re-ranking) you can bypass the built-in retrieval, call POST /knowledge-base/query, inspect the chunks and decide what to surface. Docs: https://developer.voiceflow.com/reference/post_knowledge-base-query 3️⃣ Structure the content to help the embedder Similarity search leans on word overlap, so adding a short sentence that directly answers the “do you sell whole snowmobiles?” question – or splitting product-vs-parts into separate docs – increases the embedding similarity and improves default ranking without filters. ➤ Key takeaway You can’t currently change the raw similarity-score formula, but you can steer which chunks are even eligible via metadata filters or by pre-structuring your docs. In most commerce KBs the combination of a type/category metadata field + a filter like the JSON above is the simplest way to make the KB return complete-unit products instead of parts. Hope that helps! More details in the docs above or at https://docs.voiceflow.com

DavidOP•6mo ago

export default async function main(args) { const { question, VFapiKey, } = args.inputVars; const url = 'https://general-runtime.voiceflow.com/knowledge-base/query'; const data = { chunkLimit: 10, synthesis: false, settings: { model: "claude-3.5-sonnet", temperature: 0.1, system: "You are an AI Product Finder. Your task is to retrieve all product data matching the users query and only choose the product chunks" }, question: question, filters: { item_type: { $eq: "Vehicle" } } }; try { const response = await fetch(url, { method: 'POST', headers: { 'accept': 'application/json', 'content-type': 'application/json', 'Authorization': VFapiKey, 'OpenAI-Beta': 'assistants=v1' }, body: JSON.stringify(data) }); if (!response.ok) { throw new Error(HTTP error! status: ${response.status}); } const responseBody = await response.json; // ← viktigt! let chunks = responseBody.chunks || []; let filteredChunks = chunks.filter(chunk => chunk.score > 0.1); let cleanedChunks = filteredChunks.map(({ score, chunkID, documentID, tags, ...rest }) => rest); return { outputVars: { cleaned_chunks: JSON.stringify(cleanedChunks) }, next: { path: 'success' }, trace: [ { type: 'debug', payload: { message: Chunks received: ${JSON.stringify(cleanedChunks)} } } ], }; } catch (error) { return { outputVars: { error: An error occurred: ${error.message} }, next: { path: 'error' }, trace: [ { type: 'text', payload: { message: An error occurred: ${error.message} } } ], }; } } Doesn't find any chunks even tho the metadata has a "item_type" for "Vehicle" and "Parts"

KB Chunk Score matches technically, but semantically it misses the point

Did you find this page helpful?