KB sync
Hello everyone. I have set my Knowledge base to update daily. Whenever it does refresh it deletes my matedata that were attached to that source. Is there a way to keep the metadata? Thanks
11 Replies
At the moment the automatic “Daily Sync” for URL-based Knowledge-Base sources works by completely replacing the document each time it crawls the page. Because the document is recreated, any metadata you had previously written to the document / its chunks is also replaced.
What you can do today
1. Keep the daily sync turned on, but re-apply the metadata right after each refresh with the Management API:
- PATCH
/v1/knowledge-base/docs/{documentID}
to update the whole document’s metadata, or
- PATCH /v1/knowledge-base/docs/{documentID}/chunk/{chunkID}
if you only need to tag specific chunks.
2. Turn off the automatic sync and instead run your own scheduled script that
a) PUT /v1/knowledge-base/docs/{documentID}/upload
(re-upload the URL) including the same metadata in the request body, or
b) uploads a fresh document and deletes the old one—again supplying metadata in the same call.
Until metadata-preserving sync is added to the product, one of the two work-arounds above is required.
Docs for reference
• Update Document Metadata – https://developer.voiceflow.com/reference/patch_v1-knowledge-base-docs-documentid
• Update Chunk Metadata – https://developer.voiceflow.com/reference/patch_v1-knowledge-base-docs-documentid-chunk-chunkid
• Replace Document (URL) – https://developer.voiceflow.com/reference/put_v1-knowledge-base-docs-documentid-upload
Feel free to add a feature request in Discord → #feature-requests so the team can track interest in persisting metadata during auto-sync.There is a known bug with the daily update. I saw somewhere VF is looking into it.
Here: https://discord.com/channels/1079548823610871889/1384548571822686329/1392783381120680057
@W. Williams (SFT) that is a different kind of issue.
My issue is that the refresh also delets my metadata. Is it part of the bug or is there a way to fix it on my side?
sorry, I missed that. @NiKo | Voiceflow @Braden (Voiceflow CEO)
Shared with the team so they can investigate 👍🏻
@hrubasek Could you give more details on how you process the docs?
Are you uploading URLs and set a refresh rate from the UI and then add metadata to the docs using the API or do you add URLs+metadata using the API and then set the refresh rate in the UI.
Update: a fix will be released on Monday
@NiKo | Voiceflow thank you!
Fix has been pushed, can you give it another try on your end when you get the chance @hrubasek?
@NiKo | Voiceflow works perfect thanks! Is there a way to restore my metadata? I manually added around 1500 metadata and now it is all gone due to this bug. Thanks for additional help
Sadly not I'm afraid. Were you not using a custom automation to updated the urls + metadata?
If you have a source for the metadata, you can do a quick script the update the docs with them based on the doc URL maybe?
@NiKo | Voiceflow I was not using any automation or scrypts. I am not really experienced in this thing just yet. If someone could help me do this I would really appreciate it since I really dont want to do all that manually again due to this bug...
Do you have a list of URLs along with their associated metadata?