Knowledge Synchronization
Automate the synchronization of knowledge from your systems of record into Sharely.ai, enabling AI agents to deliver unified insights across all your organizational content.
What is Knowledge Synchronization?
Knowledge Synchronization is the practice of programmatically keeping Sharely.ai's knowledge base in sync with your source systems through automated workflows using the Sharely.ai API.
Key capabilities:
- Multi-source aggregation - Unify knowledge from CMSs, file storage, databases, and external systems
- Automated reconciliation - Add, update, and remove knowledge based on your source of truth
- Idempotent operations - Run sync workflows repeatedly without side effects
- Temporal-powered reliability - Built on Temporal workflows for resilient, restartable processes
- YAML-driven configuration - Declarative source of truth that's version-controlled and auditable
- Role-based synchronization - Automatically apply RBAC rules during sync
When to Use Knowledge Sync
Use Knowledge Synchronization when you want to:
- ✅ Aggregate knowledge from multiple siloed systems into a unified AI-accessible knowledge base
- ✅ Automate content updates from your CMS, file storage, or databases
- ✅ Maintain Sharely.ai knowledge in sync with your system of record
- ✅ Apply consistent metadata and role-based access control at scale
- ✅ Version control your knowledge configuration with Git
- ✅ Eliminate manual upload workflows and human error
Common use cases:
- Professional associations - Sync research libraries, videos, podcasts, and member resources from multiple platforms
- Enterprise documentation - Keep internal knowledge base synchronized with Confluence, SharePoint, or Notion
- Content publishers - Automatically sync CMS content (WordPress, Contentful, Strapi) to power AI assistants
- Educational institutions - Aggregate course materials, recordings, and resources from learning management systems
- Healthcare organizations - Synchronize medical literature, training videos, and clinical guidelines from diverse sources
How It Works
Knowledge Synchronization follows a simple reconciliation pattern:
1. Define Source of Truth
Create a YAML configuration file listing all knowledge that should exist in Sharely.ai:
# knowledge-config.yaml
workspace_id: "your-workspace-uuid"
organization_id: "your-org-id"
knowledge:
- source_path: "azure://research/diabetes-guidelines-2024.pdf"
title: "Diabetes Treatment Guidelines 2024"
type: "FILE"
language: "en"
roles: ["medical-professionals"]
- source_path: "wordpress://blog/ai-in-healthcare"
title: "AI Applications in Modern Healthcare"
type: "LINK"
url: "https://nmea.org/blog/ai-in-healthcare"
roles: ["all-members"]
- source_path: "vimeo://videos/cme-cardiology-101"
title: "CME: Cardiology Fundamentals"
type: "LINK"
url: "https://vimeo.com/nmea/cardiology-101"
roles: ["medical-professionals", "students"]2. Fetch Current State
Query Sharely.ai to get all existing knowledge in your workspace:
const response = await fetch(
`https://api.sharely.ai/v1/workspaces/${workspaceId}/knowledge?limit=100`,
{
headers: {
'Authorization': `Bearer ${apiToken}`,
'Content-Type': 'application/json'
}
}
);
const existingKnowledge = await response.json();3. Reconcile Differences
Compare your YAML configuration (desired state) with existing knowledge (current state):
// Build index of existing knowledge by source_path
const existingMap = {};
existingKnowledge.items.forEach(item => {
if (item.metadata?.source_path) {
existingMap[item.metadata.source_path] = item;
}
});
// Determine what to add, update, or delete
const toAdd = [];
const toDelete = [];
yamlConfig.knowledge.forEach(item => {
if (!existingMap[item.source_path]) {
toAdd.push(item); // Not in Sharely, needs to be added
}
});
Object.keys(existingMap).forEach(sourcePath => {
if (!yamlConfig.knowledge.find(k => k.source_path === sourcePath)) {
toDelete.push(existingMap[sourcePath]); // In Sharely but not in YAML
}
});4. Apply Changes
Create new knowledge items and remove orphaned ones:
// Add missing knowledge
for (const item of toAdd) {
await createKnowledge(item);
}
// Remove orphaned knowledge
for (const item of toDelete) {
await deleteKnowledge(item.knowledgeId);
}5. Run Idempotently
The sync script can be run multiple times safely:
- Existing items aren't duplicated (matched by
source_path) - Deletes only happen for items truly missing from YAML
- Temporal workflows ensure operations can be restarted without corruption
Real-World Example: National Medical Education Association
The Challenge
The National Medical Education Association (NMEA) serves 50,000+ healthcare professionals with educational content spread across multiple systems:
- Azure Blob Storage - 500GB of medical research PDFs, clinical guidelines, and studies
- WordPress - News, blog posts, and member announcements
- Vimeo - 1,000+ hours of continuing medical education (CME) videos
- Podcast Platform - Audio interviews with medical experts and case studies
- External Links - PubMed articles, journal references, clinical trial databases
Problem: Members couldn't find relevant information across these siloed systems. Search was fragmented, and AI assistants couldn't provide unified insights.
The Solution
NMEA implemented Knowledge Synchronization to create a unified knowledge base in Sharely.ai:
- Created YAML source of truth defining all content across systems
- Built sync script running hourly via cron job
- Applied role-based access - students see basic content, professionals see everything
- Enabled AI agents to deliver insights from the entire knowledge corpus
Implementation
knowledge-config.yaml:
workspace_id: "nmea-workspace-uuid"
organization_id: "nmea-org-id"
knowledge:
# Azure Blob - Research PDFs
- source_path: "azure://research/diabetes-guidelines-2024.pdf"
title: "Diabetes Treatment Guidelines 2024"
type: "FILE"
azure_blob_url: "https://nmeastorage.blob.core.windows.net/research/diabetes-2024.pdf"
language: "en"
roles: ["medical-professionals"]
- source_path: "azure://research/cardiology-best-practices.pdf"
title: "Cardiology Best Practices Compendium"
type: "FILE"
azure_blob_url: "https://nmeastorage.blob.core.windows.net/research/cardiology.pdf"
language: "en"
roles: ["medical-professionals", "cardiologists"]
# WordPress - Blog content
- source_path: "wordpress://blog/ai-healthcare-2024"
title: "The Future of AI in Healthcare"
type: "LINK"
url: "https://nmea.org/blog/ai-healthcare-2024"
roles: ["all-members"]
# Vimeo - CME Videos
- source_path: "vimeo://cme/cardiology-fundamentals"
title: "CME: Cardiology Fundamentals (12 credits)"
type: "LINK"
url: "https://vimeo.com/nmea/cardiology-fundamentals"
roles: ["medical-professionals", "students"]
# Podcast - Audio content
- source_path: "podcast://expert-interviews/ep-42-immunology"
title: "Expert Interview: Advances in Immunology"
type: "LINK"
url: "https://nmea-podcasts.com/episodes/42"
roles: ["all-members"]
# External links - Journal articles
- source_path: "pubmed://article-12345678"
title: "Novel Approaches to Cancer Immunotherapy"
type: "LINK"
url: "https://pubmed.ncbi.nlm.nih.gov/12345678/"
roles: ["medical-professionals", "researchers"]sync-script.js:
const yaml = require('js-yaml');
const fs = require('fs');
const fetch = require('node-fetch');
const WORKSPACE_ID = process.env.SHARELY_WORKSPACE_ID;
const ORGANIZATION_ID = process.env.SHARELY_ORGANIZATION_ID;
const API_KEY = process.env.SHARELY_API_KEY;
async function syncKnowledge() {
console.log('Starting knowledge synchronization...');
// 1. Load YAML configuration
const config = yaml.load(fs.readFileSync('./knowledge-config.yaml', 'utf8'));
// 2. Get API token
const apiToken = await generateAPIToken();
// 3. Fetch existing knowledge
const existingKnowledge = await fetchAllKnowledge(apiToken);
// 4. Build index by source_path
const existingMap = {};
existingKnowledge.forEach(item => {
if (item.metadata?.source_path) {
existingMap[item.metadata.source_path] = item;
}
});
// 5. Reconcile: determine what to add and delete
const toAdd = [];
const toDelete = [];
config.knowledge.forEach(item => {
if (!existingMap[item.source_path]) {
toAdd.push(item);
}
});
Object.keys(existingMap).forEach(sourcePath => {
const inConfig = config.knowledge.find(k => k.source_path === sourcePath);
if (!inConfig) {
toDelete.push(existingMap[sourcePath]);
}
});
// 6. Apply changes
console.log(`Adding ${toAdd.length} new knowledge items...`);
for (const item of toAdd) {
await createKnowledgeItem(apiToken, item);
}
console.log(`Removing ${toDelete.length} orphaned knowledge items...`);
for (const item of toDelete) {
await deleteKnowledgeItem(apiToken, item.knowledgeId);
}
console.log('Synchronization complete!');
}
async function generateAPIToken() {
const response = await fetch(
`https://api.sharely.ai/workspaces/${WORKSPACE_ID}/generate-access-key-token`,
{
method: 'POST',
headers: {
'x-api-key': API_KEY,
'Content-Type': 'application/json'
}
}
);
const data = await response.json();
return data.token;
}
async function fetchAllKnowledge(apiToken) {
const allKnowledge = [];
let offset = 0;
const limit = 100;
while (true) {
const response = await fetch(
`https://api.sharely.ai/v1/workspaces/${WORKSPACE_ID}/knowledge?limit=${limit}&offset=${offset}`,
{
headers: {
'Authorization': `Bearer ${apiToken}`,
'Content-Type': 'application/json'
}
}
);
const data = await response.json();
allKnowledge.push(...data.items);
if (data.items.length < limit) break;
offset += limit;
}
return allKnowledge;
}
async function createKnowledgeItem(apiToken, item) {
const payload = {
type: item.type,
title: item.title,
language: item.language,
metadata: {
source_path: item.source_path
}
};
// Add URL for LINK types
if (item.type === 'LINK' && item.url) {
payload.url = item.url;
}
// Add file URL for FILE types (if syncing from Azure Blob or similar)
if (item.type === 'FILE' && item.azure_blob_url) {
payload.url = item.azure_blob_url;
}
const response = await fetch(
`https://api.sharely.ai/v1/workspaces/${WORKSPACE_ID}/knowledge`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${apiToken}`,
'organizationId': ORGANIZATION_ID,
'Content-Type': 'application/json'
},
body: JSON.stringify(payload)
}
);
const result = await response.json();
console.log(`Created: ${item.title} (${result.knowledgeId})`);
// Apply roles if specified
if (item.roles && item.roles.length > 0) {
await assignRoles(apiToken, result.knowledgeId, item.roles);
}
return result;
}
async function assignRoles(apiToken, knowledgeId, roleNames) {
// Note: This assumes roles already exist in workspace
// In production, you'd resolve role names to role IDs
const roleIds = await resolveRoleIds(apiToken, roleNames);
await fetch(
`https://api.sharely.ai/v1/workspaces/${WORKSPACE_ID}/knowledge/${knowledgeId}/role`,
{
method: 'POST',
headers: {
'Authorization': `Bearer ${apiToken}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ roleIds })
}
);
}
async function resolveRoleIds(apiToken, roleNames) {
// Simplified: In production, query roles API to map names to IDs
return roleNames; // Placeholder
}
async function deleteKnowledgeItem(apiToken, knowledgeId) {
await fetch(
`https://api.sharely.ai/v1/workspaces/${WORKSPACE_ID}/knowledge/${knowledgeId}`,
{
method: 'DELETE',
headers: {
'Authorization': `Bearer ${apiToken}`
}
}
);
console.log(`Deleted: ${knowledgeId}`);
}
// Run sync
syncKnowledge().catch(console.error);Result:
- ✅ All 5 knowledge sources unified in Sharely.ai
- ✅ AI agents now provide insights across entire corpus
- ✅ Members find information instantly, regardless of original source
- ✅ Role-based access ensures appropriate content visibility
- ✅ Hourly sync keeps knowledge current
- ✅ Idempotent operations mean restarts are safe
Key Concepts
Temporal Workflows & Idempotency
Why it matters: Knowledge synchronization involves multiple API calls, each of which could fail due to network issues, rate limits, or service disruptions.
Temporal workflows power many Sharely.ai API operations (especially role assignment and file processing), providing:
- Automatic retries - Failed operations retry automatically without manual intervention
- Durable execution - Workflows survive service restarts and crashes
- Idempotent operations - Running the same operation multiple times has no side effects
- Eventual consistency - Operations complete reliably, even if they take time
For sync scripts, this means:
- ✅ You can restart your sync script at any time without corrupting data
- ✅ Duplicate knowledge items won't be created (matched by
source_pathin metadata) - ✅ Role assignments eventually succeed even if they're queued
- ✅ Your sync job is bulletproof against transient failures
Best practice: Always include a unique identifier (like source_path) in knowledge metadata to enable idempotent reconciliation.
Source of Truth Pattern
The source of truth is a single, authoritative configuration that defines what should exist in Sharely.ai.
Why YAML?
- Declarative - Describes desired state, not imperative steps
- Version controlled - Track changes in Git, enable rollbacks
- Human-readable - Easy to audit and review
- Diff-friendly - See exactly what changed between versions
Pattern:
# This file IS the truth
# Sharely.ai should match this exactly
knowledge:
- source_path: "system-a/doc-1"
title: "Document 1"
- source_path: "system-b/doc-2"
title: "Document 2"Sync script responsibility:
- Read the YAML (source of truth)
- Read Sharely.ai (current state)
- Make Sharely.ai match the YAML
Benefits:
- ✅ Single source of truth for all knowledge
- ✅ Audit trail via Git history
- ✅ Rollback capability (revert YAML, re-sync)
- ✅ Clear separation: YAML = what, script = how
Reconciliation Cycle
Reconciliation is the process of making the current state (Sharely.ai) match the desired state (YAML configuration). The process compares what should exist according to your source of truth against what actually exists in Sharely.ai, then applies the necessary changes to bring them into alignment.
Three-step reconciliation:
- Add missing - Items in YAML but not in Sharely.ai → CREATE
- Remove orphaned - Items in Sharely.ai but not in YAML → DELETE
- Update changed - Items in both but with different metadata → UPDATE (optional)
Implementation:
// Build maps
const yamlMap = {};
config.knowledge.forEach(item => {
yamlMap[item.source_path] = item;
});
const sharelyMap = {};
existingKnowledge.forEach(item => {
if (item.metadata?.source_path) {
sharelyMap[item.metadata.source_path] = item;
}
});
// Reconcile
for (const sourcePath in yamlMap) {
if (!sharelyMap[sourcePath]) {
// ADD: In YAML but not Sharely
await createKnowledge(yamlMap[sourcePath]);
}
}
for (const sourcePath in sharelyMap) {
if (!yamlMap[sourcePath]) {
// DELETE: In Sharely but not YAML
await deleteKnowledge(sharelyMap[sourcePath].knowledgeId);
}
}CMS Integration Patterns
WordPress
Sync blog posts, pages, and media from WordPress:
knowledge:
- source_path: "wordpress://post-123"
title: "10 Healthcare Trends to Watch in 2024"
type: "LINK"
url: "https://nmea.org/blog/healthcare-trends-2024"
wordpress_post_id: 123
roles: ["all-members"]WordPress API integration:
// Fetch posts from WordPress REST API
const wpPosts = await fetch('https://nmea.org/wp-json/wp/v2/posts').then(r => r.json());
// Transform to YAML format
const knowledgeItems = wpPosts.map(post => ({
source_path: `wordpress://post-${post.id}`,
title: post.title.rendered,
type: 'LINK',
url: post.link,
wordpress_post_id: post.id,
roles: ['all-members']
}));Contentful
Sync structured content from Contentful headless CMS:
knowledge:
- source_path: "contentful://entry-abc123"
title: "Understanding Chronic Kidney Disease"
type: "LINK"
url: "https://nmea.org/conditions/chronic-kidney-disease"
contentful_entry_id: "abc123"
contentful_content_type: "medical-article"
roles: ["medical-professionals"]Contentful API integration:
const contentful = require('contentful');
const client = contentful.createClient({
space: process.env.CONTENTFUL_SPACE_ID,
accessToken: process.env.CONTENTFUL_ACCESS_TOKEN
});
// Fetch entries of specific content type
const entries = await client.getEntries({
content_type: 'medical-article'
});
// Transform to YAML format
const knowledgeItems = entries.items.map(entry => ({
source_path: `contentful://entry-${entry.sys.id}`,
title: entry.fields.title,
type: 'LINK',
url: `https://nmea.org/articles/${entry.fields.slug}`,
contentful_entry_id: entry.sys.id,
contentful_content_type: 'medical-article',
roles: ['medical-professionals']
}));Strapi
Sync content from Strapi open-source CMS:
knowledge:
- source_path: "strapi://research-papers/42"
title: "Advances in Cardiac Surgery Techniques"
type: "FILE"
url: "https://api.nmea.org/uploads/cardiac-surgery-2024.pdf"
strapi_content_type: "research-papers"
strapi_id: 42
roles: ["medical-professionals", "surgeons"]Strapi API integration:
// Fetch content from Strapi REST API
const strapiData = await fetch(
'https://api.nmea.org/api/research-papers?populate=*',
{
headers: {
'Authorization': `Bearer ${process.env.STRAPI_API_TOKEN}`
}
}
).then(r => r.json());
// Transform to YAML format
const knowledgeItems = strapiData.data.map(item => ({
source_path: `strapi://research-papers/${item.id}`,
title: item.attributes.title,
type: 'FILE',
url: `https://api.nmea.org${item.attributes.pdf.data.attributes.url}`,
strapi_content_type: 'research-papers',
strapi_id: item.id,
roles: item.attributes.roles || ['medical-professionals']
}));API Reference
Knowledge Management APIs
All APIs use the /v1/ prefix and require authentication via Bearer token.
Create Knowledge
POST /v1/workspaces/{workspaceId}/knowledge
Create a new knowledge item (file, link, or text).
Headers:
Authorization: Bearer {token}organizationId: {organizationId}Content-Type: application/json
Request body:
{
"type": "LINK",
"title": "Medical Research Article",
"url": "https://example.com/article",
"language": "en",
"metadata": {
"source_path": "wordpress://post-123",
"custom_field": "custom_value"
}
}Response:
{
"knowledgeId": "uuid-here",
"status": "BACKGROUND_START"
}Note: Large files process asynchronously. Check status via Knowledge API if needed.
Search/List Knowledge
GET /v1/workspaces/{workspaceId}/knowledge
List or search knowledge items with pagination.
Headers:
Authorization: Bearer {token}Content-Type: application/json
Query parameters:
limit- Number of items per page (default: 20, max: 100)offset- Pagination offset (default: 0)q- Semantic search query (optional)title- Title search (optional)
Response:
{
"items": [
{
"knowledgeId": "uuid-1",
"title": "Document Title",
"type": "LINK",
"metadata": {
"source_path": "wordpress://post-123"
}
}
],
"total": 150,
"limit": 100,
"offset": 0
}Delete Knowledge
DELETE /v1/workspaces/{workspaceId}/knowledge/{knowledgeId}
Delete a knowledge item.
Headers:
Authorization: Bearer {token}
Response: 204 No Content
Role Management APIs
Assign Roles to Knowledge
POST /v1/workspaces/{workspaceId}/knowledge/{knowledgeId}/role
Assign roles to a knowledge item for RBAC.
Headers:
Authorization: Bearer {token}Content-Type: application/json
Request body:
{
"roleIds": ["role-uuid-1", "role-uuid-2"]
}Note: This operation uses Temporal workflows and is eventually consistent.
List Roles on Knowledge
GET /v1/workspaces/{workspaceId}/knowledge/{knowledgeId}/role
Get all roles assigned to a knowledge item.
Headers:
Authorization: Bearer {token}
Response:
{
"roles": [
{
"roleId": "role-uuid-1",
"name": "medical-professionals"
}
]
}Remove Roles from Knowledge
DELETE /v1/workspaces/{workspaceId}/knowledge/{knowledgeId}/role
Remove role assignments from a knowledge item.
Headers:
Authorization: Bearer {token}Content-Type: application/json
Request body:
{
"roleIds": ["role-uuid-1"]
}Authentication
All API calls require a Bearer token generated via:
POST /workspaces/{workspaceId}/generate-access-key-token
Headers:
x-api-key: sk-sharely-your-api-keyContent-Type: application/json
Response:
{
"token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
"expiresIn": 86400
}Use this token in subsequent API calls as Authorization: Bearer {token}.
Best Practices
1. Use Pagination for Large Workspaces
Always paginate when fetching knowledge to avoid timeouts:
async function fetchAllKnowledge(apiToken) {
const allItems = [];
let offset = 0;
const limit = 100;
while (true) {
const response = await fetch(
`https://api.sharely.ai/v1/workspaces/${WORKSPACE_ID}/knowledge?limit=${limit}&offset=${offset}`,
{ headers: { 'Authorization': `Bearer ${apiToken}` } }
);
const data = await response.json();
allItems.push(...data.items);
if (data.items.length < limit) break; // No more items
offset += limit;
}
return allItems;
}2. Store Source Identifiers in Metadata
Always include a unique identifier in metadata to enable idempotent reconciliation:
{
"type": "LINK",
"title": "Article Title",
"metadata": {
"source_path": "wordpress://post-123", // Unique identifier
"source_system": "wordpress",
"last_synced": "2024-01-15T10:30:00Z"
}
}This prevents duplicate creation and enables safe deletion of orphaned items.
3. Handle Async Operations
Some operations (file uploads, role assignments) use Temporal workflows and complete asynchronously:
async function createKnowledge(item) {
const response = await fetch(/* ... */);
const result = await response.json();
if (result.status === 'BACKGROUND_START') {
console.log(`Processing started for ${item.title}, continuing...`);
// Don't wait - Temporal ensures eventual completion
}
return result;
}Key point: You don't need to poll for completion. Temporal workflows ensure the operation completes eventually, even if your script exits.
4. Version Control Your YAML Configuration
Store your knowledge configuration in Git:
git add knowledge-config.yaml
git commit -m "Add new research papers to knowledge sync"
git pushBenefits:
- Track changes over time
- Collaborate with team on knowledge structure
- Rollback if needed
- Audit trail of all modifications
5. Run Sync on Schedule
Use cron or a scheduler to keep knowledge current:
# Run every hour
0 * * * * cd /path/to/sync && node sync-script.js >> sync.log 2>&1Or use a workflow orchestration tool like Temporal, Airflow, or GitHub Actions.
6. Implement Error Handling
Always handle API errors gracefully:
async function createKnowledgeItem(apiToken, item) {
try {
const response = await fetch(/* ... */);
if (!response.ok) {
const error = await response.json();
console.error(`Failed to create ${item.title}:`, error);
return null; // Continue with other items
}
return await response.json();
} catch (err) {
console.error(`Network error creating ${item.title}:`, err);
return null;
}
}7. Apply Roles Consistently
If using RBAC, ensure roles are applied during sync:
knowledge:
- source_path: "azure://sensitive-data.pdf"
title: "Confidential Research"
roles: ["senior-researchers", "administrators"]if (item.roles && item.roles.length > 0) {
await assignRoles(apiToken, knowledgeId, item.roles);
}Scaling to a Synchronization Service
While the examples above demonstrate sync scripts suitable for cron jobs or scheduled tasks, production environments often require more sophisticated synchronization services that can handle updates at scale efficiently.
From Script to Service
Evolution path:
- Basic Script - Cron-based full reconciliation (good for < 10,000 items, hourly sync)
- Incremental Sync - Track last sync timestamp, only process changes (good for < 100,000 items)
- Event-Driven Service - Webhook-triggered updates, queue-based processing (production scale)
Architecture Considerations
When building a production-ready synchronization service, consider these architectural patterns:
1. Change Detection at Source
Instead of fetching all knowledge on every sync, detect what changed:
// Track last sync timestamp
const lastSync = await getLastSyncTimestamp();
// Query only changed items from source
const changedItems = await sourceSystem.getUpdatedSince(lastSync);
// Reconcile only the delta
await reconcileChanges(changedItems);Benefits:
- Reduces API calls to both source and Sharely.ai
- Faster sync cycles
- Lower infrastructure costs
Implementation approaches:
- Source system provides
updated_atormodified_sincefiltering - Maintain sync state database with per-item timestamps
- Use ETags or version numbers for change detection
2. Event-Driven Updates
Replace polling with webhooks from source systems:
// Webhook receiver
app.post('/webhooks/wordpress', async (req, res) => {
const { post_id, action } = req.body; // created, updated, deleted
// Queue job for processing
await queue.enqueue({
type: 'sync_item',
source: 'wordpress',
item_id: post_id,
action: action
});
res.status(200).send('Queued');
});Benefits:
- Real-time synchronization (seconds instead of hours)
- No unnecessary polling
- Lower latency for end users
Implementation requirements:
- Webhook endpoints for each source system
- Authentication and verification of webhook sources
- Queue for buffering high-volume updates
- Retry logic for failed webhook processing
3. Queue-Based Processing
Use message queues to handle large volumes:
// Producer: Add jobs to queue
await queue.enqueue({
type: 'sync_knowledge',
source_path: 'azure://research/doc.pdf',
action: 'create'
});
// Consumer: Worker processes jobs
queue.process('sync_knowledge', async (job) => {
const { source_path, action } = job.data;
if (action === 'create') {
await createKnowledgeItem(source_path);
} else if (action === 'delete') {
await deleteKnowledgeItem(source_path);
}
});Queue technologies:
- AWS SQS - Managed, serverless
- RabbitMQ - Self-hosted, feature-rich
- Apache Kafka - High-throughput, event streaming
- Redis Queue (Bull/BullMQ) - Simple, Node.js-friendly
Benefits:
- Parallel processing with multiple workers
- Automatic retries on failure
- Rate limiting and backpressure handling
- Visibility into processing status
4. Parallel Processing with Workers
Scale horizontally by running multiple sync workers:
// Main orchestrator
const chunks = chunkArray(allKnowledge, 100); // Process 100 items per worker
await Promise.all(
chunks.map(chunk =>
processChunk(chunk, workerId)
)
);
// Worker function
async function processChunk(items, workerId) {
console.log(`Worker ${workerId} processing ${items.length} items`);
for (const item of items) {
await syncKnowledgeItem(item);
}
}Scaling strategies:
- Partition knowledge by source system (WordPress worker, Azure worker, etc.)
- Partition by content type (videos, PDFs, links)
- Use worker pools with configurable concurrency
- Deploy workers as separate containers/pods for horizontal scaling
5. State Management
Track sync progress and health in a database:
CREATE TABLE sync_state (
id SERIAL PRIMARY KEY,
source_path VARCHAR(500) UNIQUE,
source_system VARCHAR(100),
last_synced_at TIMESTAMP,
sharely_knowledge_id UUID,
sync_status VARCHAR(50), -- 'synced', 'pending', 'failed'
error_message TEXT,
retry_count INTEGER DEFAULT 0
);Use cases:
- Resume failed syncs from where they stopped
- Monitor which items are out of sync
- Detect items that consistently fail
- Report on sync lag and health
Queries:
- Find items not synced in last 24 hours
- Identify items with repeated failures
- Calculate sync coverage percentage
6. Monitoring and Observability
Instrument your sync service for production:
// Metrics
metrics.increment('knowledge.synced', { source: 'wordpress' });
metrics.timing('sync.duration', duration, { source: 'wordpress' });
metrics.gauge('sync.lag_seconds', lagInSeconds);
// Logging
logger.info('Sync started', { source: 'wordpress', item_count: 150 });
logger.error('Sync failed', { source_path, error: err.message });
// Alerts
if (failureRate > 0.1) {
alerting.trigger('high_sync_failure_rate', { rate: failureRate });
}Key metrics to track:
- Sync lag (time between source update and Sharely.ai update)
- Success/failure rates
- Processing throughput (items/second)
- Queue depth
- API error rates
Implementation Patterns
Pattern 1: Temporal Workflow Orchestration
Leverage Temporal for durable, long-running sync workflows:
// Temporal workflow for full reconciliation
async function fullSyncWorkflow(workspaceId) {
// Fetch all source data
const sourceItems = await activities.fetchAllSourceData();
// Fetch all Sharely knowledge
const sharelyItems = await activities.fetchSharelyKnowledge(workspaceId);
// Reconcile (durable, survives restarts)
const toAdd = await activities.calculateDelta(sourceItems, sharelyItems);
// Process in batches (parallel activities)
for (const batch of chunk(toAdd, 50)) {
await activities.syncBatch(batch);
}
return { synced: toAdd.length };
}Benefits:
- Workflow state persisted automatically
- Survives service restarts
- Built-in retries and error handling
- Activity versioning for safe deployments
Pattern 2: Incremental Sync with Timestamps
Only sync what changed since last run:
async function incrementalSync() {
const lastRun = await db.getLastSyncTimestamp('wordpress');
// Fetch only items modified since last sync
const updatedPosts = await wordpress.getPostsModifiedSince(lastRun);
const deletedPosts = await wordpress.getDeletedPostsSince(lastRun);
// Sync updates
for (const post of updatedPosts) {
await syncWordPressPost(post);
}
// Remove deleted items
for (const post of deletedPosts) {
await removeKnowledgeBySourcePath(`wordpress://post-${post.id}`);
}
// Update last sync timestamp
await db.setLastSyncTimestamp('wordpress', Date.now());
}Pattern 3: Webhook + Queue Hybrid
Combine webhooks for real-time updates with scheduled full reconciliation:
// Webhook for real-time updates
app.post('/webhook/contentful', async (req, res) => {
await queue.add('sync-item', {
source: 'contentful',
entry_id: req.body.sys.id,
action: req.body.sys.type // created, updated, deleted
});
res.sendStatus(200);
});
// Daily full reconciliation (catch any missed webhooks)
cron.schedule('0 2 * * *', async () => {
await fullReconciliation('contentful');
});Best of both worlds:
- Real-time updates via webhooks (seconds latency)
- Full reconciliation catches missed events (eventual consistency)
- Resilient to webhook delivery failures
Coming Soon: Sample Sync Scripts
We're developing official sync scripts for popular platforms:
- WordPress - Sync posts, pages, and media
- Contentful - Sync structured content from headless CMS
- Strapi - Sync content from open-source CMS
- SharePoint - Sync documents and lists
- Notion - Sync pages and databases
Early access: Contact support@sharely.ai to join our beta program.
Related Documentation
APIs
- Knowledge API - Create, search, and manage knowledge
- Roles API - Create and manage RBAC roles
- Knowledge Roles API - Assign knowledge to roles
- Authentication - API authentication guide
Integration & Distribution
- Web Control - Embed AI agents in your applications
- Bring Your Own Agent - Deploy custom agents
- Platform Overview - Understand Sharely.ai architecture
Concepts
- Knowledge - How knowledge management works
- Roles - Understanding RBAC in Sharely.ai
- Workspace - Workspace management
Support
- Email: support@sharely.ai
- Documentation: https://docs.sharely.ai (opens in a new tab)
- API Reference: https://docs.sharely.ai/api-reference (opens in a new tab)
Next Steps
- Define your source systems - Identify where your knowledge currently lives
- Create YAML configuration - Define your desired knowledge state
- Build sync script - Use the examples above as templates
- Test reconciliation - Run in test workspace first
- Schedule regular sync - Keep knowledge current automatically
- Monitor and maintain - Track sync logs and adjust as needed