Antarys

|

Antarys

Node.js Client

Vector Operations

Complete guide to vector CRUD operations in Antarys - Create, Read, Update, and Delete vectors with advanced querying capabilities using TypeScript/Node.js.

Vector Operations

Comprehensive guide to performing Create, Read, Update, and Delete (CRUD) operations on vectors in Antarys using TypeScript/Node.js.

Getting Started: Make sure you have a collection created and a client initialized before performing vector operations.

Overview

The VectorOperations class provides high-performance interfaces for all vector operations with built-in optimizations including:

  • Automatic dimension validation
  • Worker thread parallelization for batch processing
  • Client-side caching for queries
  • HTTP/2 connection pooling and retry logic
  • Memory-efficient processing with buffer pools

Getting Vector Operations Interface

import { createClient } from 'antarys';

// Initialize client and get vector operations interface
const client = createClient('http://localhost:8080');
const vectors = client.vectorOperations('my_collection');

// Alternative method with explicit typing
const vectors: VectorOperations = client.vectorOperations('my_collection');

Upsert Operations

Single Vector Upsert

// Insert or update a single vector
await vectors.upsert([
    {
        id: 'vector_1',
        values: [0.1, 0.2, 0.3, 0.4],  // Your vector data
        metadata: { category: 'example' }
    }
]);
// Upsert with rich metadata
await vectors.upsert([
    {
        id: 'doc_123',
        values: Array(1536).fill(0.1),  // OpenAI embedding dimensions
        metadata: {
            title: 'Machine Learning Basics',
            author: 'Data Scientist',
            category: 'education',
            tags: ['ml', 'ai', 'tutorial'],
            createdAt: '2025-01-15',
            sourceUrl: 'https://example.com/ml-basics'
        }
    }
]);
// Upsert with automatic dimension validation
const result = await vectors.upsert(
    [
        {
            id: 'validated_vector',
            values: Array(512).fill(0.1),  // Must match collection dimensions
            metadata: { validated: true }
        }
    ],
    {
        validateDimensions: true  // Enable validation
    }
);

console.log(`Upserted ${result.upserted_count} vectors`);

Batch Upsert Operations

Performance Tip: Use batch operations for inserting large amounts of data to maximize throughput and minimize network overhead.

Prepare Batch Data

// Prepare large batch of vectors
const batchVectors = [];
for (let i = 0; i < 10000; i++) {
    batchVectors.push({
        id: `batch_vector_${i}`,
        values: Array.from({ length: 512 }, () => Math.random()),
        metadata: {
            batchId: Math.floor(i / 1000),  // Group by thousands
            createdAt: Date.now(),
            category: `category_${i % 5}`
        }
    });
}

Optimized Batch Upsert

// High-performance batch upsert
const result = await vectors.upsert(batchVectors, {
    batchSize: 1000,          // Optimal batch size
    showProgress: true,       // Show progress bar
    parallelWorkers: 8,       // Parallel processing
    validateDimensions: true  // Ensure data quality
});

console.log(`Successfully upserted ${result.upserted_count} vectors`);

Memory Management: For very large datasets (100k+ vectors), consider processing in chunks to manage memory usage effectively.

Advanced Batch Configuration

// Fine-tuned batch upsert for maximum performance
const result = await vectors.upsert(largeDataset, {
    batchSize: 5000,          // Larger batches for network efficiency
    parallelWorkers: 16,      // More workers for CPU-bound tasks
    validateDimensions: true,
    showProgress: true
});

// Monitor performance
const elapsedTime = performance.now() - startTime;
console.log(`Upsert rate: ${result.upserted_count / (elapsedTime / 1000):.2f} vectors/sec`);

Vector Format Requirements

Data Format: Vectors can be provided with either values or vector field names for compatibility.

// Both formats are supported
const vectorWithValues = {
    id: 'vec1',
    values: [0.1, 0.2, 0.3],  // Standard format
    metadata: { type: 'standard' }
};

const vectorWithVector = {
    id: 'vec2', 
    vector: [0.1, 0.2, 0.3],  // Alternative format
    metadata: { type: 'alternative' }
};

// Both work with the same upsert call
await vectors.upsert([vectorWithValues, vectorWithVector]);

Query Operations

// Basic similarity search
const results = await vectors.query({
    vector: [0.1, 0.2, 0.3, 0.4],  // Query vector
    topK: 5,                       // Return top 5 matches
    includeMetadata: true          // Include metadata in results
});

// Process results
for (const match of results.matches) {
    console.log(`ID: ${match.id}, Score: ${match.score.toFixed(4)}`);
    if (match.metadata) {
        console.log(`  Metadata:`, match.metadata);
    }
}
// Query with metadata filtering
const results = await vectors.query({
    vector: queryVector,
    topK: 10,
    includeMetadata: true,
    filter: {
        category: 'education',  // Filter by category
        author: 'Data Scientist'  // Multiple filters
    },
    threshold: 0.7  // Only results above 70% similarity
});

console.log(`Found ${results.matches.length} filtered matches`);
// Advanced query with HNSW parameters
const results = await vectors.query({
    vector: queryVector,
    topK: 20,
    includeValues: false,     // Exclude vectors for faster response
    includeMetadata: true,
    useAnn: true,            // Use approximate nearest neighbors
    efSearch: 200,           // Higher accuracy (vs speed)
    threshold: 0.5,          // Similarity threshold
    skipCache: false         // Use cache if available
});

// Results include similarity scores
for (const match of results.matches) {
    const similarity = match.score;
    console.log(`Vector ${match.id}: ${similarity.toFixed(3)} similarity`);
}

Batch Query Operations

Batch Queries: Process multiple query vectors in parallel for maximum efficiency.

// Prepare multiple query vectors
const queryVectors = [
    Array(512).fill(0.1),  // Query 1
    Array(512).fill(0.2),  // Query 2
    Array(512).fill(0.3),  // Query 3
];

// Batch query for parallel processing
const batchResults = await vectors.batchQuery(
    queryVectors.map(vector => ({
        vector,
        topK: 5,
        includeMetadata: true
    }))
);

// Process batch results
for (let i = 0; i < batchResults.results.length; i++) {
    console.log(`\nQuery ${i + 1} results:`);
    for (const match of batchResults.results[i].matches) {
        console.log(`  ${match.id}: ${match.score.toFixed(3)}`);
    }
}

Query Performance Optimization

Approximate Nearest Neighbors

const results = await vectors.query({
    vector: queryVector,
    useAnn: true,        // Enable HNSW
    efSearch: 200        // Quality vs speed
});

Client-side Result Caching

// First query - cache miss
const results1 = await vectors.query({
    vector: queryVector,
    topK: 5
});

// Second identical query - cache hit
const results2 = await vectors.query({
    vector: queryVector,
    topK: 5
});

// Check cache performance
const stats = vectors.getCacheStats();
console.log(`Cache hit rate: ${(stats.hitRate * 100).toFixed(2)}%`);

Metadata Filtering

// Efficient pre-filtering
const results = await vectors.query({
    vector: queryVector,
    filter: { category: 'active' },
    topK: 10
});

Query Result Format

// Query results structure
interface QueryResults {
    matches: Array<{
        id: string;
        score: number;                    // Similarity score
        values?: number[];                // Optional: vector values
        metadata?: Record<string, any>;   // Optional: metadata
    }>;
}

Delete Operations

Delete by IDs

// Delete a single vector
const result = await vectors.deleteVectors(['vector_123']);

console.log(`Deleted: ${result.deleted?.length || 0}`);
console.log(`Failed: ${result.failed?.length || 0}`);
// Delete multiple vectors
const vectorIds = Array.from({ length: 100 }, (_, i) => `vector_${i + 100}`);

const result = await vectors.deleteVectors(vectorIds);

console.log(`Successfully deleted ${result.deleted?.length || 0} vectors`);
if (result.failed?.length) {
    console.log(`Failed to delete ${result.failed.length} vectors`);
}
// Delete vectors based on query results
// First, find vectors to delete
const results = await vectors.query({
    vector: referenceVector,
    filter: { status: 'deprecated' },
    topK: 1000  // Get up to 1000 deprecated vectors
});

// Extract IDs and delete
const idsToDelete = results.matches.map(match => match.id);
if (idsToDelete.length > 0) {
    const deleteResult = await vectors.deleteVectors(idsToDelete);
    console.log(`Deleted ${deleteResult.deleted?.length || 0} deprecated vectors`);
}

Delete Performance

Cache Invalidation: Deleting vectors automatically invalidates relevant cache entries to maintain consistency.

// Efficient bulk deletion
const largeDeleteBatch = Array.from({ length: 10000 }, (_, i) => `temp_vector_${i}`);

// Monitor deletion performance
const startTime = performance.now();
const result = await vectors.deleteVectors(largeDeleteBatch);
const endTime = performance.now();

const deleteRate = (result.deleted?.length || 0) / ((endTime - startTime) / 1000);
console.log(`Deletion rate: ${deleteRate.toFixed(2)} vectors/sec`);

Utility Operations

Vector Retrieval

// Get specific vector by ID
const vectorData = await vectors.getVector('vector_123');

if (vectorData) {
    console.log(`Vector ID: ${vectorData.id}`);
    console.log(`Vector values: ${vectorData.values?.slice(0, 5)}...`);  // First 5 values
    console.log(`Metadata:`, vectorData.metadata || {});
} else {
    console.log('Vector not found');
}

Collection Statistics

// Get vector count
const totalVectors = await vectors.countVectors();
console.log(`Total vectors in collection: ${totalVectors}`);

// Get collection dimensions
const dimensions = await vectors.getCollectionDimensions();
console.log(`Collection dimensions: ${dimensions}`);

Dimension Validation

Validate Single Vector

// Validate vector dimensions
const testVector = Array(512).fill(0.1);
const isValid = await vectors.validateVectorDimensions(testVector);

if (isValid) {
    console.log('Vector dimensions are correct');
} else {
    const expected = await vectors.getCollectionDimensions();
    console.log(`Invalid dimensions. Expected: ${expected}, Got: ${testVector.length}`);
}

Batch Validation

// Validate batch of vectors before upsert
const vectorsToValidate = [
    { id: 'v1', values: Array(512).fill(0.1) },
    { id: 'v2', values: Array(512).fill(0.2) },
    { id: 'v3', values: Array(256).fill(0.3) },  // Wrong dimensions
];

try {
    await vectors.upsert(vectorsToValidate, {
        validateDimensions: true  // Will catch dimension errors
    });
} catch (error) {
    console.log(`Validation error: ${error.message}`);
}

Cache Management

// Get cache performance statistics
const cacheStats = vectors.getCacheStats();

if (cacheStats.enabled) {
    console.log(`Cache hit rate: ${(cacheStats.hitRate * 100).toFixed(2)}%`);
    console.log(`Cache size: ${cacheStats.cacheSize} entries`);
    console.log(`Total hits: ${cacheStats.hits}`);
    console.log(`Total misses: ${cacheStats.misses}`);
} else {
    console.log('Caching is disabled');
}

// Clear cache if needed
await vectors.clearCache();
console.log('Cache cleared');

Advanced Patterns

Streaming Upsert

For very large datasets, implement streaming upsert:

async function streamUpsert(
    vectors: VectorOperations, 
    dataGenerator: AsyncGenerator<VectorRecord>, 
    batchSize = 1000
): Promise<number> {
    let batch: VectorRecord[] = [];
    let totalProcessed = 0;
    
    for await (const vectorData of dataGenerator) {
        batch.push(vectorData);
        
        if (batch.length >= batchSize) {
            const result = await vectors.upsert(batch, {
                validateDimensions: true,
                showProgress: true
            });
            totalProcessed += result.upserted_count;
            batch = [];  // Reset batch
            
            // Optional: yield control to prevent blocking
            await new Promise(resolve => setTimeout(resolve, 10));
        }
    }
    
    // Process final batch
    if (batch.length > 0) {
        const result = await vectors.upsert(batch, {
            validateDimensions: true
        });
        totalProcessed += result.upserted_count;
    }
    
    return totalProcessed;
}

// Usage
async function* dataGenerator(): AsyncGenerator<VectorRecord> {
    for (let i = 0; i < 100000; i++) {
        yield {
            id: `stream_vector_${i}`,
            values: Array.from({ length: 512 }, () => Math.random()),
            metadata: { batch: Math.floor(i / 1000) }
        };
    }
}

const total = await streamUpsert(vectors, dataGenerator());
console.log(`Streamed ${total} vectors`);

Similarity Search with Re-ranking

interface HybridSearchOptions {
    vectorWeight: number;
    textWeight: number;
    topK: number;
}

async function hybridSearch(
    vectors: VectorOperations, 
    queryVector: number[], 
    textQuery: string, 
    options: HybridSearchOptions = { vectorWeight: 0.7, textWeight: 0.3, topK: 10 }
): Promise<any> {
    const { vectorWeight, textWeight, topK } = options;
    
    // First stage: Vector similarity search
    const initialResults = await vectors.query({
        vector: queryVector,
        topK: topK * 3,  // Get more candidates
        includeMetadata: true,
        useAnn: true
    });
    
    // Second stage: Re-rank based on text similarity
    const candidates = [];
    for (const match of initialResults.matches) {
        const textScore = calculateTextSimilarity(
            textQuery, 
            match.metadata?.text || ''
        );
        
        // Combine scores
        const combinedScore = vectorWeight * match.score + textWeight * textScore;
        candidates.push({
            ...match,
            combinedScore
        });
    }
    
    // Sort by combined score and return topK
    candidates.sort((a, b) => b.combinedScore - a.combinedScore);
    return { matches: candidates.slice(0, topK) };
}

function calculateTextSimilarity(query: string, text: string): number {
    // Simple text similarity calculation
    // This is a placeholder - implement your text similarity logic here
    return Math.random() * 0.5;
}

Error Handling and Retry Logic

interface RetryOptions {
    maxRetries: number;
    backoffFactor: number;
}

async function robustUpsert(
    vectors: VectorOperations, 
    vectorData: VectorRecord[], 
    options: RetryOptions = { maxRetries: 3, backoffFactor: 2 }
): Promise<any> {
    const { maxRetries, backoffFactor } = options;
    
    for (let attempt = 0; attempt < maxRetries; attempt++) {
        try {
            const result = await vectors.upsert(vectorData, {
                validateDimensions: true,
                showProgress: true
            });
            return result;
            
        } catch (error: any) {
            if (error.message.includes('dimension')) {
                // Dimension validation errors - don't retry
                console.log(`Validation error: ${error.message}`);
                throw error;
            }
            
            if (attempt < maxRetries - 1) {
                const waitTime = Math.pow(backoffFactor, attempt) * 1000;
                console.log(`Attempt ${attempt + 1} failed: ${error.message}`);
                console.log(`Retrying in ${waitTime}ms...`);
                await new Promise(resolve => setTimeout(resolve, waitTime));
            } else {
                console.log(`All ${maxRetries} attempts failed`);
                throw error;
            }
        }
    }
}

// Usage
try {
    const result = await robustUpsert(vectors, myVectors);
    console.log(`Successfully upserted ${result.upserted_count} vectors`);
} catch (error) {
    console.log(`Upsert failed permanently: ${error.message}`);
}

Resource Cleanup

Always properly clean up resources:

// Clean up vector operations
await vectors.clearCache();

// Clean up client
await client.close();

Best Practice: Use try/finally blocks or proper async cleanup to ensure resources are freed even if errors occur.

async function safeVectorOperations() {
    const client = createClient('http://localhost:8080');
    const vectors = client.vectorOperations('my_collection');
    
    try {
        // Your vector operations here
        await vectors.upsert(data);
        const results = await vectors.query({ vector: queryVector, topK: 10 });
        
    } finally {
        // Always cleanup
        await vectors.clearCache();
        await client.close();
    }
}