AI SDK (@qnsp/ai-sdk)
TypeScript client for ai-orchestrator. Model artifacts and inference data are encrypted with tenant-specific PQC algorithms based on crypto policy.
AI SDK (@qnsp/ai-sdk)
TypeScript client for ai-orchestrator. Model artifacts and inference data are encrypted with tenant-specific PQC algorithms based on crypto policy.
Install
pnpm install @qnsp/ai-sdk
Create a client
import { AiOrchestratorClient } from "@qnsp/ai-sdk";
const ai = new AiOrchestratorClient({
baseUrl: "http://localhost:8094",
token: "<access_token>",
tier: "enterprise-pro", // Optional tier check
});
Register Artifacts
const artifact = await ai.registerArtifact({
tenantId: "<tenant_uuid>",
name: "llama-7b-fine-tuned",
type: "model",
sizeBytes: 14000000000,
checksumSha3: "<sha3_hash>",
metadata: {
framework: "pytorch",
version: "2.0",
},
});
Submit Workloads
const workload = await ai.submitWorkload({
tenantId: "<tenant_uuid>",
name: "inference-job",
priority: "high",
schedulingPolicy: "on-demand",
containerImage: "qnsp/inference-runtime:latest",
command: ["python", "-m", "inference"],
env: {
MODEL_PATH: "/models/llama-7b",
},
resources: {
cpu: 4,
memoryMb: 16384,
gpu: 1,
},
artifacts: [
{
artifactId: "<artifact_uuid>",
mountPath: "/models",
accessMode: "read",
},
],
idempotencyKey: "<unique_key>", // Optional
});
Deploy Models
const deployment = await ai.deployModel({
tenantId: "<tenant_uuid>",
modelName: "llama-7b-fine-tuned",
artifactId: "<artifact_uuid>",
runtimeImage: "qnsp/inference-runtime:latest",
manifest: {
version: "1.0.0",
framework: "pytorch",
inputSchema: { /* ... */ },
outputSchema: { /* ... */ },
},
resources: {
cpu: 4,
memoryMb: 16384,
gpu: 1,
},
priority: "normal",
schedulingPolicy: "on-demand",
});
Invoke Inference
const result = await ai.invokeInference({
tenantId: "<tenant_uuid>",
modelDeploymentId: "<deployment_uuid>",
input: {
prompt: "Explain quantum computing",
maxTokens: 500,
},
priority: "normal",
});
console.log(result.output);
Stream Inference Events
for await (const event of ai.streamInferenceEvents("<workload_uuid>")) {
switch (event.type) {
case "token":
process.stdout.write(event.token);
break;
case "complete":
console.log("\nDone:", event.usage);
break;
case "error":
console.error("Error:", event.message);
break;
}
}
Manage Workloads
// Get workload status
const workload = await ai.getWorkload("<workload_uuid>");
console.log(workload.status); // "pending" | "running" | "completed" | "failed"
// List workloads
const { items, nextCursor } = await ai.listWorkloads({
tenantId: "<tenant_uuid>",
status: "running",
limit: 20,
});
// Cancel workload
await ai.cancelWorkload({
workloadId: "<workload_uuid>",
reason: "No longer needed",
});
Tier Access
The SDK validates tier access for premium features:
import { AiOrchestratorClient, TierError } from "@qnsp/ai-sdk";
try {
const ai = new AiOrchestratorClient({
baseUrl: "http://localhost:8094",
token: "<token>",
tier: "dev-starter",
});
// Training requires enterprise-pro tier
await ai.submitWorkload({
name: "fine-tuning-job",
// ...
});
} catch (error) {
if (error instanceof TierError) {
console.log("Training requires enterprise-pro tier");
}
}
Model Registry
Register, version, and manage AI models throughout their lifecycle:
// Register a new model
const model = await ai.registerModel({
tenantId: "<tenant_uuid>",
name: "customer-churn-predictor",
version: "1.0.0",
provider: "pytorch",
modelType: "classification",
servingConfig: {
framework: "pytorch",
runtimeImage: "qnsp/pytorch-serve:2.0",
minInstances: 1,
maxInstances: 10,
},
tags: ["production", "ml-ops"],
});
// List models with filters
const { models, total } = await ai.listModels({
tenantId: "<tenant_uuid>",
status: "active",
provider: "pytorch",
limit: 50,
});
// Get model details
const modelDetails = await ai.getModel("<model_uuid>");
// Activate a model for production use
await ai.activateModel("<model_uuid>", {
activationNote: "Passed all validation tests",
});
// Deprecate a model
await ai.deprecateModel("<model_uuid>", {
deprecationReason: "Superseded by v2.0",
replacementModelId: "<new_model_uuid>",
sunsetDate: "2026-06-01",
});
Model Deployments
Deploy models to specific environments:
// Create a deployment
const deployment = await ai.createDeployment({
tenantId: "<tenant_uuid>",
modelId: "<model_uuid>",
name: "churn-predictor-prod",
environment: "production",
resources: {
cpu: 4,
memoryMb: 16384,
gpu: 1,
},
autoscaling: {
minReplicas: 2,
maxReplicas: 20,
targetCpuUtilization: 70,
scaleDownDelaySeconds: 300,
},
});
// List deployments
const { deployments } = await ai.listDeployments({
tenantId: "<tenant_uuid>",
environment: "production",
status: "running",
});
// Get deployment details
const deploymentDetails = await ai.getDeployment("<deployment_uuid>");
// Stop a deployment
await ai.stopDeployment("<deployment_uuid>", {
reason: "Scheduled maintenance",
});
Cost Optimization
Monitor and optimize AI inference costs:
// Get cost summary
const summary = await ai.getCostSummary({ tenantId: "<tenant_uuid>" });
console.log(summary);
// {
// currentPeriodCost: 1250.50,
// previousPeriodCost: 1180.25,
// projectedMonthlyCost: 2500.00,
// costTrend: "increasing",
// topCostDrivers: [...]
// }
// Get detailed cost analytics
const analytics = await ai.getCostAnalytics({
tenantId: "<tenant_uuid>",
startDate: "2026-03-01",
endDate: "2026-03-20",
groupBy: "model",
});
// Create a budget
const budget = await ai.createBudget({
tenantId: "<tenant_uuid>",
name: "Q1 AI Inference Budget",
budgetType: "monthly",
amount: 5000,
alertThresholds: [50, 75, 90, 100],
notificationChannels: ["email", "slack"],
});
// List budgets
const { budgets } = await ai.listBudgets({ tenantId: "<tenant_uuid>" });
// Get cost alerts
const { alerts } = await ai.getCostAlerts({
tenantId: "<tenant_uuid>",
severity: "high",
acknowledged: false,
});
// Acknowledge an alert
await ai.acknowledgeCostAlert("<alert_uuid>");
// Get optimization recommendations
const { recommendations } = await ai.getOptimizationRecommendations({
tenantId: "<tenant_uuid>",
status: "pending",
});
// Accept a recommendation
await ai.acceptRecommendation("<recommendation_uuid>", {
implementationNote: "Applying during next maintenance window",
});
Bias Monitoring
Detect and track bias in AI model outputs:
// Create a bias evaluation
const evaluation = await ai.createBiasEvaluation({
tenantId: "<tenant_uuid>",
modelId: "<model_uuid>",
name: "Q1 Fairness Audit",
evaluationType: "demographic_parity",
protectedAttributes: ["gender", "age_group", "ethnicity"],
targetMetrics: ["accuracy", "false_positive_rate", "false_negative_rate"],
});
// Start the evaluation
await ai.startEvaluation("<evaluation_uuid>");
// List evaluations
const { evaluations } = await ai.listEvaluations({
tenantId: "<tenant_uuid>",
status: "completed",
});
// Get evaluation details
const evaluationDetails = await ai.getEvaluation("<evaluation_uuid>");
// Get bias incidents
const { incidents } = await ai.getBiasIncidents({
tenantId: "<tenant_uuid>",
severity: "high",
status: "open",
});
// Record a bias incident manually
await ai.recordBiasIncident({
tenantId: "<tenant_uuid>",
modelId: "<model_uuid>",
incidentType: "disparate_impact",
severity: "medium",
description: "Higher rejection rate for age group 50+",
affectedAttribute: "age_group",
affectedGroup: "50+",
});
// Get fairness metrics
const { metrics } = await ai.getFairnessMetrics({
tenantId: "<tenant_uuid>",
modelId: "<model_uuid>",
});
// Get bias monitoring summary
const biasSummary = await ai.getBiasSummary({ tenantId: "<tenant_uuid>" });
Prompt Injection Detection
Detect and prevent prompt injection attacks:
// Create a detection pattern
const pattern = await ai.createDetectionPattern({
tenantId: "<tenant_uuid>",
name: "System Prompt Override",
patternType: "regex",
pattern: "ignore\\s+(previous|above|all)\\s+instructions",
attackCategory: "instruction_override",
severity: "critical",
enabled: true,
});
// List patterns
const { patterns } = await ai.listPatterns({
tenantId: "<tenant_uuid>",
enabled: true,
});
// Get pattern details
const patternDetails = await ai.getPattern("<pattern_uuid>");
// Delete a pattern
await ai.deletePattern("<pattern_uuid>");
// Get injection incidents
const { incidents: injectionIncidents } = await ai.getInjectionIncidents({
tenantId: "<tenant_uuid>",
severity: "critical",
since: "2026-03-01",
});
// Get injection statistics
const stats = await ai.getInjectionStats({
tenantId: "<tenant_uuid>",
startDate: "2026-03-01",
endDate: "2026-03-20",
groupBy: "attack_category",
});
// Get injection summary
const injectionSummary = await ai.getInjectionSummary({ tenantId: "<tenant_uuid>" });
// Configure detection settings
await ai.configureDetection({
tenantId: "<tenant_uuid>",
detectionMode: "enforce",
sensitivityLevel: "high",
blockOnDetection: true,
logAllAttempts: true,
});
Key APIs
Artifacts
AiOrchestratorClient.registerArtifact(input)- Register model artifact
Workloads
AiOrchestratorClient.submitWorkload(input)- Submit workloadAiOrchestratorClient.deployModel(request)- Deploy model (legacy)AiOrchestratorClient.getWorkload(workloadId)- Get statusAiOrchestratorClient.listWorkloads(params?)- List workloadsAiOrchestratorClient.cancelWorkload(input)- Cancel workload
Inference
AiOrchestratorClient.invokeInference(request)- Invoke inferenceAiOrchestratorClient.streamInferenceEvents(workloadId)- Stream events
Model Registry
AiOrchestratorClient.registerModel(request)- Register a new modelAiOrchestratorClient.listModels(params?)- List models with filtersAiOrchestratorClient.getModel(modelId)- Get model detailsAiOrchestratorClient.updateModel(modelId, request)- Update modelAiOrchestratorClient.activateModel(modelId, request)- Activate modelAiOrchestratorClient.deprecateModel(modelId, request)- Deprecate modelAiOrchestratorClient.createDeployment(request)- Create deploymentAiOrchestratorClient.listDeployments(params?)- List deploymentsAiOrchestratorClient.getDeployment(deploymentId)- Get deploymentAiOrchestratorClient.stopDeployment(deploymentId, request)- Stop deployment
Cost Optimization
AiOrchestratorClient.getCostSummary(params)- Get cost summaryAiOrchestratorClient.getCostAnalytics(params)- Get cost analyticsAiOrchestratorClient.createBudget(request)- Create budgetAiOrchestratorClient.listBudgets(params?)- List budgetsAiOrchestratorClient.getBudget(budgetId)- Get budgetAiOrchestratorClient.deleteBudget(budgetId)- Delete budgetAiOrchestratorClient.getCostAlerts(params?)- Get cost alertsAiOrchestratorClient.acknowledgeCostAlert(alertId)- Acknowledge alertAiOrchestratorClient.getOptimizationRecommendations(params?)- Get recommendationsAiOrchestratorClient.acceptRecommendation(recommendationId, request)- Accept recommendation
Bias Monitoring
AiOrchestratorClient.createBiasEvaluation(request)- Create evaluationAiOrchestratorClient.startEvaluation(evaluationId)- Start evaluationAiOrchestratorClient.listEvaluations(params?)- List evaluationsAiOrchestratorClient.getEvaluation(evaluationId)- Get evaluationAiOrchestratorClient.getBiasIncidents(params?)- Get incidentsAiOrchestratorClient.recordBiasIncident(request)- Record incidentAiOrchestratorClient.getFairnessMetrics(params)- Get fairness metricsAiOrchestratorClient.getBiasSummary(params)- Get bias summary
Prompt Injection
AiOrchestratorClient.createDetectionPattern(request)- Create patternAiOrchestratorClient.listPatterns(params?)- List patternsAiOrchestratorClient.getPattern(patternId)- Get patternAiOrchestratorClient.deletePattern(patternId)- Delete patternAiOrchestratorClient.getInjectionIncidents(params?)- Get incidentsAiOrchestratorClient.getInjectionStats(params)- Get statisticsAiOrchestratorClient.getInjectionSummary(params)- Get summaryAiOrchestratorClient.configureDetection(request)- Configure detection
Types
SubmitWorkloadRequest- Workload configurationModelDeploymentRequest- Model deployment configInferenceRequest- Inference inputInferenceStreamEvent- Streaming eventTierError- Tier access errorModel- Model entityDeployment- Deployment entityCostSummary- Cost summary dataBudget- Budget configurationBiasEvaluation- Bias evaluationBiasIncident- Bias incidentDetectionPattern- Injection detection patternInjectionIncident- Injection incident