FAQ

How do updates get pushed to customers without a centralized config manager?

In the Instalily ecosystem, we rely on weekly SDK releases rather than a single global configuration service. Each customer environment pins a specific SDK version—think of it like choosing a particular “snapshot” of the platform. When we publish a new release, customers can update on their own schedule by pulling the new SDK version into their local or containerized setup. To handle the details of different clouds or tenant-specific settings, we use “environment overlays.” An environment overlay is a lightweight file or module (often YAML or Python-based) that injects the correct endpoints, credentials, and runtime configuration for a given cloud or tenant. For example, if one tenant runs on AWS (using Amazon OpenSearch and ElastiCache) and another on Azure (using Cognitive Search and Redis), the same Instalily SDK is used in both cases; each environment overlay simply tells the SDK which resources to connect to. This approach keeps the codebase uniform while allowing you to easily push updates and custom settings—without relying on a large, centralized config manager.

How do you handle different cloud providers (search, load balancing, caching) without a config manager?

Rather than a universal config manager, Instalily uses a pluggable architecture and environment-specific overlays. For instance, if one tenant needs to run on AWS (using Amazon OpenSearch, ELB, and ElastiCache) and another is on Azure (using Cognitive Search, Azure Load Balancer, and Redis), each environment references the necessary cloud resources via a small YAML or Python-based overlay. The codebase remains the same, while each overlay injects the correct service endpoints and credentials at runtime.

You mention weekly releases of the SDK. Is it internal-only or documented for customers?

Although the SDK originated as an internal set of libraries for our own teams, we are in the process of completing documentation and release notes to customers who need self-hosted or more hands-on integrations. This documentation details how to install, configure, and customize each module—whether for multi-agent orchestration, data connectors, or advanced policy checks.

How do you handle large-scale concurrency and potential latency issues?

Instalily leverages Cloud Run to dynamically scale our API layer: when traffic surges to hundreds or thousands of concurrent requests, additional container instances are spun up automatically. We integrate Kafka for asynchronous, queue-based workflows, distributing tasks among multiple workers so that no single node is overwhelmed. In parallel, we rely on Datadog for centralized logging and real-time metrics—this ensures prompt alerts if latency, throughput, or resource usage deviates from the norm. To further reduce load, we employ caching layers (Redis) and carefully tuned autoscaling thresholds. Overall, this blend of Cloud Run’s autoscaling, Kafka’s event-driven processing, Datadog’s monitoring, and strategic caching provides robust support for high-concurrency scenarios while keeping latency in check.

How is knowledge stored today? Does each customer have a dedicated knowledge base?

Yes. Each customer has logically (and often physically) separated data indexes to ensure privacy and compliance. Typically, we store embeddings or domain documents in a vector database (e.g., AlloyDB or Azure Cognitive Search) as well as store structured data in a relational DB. By default, no cross-tenant data sharing occurs, which preserves strong data isolation.

Is there a way to determine relevancy of data for user queries, or do you do RAG on the entire dataset each time?

Our retrieval-augmented generation (RAG) approach typically filters within a single customer’s dataset based on metadata, user roles, or domain constraints. Our approach is guided by the Task Manager Agent, which infers the correct data source based on the user’s query. We don’t blindly search the entire dataset; instead, for more deterministic or structured questions, we apply filter-based methods (e.g., restricting by domain, date range, or user role). Once the relevant subset of documents or records is identified, we embed that subset and feed it to the LLM—reducing unnecessary overhead and minimizing irrelevant or unauthorized data. For especially large knowledge sets, we also employ indexing or partitioning to keep queries efficient and the user experience fluid.

Is there any shared reinforcement learning across customers, or are agents improved only per tenant?

Currently, each tenant’s agent system is isolated to respect data privacy and compliance. While we’re exploring global improvements to the agent framework (e.g., better prompts, generic heuristics), we do not mix or pool data across different customers. Any reinforcement or feedback loop is per-customer, tuned to their domain data, brand constraints, and usage patterns. This avoids knowledge leaks and ensures each client’s sensitive information stays siloed. Future roadmap items may consider some form of anonymized, aggregated learning, but we are heavily guided by data protection requirements when exploring such options.

How do updates get pushed to customers without a centralized config manager?​

How do you handle different cloud providers (search, load balancing, caching) without a config manager?​

You mention weekly releases of the SDK. Is it internal-only or documented for customers?​

How do you handle large-scale concurrency and potential latency issues?​

How is knowledge stored today? Does each customer have a dedicated knowledge base?​

Is there a way to determine relevancy of data for user queries, or do you do RAG on the entire dataset each time?​

Is there any shared reinforcement learning across customers, or are agents improved only per tenant?​