Frequently Asked Questions

Value & ROI

How does Expanso reduce my Snowflake/Splunk/Datadog costs?

Expanso filters, transforms, and governs data at the source-before it reaches your expensive downstream platforms. By cutting 50-70% of data volume upstream, you dramatically reduce ingestion, storage, and compute costs in Snowflake, Splunk, Datadog, and similar platforms. Instead of paying premium rates to process raw, noisy data in centralized systems, Expanso does the heavy lifting at the edge for pennies on the dollar, then sends only valuable, policy-compliant data downstream.

What's the typical ROI and payback period?

Most enterprises see 50-70% reduction in downstream platform costs within 60-90 days. Typical outcomes: $500K-$2M+ annual savings on Snowflake/Splunk/Datadog spend, 50% faster data onboarding (weeks vs. months), and 60% reduction in pipeline engineering toil. We offer a free cost assessment that identifies 5-10 immediate savings opportunities in your first consultation. ROI usually pays back deployment costs in 3-6 months.

Does Expanso replace my existing data platforms?

No-we make them cheaper, faster, and more reliable. Expanso sits upstream of Snowflake, Databricks, Splunk, Datadog, Elastic, and other platforms. We're the 'data control layer' that filters noise, enforces governance policies, and optimizes data before it hits your downstream systems. You keep your existing analytics, observability, and security tools-they just work better and cost 50-70% less.

How quickly can we deploy and see results?

Start with our free tier (5 nodes) or pilot a subset of your infrastructure in 1-2 weeks. Most enterprises deploy to 50-100 nodes in the first month and validate cost savings before scaling to thousands of endpoints. Transparent per-node pricing means no surprises-you control the rollout pace. Typical path: pilot (weeks 1-4) → validate savings (weeks 5-8) → scale (months 3-6).

Platform & Architecture

What platforms and environments does Expanso support?

Expanso runs everywhere your data originates: cloud (AWS, Azure, GCP), on-prem data centers, edge locations, hybrid environments, and IoT/OT devices. Native support for Linux (x86_64, ARM64), Windows Server, containers (Docker, Kubernetes), and bare metal. Lightweight agents (<512 MB RAM, <5% CPU) run on everything from Raspberry Pi to industrial servers. Perfect for distributed enterprises with complex, multi-cloud infrastructure.

How does Expanso handle network instability and disconnections?

Built for unreliable networks. Edge nodes buffer locally during disconnections (configurable, default 10 GB with optional persistent disk buffering). Automatic backpressure management, guaranteed delivery, and no data loss when configured properly. Critical for retail locations, manufacturing plants, remote sites, and distributed infrastructure where connectivity isn't guaranteed.

What's the performance overhead and resource footprint?

Minimal. Edge agents use <5% CPU and <512 MB RAM under typical enterprise load (100 GB/hour throughput). Processing adds microseconds per event. Binary size ~50 MB with no runtime dependencies. Designed for efficiency-agents run on constrained edge hardware while processing 10-50 GB/hour per node. Scale horizontally across thousands of endpoints without performance degradation.

Can we customize processing logic for our specific use cases?

Yes. Expanso provides 40+ built-in processors (filter, parse, transform, enrich, aggregate, mask, redact) that handle most enterprise use cases. For custom logic, use our JavaScript sandbox or request custom processors-Enterprise customers typically receive custom processors within 48 hours. Full SDK roadmap for Q1 2025. Our team works with you to build policy-driven pipelines that match your exact requirements.

Compliance & Governance

How does Expanso help with compliance (GDPR, HIPAA, industry standards)?

Expanso enforces compliance at the source-before sensitive data reaches downstream platforms. Built-in processors redact PII (SSNs, credit cards, API keys, passwords), hash identifiable data while preserving searchability, and enforce data residency policies. Built to meet enterprise security and compliance requirements including HIPAA-compliant deployment options and GDPR compliance with DPAs. Data governance becomes policy-driven and automated, not manual and error-prone.

Where does our data go? Can we keep it on-prem?

Your data never leaves your infrastructure unless you explicitly configure forwarding. Expanso processes data locally at the edge, then sends only policy-compliant, filtered data to your chosen destinations. Full data sovereignty-you own and control your data completely. Control plane available in US, EU, and APAC regions. Private control plane deployment available for Enterprise customers with strict data residency requirements.

How is authentication and access control managed?

Enterprise-grade security: mTLS between agents and control plane with automatic certificate rotation, SAML/OIDC for user authentication (Okta, Auth0, Azure AD), API keys with RBAC and automatic expiration, and optional hardware token support (YubiKey). All control plane communication uses mTLS with certificate pinning. No passwords stored anywhere. Full audit logs for compliance reporting.

Can Expanso run in air-gapped or highly restricted environments?

Yes. Edge agents operate fully offline once configured. Configuration deployed via local network, USB, or manual distribution. Many financial services and defense customers run completely air-gapped deployments. Offline license activation available for Enterprise tier. Perfect for highly regulated industries with strict network isolation requirements.

Integrations & Ecosystem

Does Expanso integrate with Snowflake, Databricks, and data warehouses?

Yes-first-class support. Direct integration with Snowflake (Snowpipe, external tables), Databricks (Delta Lake, Unity Catalog), AWS S3/Athena, Azure Blob/Synapse, and GCP GCS/BigQuery. Automatic partitioning, compression (gzip, snappy, zstd), and multiple formats (JSON, Parquet, ORC, Avro). Smart batching reduces API calls by 95%. Most customers see 50-70% warehouse cost reduction due to upstream filtering and optimization.

What about Splunk, Datadog, Elastic, and observability platforms?

Full native support. Splunk HEC with automatic batching and compression, Datadog Logs API with proper tagging, New Relic with attribute enrichment, Elasticsearch/OpenSearch with bulk indexing and ILM support. We compress, batch, and optimize for efficient ingestion-typically reducing API calls by 90% and ingestion costs by 60-80%. All with automatic retry, circuit breaking, and backpressure management.

Can we keep using Kafka, Kinesis, and existing streaming infrastructure?

Absolutely. Native Kafka producer with exactly-once semantics, automatic topic creation, and schema registry support. Kinesis output with automatic sharding and PutRecords batching. Expanso sits upstream-filtering and enriching data before it hits your streaming platforms. Most customers see 70-80% reduction in Kafka/Kinesis costs due to edge filtering and reduced message volume.

What connectors and data sources does Expanso support?

Plug-and-play connectors for 100+ data sources: application logs, system metrics, cloud services (AWS CloudWatch, Azure Monitor, GCP Logging), databases (PostgreSQL, MySQL, MongoDB), message queues, APIs, and custom sources. Pre-built integrations mean faster onboarding-weeks instead of months building brittle custom scripts. If we don't have a connector you need, we'll build it (typically 48-hour turnaround for Enterprise customers).

Enterprise Scale & Operations

What scale can Expanso handle? How many nodes/endpoints?

Production-proven at enterprise scale: 10,000+ nodes for global customers, tested to 100,000+ nodes. Largest customer processes 5 PB/month across 8,000 distributed locations. Configuration updates propagate globally in <30 seconds. Typical enterprise deployment: 500-5,000 nodes across data centers, cloud regions, edge locations, and remote sites. Linear scaling with no performance degradation.

How much data throughput can a single node handle?

Typical throughput: 100 GB/hour per node (compressed). Peak tested: 500 GB/hour on modern hardware. Sweet spot for cost/performance: 10-50 GB/hour per node. Scale horizontally across endpoints rather than vertically per node. Most enterprises deploy multiple lightweight nodes rather than over-provisioning single nodes-better resilience, easier management.

How does pricing work? Any hidden costs or surprises?

Transparent per-node pricing-no surprises. Pay for data processed at ingestion (before filtering). If you ingest 100 GB and filter to 30 GB, you pay for 100 GB processing. Pricing tiers: Free (50 GB/month, 5 nodes), Team (pay-as-you-go), Business (volume discounts with commitment), Enterprise (custom pricing for 500+ nodes). Most enterprises see 60-80% total cost reduction when factoring in downstream platform savings. ROI calculator available on request.

What does Enterprise support include?

Enterprise tier includes: dedicated customer success manager, 24/7/365 support with <1 hour SLA, custom processor development (48-hour turnaround), architecture reviews and optimization, private control plane deployment, professional services for onboarding, and executive business reviews. We partner with you to ensure successful deployment and continuous optimization-your success is our success.

Competitive Positioning

How is Expanso different from Cribl, Edge Delta, or traditional ETL tools?

Expanso is purpose-built for upstream data control at enterprise scale. Unlike point solutions focused on observability (Edge Delta) or log routing (Cribl), Expanso provides policy-driven pipelines for all data types-logs, metrics, events, IoT telemetry, application data. We're the control layer between data sources and all downstream platforms (Snowflake, Splunk, Datadog, Databricks, etc.). Built-in compliance, governance, and cost optimization-not bolt-on features.

Why not just use Snowflake/Databricks built-in features?

Snowflake and Databricks are excellent for analytics, but they charge premium rates for data ingestion, storage, and compute. By the time data reaches them, you've already paid for raw, unfiltered volume. Expanso sits upstream-filtering and optimizing before data hits these platforms. You keep using Snowflake/Databricks for what they do best (analytics), but your bills drop 50-70% because you're only sending valuable, processed data.

We have a data engineering team. Why can't we build this ourselves?

You absolutely could-but should your best engineers build distributed systems infrastructure, or focus on business-differentiating analytics and features? Expanso handles the complexity of reliable edge processing (disconnections, retries, security, compliance, monitoring, scaling) so your team focuses on insights, not infrastructure. Customers report 60%+ reduction in pipeline engineering toil after deploying Expanso-freeing engineers for higher-value work.

What's the migration path from our current setup?

Phased, low-risk approach: start with pilot (5-50 nodes), validate cost savings (30-60 days), then scale incrementally. Expanso runs alongside your existing pipelines-no rip-and-replace. Many customers start with one high-cost data source (e.g., verbose application logs flooding Splunk) to prove ROI, then expand to additional sources. We provide migration playbooks, professional services, and hands-on support to ensure smooth rollout.

Common Issues & Troubleshooting

Edge agent won't start-what should I check first?

Usually permissions or connectivity. Check: 1) Agent has read access to log files and write access to buffer directory (/var/lib/expanso by default), 2) Minimum 1 GB disk space available, 3) Port 443 open for control plane communication, 4) System time synchronized (NTP->5 min drift causes TLS failures). Run 'expanso doctor' for automatic diagnostics. Contact support if issues persist-Enterprise customers get <1 hour response.

Seeing 'TLS handshake failed' or connectivity errors?

Usually firewall, proxy, or time sync issues. Check: 1) Port 443 outbound is open, 2) If using corporate proxy, set HTTPS_PROXY environment variable, 3) System time is correct (use NTP), 4) CA certificates are up-to-date. For debugging, run agent with --debug flag for detailed TLS logs. Many enterprise customers require proxy configuration-we provide detailed documentation and support.

Data isn't appearing in downstream destinations-how do I debug?

Check pipeline status: 'expanso pipeline status [name]'. Common issues: 1) Filter policy too restrictive (check metrics for filtered count), 2) Destination credentials incorrect (check error logs), 3) Network path blocked to destination (test with telnet/curl), 4) Backpressure from slow destination (check buffer usage). Enable debug logs for detailed flow tracing. Enterprise support can remote-diagnose and resolve within hours.

High memory usage or performance issues on edge nodes?

Usually buffer configuration or throughput mismatch. Check: 1) Buffer size vs. throughput (need 10x headroom), 2) Pipeline batch sizes (larger batches = more memory), 3) Number of concurrent pipelines, 4) Memory leak (rare-run 'expanso debug memory-profile'). Tune with --max-memory flag. Default in-memory buffer is 512 MB; enable disk buffering for high-volume nodes. Enterprise customers get architecture reviews to optimize performance.

FAQ