Base 42 — Run Together
Launch day is just Day 1. Base 42 is our managed-ops layer that keeps your product fast, secure, and always shipping. It blends light Site-Reliability Engineering (SRE) with product-ops so adoption and uptime rise in tandem.
1 · Why Base 42?
Production Pain | How Base 42 Solves It |
---|---|
“Who's on pager duty?” | 24 × 7 on-call rotation, first-line triage in < 15 min. |
Bug found → no one can roll back | Git tag + blue/green release flow with instant revert. |
Metrics in ten dashboards, none linked to features | Unified Grafana board tying KPIs, errors, logs, and cost to each epic. |
Roadmap stalls under unplanned maintenance | Monthly optimisation sprint keeps feature velocity ≥ 80 % of dev budget. |
2 · Managed-Hosting Options
Tier | Stack | Ideal For | SLA¹ | Typical Cost² |
---|---|---|---|---|
BYOC (Bring your own cloud) | AWS, Azure, GCP, on-prem Docker/K8s | Enterprises with internal DevOps | Advisory only | 5–10 % of dev spend |
Edge Cloud (default) | Fly.io + Postgres | SaaS & fintech MVPs needing global latency | 99.9 % | 12–16 % |
Hybrid | Fly.io front, AWS data plane, Salesforce or SAP integrations | Regulated workloads | 99.95 % core, 99.9 % edge | 15–20 % |
Full-Service | Edge Cloud + CI/CD + DaaS squad | Scale-ups wanting "no-ops" | 99.95 % + hot-fix < 1 h | 18–22 % |
¹ Monthly uptime.
² Percentage of monthly engineering run-rate (DaaS or Pod).
All tiers include Datadog, Sentry, Statuspage and automated SSL rotation.
3 · Ops Lifecycle
Phase | What Happens | Artefacts |
---|---|---|
Product-Ops Sprint (monthly) | Prioritise bugs, A/B tests, infra chores | Ranked "Run Backlog" |
Blue/Green Release | Traffic shift with auto-rollback if health < 99 % | Release notes |
Telemetry & Alerts | Datadog SLOs, Sentry error triage | Real-time dashboard |
Incident Response | PagerDuty rotation, Slack war-room | RCA doc in 24 h |
Post-Mortem & Tech-Debt Ticket | Blameless review, ticket into Forge backlog | Jira ticket linked to RCA |
4 · Reliability & Performance Budgets
Error Budget
p95 latency target (default 300 ms API, 3 s PWA).
Change Failure Rate
goal ≤ 5 %.
Mean Time to Restore (MTTR)
P1 ≤ 2 h, P2 ≤ 8 h.
Traffic Surge Policy
auto scale 3× in < 60 s on Fly.io; static warm pool for AWS.
5 · Security & Compliance
Control | Implementation |
---|---|
Secrets Management | Doppler or AWS Secrets Manager — no keys in env files. |
Vulnerability Scans | Nightly Snyk + GitHub Dependabot blobs auto-PR. |
Data Encryption | TLS 1.3 in transit; AES-256 at rest. |
Audit Log | Immutable CloudTrail / Fly Log-Drains; 30-day hot, 1-year cold. |
Compliance Support | SOC-2 / ISO 27001 questionnaire pack; HIPAA BAA addendum. |
6 · Optimisation & Cost Control
Usage-to-Cost Dashboard
unit cost per 1 k requests, per tenant.
Weekly Anomaly Alerts
spend spike > 15 % triggers Slack ping.
Quarterly Infra Review
right-size instances, prune dead feature flags.
AI-assisted Index Tuning
pgvector and Postgres auto-suggested indexes applied after tests.
Scale-ups on Full-Service tier cut infra cost/MAU by avg 22 % in year 1.
7 · Exit & Portability
- 14-day data export + Terraform scripts.
- Handoff meeting, doc stack, and credentials.
- Optional "Shadow Month" at 50 % fee for knowledge overlap.
- No vendor lock-in: you own cloud accounts (Edge Cloud uses your Fly.io org).
8 · Mini Case
Retail Flash-Sale Platform — Black-Friday traffic spiked 18×; autoscale kept p95 < 280 ms, zero downtime. Infra spend only +31 % vs baseline due to right-sizing.
9 · FAQs
10 · Call to Action
Want 99.9 % uptime without hiring an SRE team?
Book a Base 42 readiness call—get a tailored hosting plan and cost in 48 h.
Secure My Ops