About Steve
Hi! I'm Steve Rose, a reliability-focused engineering leader specializing in Cloud, Kubernetes, and large-scale messaging and data platforms. Over the last decade and a half I’ve led cross-region SRE teams, major cloud migrations, and FinOps initiatives that significantly reduced infrastructure spend while improving resilience.
Where it Started
My career started with FreeBSD servers, pager alerts, and a lot of time in the terminal figuring out why services were misbehaving. It was a crash course in systems thinking: logs, metrics, weird edge cases, and the very real cost of downtime.
As the stack moved to Cloud and modern tooling, I moved with it — building out platforms, introducing infrastructure as code, and pushing for SLOs and observability so we could make decisions based on signals, not gut feel.
Today I lead SRE and Platform Engineering teams, balancing reliability, performance, and cost. The tools have changed since those early FreeBSD days, but the goal hasn’t: keep systems boring, predictable, and ready for whatever traffic hits them.
What I Bring to the Table
Engineering Leadership
I build and lead cross-region SRE teams, set clear reliability goals, and align platform roadmaps with product and risk.
Platform Architecture
Design and evolve Cloud and Kubernetes platforms, messaging and data layers that can safely absorb traffic spikes and change.
Observability
I turn "the system seems fine" into measurable promises using telemetry, monitoring and metrics.
AI-Augmented Operations
I safely apply AI copilots and automation to incident response, alert triage, and runbook execution.