On the road to SRE with Cloud Operations Sandbox

At Google Cloud, we strive to bring Site Reliability Engineering (SRE) culture to our customers not only through training on organizational best practices, but also with the tools you need to run successful cloud services. Part and parcel of that is comprehensive observability tooling—logging, monitoring, tracing, profiling and debugging—which can help you troubleshoot production issues faster, increase release velocity and improve service reliability. 

We often hear that implementing observability is hard, especially for complex distributed applications that are implemented in different programming languages, deployed in a variety of environments, that have different operational costs, and many other factors. As a result, when migrating and modernizing workloads onto Google Cloud, observability is often an afterthought. 

Nevertheless, being able to debug the system and gain insights into the system’s behavior is important for running reliable production systems. Customers want to learn how to instrument services for observability and implement SRE best practices using tools Google Cloud has to offer, but without risking production environments. With Cloud Operations Sandbox, you can learn in practice how to kickstart your observability journey and answer the question, “Will it work for my use-case?”

Cloud Operations Sandbox is an open-source tool that helps you learn SRE practices from Google and apply them on cloud services using Google Cloud’s operations suite (formerly Stackdriver). Cloud Operations Sandbox has everything you need to get started in one click:

  • Demo service – an application built using microservices architecture on modern, cloud-native stack (a modified fork of a Online Boutique microservices demo app)

  • One-click deployment – automated script that deploys and configures the service to Google Cloud, including:

    • Service Monitoring configuration

    • Tracing with OpenTelemetry

    • Cloud Profiling, Logging, Error Reporting, Debugging and more

  • Load generator – a component that produces synthetic traffic on the demo service

  • SRE recipes – pre-built tasks that manufacture intentional errors in the demo app so you can use Cloud Operations tools to find the root cause of problems like you would in production

  • An interactive walkthrough to get started with Cloud Operations 

Getting started

Launching the Cloud Operations Sandbox is as easy as can be. Simply:

This creates a new Google Cloud project. Within that project, a Terraform script creates a Google Kubernetes Engine (GKE) cluster and deploys a sample application to it. The microservices that make up the demo app are pre-instrumented with logging, monitoring, tracing, debugging and profiling as appropriate for each microservices language runtime. As such, sending traffic to the demo app generates telemetry that can be useful for diagnosing the cloud service’s operation. In order to generate production-like traffic to the demo app, an automated script deploys a synthetic load generator in a different geo-location than the demo app.

What's your reaction?

In Love
Not Sure

You may also like

More in:GOOGLE

Leave a reply

Your email address will not be published. Required fields are marked *