Uptime monitoring involves checking the availability of websites, APIs, and servers. The monitor probes a given endpoint within a specified interval to determine whether it is available. The goal is to achieve the contracted level of availability, as specified in the system's SLA, and determine the difference when the contract isn't met.
In this article, we'll build an uptime monitoring system based on Prometheus blackbox_exporter. While it might be trivial to build a custom HTTP monitoring system, building a wrapper around the exporter enables us to access many other probe techniques and quickly monitor other elements of our system.
This article covers the use of several technologies, and I'll describe each component before diving into the details of the uptime system.
What is Google Compute Engine (GCE)?
Compute Engine is Google's cloud computing service similar to AWS's EC2 compute offering. GCE is secure and customizable enough to fit various workloads ranging from small machines (supporting up to 32 vCPUs and 128 GB memory) to standard machines (supporting up to 224 vCPUs and 896 GB memory) and other high-end machines for intensive workloads. It utilizes computer on-demand to scale to your needs per-time.
GCE supports different deployment mechanisms for app deployment, including containers, instance templates, and managed instance groups. For the purpose of this article, we'll bundle our Ruby uptime monitor into a docker container for deployment.
What is Cloud Storage?
Google Cloud Storage is a highly available object-storage service similar to AWS's S3 service. Cloud Storage provides many storage features that enable several use-cases for modern apps. To get started with Cloud Storage in Ruby, we’ll use the google-cloud-storage
gem to authenticate, as well as upload and download files from Cloud Storage:
require 'google/cloud/storage'
def upload_file bucket_name:, file_path:, file_name: nil
storage = Google::Cloud::Storage.new
bucket = storage.bucket bucket_name
file = bucket.create_file file_path, file_name
end
def download_file bucket_name: file_path, file_name: nil
storage = Google::Cloud::Storage.new
bucket = storage.bucket bucket_name
file = bucket.file file_name
file.download file_path
end
Note: You need to set up GOOGLE_APPLICATION_CREDENTIALS
in your environment to point to the right service account key. All Google client gems search for this environment variable for authorization; otherwise, you’ll need to pass auth specific parameters to Google::Cloud::Storage.new
. If your app is running in a GCE VM, however, this is already set up in the environment.
What is Cloud PubSub?
Cloud PubSub is a publish/subscribe messaging service provided by Google Cloud. This form of communication is used to facilitate asynchronous service-to-service communication, similar to AWS's SNS. Building systems with asynchronous communication can help improve our system's performance, scalability, and reliability. To get started with Cloud PubSub in Ruby, we’ll use the google-cloud-pubsub
gem to authenticate, publish, and listen-in on events:
require 'google/cloud/pubsub'
def publish_message topic_id:, message: nil
pubsub = Google::Cloud::Pubsub.new
topic = pubsub.topic topic_id
topic.publish_async message do |result|
raise "Failed to publish message" unless result.succeeded?
puts "Message published asynchronously"
end
topic.async_publisher.stop.wait!
rescue StandardError => e
puts "Received error while publishing: #{e.message}"
end
def receive_message subscription_id: nil, wait_time: 200.seconds
pubsub = Google::Cloud::Pubsub.new
subscription = pubsub.subscription subscription_id
subscriber = subscription.listen do |received_message|
puts "Received message: #{received_message.data}"
received_message.acknowledge!
end
subscriber.start
sleep wait_time
end
Note: The authentication described for Cloud Storage also applies here.
When leveraging Cloud Storage and PubSub, we can build very interesting solutions. Often, we want to upload an object and track updates - it's life-cycle - create, update, delete, and take specific actions based on certain events. If this still seems abstract, let's explore two use-cases:
- Image Service: Building an Image Service. Let’s say that we want to create something similar to Cloudinary that provides image and video storage, as well as performs transformations on these data. While Cloud Storage can help store and version the data, with PubSub, we can listen for events from a bucket and perform certain types of pre-processing on the data, even before the customer requests a pre-processed version.
- Distribute Configuration Files. A common problem in infrastructure engineering is rolling out configurations to several servers and providing easy rollbacks. Imagine that we want to have a central server responsible for server configurations, and we wanted to update the configuration once and distribute the config to a fleet of our servers. By using Cloud Storage and Cloud PubSub, we can build agents on our servers that listen through PubSub to get object notifications and take action based on these events. Furthermore, in the event that it was a bad change (wrong configuration changes are a common reason for downtime 😩 ), we can perform a rollback with object versioning.
In this article, we'll build a Ruby wrapper for Blackbox Exporter using the second use-case described above. The wrapper will run the exporter in one process and run another process to watch for configuration changes from a bucket in GCP, and then live reload the exporter. Are you ready? Let's have fun!
What is Blackbox Exporter?
Blackbox Exporter is an open-source tool built by the Prometheus team to probe endpoints over HTTP, HTTPS, DNS, TCP, and ICMP. The exporter should be deployed alongside a Grafana and Prometheus deployment. The complete setup looks like the following:
The Blackbox wrapper probes all configured endpoints, and Prometheus scrapes the exporter like any other target. Then, Grafana retrieves data from Prometheus to be graphed. We run the exporter binary like blackbox_exporter --config.file blackbox.yml
. Blackbox Exporter also allows us to live reload the exporter with a new configuration without shutting down the binary and restarting it. This can be very useful when scraping endpoints with intervals measured in seconds.
BlackboxWrapper Service Specs
Before deep-diving into the code, let's highlight the service specs:
- The
BlackboxWrapper
service will run two processes.- The first process runs
blackbox_exporter
binary. - The second process listens for bucket changes from GCP and restarts the first process.
- The first process runs
- The service will be deployed as a docker image, which will enable us to package the service alongside the
blackbox_exporter
binary.
Let's Start Building
First, create an app directory and then enter the directory.
mkdir blackbox-wrapper && cd blackbox-wrapper
Like our standard Ruby application, we'll use bundler
to manage our wrapper's dependencies. Create a Gemfile:
source "https://rubygems.org"
git_source(:github) {|repo_name| "https://github.com/#{repo_name}" }
ruby '2.7.2'
gem 'google-cloud-storage'
gem 'google-cloud-pubsub'
gem 'rake'
gem 'pry'
Then run bundle install
.
Now we'll create a file to hold our code: app.rb
.
This file will act as the entry point to our service. Since we will be deploying our app in a container, this file will be specified in the CMD
command in our Dockerfile
later on.
touch app.rb
Creating the Dockerfile
While some items have been omitted from this file on purpose. the code below highlights the critical components necessary for this article:
FROM ruby:2.7.2
RUN mkdir /app
WORKDIR /app
COPY . .
# Install other dependencies
...
# Download & Install blackbox exporter
RUN curl -SL \
https://github.com/prometheus/blackbox_exporter/releases/download/v0.18.0/blackbox_exporter-0.18.0.linux-386.tar.gz | \
tar xvz -C /tmp && \
mv /tmp/blackbox_exporter-0.18.0.linux-386/blackbox_exporter /usr/local/bin && \
mkdir /etc/blackbox && \
mv /tmp/blackbox_exporter-0.18.0.linux-386/blackbox.yml /etc/blackbox/
# Specify entry point.
CMD ["bundle", "exec", "ruby", "app.rb" ]
From the above, we should note the following:
- We used a Ruby image -
ruby:2.7.2
- as a base image with Ruby installed. - We installed
blackbox_exporter
binary and moved it to a directory accessible from ourPATH
. - We specified the entrypoint of the container to run
app.rb
on container start up.
Building The Wrapper Service
This is our Ruby service that glues everything together. In main.rb
, place the following:
require 'rubygems'
require 'bundler/setup'
require "google/cloud/pubsub"
require "google/cloud/storage"
CONFIG_BUCKET = ENV['BUCKET_NAME']
TOPIC = ENV['PUBSUB_TOPIC']
TOPIC_SUBSCRIPTION = ENV['TOPIC_SUBSCRIPTION']
class ProcessNotification
def initialize(file, attr, blackbox_exporter)
@file = file
@attr = attr
@blackbox_exporter = blackbox_exporter
end
def call
return if @attr['eventType'] == 'OBJECT_DELETE'
@blackbox_exporter.write @file
@blackbox_exporter.reload
end
end
class BlackBoxExporter
CONFIG_FILE = '/etc/blackbox/blackbox.yml'
def initialize
@blackbox_pid = nil
end
def start
return unless @blackbox_pid.nil?
@blackbox_pid = fork do
exec('blackbox_exporter', '--config.file', CONFIG_FILE)
end
end
def write(file)
file.download CONFIG_FILE
end
def reload
# Send SIGHUP signal
Process.kill('HUP', @blackbox_pid)
end
def shutdown
Process.kill('KILL', @blackbox_pid)
end
end
class Subscriber
class NotificationConfigError < StandardError
end
SUPPORTED_FILE_TYPES = ['blackbox.yml']
def initialize(blackbox_exporter)
@pubsub = Google::Cloud::Pubsub.new
@storage = Google::Cloud::Storage.new
@subscription_name = ENV['TOPIC_SUBSCRIPTION'] # Retrieve a subscription
@bucket = @storage.bucket CONFIG_BUCKET
@subscription = @pubsub.subscription @subscription_name
@blackbox_exporter = blackbox_exporter
end
def listen
create_notification_config
puts "Starting subscriber"
@subscriber = @subscription.listen do |received_message|
process_notification(received_message)
end
@subscriber.on_error do |exception|
process_exception(exception)
end
@subscriber.start
end
def process_notification(received_message)
data = received_message.message.data
published_at = received_message.message.published_at
attributes = received_message.message.attributes
puts "Data: #{data}, published at #{published_at}, Attr: #{attributes}"
received_message.acknowledge!
parsed_data = JSON.parse(data)
file_name = parsed_data['name']
return unless SUPPORTED_FILE_TYPES.include?(file_name)
file = @bucket.file file_name
process_notification = ProcessNotification.new(file, attributes, @blackbox_exporter)
process_notification.call
end
def process_exception(exception)
puts "Exception: #{exception.class} #{exception.message}"
end
def shutdown
@subscriber.stop!(10)
end
def create_notification_config
topic = @pubsub.topic TOPIC
notification_exists = @bucket.notifications.count == 1
unless notification_exists
@bucket.notifications.each do |notification|
notification.delete
end
end
@bucket.create_notification topic.name
rescue StandardError => e
raise NotificationConfigError, e.message
end
end
class BlackboxWrapper
def initialize
@blackbox_exporter = BlackBoxExporter.new
@subscriber = Subscriber.new(@blackbox_exporter)
end
def start
@blackbox_exporter.start
@subscriber.listen
at_exit do
@blackbox_exporter.shutdown
@subscriber.shutdown
end
# Block, letting processing threads continue in the background
sleep
end
end
blackbox_wrapper = BlackboxWrapper.new
blackbox_wrapper.start
While the above is a lot of coding, let's try to break it down starting from the bottom:
BlackboxWrapper
: This class is the entrypoint to our service. - The.start
method does the following:- Starts the
blackbox_exporter
binary in a different process to start probing endpoints. - Starts the
subscriber
in another process to listen for bucket changes. - It then calls
sleep
in the main process to ensure the app runs infinitely.
- Starts the
- How does the
BlackboxExporter
work?- The
.start
method uses theexec
kernel method to run theblackbox_exporter
binary in another process. - The
.reload
method sends theSIGHUP
signal to live reload theblackbox_exporter
binary with the new configuration. As you may have noted from theProcessNotification
class, a new configuration file is written to the configuration file location before the exporter is reloaded.
- The
- How does the
Subscriber
work?- The
.listen
method starts with creating aNotificationConfiguation
. ANotificationConfiguration
is a rule that specifies three things:- A topic in pub/sub to receive notifications.
- The event that triggers notifications to be sent. Click here to view the various event types that can trigger notifications.
- The information contained within notifications.
- The
#create_notification_config
method also ensures that there's just oneNotificationConfiguration
; otherwise, it will delete everything and create one. This ensures that notifications are sent just once. - The
.listen
method also calls@subscription.listen
to start listening for notification changes in the bucket to which we're subscribed to. Note that this runs infinitely in another process, as explained. - The
#process_notification
method is called for every notification update sent. Note that we haveSUPPORTED_FILE_TYPES
, which we use to identify files in the bucket we care about and do nothing about the rest.
- The
ProcessNotification
: This is responsible for processing notifications, downloading the updated configuration, writing it to a file, and reloading theblackbox_exporter
binary.
Running the Service Locally
To run the service locally and test it, run the following in the root of the app directory:
export BUCKET_NAME='{insert-bucket-name}'
export PUBSUB_TOPIC='{insert-pubsub-topic}'
export TOPIC_SUBSCRIPTION='{insert-subscription-name}'
export GOOGLE_APPLICATION_CREDENTIALS='{insert-path-to-service-key-json}'
bundle exec ruby app.rb
Deploying our Service to Google Compute Engine
Like many aspects of the cloud, there are many ways to achieve the same result, but modern software engineering encourages CI/CD processes for several good reasons. As such, we will focus on deploying our service from Github Actions using setup-gcloud
Let's set up our deployment file (.github/workflows/deploy.yml).
name: Build and Deploy to Google Compute Engine
on:
push:
branches:
- main
env:
PROJECT_ID: ${{ secrets.GCE_PROJECT }}
GCE_INSTANCE: ${{ secrets.GCE_INSTANCE }}
GCE_INSTANCE_ZONE: us-central1-a
BUCKET_NAME: demo-configurations
PUBSUB_TOPIC: demo-configurations-bucket-notifications
TOPIC_SUBSCRIPTION: demo-bucket-changes-subscription
jobs:
setup-build-publish-deploy:
name: Setup, Build, Publish, and Deploy
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v2
# Setup gcloud CLI
- uses: google-github-actions/setup-gcloud@master
with:
version: '290.0.1'
service_account_key: ${{ secrets.GCE_SA_KEY }}
project_id: ${{ secrets.GCE_PROJECT }}
# Configure Docker to use the gcloud command-line tool as a credential
# helper for authentication
- run: |-
gcloud --quiet auth configure-docker
# Build the Docker image
- name: Build
run: |-
docker build --tag "gcr.io/$PROJECT_ID/$GCE_INSTANCE-image:$GITHUB_SHA" .
# Push the Docker image to Google Container Registry
- name: Publish
run: |-
docker push "gcr.io/$PROJECT_ID/$GCE_INSTANCE-image:$GITHUB_SHA"
- name: Deploy
run: |-
gcloud compute instances update-container "$GCE_INSTANCE" \
--zone "$GCE_INSTANCE_ZONE" \
--container-image "gcr.io/$PROJECT_ID/$GCE_INSTANCE-image:$GITHUB_SHA" \
--container-env "BUCKET_NAME=$BUCKET_NAME,PUBSUB_TOPIC=$PUBSUB_TOPIC,TOPIC_SUBSCRIPTION=$TOPIC_SUBSCRIPTION"
Note that the --container-env
flag is set in the deploy phase, which ensures that we pass necessary environment variables from Github Actions secrets to the container in a secure fashion.
Secrets & Environment Variables
Next, we'll set up secrets for github actions.
We set the environment variables for our container with the --container-env
flag. Since we are setting it from Github actions, we can either use secrets for sensitive data or env variables for non-sensitive data.
Creating GCP Resources
Let's create a bucket in the GCP console.
We'll also create a PubSub topic in the GCP console.
Set the service agent of the cloud storage bucket - the IAM role - pubsub.publisher
in the console. Each project has an associated Cloud Storage service account responsible for some background actions, such as PubSub notifications. Click here to learn how to find it.
Finally, we create a subscription in the GCP Console.
Voila! 🎉 Our cloud function has been deployed successfully.
Conclusion
If you’ve made it this far, you deserve a cookie 🍪 . I think this is the first version of a potentially great solution with multiple optimizations to be achieved. For example, we could achieve the following:
- Deploy the blackbox_exporter as a serverless function to support multiple regions, which is ideal for uptime monitoring, and deploy a master server responsible for updating the bucket configuration in cloud Storage.
- Potentially, from the previous point, we can abstract this into an app that integrates into popular cloud providers to achieve the same functionality, hence making it cloud-agnostic. P.S: Popular cloud providers (GCP, AWS, & Azure) provide the same functionalities across services.
In the next article, we'll build on this solution to provide rollbacks with cloud-storage object versioning, which will enable us to recover from updating the configuration with incorrect updates.
Deploying with Docker simply solves the packaging problem for us, but as you may already know, there are various ways to package services. I chose Docker in this article for the sake of simplicity.
Glossary
- Prometheus is an open-source systems monitoring and alerting toolkit. It includes a server that scrapes and stores time-series data, client libraries for instrumenting application code, and an alert manager to handle alerts.
- Grafana is a visualization system that allows you to query, visualize, alert on, and understand your metrics, regardless of where they are stored.
- Blackbox Exporter is an open-source tool built by the Prometheus team to probe endpoints over HTTP, HTTPS, DNS, TCP, and ICMP.