PHP Slim 3 Circuit Breakers - Failing gracefully

Recently at work I have run into a few small problems with our infrastructure. Occasionally one of my VPS's timesout when contacting an external mail server. Since this happens maybe two or three times a month, it has been incredably hard to trace.

One of the problems that I currently have is that the only notification I get when a problem like this happens is an email, which is actually quite silly. The auto-email notification was put in for a totally separate issue, however it casts a wide enough net that I get to look at a view quirks of our environment. Given that the only error reporting is sent over email I highly suspect that the issue might be happening more often than not, but the only way I can tell is if I add more error reporting. Doing it for a one off service doesn't seem to be worth the effort, until I came across a wonderful package implementing a wonderful pattern.

Circuit Breakers

The original repository can be found here, https://github.com/ejsmont-artur/php-circuit-breaker. I have forced it and added to the functionality https://github.com/geggleto/php-circuit-breaker.

A circuit breaker allows a 3rd party service in your application to fail, registers that fail and after a threshold will trip the breaker making all subsequent requests fail immediately until a certain time threshold has past. This design is really good for stopping resource based attacks. The original repository implements the most basics of the pattern and provides a few different adapters for storage.

What to do when you "Trip" the breaker

In my package I address the problem of what to do when your service "trips" the breaker. For me this is the point in the application where I need to be notified something is really wrong. Think going Green to going Red. I define an interface called TrippedHandler. A Handler class would implement this interface to execute code to do something about the error. In my haste to use this in production, I have tied my implementation to using Slim v3, this will be fixed next week sometime.

Here is what an example Handler might look like.

<?php
/**
 * Created by PhpStorm.
 * User: Glenn
 * Date: 2016-02-19
 * Time: 9:09 AM
 */

namespace VMS\Handler;

use Ejsmont\CircuitBreaker\TrippedHandlerInterface;

class EmailHandler implements TrippedHandlerInterface
{
    protected $targetEmail = '';
    
    protected $headers = '';
    
    public function __construct($targetEmail)
    {
        $this->targetEmail = $targetEmail;
        $this->headers = 'From: [email protected]' . "\r\n" .
            'Reply-To: [email protected]' . "\r\n" .
            'X-Mailer: PHP/' . phpversion();
    }

    public function __invoke($serviceName, $count, $message)
    {
        mail($this->targetEmail, "Service Outage: " . $serviceName, $message, $this->headers);
    }
}

In Slim v3 I define the circuit breaker in the application container like so

    CircuitBreaker::class => function ($c) {
        $factory = new Ejsmont\CircuitBreaker\Factory();
        /** @var  $circuitBreaker CircuitBreaker */
        $circuitBreaker = $factory->getSingleApcInstance(5, 30);
        $circuitBreaker->registerHandler("Database", new \VMS\Handler\EmailHandler("[email protected]"));
        return $circuitBreaker;
    },

In my case I am using a local APC instance to hold the reporting status of the services.

In this example I will show you how I guard against Database Server outages

<?php
/**
 * Created by PhpStorm.
 * User: Glenn
 * Date: 2016-01-19
 * Time: 2:58 PM
 */
use Ejsmont\CircuitBreaker\Core\CircuitBreaker;

/*
 * The Breaker code only checks on Connect. There is no reliable way to check for any other errors during the life cycle
 * of the application. For the most part this should be fine, as most requests are served under 1 second and the next
 * request that comes into the server will fail if there was a problem in this request
 */

/** @var $breaker CircuitBreaker */
$breaker = $container[CircuitBreaker::class];
if ($breaker->isAvailable('Database')) {
    try {
        ActiveRecord\Config::initialize(function (\ActiveRecord\Config $cfg) {
            $cfg->set_connections(
                array(
                    'development' => 'mysql://[email protected]/database',
                    'production' => 'mysql://[email protected]/database'
                )
            );

            $cfg->set_default_connection("development");
        });

        // This line forces the Connection attempt If there's a timeout then we want to know about it here rather than later on in the execution cycle.
        $conn = ActiveRecord\Connection::instance("development");

        $breaker->reportSuccess("Database");
    } catch (\ActiveRecord\ActiveRecordException $e) {
        $breaker->reportFailure("Database");

        $handle = $container["circuitError"];
        $handle($app, 503, "Database currently unavailable");
    } catch (Exception $e) { // ????
        $handle = $container["circuitError"];
        $handle($app, 500, "Application Problem");
    }
} else { //It has tripped if it has hit this code block
    $handle = $container["circuitError"];
    $handle($app, 503, "Database currently unavailable");
}

So this code block requires a lot of explanation.

First off we check to see if the breaker is already tripped with the isAvailable. If it is then we want to stop the execution to stop any resource based attacks. If the service is still available we try to do what we need to, in this case execute a Database Connection attempt. If no exceptions are thrown then we report a Success to the breaker and continue on...

If a database exception is thrown we want to report a failure. It is at this point if the threshold has been hit your handler will run if and only if you have defined a handler for that service.

Lastly if anything else happens we chalk it up to... wtf bqq. Terminate the app and not record an issue.

Dashboard

Since we are using a local APC instance to hold the reporting status, it is rather trivial to setup a dashboard to monitor failure statistics. But since this is highly dependant on your application and use case I will leave that to you to implement!

Solution to Original Problem

Since my original problem seems to be very intermittent I can define a circuit breaker such that it reattempts the email connection by setting a threshold of 1 and having the handler resend the message, bam problem solved. Also I have it emailing me a copy of every failed message.

TGIF Happy Friday!

Follow Up

I had an idea after doing a peer code review with a collegue, the syntax for the breaker check was a little bit verbose. I just pushed a patch that should greatly simplify code reuse on this package. I present to all the attempt method

/** @var $breaker CircuitBreaker */
$breaker = $container[CircuitBreaker::class];
$breaker->attempt("Database", function () {
    ActiveRecord\Config::initialize(function (\ActiveRecord\Config $cfg) {
        $cfg->set_connections(
            array(
                'development' => 'mysql://[email protected]/reporting',
                'production' => 'mysql://root:[email protected]/reporting'
            )
        );

        $cfg->set_default_connection("development");
    });

    $conn = ActiveRecord\Connection::instance("development");
}, function () use ($app, $container) {
    $handle = $container["circuitError"];
    $handle($app, 503, "Database currently unavailable");
});

Written by Glenn Eggleton on Friday February 19, 2016
Permalink - Chapter: Projects