This one is going to be a little less coherent than usual. It’s just some general findings I’ve made, with respect to defining an automated multi-master architecture.

Automated Multi-Master Topology Negotiation

So, I’ve been working recently on creating an analytics service for a project I probably can’t talk about. Basically, I have to do a lot of math on a lot of rows very very quickly.

One of the problems I’m running into is that the data needs to be processed really quickly. To the point where we can’t have a master node, as that would introduce additional latency for each calculation - milliseconds, sure, but that’s still latency. Think high-frequency trading in the stock market, you can’t have any latency, because you’re paid to not have latency.

There are a few possible methods to handle multi-master topologies. One of the most widely used and supported ideas is to simply have each node vote for a leader, and when that one goes down, they vote again for a new one.

I’m not really a fan of voting for a master

In my mind, the most fault-tolerant architectures are truly distributed. For example, there’s no “master” internet, where when it goes down, the world goes down. Granted, some companies are masters in some way, but it is not true that if Google goes down, Amazon goes down with it.

As far as I’m concerned, voting for a singular leader is an imperfect solution. It creates a single point-of-failure among a distributed system. And, as much as the other masters can quickly vote for a new leader, but there will always be a theoretical period where there is no leader. For example, this happens with MongoDB, and I have been bitten from it - the primary goes down, and then the cluster has to fail over to secondary.

But, what happens if there are an even number of nodes

This is one of the most common problems with a voting solution. It is possible for a vote to yield an inconclusive outcome, due to simple issues, such as not having a majority. As much as this can be solved in implementation, it can be implemented poorly, or even not at all.

I like perfect, mathematical solutions

Solving a problem doesn’t just mean band-aid’ing it. It also means understanding why there is an issue, where it came from, and how to prevent it in the future.

Math is great

It can create an optimal algorithm to answer a question, irrespective of technical limitations. For example, math has solved how to break all forms of encryption available, because “Math Don’t Care” about puny constraints like time and physics.

Stepping down from such a pedestal, it can also design “secure” algorithms that cannot be broken or misused, provided that code is implemented according to the spec (ignoring any possible memory, compute, storage, or other limitations).

For example, state machines

State machines, at their core, are a predefined set of possible “movements, “ and a predefined set of possible “locations.” Theoretical mathematicians and computer scientists may disagree with me on that point, but I’m simplifying for the sake of simplicity, m’kay?

In order for a state machine to be “correct, “ all possible movements must be defined for all possible locations, and they must be handled appropriately. This means that, for example, if you had a device with two lights and two buttons, you could create the following “locations” (states) and “movements” (transitions):

States

Light #1 off, light #2 off
Light #1 on, light #2 off
Light #1 on, light #2 on
Light #1 off, light #2 on

Transitions

a. Button #1 pressed b. Button #2 pressed

These come together to form a rigidly defined algorithm for all possibilities

We could define that the “device” starts in state 1, and can be moved between states using the two buttons. If button A is pressed, it brings the device to state 2. If button B is pressed, it brings the device to state 4. If either button is pressed afterwards, it brings the device to state 3. Then, if a button is pressed again, it remains in state 3.

That creates a rough diagram like so:

 <state 1> ---A---> <state 2> ----A, B-----|
     |                                     V
     |--------B---> <state 3> --A, B--> <state 4> ----A, B--|
                                            /\              |
                                            |---------------|

You can see that in my horrific drawing, state 1 can be transitioned to state 2 or 3, which then transitions to state 4, and no matter what you do, remains in state 4. You can test this extremely well - there are only two possible inputs, and four possible states.

It creates a theoretical solution as a “specification” for the answer to a problem

If this state machine is implemented in code, it can be rigidly assured that there won’t be any possible failures in the execution of this code.

Approaching the problem theoretically can grant a different perspective than the “head in the weeds” approach that is usually taken

So, back to our initial issue. Voting on a master is, to me, an implementation solution to a theoretical problem. Theoretical problems should have theoretical solutions, and implementation problems should have implementation (or theoretical) solutions.

Or so my OCD says.

If it is possible to solve the multi-master problem from a theoretical approach, it means there cannot be a failure in implementation, as long as it is implemented correctly

That is what brings me to the method I have come up with: automated negotation of parameters necessary for each system to function independently as its own leader.

Determining the number of nodes

It is not possible to reliably trigger an event on the failure of a system. This is a point I have spent many an hour belaboring, and trying to figure out. Much like the Two Generals problem, the only way to solve it is with a heartbeat, and have a third party notify others of failure when the heartbeat fails.

So, if each node performs a heartbeat to each other node, as long as there are less than 65535 nodes (Linux’s maximum number of TCP sockets), this should be fine.

But, let’s say there are 100000 nodes. Where do we go from there?

Mesh networks

I’m going to detail mesh networks more fully in another post in the future. Suffice it to say, if each node can guarantee transport to some number of other nodes, in a way that ensures a lack of “islands, “ it can be reliably determined when a node drops off, and all other nodes can be nodified. See what I did there?

So, on to my solution

I have roughly solved my issue of multiple masters by running a watchdog that counts the number of alive nodes, and maintains that number in-memory, to be used when the calculation starts. A short, lacklustre answer for an annoying issue, when I just spent the last few pages writing a long explanation of the superiority of various architectures. Ultimately, I have a few guarantees of this code, such as that calculations will occur at a time after nodes have finished entering & leaving the cluster, so I don’t have to worry about new machines mid-stream. Also, I can have the calculation run in parallel and duplicate, which means I can have some guarantees as to the reliability of my results.

And, last but not least, I don’t need a massively complex answer to this - it’s a small project, with a relatively small impact, and I don’t need to be worried about something too grandiose.

Thanks for reading!

–E

Automated Multi-Master Topology Negotiation

Automated Multi-Master Topology Negotiation

I’m not really a fan of voting for a master

But, what happens if there are an even number of nodes

I like perfect, mathematical solutions

Math is great

For example, state machines

States

Transitions

These come together to form a rigidly defined algorithm for all possibilities

It creates a theoretical solution as a “specification” for the answer to a problem

Approaching the problem theoretically can grant a different perspective than the “head in the weeds” approach that is usually taken

If it is possible to solve the multi-master problem from a theoretical approach, it means there cannot be a failure in implementation, as long as it is implemented correctly

Determining the number of nodes

Mesh networks

So, on to my solution

Further Reading

oVirt. Red Hat Enterprise Virtualization Upstream

Handling a Distributed Stack Across Every Isoscribe Server

How we Run Repetitive Tasks Without Dedicated Servers