Home Handling a Distributed Stack Across Every Isoscribe Server
Post
Cancel

Handling a Distributed Stack Across Every Isoscribe Server

Pre-note: This is an old article from when I headed Isoscribe. I’m reposting it here, because I thought it was somewhat cool.

Note: This article is out of date, now that we have moved to AWS. We’re also completely rewriting our software stack - more info in a future post.

We run one software stack to handle all of Isoscribe traffic.

Admittedly, that’s a partial lie. We run two stacks, serving 100% different purposes. The frontend you use right now uses a static server written in Go, hiding behind NGINX. Believe it or not, our static server uses less than three hundred lines of code. Most of that is, in fact, for updating local references to our JavaScript repository.

The Kubernetes Architecture

Isoscribe, aside from its analytical workload, runs entirely on Kubernetes. I’ll get into how our
data analytics system works in another post.

The two main deployments In Kubernetes,

if you aren’t familiar with its configuration, a container is called a pod, a group of pods under the same name is called a deployment, and a deployment that has additional features is called a service.

API Deployment

We run a Python-based API server, with aiohttp as our web library. The server sticks itself behind NGINX, which handles request buffering, specifying correct user IPs for error logging, and handles proxying to the Gunicorn socket. It access our external MongoDB server, and does a ton of business logic for us. To be clear, only API requests are tracked - we use it for purely analytical purposes, and only collect IP, request URL (excluding any sensitive, i.e. POST parameters), date/time, request ID, and response code, along with some internal data like referer and and host header.

The API server runs on a modified Alpine container, built by the amazing CI system, Concourse. Seriously. Check these guys out and
give them some love. It is a beyond-amazing Go based continuous integration package.

Frontend Server Deployment

It’s called frontend-server. I know, I’m so creative.

The static server is the one I mentioned before - it takes some HTML files, which are basically a <head> tag that has metadata, and a link to the JavaScript bundle, and gives it to you. It’s honestly that simple. It is likely going to be changed in the future to be a pure NGINX server, with the HTML files generated by a Go daemon. In fact, I’m almost definitely going to do that after I finish writing this post.

The web infrastructure

This stack runs in a sort of three-phase system. First, we have the place you’re routed to - CloudFlare. They cache and proxy our content to keep us safe from DDoS attacks. It then gets moved their load balancer to one of multiple datacenters worldwide. Each of those datacenters runs its own load balancer, which will check for healthy instances, and route requests between each pod in our Kubernetes cluster. It’s safe to say, the only thing that will take us down is the complete failure of either CloudFlare or GCP.

Expansions

On the roadmap is making a powerful ElasticSearch cluster for better searching content, making our analytics slightly faster, and possibly phasing out our Go server.

So yes. There is a single-ish stack running Isoscribe.

I hope you enjoyed this writeup. Next? The data analytics workload.

This post is licensed under CC BY 4.0 by the author.