One of the key principles of a well-designed web infrastructure is high availability: whatever may happen server-side, users should experience minimal to no service disruption.
In order to achieve this goal, the usual technique used by most online services named load balancing, allows scaling by distributing traffic among multiple servers, but also ensuring that the servers we’re dispatching requests to are actually able to process them.
This later function is named health checking, and it is now available for use with Elastic IP addresses. In this article we will showcase how you can leverage the managed Elastic IP health checking in order to build a highly available web infrastructure on Exoscale.
Architecture Overview
Let’s consider the following web infrastructure: we’re running a Wordpress instance on 2 independent compute instances, relying on a pair of database servers (one primary, and one secondary replicating the primary).
- db1 (private network interface: 10.0.0.1/24)
- db2 (private network interface: 10.0.0.2/24)
- web1 (private network interface: 10.0.0.3/24)
- web2 (private network interface: 10.0.0.4/24)
In case of a database failure happening on the primary server, a failover to the secondary can be performed by switching the database server address to the secondary servers’ address in the Wordpress configuration.
Demo Infrastructure Deployment
Note
In order to keep this article focused on the topic of load balancing, we will not detail the actual database server configuration/replication setup nor the web server setup/Wordpress installation: you can find plenty of up-to-date resources and tutorials covering those topics available online.
We start by creating a Private Network through which all servers communicate with each other:
$ exo privnet create data --zone ch-gva-2
┼──────┼─────────────┼──────────────────────────────────────┼──────┼
│ NAME │ DESCRIPTION │ ID │ DHCP │
┼──────┼─────────────┼──────────────────────────────────────┼──────┼
│ data │ │ ffca3300-319a-41a5-ba6a-0a6cc55b140f │ n/a │
┼──────┼─────────────┼──────────────────────────────────────┼──────┼
The next step is to create an Anti-Affinity Group (AAG) where we’ll place our database server instances. This is an optional, but highly recommended architecture best practice in cloud-based deployments. Instances in the same AAG will be explicitly dispatched among different hypervisors, resulting in a lower probability to lose several redundant instances at once in case of a hypervisor failure – thus improving the overall reliability.
$ exo affinitygroup create aa-db
┼───────┼─────────────┼──────────────────────────────────────┼
│ NAME │ DESCRIPTION │ ID │
┼───────┼─────────────┼──────────────────────────────────────┼
│ aa-db │ │ f7997f5b-7896-4753-b450-b9649f31b949 │
┼───────┼─────────────┼──────────────────────────────────────┼
We then proceed to create our database server instances in the Private Network data
and Anti-Affinity Group aa-db
:
$ exo vm create db1 --zone ch-gva-2 --anti-affinity-group aa-db --privnet data --security-group default
$ exo vm create db2 --zone ch-gva-2 --anti-affinity-group aa-db --privnet data --security-group default
Now that the database tier is up and running, we can focus and the web tier – which is, our webserver instances serving the Wordpress blog. Similar to what we did for the db*
instances, we start by creating a new Anti-Affinity Group for the web*
instances:
$ exo affinitygroup create aa-web
┼────────┼─────────────┼──────────────────────────────────────┼
│ NAME │ DESCRIPTION │ ID │
┼────────┼─────────────┼──────────────────────────────────────┼
│ aa-web │ │ f294a902-004c-448f-920e-43d6adfa9140 │
┼────────┼─────────────┼──────────────────────────────────────┼
Since we want to expose our web servers on the Internet, we create a new firewall Security Group (SG) to allow ingress traffic from any source to port 80 (HTTP):
$ exo firewall create web
┼──────┼────┼──────────────────────────────────────┼
│ NAME │ ID │ │
┼──────┼────┼──────────────────────────────────────┼
│ web │ │ 91c8cd04-3f2d-4aa4-ae69-eb9dac527e81 │
┼──────┼────┼──────────────────────────────────────┼
$ exo firewall add web --protocol tcp --port 80-80
┼─────────┼────────────────┼──────────┼───────────┼─────────────┼──────────────────────────────────────┼
│ TYPE │ SOURCE │ PROTOCOL │ PORT │ DESCRIPTION │ ID │
┼─────────┼────────────────┼──────────┼───────────┼─────────────┼──────────────────────────────────────┼
│ INGRESS │ CIDR 0.0.0.0/0 │ tcp │ 80 (http) │ │ b2d9d919-f46c-4083-811e-858b3fca85b7 │
┼─────────┼────────────────┼──────────┼───────────┼─────────────┼──────────────────────────────────────┼
We then create our 2 web server instances with the AAG and SG we created earlier, also including them in the data
Private Network so they can reach the database servers securely:
$ exo vm create web1 --zone ch-gva-2 --anti-affinity-group aa-web --privnet data --security-group web
$ exo vm create web2 --zone ch-gva-2 --anti-affinity-group aa-web --privnet data --security-group web
As mentioned before we’ll skip the Wordpress installation and configuration, as well as the Nginx HTTP server set up, however to help us visualise the practical effects of the load balancing we’ll tweak the Nginx configuration a little so that they return their hostname in a HTTP response header:
# /etc/nginx/conf.d/add_header.conf
add_header "X-Server" "$hostname";
Note: Do not do this on your productions servers as it may expose internal infrastructure information that can be used by attackers.
$ exo eip create ch-gva-2 --healthcheck-mode http --healthcheck-path / --healthcheck-port 80
┼──────────┼─────────────────┼──────────────────────────────────────┼
│ ZONE │ IP │ ID │
┼──────────┼─────────────────┼──────────────────────────────────────┼
│ ch-gva-2 │ 159.100.241.202 │ c595c5b6-8211-4497-9154-050c696eae1c │
┼──────────┼─────────────────┼──────────────────────────────────────┼
The final step is to associate our managed Elastic IP with our web servers, effectively starting the health checking process:
$ exo eip associate 159.100.241.202 web1 web2
associate "159.100.241.202" EIP
associate "159.100.241.202" EIP
Within a few seconds, sending requests to the associated Elastic EIP shows that both the web1
and web2
servers receive the incoming requests (the balancing method is 5-tuple based on a hash of the source IP address, source port, destination IP address, destination port and the layer 4 protocol):
$ curl -I 159.100.241.202
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Tue, 02 Apr 2019 15:36:38 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Link: <http://159.100.241.220/index.php?rest_route=/>; rel="https://api.w.org/"
X-Server: web2
$ curl -I 159.100.241.202
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Tue, 02 Apr 2019 15:36:56 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Link: <http://159.100.241.220/index.php?rest_route=/>; rel="https://api.w.org/"
X-Server: web1
$ curl -I 159.100.241.202
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Tue, 02 Apr 2019 15:36:57 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Link: <http://159.100.241.220/index.php?rest_route=/>; rel="https://api.w.org/"
X-Server: web2
$ curl -I 159.100.241.202
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Tue, 02 Apr 2019 15:36:58 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Link: <http://159.100.241.220/index.php?rest_route=/>; rel="https://api.w.org/"
X-Server: web1
If we simulate a web service outage by stopping the HTTP server on the instance web1
, we see that now only the instance web2
serves incoming requests:
ubuntu@web1:~$ sudo systemctl stop nginx
curl -I 159.100.241.202
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Tue, 02 Apr 2019 15:39:39 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Link: <http://159.100.241.220/index.php?rest_route=/>; rel="https://api.w.org/"
X-Server: web2
$ curl -I 159.100.241.202
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Tue, 02 Apr 2019 15:39:40 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Link: <http://159.100.241.220/index.php?rest_route=/>; rel="https://api.w.org/"
X-Server: web2
$ curl -I 159.100.241.202
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Tue, 02 Apr 2019 15:39:40 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Link: <http://159.100.241.220/index.php?rest_route=/>; rel="https://api.w.org/"
X-Server: web2
$ curl -I 159.100.241.202
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Tue, 02 Apr 2019 15:39:41 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Link: <http://159.100.241.220/index.php?rest_route=/>; rel="https://api.w.org/"
X-Server: web2
$ curl -I 159.100.241.202
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Tue, 02 Apr 2019 15:39:42 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Link: <http://159.100.241.220/index.php?rest_route=/>; rel="https://api.w.org/"
X-Server: web2
Restarting the HTTP server on server web1
resumes traffic distribution after a few seconds:
ubuntu@web1:~$ sudo systemctl start nginx
$ curl -I 159.100.241.202
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Tue, 02 Apr 2019 15:44:30 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Link: <http://159.100.241.220/index.php?rest_route=/>; rel="https://api.w.org/"
X-Server: web1
$ curl -I 159.100.241.202
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Tue, 02 Apr 2019 15:44:30 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Link: <http://159.100.241.220/index.php?rest_route=/>; rel="https://api.w.org/"
X-Server: web2
$ curl -I 159.100.241.202
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Tue, 02 Apr 2019 15:44:31 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Link: <http://159.100.241.220/index.php?rest_route=/>; rel="https://api.w.org/"
X-Server: web1
$ curl -I 159.100.241.202
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Tue, 02 Apr 2019 15:44:32 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Link: <http://159.100.241.220/index.php?rest_route=/>; rel="https://api.w.org/"
X-Server: web1
$ curl -I 159.100.241.202
HTTP/1.1 200 OK
Server: nginx/1.14.0 (Ubuntu)
Date: Tue, 02 Apr 2019 15:44:33 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Link: <http://159.100.241.220/index.php?rest_route=/>; rel="https://api.w.org/"
X-Server: web2
Conclusion
It is worth mentioning that this solution, although battle-tested for years, is not a silver bullet: there is slight a delay before the managed Elastic IP healthcheck detects an unhealthy instance and removes it from the distribution, resulting in requests sent towards it and potentially errors returned to users. However, this is still better than minutes or even hours of partially degraded service in the case of a manual intervention.
Need more capacity to handle increasing incoming traffic? Just create more instances and associate them to your managed Elastic IP the same way we did with web1
and web2
, and you can virtually scale to infinity and beyond.