Table of Contents
- Design considerations for a microservice architecture with Docker Swarm
Design considerations for a microservice architecture with Docker Swarm
Recently we went live with our application which is based on microservices architecture and hosted on Docker Swarm. Here are some of the key learnings and design consideration with Docker swarm microservices architecture which needs to be taken into account while architecting a docker swarm infrastructure
- Loosely Coupled Microservices
- Manager nodes availability and their location
- Stateless vs stateful
- Machines configurations
- Number of manager/workers
- Restricting the container’s memory
- Avoiding downtime
Loosely Coupled Microservices
This is one of the most important design principles since ages and promotes resilience and better implementation. When you have a front-end application which talks to various backends or does some compute-intensive tasks it’s better to segregate the business logic into separate microservice. Front end should just act as a view layer and all the business logic should be in the business layer.
Deciding when to create a new microservice is important and relies on the functionality and business purpose served.This will also determine that how many microservice you will end up with.
For e.g, Let’s Imagine a web application where the customer comes and checks if he is eligible for currency conversion. The application can be broken into 3 microservices like
- View – layer which holds the application view layer
- Currency conversion
- Eligibility check
The advantage of separation are as follows
- Say if 90% of the customer just uses the application for checking eligibility then you can scale that service alone on multiple machines based on usage and keep currency conversion instances low.
- If currency conversion is down for some reason customer can still check their eligibility and other bits.
Stateless vs Stateful
The docker containers by design are supposed to be stateless. However many times the application may need to be stateful for e.g login functionality, where the application needs to be aware of which user is logged on. By default, the docker swarm uses the round-robin algorithm for traffic routing which means incoming request will be sent to different Docker container each time and thus losing the session information.
Session persistence might come as a feature in docker swarm in future but not available as of now.We had to implement Traefik as a load balancer for maintaining the sticky session,
Manager nodes availability and their location
The managers in Docker Swarm needs to define a Quorum of managers which in simple terms means that the number of available manager nodes should be always greater or equal to (n+1)/2, where n is the number of manager nodes.
So if you have 3 manager nodes 2 should be always up and if you have 5 manager nodes 3 should be up. If the swarm loses the quorum of managers, the swarm cannot perform management tasks. Which means you can not add new nodes, run swarm commands until Quorum is not maintained again.
Also, another important attribute is the location of manager nodes, it is advisable that manager nodes are in the different geographic region so any outages in a particular region won’t affect the quorum of managers. For example, if you have 3 managers node then you can choose Asia, Europe and America as their geographic locations from any cloud provider.
On the positive side, even if the Quorom is lost, say due to 2 out of 3 managers being down. The docker containers/services will still keep working and serving traffic. Once the machines are available the quorum will be maintained automatically.
So the rosy picture which has been painted by containerization is that it is easy to scale using cheap machines Now the problem with cheap machines is that they often have a poor configuration.
If the machine has only 1 CPU only and the microservice happens to be CPU intensive. Running multiple containers on that machine might even make things worse as the containers would be fighting for CPU allocation.
Similarly, if the microservices are memory intensive make sure the RAM is appropriate.
Autoscaling is not available with docker swarm as of now with version 17.06 and to add new machines to swarm you will have to use docker swarm join token to add more managers and workers. Also adding new nodes doesn’t mean that the swarm will be auto rebalanced by itself, for e.g if you have 3 machines with each running 2 containers and then you decide to add 3 more machines so only 1 container should run on each machine. Unless and until you do a docker stack deploy swarm won’t be auto-balanced, another trick which works well and I tend to use docker service scale to scale up and bring service down, that way swarm rebalances itself.
Eventually, at some point, the services will fail or will have defects and you will need logs to debug things out. Having multiple services would mean that multiple log files and even if use docker service logs it may not be helpful if the service has multiple containers running.
The best way to log in a multiservice environment is to use a log aggregator like fluentd so logs are written at one place irrespective of scattered all over. Fluentd works well with Elasticsearch and Kibana where you can basically search through logs, filter and query. More can be found here.
- How to Collect logs from multiple containers and write to a single file
- Configuring Kibana and ElasticSearch for Log Analysis with Fluentd on Docker Swarm
To avoid downtime there are a couple of things which can be done, first is to have multiple instances of the container. So any service should have at least 2 instances of containers running. Also, make effective use of update_config tattribute in docker compose where you can specify the delay between 2 restarts. For e.g below snippet of docker-compose will create 3 replicas of containers and if you choose to update your service ever each container will restart after a gap of 90 secs.
Optimizing the Container limits
To make sure that one docker container/microservice doesn’t end up fighting up with other containers for resources like CPU, RAM and I/O. The containers can be limited to how much RAM can be allocated, how much CPU can be used by them for e.g the below lines in docker compose will limit the container to use only 2GB RAM een if the machine has 8GB or 16GB RAM.
Creation of Docker Swarm and Automated Deployments
Docker cloud seems to be capable of creating a new SWARM on Azure/AWS and also can potentially implement a continuous integration pipeline, but on the downside, it creates too many resources on Azure at least.
We found that it was simple and easy enough to create a SWARM within a matter of minutes once we have the docker installed. docker swarm join token can be used easily to bring new machines on SWARM. Also, the automated deployment is easy enough through Jenkins.
We use fabric8.io plugin to create docker images and push them to docker hub. Jenkins then does the deployment by running commands on manager node using remote SSH plugin.
- How to Automate Docker Swarm Service deployment using Jenkins
- How To Push Docker Images To Docker Hub Repository Using Docker Maven plugin
Docker Swarm works and fits well in a microservice architecture scheme of things. Some of the features which have really caught our eyes are are
- Docker Swarm is very easy to create and can be set up in a matter of minutes. Ease of scaling up is immense, any new machine needs just a token to become a worker/manager node.
- Scaling services is very easy docker scale service <servicename>=10, will create 10 instances of Docker containers in no time.
- Its open source and community edition too works well in production, thus saving a lot of money for small enterprises.
Some of the features if added could be a good improvement
- Session persistence in Docker swarm can be added as a feature in new releases.
- Autoscaling can be added as a feature too, it would be good if swarm can add new machines from the pool and run the containers which are being used more or under stress on demand.
- Rebalancing the services when new machines are added to SWARM would be a great addition too.