Table of Contents
How to maintain Session Persistence(Sticky Session) in Docker Swarm with multiple containers
Introduction
Session state can be maintained either using
- Session Replication
- Session Stickiness
or a combination of both.
Maintaining a user session is relatively easy if you are using a typical monolithic architecture where your application is installed on a couple of servers and you can change the configuration in servers to facilitate session replication using some cache mechanism or session stickiness using a load balancer/reverse proxy.
However, In the case of Microservices, where the scale can be as large from 10 to 10000’s instances the session replication might slow up things as each and every service need to look up at the centralised cache to get session information.
The other approach Session Stickiness where each following request should keep going to the same server( Docker container) and hence preserving the session will be looked at in this article.
Why session persistence is hard to maintain with containers
Load balancer typically works on Layer 7 OSI model, the application layer (HTTP protocol at this layer) and then distributes the data across multiple machines, but Docker ingress routing mesh works at level 4 in OSI layer.
Someone in StackOverflow has summarized the solution for above problem as- To implement sticky sessions, you would need to implement a reverse proxy inside of docker that supports sticky sessions and communicates directly to the containers by their container id (rather than doing a DNS lookup on the service name which would again go to the round robin load balancer). Implementing that load balancer would also require you to implement your own service discovery tool so that it knows which containers are available.
Possible options explored
Take -1
So I tried implementing the reverse proxy with Nginx and it worked with multiple containers on a single machine but when deployed on Docker Swarm it doesn’t work probably because I was using the service discovery by name and as suggested above, I should use containerId to communicate and not container names.
Take -2
Read about the Jwilder Nginx proxy which works for everyone and it worked on my local but when deployed on Swarm it won’t generate anything any container IP’s inside the upstream{server}
Take -3
Desperate enough by this time I was going through all possible solutions people have to offer about on the internet (stack overflow, Docker community forums..) and one gentleman has mentioned something about Traefik. Eyes glittered when I read that it works on SWARM and here I go.
Sticky Session with Traefik in Docker Swarm with multiple containers
Even though I was very comfortable with Nginx and assumed that learning Traefik will again be an overhead. It wasn’t the case Traefik is simple to learn and easy to understand and good thing is that you need not fiddle with any of the conf files.
I have tested the configuration with Docker compose version 3 which is the latest and deployed using Docker stack deploy
To start off you need to create a docker-compose.yml (version 3) and add the load balancer Traefik Image. This is how it looks like
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | loadbalancer: image: traefik command: --docker \ --docker.swarmmode \ --docker.watch \ --web \ --loglevel=DEBUG ports: - 80:80 - 9090:8080 volumes: - /var/run/docker.sock:/var/run/docker.sock deploy: restart_policy: condition: any mode: replicated replicas: 1 update_config: delay: 2s placement: constraints: [node.role == manager] networks: - net |
Few things to note here
- Traefik listens to Docker daemon on manager node and keeps aware of new worker nodes, so there is no need to restart if you scale your services.
volumes: – /var/run/docker.sock:/var/run/docker.sock - Traefik provides a dashboard to check the worker nodes health so port 9090 can be kept inside a firewall for monitoring purpose.
Also, note that placement: constraints: [node.role == manager] specifies that traefik run only on manager node.
Adding the Image for sticky session
To add a Docker Image which will hold session stickyness we need to add something like this
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | whoami: image: tutum/hello-world networks: - net ports: - "80" deploy: restart_policy: condition: any mode: replicated replicas: 5 placement: constraints: [node.role == worker] update_config: delay: 2s labels: - "traefik.docker.network=test_net" - "traefik.port=80" - "traefik.frontend.rule=PathPrefix:/hello;" - "traefik.backend.loadbalancer.sticky=true" |
This is a hello world image which displays the container name its running on. We are defining in this file to have 5 replicas of this container. The important section where traefik does the magic is in “labels”
- - "traefik.docker.network=test_net" Tells on which network this image will run on. Please note that the network name is test_net, where test is the stack name. In the load balancer service we just gave net as name.
- - "traefik.port=80" This Helloworld is running on docker port 80 so lets map the traefik port to 80
- - "traefik.frontend.rule=PathPrefix:/hello"All URLs starting with {domainname}/hello/ will be redirected to this container/application
- - "traefik.backend.loadbalancer.sticky=true" The magic happens here, where we are telling to make sessions sticky.
The Complete Picture
Try to use the below file as it is and see if it works, if it does then fiddle with it and make your changes accordingly.
You will need to create a file called docker-compose.yml on your Docker manager node and run this command
docker stack deploy -c docker-compose.yml test wher the “test” is the namespace.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | version: "3" services: whoami: image: tutum/hello-world networks: - net ports: - "80" deploy: restart_policy: condition: any mode: replicated replicas: 5 placement: constraints: [node.role == worker] update_config: delay: 2s labels: - "traefik.docker.network=test_net" - "traefik.port=80" - "traefik.frontend.rule=PathPrefix:/hello;" - "traefik.backend.loadbalancer.sticky=true" loadbalancer: image: traefik command: --docker \ --docker.swarmmode \ --docker.watch \ --web \ --loglevel=DEBUG ports: - 80:80 - 9090:8080 volumes: - /var/run/docker.sock:/var/run/docker.sock deploy: restart_policy: condition: any mode: replicated replicas: 1 update_config: delay: 2s placement: constraints: [node.role == manager] networks: - net networks: net: |
Now you can test this service by http://{Your-Domain-name}/hello and http://{Your-Domain-name}:9090 should show us a Traefik dashboard.
Though there are 5 replicas of above “whoami” service, it should always display the same container ID. If it does congratulations your session peristence is working.
This is how the dashboard of Traefik looks like
Testing session stickness in local machine
In case you don’t have a swarm node and just want to test it on your localhost machine. You can use the following docker-compose file. To run successfully create a directory called test( required for namespace, as we have given our network name as test_net - "traefik.docker.network=test_net", change the directory name if you have different network) and run
docker-compose up -d
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | version: "3" services: whoami: image: tutum/hello-world networks: - net ports: - "80" labels: - "traefik.backend.loadbalancer.sticky=true" - "traefik.docker.network=test_net" - "traefik.port=80" - "traefik.frontend.rule=PathPrefix:/hello" loadbalancer: image: traefik command: --docker \ --docker.watch \ --web \ --loglevel=DEBUG ports: - 80:80 - 25581:8080 volumes: - /var/run/docker.sock:/var/run/docker.sock networks: - net networks: net: |
Docker-Compose should create the required services and whoami service should be available on http://localhost/hello.
Scale say this service to 5 docker-compose scale whoami=5 and test
Follow this Video to see things in action
Hi! Great article! I’ve tried following your steps, but it doesn’t seem to work for me.
I’m trying to run 3 replicas of GOGS (git repo) and maintain sessions
I used the following:
services:
gogs:
image: registry:5000/gogs:latest
deploy:
replicas: 3
restart_policy:
condition: any
labels:
– “traefik.docker.network=gogs_net”
– “traefik.port=3000”
– “traefik.frontend.rule=PathPrefix:/git;”
– “traefik.backend.loadbalancer.sticky=true”
volumes:
– /nfs/apps/gogs:/gogs:rw
– /nfs/apps/gogs/data/repos:/repos:rw
– /nfs/apps/gogs/data/db:/db:rw
– /nfs/apps/gogs/custom/conf:/data/gogs/conf:rw
ports:
– “10022:22”
– “3000:3000”
networks:
– net
loadbalancer:
image: registry:5000/traefik
command: –docker \
–docker.swarmmode \
–docker.watch \
–web \
–loglevel=DEBUG
ports:
– “80:3000”
– “9090:8080”
volumes:
– /var/run/docker.sock:/var/run/docker.sock
deploy:
restart_policy:
condition: any
mode: replicated
replicas: 1
update_config:
delay: 2s
placement:
constraints: [node.role == manager]
networks:
– net
networks:
net:
But no luck…
If I only run a stack with GOGS using the following docker-compose file, i’m able to access the GIT Repo:
################ START docker-compose.yml ################
version: “3”
services:
gogs:
image: registry:5000/gogs:latest # This specifies the image on our private registry.
deploy:
replicas: 3 # This generates three replicas.
volumes: # We map various folders from the NFS share on the HOST to the CONTAINER.
– /nfs/apps/gogs:/gogs:rw
– /nfs/apps/gogs/data/repos:/repos:rw
– /nfs/apps/gogs/data/db:/db:rw
– /nfs/apps/gogs/custom/conf:/data/gogs/conf:rw
ports: # We map HOST ports with CONTAINER ports.
– “10022:22”
– “80:3000”
################ END docker-compose.yml ################
Any idea/suggestion?
Thanks in advance!
Can you please check couple of things
docker stack deploy -c docker-compose.yml gogs. Make sure the namespace is gogs.
networks:
– net
(REMOVE ONLY FOR GOGS and NOT FOR LOADBALANCER)
Network is already defined here – “traefik.docker.network=gogs_net”
Hello Abhi,
Thank you very much for the response!
Responding to your qestions:
– I do use the namespace “gogs” when deploying the stack.
– I’ve removed the “network” lines from the gogs service:
I had it in there because in your examples the “hello-world” services does also include it…
######## START docker-compose.yml ########
version: “3”
services:
gogs:
image: gogs/gogs
deploy:
replicas: 3
restart_policy:
condition: any
labels:
– “traefik.docker.network=gogs_net”
– “traefik.port=3000”
– “traefik.frontend.rule=PathPrefix:/git;”
– “traefik.backend.loadbalancer.sticky=true”
volumes:
– /nfs/apps/gogs:/gogs:rw
– /nfs/apps/gogs/data/repos:/repos:rw
– /nfs/apps/gogs/data/db:/db:rw
– /nfs/apps/gogs/custom/conf:/data/gogs/conf:rw
ports:
– “10022:22”
– “3000:3000”
loadbalancer:
image: traefik
command: –docker \
–docker.swarmmode \
–docker.watch \
–web \
–loglevel=DEBUG
ports:
– “80:3000”
– “9090:8080”
volumes:
– /var/run/docker.sock:/var/run/docker.sock
deploy:
restart_policy:
condition: any
mode: replicated
replicas: 1
update_config:
delay: 2s
placement:
constraints: [node.role == manager]
networks:
– net
networks:
net:
######## END docker-compose.yml ########
Doing this, I’m able to get to the Traefik dashboard, which I wasn’t before. But trying to access gogs through “http://host:80” or “http://host:3000” still doesn’t work… Looks like I’m getting a “connection refused error”.
– Running the docker-compose file provided as example I can get to the dashboard but it just half loads and doesn’t show the lower part (only header bar). And going to “http://host/” or “http://host/hello” I only get an 404 Error.
Kind Regards,
Kevin
Hi Kevin,
Apologies, I overlooked the -networks tag yes it should be there so put it back.
Lets try to fix the example compose in your machine and then your gogs service .
Is your tutum/hello-world image getting downloaded ? Can you see any errors while running the stack?
Can you check following things for me
1> docker service ls ( Whats the output for this ?)
2> docker service ps test_whoami ( Output for this, you need to run “docker stack deply -c docker-compose.yml test”)
3> docker node ls ( Just to check if you are on swarm and manager node setup)
4> Traefik Dashboard, does it show any backends ?
4>
Hello Abhi,
I’m using a private registry and I think the image was not being downloaded, when deploying the stack. The containers were pending. I manually pulled the image from the private registry on each node and then they changed into “running” state. I also removed the following line from the “hello-world” service:
placement:
constraints: [node.role == worker]
Because at this moment all my nodes are managers.
This way now I got the following “docker-compose.yml” file which works. Traefik is accessible and shows frontend and backend, the whoami containers are running fine, and I can navigate to “http://host/hello” which displays the container ID and refreshing does show the same ID:
version: “3”
services:
whoami:
image: registry:5000/hello-world
networks:
– net
ports:
– “80”
deploy:
restart_policy:
condition: any
mode: replicated
replicas: 3
update_config:
delay: 2s
labels:
– “traefik.docker.network=test_net”
– “traefik.port=80”
– “traefik.frontend.rule=PathPrefix:/hello;”
– “traefik.backend.loadbalancer.sticky=true”
loadbalancer:
image: traefik
command: –docker \
–docker.swarmmode \
–docker.watch \
–web \
–loglevel=DEBUG
ports:
– 80:80
– 9090:8080
volumes:
– /var/run/docker.sock:/var/run/docker.sock
deploy:
restart_policy:
condition: any
mode: replicated
replicas: 1
update_config:
delay: 2s
placement:
constraints: [node.role == manager]
networks:
– net
networks:
net:
– running docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
6cp3z55e9ws5 test_whoami replicated 3/3 registry:5000/hello-world:latest *:0->80/tcp
e2vd98781itt test_loadbalancer replicated 1/1 traefik:latest *:80->80/tcp,*:9090->8080/tcp
xrdivswie93l visualizer_visualizer replicated 1/1 registry:5000/visualizer:latest *:8080->8080/tcp
– running docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
j4m25tolkaamj2jr263qoqu56 * e02dkr01 Ready Active Reachable
v8dswzlyj4f499k12rkrivbcv mars Ready Active Reachable
vtsk3j9jbmg7tdqnhoqnois89 e02dkr02 Ready Active Leader
So the Whoami example seems to work….
Thats great !! Hopefully the other gogs should also work now. I would suggest keep the whoami service and add the gogs service in the same file, just for testing. So if session persistence works for whoami, it will work for any other image.
You can read through this blog for using private repos’s in docker swarm.
http://littlebigextra.com/installing-docker-images-private-repositories-docker-swarm/
P.D, as per the private registry, I’m using an internal self-hosted private registry, instead of a private registry on docker hub 🙂
Hi Abhi,
So the below is the docker-compose file I currently have.
– Traefik is accessible and shows backend and frontends.
– Hello world is accessible, but every refresh the container ID changes… It was “sticky” before, why not now if nothing has changed on this service?
– GOGS is accessible on “http://host1/”, “http://host2/” and “http://host3” (where each host is a node of the swarm). If I use “http://host1/” and login, I can refresh as many times as I want, it won’t log me out, and everything works fine. If I then go to “http://host2/” I have to re-login, same with host3. Once logged in, I’m not logged out and can switch between hosts (I suppose it stores the cookies for each of them). If I remove or add replicas, I have to re-login doesn’t matter which host I use…
So basically right now it doesn’t seem to be sticky for some reason, but GOGS seems to work fine somehow…
version: “3”
services:
whoami:
image: registry:5000/hello-world
networks:
– net
ports:
– “80”
deploy:
restart_policy:
condition: any
mode: replicated
replicas: 3
update_config:
delay: 2s
labels:
– “traefik.docker.network=test_net”
– “traefik.port=80”
– “traefik.frontend.rule=PathPrefix:/hello;”
– “traefik.backend.loadbalancer.sticky=true”
gogs:
image: registry:5000/gogs
deploy:
replicas: 1
restart_policy:
condition: any
labels:
– “traefik.docker.network=test_net”
– “traefik.port=3000”
– “traefik.frontend.rule=PathPrefix:/;”
– “traefik.backend.loadbalancer.sticky=true”
volumes:
– /nfs/apps/gogs:/gogs:rw
– /nfs/apps/gogs/data/repos:/repos:rw
– /nfs/apps/gogs/data/db:/db:rw
– /nfs/apps/gogs/custom/conf:/data/gogs/conf:rw
networks:
– net
ports:
– “10022:22”
– “3000:3000”
loadbalancer:
image: registry:5000/traefik
command: –docker \
–docker.swarmmode \
–docker.watch \
–web \
–loglevel=DEBUG
ports:
– 80:80
– 25581:3000
– 9090:8080
volumes:
– /var/run/docker.sock:/var/run/docker.sock
deploy:
restart_policy:
condition: any
mode: replicated
replicas: 1
update_config:
delay: 2s
placement:
constraints: [node.role == manager]
networks:
– net
networks:
net:
Even I have faced similar problem where I had 2 images which I wanted to be sticky but it was giving issues
Can you add the following rule in labels and try
– “traefik.frontend.priority=2” for whoami and
– “traefik.frontend.priority=1” for gogs
If it doesn’t fix the problem take the “whoami” service out you don’t need that anyway. I think there is some problem when you use- “traefik.frontend.rule=PathPrefix:/” and – “traefik.frontend.rule=PathPrefix:/whatever***” with sticky session.
If you remove the label “- “traefik.frontend.rule=PathPrefix:/” from gogs service.I believe your whoami service will again be sticky.
I had asked in Traefik support channel on this but no one replied, feel free to raise an issue.
Hi Abhi,
I’ve tried with the priority but nothing changed.
I’ve also tried removing the whoami service, but am not able to get gogs running and accessible through http://host/git
The app is configured to listen on port 3000, and it’s default root is http://host:3000/git
I set it up in the following way below, but am not able to access GOGs at all…
version: “3”
services:
gogs:
image: registry:5000/gogs
deploy:
replicas: 3
restart_policy:
condition: any
labels:
– “traefik.docker.network=gogs_net”
– “traefik.port=3000”
– “traefik.frontend.priority=1”
– “traefik.frontend.rule=PathPrefix:/;”
– “traefik.backend.loadbalancer.sticky=true”
volumes:
– /nfs/apps/gogs:/gogs:rw
– /nfs/apps/gogs/data/repos:/repos:rw
– /nfs/apps/gogs/data/db:/db:rw
– /nfs/apps/gogs/custom/conf:/data/gogs/conf:rw
networks:
– net
ports:
– “10022:22”
– “3000”
loadbalancer:
image: registry:5000/traefik
command: –docker \
–docker.swarmmode \
–docker.watch \
–web \
–loglevel=DEBUG
ports:
– “80:3000”
– “9090:8080”
volumes:
– /var/run/docker.sock:/var/run/docker.sock
deploy:
restart_policy:
condition: any
mode: replicated
replicas: 1
update_config:
delay: 2s
placement:
constraints: [node.role == manager]
networks:
– net
networks:
net:
Any idea on what could I be missing?
Is there any error you are getting when you do
docker service ps
What about exposing port 3000 of the gogs on the dockerhost machine too like 3000:3000, that way you can test by http://host:3000/app and see if the application is fine.
Hello Abhi,
No, there is no error whatsoever when running docker service ps. All services are running without issues supposedly.
I’ve been doing some testing, and the bellow docker-compose file is what I have.
– As you can see, I’ve removed – “traefik.backend.loadbalancer.sticky=true” from the whoami service. If I leave the this label for both services, I can login to GOGS but every refresh logs me out and I have to re-login. Without this label in the whoami service, going to http://host1/hello shows a different container ID every time (as expected), but at least going to http://host1/ shows me GOGS, and I can login without issues and refresh. If I go to http://host2/ or http://host3/ I have to re-login once, but it seems that for each hostname used, the sessions are preserved.
– I don’t know why, but if I remove/comment out the whoami service from the docker-compose file, and deploy it, GOGS is not accessible. I don’t understand where they relate. If I remove one, the other should work fine, no?
################# DOCKER-COMPOSE START ####################
version: “3”
services:
whoami:
image: registry:5000/hello-world
networks:
– net
ports:
– “80”
deploy:
restart_policy:
condition: any
mode: replicated
replicas: 3
update_config:
delay: 2s
labels:
– “traefik.docker.network=gogs_net”
– “traefik.port=80”
– “traefik.frontend.priority=2”
– “traefik.frontend.rule=PathPrefix:/hello;”
gogs:
image: registry:5000/gogs
deploy:
replicas: 3
restart_policy:
condition: any
labels:
– “traefik.docker.network=gogs_net”
– “traefik.port=3000”
– “traefik.frontend.priority=1”
– “traefik.frontend.rule=PathPrefix:/;”
– “traefik.backend.loadbalancer.sticky=true”
volumes:
– /nfs/apps/gogs:/gogs:rw
– /nfs/apps/gogs/data/repos:/repos:rw
– /nfs/apps/gogs/data/db:/db:rw
– /nfs/apps/gogs/custom/conf:/data/gogs/conf:rw
networks:
– net
ports:
– “10022:22”
– “3000”
loadbalancer:
image: registry:5000/traefik
command: –docker \
–docker.swarmmode \
–docker.watch \
–web \
–loglevel=DEBUG
ports:
– 80:80
– 3000:3000
– 9090:8080
volumes:
– /var/run/docker.sock:/var/run/docker.sock
deploy:
restart_policy:
condition: any
mode: replicated
replicas: 1
update_config:
delay: 2s
networks:
– net
networks:
net:
Hi,
Glad that your GOGs service is working fine with session persistence. The reason it won’t work with WHOAMI service is probably that trafeik uses a cookie called “traefik_backend” to store the container IP. I think in the case of 2 services using “sticky=true” it overrides the proxy value with the new container.
May be trafeik team needs to consider this scenario and may be writing multiple cookies if sticky=true for multiple services.
Yeah, maybe.
But why does GOGS stop working if I remove the whoami service? That doesn’t make any sense for me…
True doesnt make any sense, did you tried removing priority from labels from gogs and taking down whoami service.
That worked! Removing the priority labels and the whoami service, GOGS works fine!
So seems like there is some issue with Traefik and multiple services + priority labels…
Hi Abhi, I am trying to implement load balancing while maintain session stickyness for docker containers deployed in a swarm. I am unable to reach specific containers when I search for http://myip/containername : for my case it is 10.244.102.243/TestManager
Heres my docker-compose file :
version: ‘3’
services:
test_manager:
image: 10.244.102.10:5000/testmanager
networks:
– net
deploy:
mode: replicated
replicas: 3
restart_policy:
condition: on-failure
delay: 5s
max_attempts: 3
window: 120s
labels:
– “traefik.docker.network=autoframework_net”
– “traefik.port=80”
– “traefik.frontend.rule=Host:10.244.102.243; PathPrefix:/TestManager”
– “traefik.backend.loadbalancer.sticky=true”
ports:
– “8080”
loadbalancer:
image: traefik
command: –docker \
–docker.swarmmode \
–docker.watch \
–web \
–loglevel=DEBUG
ports:
– 80:80
– 9090:8080
volumes:
– /var/run/docker.sock:/var/run/docker.sock
deploy:
restart_policy:
condition: any
mode: replicated
replicas: 1
update_config:
delay: 2s
placement:
constraints: [node.role == manager]
networks:
– net
networks:
net:
If you do this:
– /var/run/docker.sock:/var/run/docker.sock
you are pretty much granting ROOT access to the underlying HOST machine to any user on your container. In general, this (hopefully obviously) is a really bad idea.
Agree but this was one of the option at hand nothing else worked!
Parameter “traefik.backend.loadbalancer.sticky” has been deprecated, it should be “traefik.backend.loadbalancer.stickiness”
Thanks for updating that !!
Hi Abhi,
I am trying to implement loadbalaner. Your example worked for me.
But I am facing issues after adding my service to the compose file. My application runs on HTTPS and custom port. Could you please guide me in setting up the configurations for allowing traefik to process HTTPS requests.
Thanks,
Shyam