Recently I read some articles referring to my previous article on creating Proxmox Cluster using VPN. There is no doubt that there will be a synchronization issue on the Proxmox Cluster when the latency of each node is high. However, the definition of high latency is questionable. Therefore, to understand how latency will affect the stability of corosync, I have experimented by means of connecting a server using a 4G Network in the same city to my Proxmox Cluster. After the experiment, my recommendations are listed below. Always back up your containers/VMs before performing any insane actions. XD
Recommendation 1: Make sure the physical distance of your devices is as close as possible. Assuming your servers transfer the data using fibre optics, you can estimate the latency caused by dividing the physical distance by the speed of light. The longer the distance between two server locations, the higher the latency. Simple Math. You may use one of your servers to ping other servers one by one. You will likely have a synchronization issue if the ping is higher than 100ms. In my opinion, less than 10ms is feasible and relatively safe to form a Proxmox Cluster with a few servers. Currently, I have a Proxmox Cluster with 5 nodes in the same city, including the server using a 4G Network. If you want to know about the limit of how many servers can be connected to the Proxmox Cluster inside a 4G Network, you may sponsor me to buy more servers. XD
Recommendation 2: Never use up nearly 100% of your Internet bandwidth. My 4G Network upload speed is capped at around 3MB/s. The 4G Proxmox node will be out of synchronization with other nodes when it tries to use up all of the bandwidth to back up VMs to my SMB/CIFS pool. Cloudflare has plenty number of servers around the world aiming to provide extremely low latency on their DNS server. Imagine pinging the Cloudflare DNS server will take you more than 10s for a reply. That’s a nightmare. Use up less than 100% of your Internet bandwidth. Otherwise, even a DNS request may cause trouble for you.
Recommendation 3: Use two VPNs for your servers. Reserved a whole VPN channel for Proxmox Cluster. To minimize the corosync latency between each server, you will be better to reserve the whole channel of a VPN for corosync (your Proxmox Cluster) and use another VPN for transferring extensive data, such as backing up your VMs. You may use Headscale/Tailscale to form a Proxmox Cluster and use Zerotier or another VPN to transfer data. However, splitting the VPN channel means you may need to add your storage pool to the Proxmox Cluster again using another IP.
Recommendation 4: If you are using Tailscale, try randomizing its port. By default, Tailscale will use port 41641 for WireGuard. If another process occupies this port of your public IP, you may not be able to connect to other servers directly. Instead, they will connect to the closest Tailscale relay servers. Therefore, the latency between your server may increase significantly if there is no relay server in your country. Randomizing the port used may resolve the issue. By the way, there are no means to reject connecting to Tailscale relay servers when there is no direct connection between your servers. You may check the connection between your servers by
tailscale status
Someone said changing the token for corosync may also reduce the effect of high latency, but I have never tried to change the token. Leave a comment below to share your ideas. Feel free to correct me if I misunderstood something.