An OpenStack Crime Story solved by tcpdump, sysdig, and iostat - Episode 1 - ...

:

This is the story of how the tiniest things can sometimes be the biggest culprits. Because first and foremost, this is a detective story. So come and follow me on a little crime scene investigation that uncovered an incredibly counterintuitive and almost criminal default setting in Ubuntu 14.04 crippling the virtual network performance and especially network latency.

And just in case you don’t like detective stories – and this goes out to all you nerds out there –  I promise it will be worth your while, because this is also an introduction to keepalived, tcpdump, and sysdig. So sit back and enjoy!

The problems came out of the blue. We are running an OpenStack installation for the next generation CenterDevice cloud.

CenterDevice is a cloud-based document solution and our clients rely on its 24/7 availability. In order to achieve this availability, all our components are multiply redundant. For example, our load balancers are two separate virtual machines running HAProxy. Both instances manage a highly available IP address via keepalived.

This system worked very well. Until both load balancers became the victims of an evil crime. Both virtual machines started flapping their highly available virtual IP address and we were receiving alerting e-mails by the dozens, but there was no obvious change to the system that could have explained this behavior.

And this is how the story begins.

When I was called to investigate this weird behavior and answer the question what was happening to the poor little load balancers, I started by taking a good close look at the crime scene: keepalived. The following listing shows our master configuration of keepalived on virtual host loadbalancer011.

% sudo cat /etc/keepalived/keepalived.conf
global_defs {
        notification_email {
                some-e-mail-address@veryimportant.people
        }
        notification_email_from loadbalancer01@protecting.the.innocent.local
        smtp_server 10.10.9.8
        smtp_connect_timeout 30
        router_id loadbalancer01
}
 
vrrp_script chk_haproxy {
        script "killall -0 haproxy"
        interval 2
        weight 2
}
 
vrrp_instance loadbalancer {
        state MASTER
        interface eth0
        virtual_router_id 205
        priority 101
        smtp_alert
        advert_int 1
        authentication {
                auth_type PASS
                auth_pass totally_secret
        }
        virtual_ipaddress {
                192.168.205.7
        }
        track_script {
                chk_haproxy
        }
}

% sudo cat /etc/keepalived/keepalived.conf global_defs {         notification_email {                 some-e-mail-address@veryimportant.people         }         notification_email_from loadbalancer01@protecting.the.innocent.local         smtp_server 10.10.9.8         smtp_connect_timeout 30         router_id loadbalancer01 }vrrp_script chk_haproxy {         script "killall -0 haproxy"         interval 2         weight 2 }vrrp_instance loadbalancer {         state MASTER         interface eth0         virtual_router_id 205         priority 101         smtp_alert         advert_int 1         authentication {                 auth_type PASS                 auth_pass totally_secret         }         virtual_ipaddress {                 192.168.205.7         }         track_script {                 chk_haproxy         } }

The same configuration on our second load balancer loadbalancer02 looks exactly the same with notification email and router id changed accordingly as well as a lower priority. This all looked fine to me and it was immediately clear that keepalived.conf was not the one to blame. I needed to figure out a different reason why the two keepalived were continuously flapping the virtual IP address.

Now, it is important to understand how VRRP, the protocol keepalived uses to check the availability of its partners, work. All partners continuously exchange keep alive packets via the multicast address  vrrp.mcast.net which resolves to 224.0.0.18. These packets use IP protocol number 112. Only if the backup does not receive these keep alive packets from the current master, it assumes the partner is dead, murdered, or otherwise gone AWOL and takes over the virtual IP address, now acting as the new master. If the old master decides to check back in, the virtual IP address is exchanged again.

Since we were observing this exchange back and forth, I suspected the virtual network to unlawfully disregard its responsibility. I immediately rushed to the terminal and started to monitor the traffic. This might seem easy, but it is far more complex than you think. The figure below, taken from openstack.redhat.com, shows on overview of how Neutron creates virtual networks.

OpenStack Neutron Architecture

OpenStack Neutron Architecture [https://openstack.redhat.com/Networking_in_too_much_detail]

Since both load balancers were running on different nodes, the VRRP packets traveled from A to J and back to Q. So where to look for the missing packets?

I decided to tail the packets at A, E, and J. Maybe that was where the culprit was hiding. So I started tcpdump on loadbalancer01, the compute node node01, and the network controller control01 looking for missing packets. And there were packets, I tell you. Lots of packets. But I could not see missing packets until I stopped tcpdump:

$ tcpdump -l host vrrp.mcast.net
...
45389 packets captured
699 packets received by filter
127 packets dropped by kernel

$ tcpdump -l host vrrp.mcast.net ... 45389 packets captured 699 packets received by filter 127 packets dropped by kernel

Dropped packets by the kernel? Are these our victims? Unfortunately not.

tcpdump uses a little buffer in the kernel to store captured packets. If too many new packets arrive before the user process tcpdump can decode them, the kernel drops them to make room for freshly arriving packets.

Not very helpful when you are trying to find where packets get, well, dropped. But there is the handy parameter -B for tcpdump that increases this buffer. I tried again:

$ tcpdump -l -B 10000 host vrrp.mcast.net

$ tcpdump -l -B 10000 host vrrp.mcast.net

Much better, no more dropped packets by the kernel. And now, I saw something. While the VRRP packets were dumped on my screen, I noticed that sometimes there was a lag of more than one second between VRRP keep alives. This struck me as odd. This should not happen as the interval for VRRP packets has been set to one second in the configurations. I felt that I was on to something, but just needed to dig deeper.

I let tcpdump show me the time differences between succeeding packets and detect differences bigger than one second.

$ tcpdump -i any -l -B 10000 -ttt host vrrp.mcast.net  | grep -v '^00:00:01'

$ tcpdump -i any -l -B 10000 -ttt host vrrp.mcast.net  | grep -v '^00:00:01'

Screen Shot 2014-09-04 at 14.37.53

Oh my god. Look at the screenshot above2. There is a delay of more than 3,5 seconds. Of course loadbalancer02 assumes his partner went MIA.

But wait, the delay already starts on loadbalancer01? I have been fooled! It is not the virtual network. It had to be the master keepalived host! But why? Why should the virtual machine hold packets back? There must be some evil hiding in this machine and I will find and face it…

Stay tuned for our all new next episode of OpenStack Crime Investigation tomorrow on the codecentric Blog.

Footnotes

1. All personal details like IP, E-Mail addresses etc. have been changed to protect the innocent.
2. Yes, you see correctly. I am running screen inside tmux . This allows me to sustain the screen tabs even during logouts.