Collecting and analyzing Linux kernel crashes - Kdump
Collecting and analyzing Linux kernel crashes - Kdump
Updated: July 18, 2009
Welcome to the second article in the series on high-end Linux system administration. In this article, we will discuss the powerful, robust Kdump kernel crash utility. This articles continues the first part, where an older utility LKCD was presented.
Please note that this tutorial is mainly intended for Linux system administrators or power users that want to be able to understand, troubleshoot and solve difficult problems revolving around kernel crashes, a rare if serious issue that may occur in the Linux production environment. Being able to react quickly and accurately to these crashes and find the root cause is of paramount importance. Hence this tutorial.
Once again, completely dissatisfied with the available repertoire of tutorials on the subject, I have decided to create a tutorial as it ought to be - as always, crisp-clear, step-by-step and accompanied by a plenty of screenshots. There's also a downloadable PDF version available. However, please note the web article will always be the most up to date one.
This tutorial complements LKCD as the Linux kernel crash utility and builds on the experience we have learned the last time. If you recall the first article, LKCD suffered from a number of problems that do not scale well in a modern, large environment. Kdump, which is a recent, more modern utility, copes well with these limitations and presents an efficient, scalable solution.
Kdump is the most current Kernel crash utility available and comes included in a number of business-oriented distributions, like RHEL (CentOS) and SLES. By following this article, you will gain valuable knowledge on how to work at the very tip of technological advancement.
In the following articles, we will focus on the analysis of dumped cores using a range of specialized tools, including crash utilities and gdb.
This covers the short preamble. We can start now the serious work. I truly hope you will enjoy this article. I'm convinced that you will not easily find a more thorough or detailed article on the subject anywhere in the world, so stay focused.
- Kdump installation
- Kdump packages & files
- Kdump configuration
- Configuration file
- Configure KDUMP_KERNELVER
- Configure KDUMP_COMMANDLINE
- Configure KDUMP_COMMANDLINE_APPEND
- Configure KEXEC_OPTIONS
- Configure KDUMP_RUNLEVEL
- Configure KDUMP_IMMEDIATE_REBOOT
- Configure KDUMP_TRANSFER
- Configure KDUMP_SAVEDIR
- Configure KDUMP_KEEP_OLD_DUMPS
- Configure KDUMP_FREE_DISK_SIZE
- Configure KDUMP_DUMPDEV
- Configure KDUMP_VERBOSE
- Configure KDUMP_DUMPLEVEL
- Configure KDUMP_DUMPFORMAT
- GRUB menu changes
- Set Kdump to start on boot
- Configuration file
- Test configuration
- Simulate kernel crash
- Kdump network dump functionality
Linux kernel is a rather robust entity. It is stable and fault-tolerable and usually does not suffer irrecoverable errors that crash the entire system and require a reboot to restore to normal production. Nevertheless, these kinds of problems do occur from time to time. They are known as kernel crashes and are of utmost interest and importance to administrators in change of these systems. Being able to detect the crashes, collect them and analyze them provides the system expert with a powerful tool in finding the root cause to crashes and possibly solving critical bugs.
In the first tutorial on Linux system debugging, we have learned how to setup, configure and use Linux Kernel Crash Dump (LKCD) utility. However, LKCD, being an older project, exhibited several major limitations in its functionality: LKCD was unable to save memory dumps to local RAID (md) devices and its network capability was restricted to sending memory cores to dedicated LKCD netdump servers only on the same subnet, provided the cores were under 4GB in size. Memory cores exceeding the 32-bit size barrier were corrupt upon transfer and thus unavailable for analysis. The same-subnet also proved impractical for large-scale operations with thousands of machines.
Kdump is a much more flexible tool, with extended network-aware capabilities. It aims to replace LKCD, while providing better scalability. Indeed, Kdump support network dumping to a range of devices, including local disks, but also NFS areas, CIFS shares or FTP and SSH servers. This makes if far more attractive for deployment in large environments, without restricting operations to a single server per subnet.
In this tutorial, we will learn how to setup and configure Kdump for memory core dumping to local disk and network shares. We will begin with a short overview of basic Kdump functionality and terminology. Next, we will review the kernel compilation parameters required to use Kdump. After that, we will go through the configuration file and study each directive separately, step by step. We will also edit the GRUB menu as a part of the Kdump setup. Lastly, we will demonstrate the Kdump functionality, including manually triggering kernel crashes and dumping memory cores to local and network devices.
On one hand, this article will examine the Kdump utility in great detail. On the other, a number of Kdump-related topics will be only briefly discussed. It is important that you know what to expect from this tutorial.
I will not explain the Kernel compilation in this article, although I will explain the parameters required for proper Kdump functionality. The kernel compilation is a delicate, complex process that merits separate attention; it will be presented in a dedicated tutorial.
Kdump can also run on the Itanium (ia64) and Power PC (ppc64) architectures. However, due to relative scarcity of these platforms in both the home and business use, I will focus on the i386 (and x86-64) platforms. The platform-specific configurations for Itanium and PPC machines can be found in the official Kdump documentation (see References).
Now, let us begin.
To make things easier to understand, here's a brief lexicon of important terms we will use in this document:
- Standard (production) kernel - kernel we normally work with
- Crash (capture) kernel - kernel specially used for collecting crash dumps
I will sometimes use only partial names when referring to these two kernels. In general, if I do not specifically use the words crash or capture to describe the kernel, this means we're talking about the production kernel.
Kdump has two main components Kdump and Kexec.
Kexec is a fastboot mechanism that allows booting a Linux kernel from the context of an already running kernel without going through BIOS. BIOS can be very time consuming, especially on big servers with numerous peripherals. This can save a lot of time for developers who end up booting a machine numerous times.
Kdump is a new kernel crash dumping mechanism and is very reliable. The crash dump is captured from the context of a freshly booted kernel and not from the context of the crashed kernel. Kdump uses Kexec to boot into a second kernel whenever the system crashes. This second kernel, often called a crash or a capture kernel, boots with very little memory and captures the dump image.
The first kernel reserves a section of memory that the second kernel uses to boot. Kexec enables booting the capture kernel without going through BIOS hence the contents of the first kernel's memory are preserved, which is essentially the kernel crash dump.
There are quite a few requirements that must be met in order for Kdump to work.
- The production kernel must be compiled with a certain set of parameters required for kernel crash dumping.
- The production kernel must have the kernel-kdump package installed. The kernel-kdump package contains the crash kernel that is started when the standard kernel crashes, providing an environment in which the standard kernel state during the crash can be captured. The version of the kernel-dump package has to be identical to the standard kernel.
If the operating system comes with a kernel already compiled to run and use Kdump, you will have saved quite a bit of time. If you do not have a kernel built to support the Kdump functionality, you will have to do quite a bit of work, including a lengthy compilation and configuration procedure of both the standard, production kernel and the crash (capture) kernel.
In this article, we will not into details on kernel compilation. The compilation is a generic procedure that does not directly relate to Kdump and demands dedicated attention. We will talk about kernel compilation in a separate tutorial. Here, we will take the compilation for granted and focus on the configuration.
Nevertheless, although we won't compile, we will have to go through the list of kernel parameters that have to be configured so that your system can support the Kexec/Kdump functionality and collect crash dumps. These parameters need to be configured prior to kernel compilation.
The simplest way to configure kernel parameters is to invoke a kernel configuration wizard such as menuconfig or xconfig.
The kernel configuration wizard can be text (menuconfig) or GUI driven (xconfig). In both cases, the wizard contains a list of categories, divided into subcategories, which contain different tunable parameters.
Just to give you an impression of what kernel compilation configuration looks like, for those of you who have never seen one:
What you see above is the screenshot of a typical kernel configuration menu, ran inside the terminal. The wizard uses the text interface and is invoked by typing make menuconfig. Notice the category names; we will refer to them soon.
We will now go through the list of kernel parameters that need to be defined to enable Kdump/Kexec to function properly. For the sake of simplicity, this document focuses on the x86 architecture. For some details about other platforms and exceptions, please refer to the Appendix and the official documentation.
The standard kernel can be a vanilla kernel downloaded from kernel.org or one of your favorite distributions. Whichever you choose, you will have to configure the kernel with the following parameters:
Enable Kexec system call:
This parameter tells the system to use Kexec to skip BIOS and boot (new) kernels. It is critical for the functionality of Kdump.
Enable kernel crash dumps:
Crash dumps need to be enabled. Without this option, Kdump will be useless.
Optional: Enable high memory support (for 32-bit systems):
You need to configure this parameter in order to support memory allocations beyond the 32-bit (4GB) barrier. This may not be applicable if your system has less than 4GB RAM or if you're using a 64-bit system.
Optional: Disable Symmetric Multi-Processing (SMP) support:
Kdump can only work with a single processor. If you have only a single processor or run your machine with SMP support disabled, you can safely set this parameter to (n).
On the other hand, if your kernel must use SMP for whatever reason, you will want to set this directive to (y). However, you will have to remember this during the Kdump configuration. We will have to set Kdump to use only a single CPU. It is very important that you remember this!
To recap, you can either disable SMP during the compilation - OR - enable SMP but instruct Kdump to use a single CPU. This instruction is done by changing the Kdump configuration file. It is NOT a part of the kernel compilation configuration.
The configuration file change requires that one of the options be configured in a particular manner. Specifically, the directive below needs to be set in the Kdump configuration file under /etc/sysconfig/kdump AFTER the kernel has been compiled and installed.
Enable sysfs file system support:
Modern kernel support (2.6 and above) this setting by default, but it does not hurt to check.
Enable /proc/vmcore support:
This configuration allows Kdump to save the memory dump to /proc/vmcore. We will talk more about this later. Although in your setup you may not use the /proc/vmcore as the dump device, for greatest compatibility, it is recommended you set this parameter to (y).
Configure the kernel with debug info:
This parameter means the kernel will be built with debug symbols. While this will increase the size of the kernel image, having the symbols available is very useful for in-depth analysis of kernel crashes, as it allows you to trace the problems not only to problematic function calls causing the crashes, but also the specific lines in relevant sources. We will talk about this in great detail in the set of separate tutorials covering the crash, lcrash and gdb debugging utilities.
Configure the start section for reserved RAM for the crash kernel:
This is a very important setting to pay attention to. To work properly, the crash kernel uses a piece of memory specially reserved to it. The start section for this memory allocation needs to be defined. For instance, if you intend to start the crash kernel RAM at 16MB, then the value needs to be set to the following (in hexadecimal):
Configure kdump kernel so it can be identified:
Setting this suffix allows kdump to select the right kernel for boot, since there may be several kernels under /boot on your system. In general, the rule of thumb calls for the crash kernel to be named the same as your production kernel, save for the -kdump suffix. You can check this by running the uname -r command in terminal, to see the kernel version you run and then check the files listed in the /boot directory.
Please note that the above table is neither a holy bible nor rocket science. As always, it is quite possible that my observations are limited and apply only to a very specific, private setup. Therefore, please exercise discretion when using the above table for reference, taking into consideration the fact that you may not experience the same success as myself. That said, I have thoroughly tested the setup and it works.
Now, your next step is to compile the kernel. I cannot dedicate the resource to cover the kernel compilation procedure at this point. However, if you're using Kdump as a part of your production environment - rather than household hobby - there are pretty fair chances you will have dedicated support from vendors, which should provide you with the kernel already compiled for Kdump. I apologize for this evasion, but I must forgo the kernel compilation for another time.
Modern distributions, especially those forked off enterprise solutions, are configured to use Kdump. openSUSE 11.1 is a good example; you will only have to install the missing RPMs and edit the configuration file to get it to work. We will discuss openSUSE 11.1 some more later in the article.
This kernel needs to be compiled with the same parameters as above, save one exception. Kdump does not support compressed kernel images as crash (capture) kernels. Therefore, you should not compress this image. This means that while your production kernels will most likely be named vmlinuz, the Kdump crash kernels need to be uncompressed, hence named vmlinux, or rather vmlinux-kdump.
This is the list of required packages that must be installed on the system for Kdump to work. Please note that your kernel must be compiled properly for these packages to work as expected. It is very likely that you will succeed in installing them anyhow, however this is no guarantee that they will work.
|Package name||Package info|
|Kernel-debuginfo *||Crash analysis package (optional)|
* The kernel-debuginfo package needs to match your kernel version - default, smp, etc.
The best way to obtain these packages is from your software repositories. This guarantees you will be using the most compatible version of Kdump and Kexec. For example, on Debian-based systems, you can use the apt-get install command to fetch the necessary packages:
apt-get install <package-name>
Likewise, please note that the production kernel also must have the kernel-kdump package installed. This package contains the crash kernel that is started when the standard kernel crashes, providing an environment in which the standard kernel state during the crash can be captured. The version of this package has to be identical to the production kernel.
For details about how to obtain the kernel-kdump and kexec-tools packages not via the software repositories, please refer to the Appendix.
Here's the list of the most important Kdump-related files:
|/etc/sysconfig/kdump||Kdump configuration file|
The Kdump installation also includes the GDB Kdump wrapper script (gdb-kdump), which is used to simplify the use of GDB on Kdump images. The use of GDB, as well as other crash analysis utilities requires the presence of the kernel-debuginfo package.
On SUSE systems, the Kdump installation also includes the YaST module (yast2-kdump). We will talk about this in a while.
In the last section, we went through the kernel configuration parameters that need to be set for Kexec/Kdump to work properly. Now, assuming you have a functioning kernel that boots to the login screen and has been compiled with the relevant parameters, whether by a vendor or yourself, we will see what extra steps we need to take to make Kdump actually work and collect crash dumps.
We will configure Kdump twice: once for local dump and once for network dump, similarly to what we did with LKCD. This is a very important step, because LKCD is limited to network dumping only within the specific subnet of the crash machine. Kdump offers a much greater, more flexible network functionality, including FTP, SSH, NFS and CIFS support.
The configuration file for Kdump is /etc/sysconfig/kdump. We will start with the basic, local dump functionality. Later, we will also demonstrate a crash dump over network. You should save a backup before making any changes!
This setting refers to the CONFIG_LOCALVERSION kernel configuration parameter that we reviewed earlier. We specified the suffix -kdump, which tells our system to use kernels with -kdump suffix as crash kernels. Like the short description paragraph specifies, if no value is used, the most recently installed Kdump kernel will be used. By default, crash kernels are identified by the -kdump suffix.
In general, this setting is meaningful only if non-standard suffices are used for Kdump kernels. Most users will not need touch this setting and can leave it at the default value, unless they have very specific needs that require certain kernel versions.
This settings tells Kdump the set of parameters it needs to boot the crash kernel with. In most cases, you will use the same set as your production kernel, so you won't have to change it. To see the current set, you can issue the cat command against /proc/cmdline. When no string is specified, this is the set of parameters that will be used as the default. We will use this setting when we test Kdump (or rather, Kexec) and simulate a crash kernel boot.
This is a very important directive. It is extremely crucial if you use or have to use an SMP kernel. We have seen earlier during, the configuration of kernel compilation parameters, that Kdump cannot use more than a single core for the crash kernel. Therefore, this parameter is a MUST if you're using SMP. If the kernel has been configured with SMP disabled, you can ignore this setting.
As we've mentioned earlier, Kexec is the mechanism that boots the crash kernel from the context of the production kernel. To work properly, Kexec requires a set of arguments. The basic set used is defined by the /proc/cmdline. Additional arguments can be specified using this directive. In most cases, the string can be left empty. However, if you receive strange errors when starting Kdump, it is likely that Kdump on your particular kernel version cannot parse the arguments properly. To make Kdump interpret the additional parameters literally, you may need to add the string --args-linux.
You should try both settings and see which one works for you. If you're interested, you can Google for "--args-linux" and see a range of mailing list threads and bug entries revolving around this subject. Nothing decisive, so trial is your best choice here. We'll discuss this some more later on.
This is another important directive. If defines the runlevel into which the crash kernel should boot. If you want Kdump to save crash dumps only to a local device, you can set the runlevel to 1. If you want Kdump to save dumps to a network storage area, like NFS, CIFS or FTP, you need the network functionality, which means the runlevel should be set to 3. You can also use 2, 5 and s. If you opt for runlevel 5 (not recommended), make sure the crash kernel has enough memory to boot into the graphical environment. The default 64MB is most likely insufficient.
This directive tells Kdump whether to reboot out of the crash kernel once the dump is complete. This directive is ignored if the KDUMP_DUMPDEV parameter (see below) is not empty. In other words, if a dump device is used, the crash kernel will not be rebooted until the transfer and possibly additional post-processing of the dump image to the destination directory are complete. You will most likely want to retain the default value.
This setting tells Kdump what to do with the dumped memory core. For instance, you may want to post-process it instantly.
KDUMP_TRANSFER requires the use of a non-empty KDUMP_DUMPDEV directive. Available choices are /proc/vmcore and /dev/oldmem. This is similar to what we've seen with LKCD utility. Normally, either /proc/vmcore or /dev/oldmem will point out to a non-used swap partition.
For now, we will use only the default setting, which is just to copy the saved core image to KDUMP_SAVEDIR. We will talk about the KDUMP_DUMPDEV and KDUMP_SAVEDIR directives shortly. However, we will study the more advanced transfer options only when we discuss crash analysis utilities.
This is a very important directive. It tells us where the memory core will be saved. Currently, we are talking about local dump, so for now, our destination will point to a directory on the local filesystem. Later on, we will see a network example. By default, the setting points to /var/log/dump.
We will change this to:
Please pay attention to the syntax. You can also use the absolute directory paths inside the quotation marks without prefix, but this use is discouraged. You should specify what kind of protocol is used, with file:// for local directories, nfs:// for NFS storage and so on. Furthermore, you should make sure the destination is writable and that is has sufficient space to accommodate the memory cores.
The KDUMP_SAVEDIR directive can be used in conjunction with KDUMP_DUMPDEV, which we will discuss a little later on.
This settings defines how many dumps should be kept before rotating. If you're short on space or are collecting numerous dumps, you may want to retain only a small number of dump. Alternatively, if you require a backtrace as long and thorough as possible, increase the number to accommodate your needs.
To keep an infinite number of old dumps, set the number to 0. To delete all existing dumps before writing a new one, set the number to -2. Please note the somewhat strange values, as they are counterintuitive.
|0||all (infinite number)|
The default value is 5:
This value defines the minimum free space that must remain on the target partition, where the memory core dump destination directory is located, after accounting for the memory core size. If this value cannot be met, the memory core will not be saved, to prevent possible system failure. The default value is 64MB. Please note it has nothing to do with the memory allocation in GRUB. This is an unrelated, purely disk space setting.
This is a very important directive. We have mentioned it several times before. KDUMP_DUMPDEV does not have to be used, but you should carefully consider whether you might need it. Furthermore, please remember that this directive is closely associated with several other settings, so if you do use it, the functionality of Kdump will change.
First, let's see when it might be prudent to use KDUMP_DUMPDEV: Using this directive can be useful if you might be facing filesystem corruption problems. In this case, when a crash occurs, it might not be possible to mount the root filesystem and write to the destination directory (KDUMP_SAVEDIR). Should that happen, the crash dump will fail. Using KDUMP_DUMPDEV allows you to write to a device or a partition in raw mode, without any consideration to underlying filesystem, circumventing any filesystem-related problems.
This also means that there will be no KDUMP_IMMEDIATE_REBOOT; the directive will also be ignored, allowing you to use the console to try to fix system problems manually, like check the filesystem, because no partition will be mounted and used. Kdump will examine the KDUMP_DUMPDEV directive and it's not empty, it will copy the contents from the dump device to the dump directory (KDUMP_SAVEDIR).
On the other hand, using KDUMP_DUMPDEV increases the risk of disk corruption in the recovery kernel environment. Furthermore, there will be no immediate reboot, which slows down the restoration to production. While such a solution is useful for small scale operations, it is impractical for large environments. Moreover, take into account that the dump device will always be irrecoverably overwritten when the dump is collected, destroying data present on it. Secondly, you cannot use an active swap partition as the dump device.
This is a rather simple, administrative directive. It tells how much information is output to the user, using bitmask values n a fashion similar to chmod command. By default, the Kdump progress is written to the standard output (STDOUT) and the Kdump command line is written into the syslog. If we sum the values, we get command line (1) + STDOUT (2) = 3. See below for all available values:
|1||Kdump command line written to syslog|
|2||Kdump progress written to STDOUT|
|4||Kdump command line written to STDOUT|
|8||Kdump transfer script debugged|
This directive defines the level of data provided in the memory dump. Values range from 0 to 32. Level 0 means the entire contents of the memory will be dumped, with no detail omitted. Level 32 means the smallest image. The default value is 0.
You should refer to the configuration file for exact details about what each level offers and plan accordingly, based on your available storage and analysis requirements. You are welcome to try them all. I recommend using 0, as it provides most information, even though it requires hefty space.
This setting defines the dump format. The default selection is ELF, which allows you to open the dump with gdb and process it. You can also use compressed, but you can analyze the dump only with the crash utility. We will talk about these two tools in great detail in separate tutorials. The default and recommended choice is ELF, even though the dump file is larger.
This concludes the necessary changes in the configuration file for Kdump to work.
Because of the way it works, Kdump requires a change to the kernel entry in the GRUB menu. As you already know, Kdump works by booting from the context of the crashed kernel. In order for this feature to work, the crash kernel must have a section of memory available, even when the production kernel crashes. To this end, memory must be reserved.
In the kernel configurations earlier, we declared the offset point for our memory reservation. Now, we need to declare how much RAM we want to give our crash kernel. The exact figure will depend on several factors, including the size of your RAM and possibly other restrictions. If you read various sources online, you will notice that two figures are mostly used: 64MB and 128MB. The first is the default configuration and should work. However, if it proves unreliable for whatever reason, you may want to try the second value. Test-crashing the kernel a few times should give you a good indication whether your choice is sensible or not.
Now, let us edit the GRUB configuration file. First, make sure you backup the file before any changes.
cp /boot/grub/menu.lst /boot/grub/menu.lst-backup
Open the file for editing. Locate the production kernel entry and append the following:
YM is the offset point we declared during the kernel compilation - or has been configured for us by the vendor. In our case, this is 16M. XM is the size of memory allocated to the crash kernel. Like I've mentioned earlier, the most typical configuration will be either 64M or 128MB. Therefore, the appended entry should look like:
A complete stanza inside the menu.lst file:
title Some Linux
kernel /boot/vmlinuz root=/dev/sda1 resume=/dev/sda5 ->
-> splash=silent crashkernel=64M@16M
We now need to enable Kdump on startup. This can be done using chkconfig or sysv-rc-conf utilities on RedHat- or Debian-based distros, respectively. For a more detailed tutorial about the usage of these tools, please take a look at this tutorial.
For example, using the chkconfig utility:
chkconfig kdump on
Changes to the configuration file require that the Kdump service be restarted. However, the Kdump service cannot run unless the GRUB menu change has been affected and the system rebooted. You can easily check this by trying to start the Kdump service:
If you have not allocated the memory or if you have used the wrong offset, you will get an error. Something like this:
Loading kdump failed
Memory for crashkernel is not reserved Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel Then try loading kdump kernel
If you receive this error, this means that the GRUB configuration file has not been edited properly. You will have to make the right changes, reboot the system and try again. Once this is done properly, Kdump should start without any errors. We will mention this again when we test our setup.
This concludes the configurations section. Now, let's test it.
Before we start crashing our kernel for real, we need to check that our configuration really works. This means executing a dry run with Kexec. In other words, configure Kexec to load with desired parameters and boot the crash (capture) kernel. If you successfully pass this stage, this means your system is properly configured and you can test the Kdump functionality with a real kernel crash.
Again, if your system comes with the kernel already compiled to use Kdump, you will have saved a lot of time and effort. Basically, the Kdump installation and the configuration test are completely unnecessary. You can proceed straight away to using Kdump.
First, let's quickly check that our kernel has been compiled with relevant parameters:
If everything is as expected, we can proceed on to the next step. Please note that /proc/config.gz is not available for all distributions.
Next, you need to make sure your production kernel is configured to allocate memory to the crash kernel. This means that the crashkernel=XM@YM string has to be appended to the relevant GRUB kernel entry and that you're using the correct offset, as specified in the kernel parameters. As we've seen earlier, the memory allocation requires a reboot to take effect.
Try to start the Kdump service:
If you have not allocated the memory or used the wrong offset, you will get an error. Something like this:
Loading kdump failed
Memory for crashkernel is not reserved Please reserve memory by passing "crashkernel=X@Y" parameter to the kernel Then try loading kdump kernel
The error is quite descriptive and rather self-explanatory. You will have to edit the GRUB configuration file, reboot and try again. Once you do it properly, Kdump should start without any errors.
Our first step is to load Kexec with desired parameters into the existing kernel. Usually, you will want Kdump to run with the same parameters your production kernel booted with. So, you will probably use the following configuration to test Kdump:
/usr/local/sbin/kexec -l /boot/vmlinuz-`uname -r` ->
-> --initrd=/boot/initrd-`uname -r`--command-line= ->
-> `cat /proc/cmdline`
Then, execute Kexec (it will load the above parameters):
Your crash kernel should start booting. As said before, it will skip BIOS, so you should see the boot sequence in your console immediately. If this step completes successfully without errors, you are on the right path. I would gladly share a screenshot here, but it would look just like any other boot, so it's useless.
The next step would be to load the new kernel for use on panic. Reboot and then test:
At this stage, you may encounter a possible error. Something like this:
kexec_load failed: Cannot assign requested address
entry = 0x96550 flags = 1
nr_segments = 4
segment.buf = 0x528aa0
segment.bufsz = 2044
segment.mem = 0x93000
segment.memsz = 3000
segment.buf = 0x521880
segment.bufsz = 7100
segment.mem = 0x96000
segment.memsz = 9000
segment.buf = 0x2aaaaaf1f010
segment.bufsz = 169768
segment.mem = 0x100000
segment.memsz = 16a000
segment.buf = 0x2aaaab11e010
segment.bufsz = 2f5a36
segment.mem = 0xdf918000
segment.memsz = 2f6000
If this happens, this means you have one of the two following problems:
- You have not configured the production kernel properly and Kdump will not work. You will have to go through the installation process again, which includes compiling the kernel with relevant parameters.
- The Kexec version you are using does not match the kernel-kdump package. Make sure the right packages are selected. You should check the installed versions of the two packages - kernel-kdump and kexec-tools. Refer to the official website for details.
- You may be missing --args-linux in the configuration file, under KEXEC_OPTIONS.
Once you successfully solve this issue, you will be able to proceed with testing. If the crash kernel boots without any issues, this means you're good to go and can start using Kdump for real.
We can begin the real work here. Like with LKCD, we will simulate a crash and watch magic happen. To manually crash the kernel, you will have to enable the System Request (SysRq) functionality (A.K.A. magic keys), if it has not already been enabled on your system(s), and then trigger a kernel panic. Therefore, first, enable the SysRq:
echo 1 > /proc/sys/kernel/sysrq
Then, crash the kernel:
echo c > /proc/sysrq-trigger
Now watch the console. The crash kernel should boot up. After a while, you should see Kdump in action.
Let's see what happens in the console. After a while, a small counter should appear, showing you the progress of the dump procedure. This means you have most likely properly configured Kdump and it's working as expected. Wait until the dump completes. The system should reboot into the production kernel when the dump is complete.
Indeed, checking the destination directory, you should see the vmcore file.
This concludes the local disk dump configuration. Now, we will see how Kdump handles network dump.
Being able to send kernel crash dumps to network storage makes Kdump attractive for deployment in large environments. It also allows system administrators to evade local disk space limitations. Compared to LKCD, Kdump is much more network-aware; it is not restricted to dumping on the same subnet and there is no need for a dedicated server. You can use NFS areas or CIFS shares as the archiving destination. Best of all, the changes only affect the client side. There is no server-side configuration.
To make Kdump send crash dumps to network storage, only two directives in the configuration file need to be changed for the entire procedure to work. The other settings remain identical to local disk functionality, including starting Kdump on boot, GRUB menu addition, and Kexec testing.
The configuration file is located under /etc/sysconfig/kdump. As always, before effecting a change, backup the configuration file.
To use the network functionality, we need to configure Kdump to boot in runlevel 3. By default, runlevel 1 is used. Network functionality is achieved by changing the directive.
The second step is to configure the network storage destination. We can no longer use the local file. We need to use either an NFS area, a CIFS share or an SSH or an FTP server. In this document, we will configure an NFS area, because it seems the most sensible choice for sending crash dumps to. The configuration of the other two is very similar, and just as simple.
The one thing you will have to pay attention to is the notation. You need to use the correct syntax:
<server> refers to the NFS server, either by name or IP address. If you're using a name, you need to have some sort of a name resolution mechanism in your environment, like hosts file or DNS. <dir> is the exported NFS directory on the NFS server. The directory has to be writable by the root user. In our example, the directive takes the following form:
These are the two changes required to make Kdump send memory dumps to a NFS storage area in the case of a kernel crash. Now, we will test the functionality.
Like the last time, we will trigger a kernel crash using the Magic Keys and observe the progress in the console. You should a progress bar, showing the percentage of memory core dumped (copied) to the network area. After a while, the process will complete and the crash kernel will reboot. If you get to see output similar to the two screenshots below, this means you have most likely successfully configured Kdump network functionality.
This concludes the long and thorough configuration and testing of Kdump. If you have successfully managed all the stages so far, this means your system is ready to be placed into production and collect memory cores when kernel panic situations occur. Analyzing the cores will provide you with valuable information that should hopefully help you find and resolve the root causes leading to system crashes.
Kdump is a powerful, flexible Kernel crash dumping utility. The ability to execute a crash kernel in the context of the running production kernel is a very useful mechanism. Similarly, the ability to use the crash kernel in virtually all runlevels, including networking and the ability to send cores to network storage using a variety of protocols significantly extends our ability to control the environment.
Specifically, in comparison to the older LKCD utility, it offers improved functionality on all levels, including a more robust mechanism and better scalability. Kdump can use local RAID (md) devices if needed. Furthermore, it has improved network awareness and can work with a number of protocols, including NFS, CIFS, FTP, and SSH. The memory cores are no longer limited by the 32-bit barrier.
I hope you enjoyed this tutorial and will find it most helpful in configuring Kdump in your environment. We will talk about the post-processing of the memory cores in a separate article.
This section contains a few more details about Kdump. Namely, it provides instructions how to install kexec-tools and kernel-kdump packages manually, and how to use the friendly and simple YaST Kdump module to configure and setup Kdump in SUSE.
The settings listed in this tutorial are only valid for the i386 and x86_64 platforms. Itanium and PPC require some changes. The best place to look for details is the official documentation under /usr/share/doc/packages/kdump. Likewise, please check References further below.
The simplest way of installing the package is via the official distro repositories. However, if this package is missing, your kernel is probably not configured to use Kdump in the first place, so the chance of encountering this situation is slim. Still, if you did have to compile the kernel manually, then you will have to install this package after the kernel has been built and booted into.
It is possible that you will have to manually download and install the kexec-tools package, especially if you do not have a vendor-ready kernel image. The best way to install the package is via the official repositories. However, if the package is not available that way, then to obtain kexec-tools, you will have to do the following:
The kexec-tools package comes archives. You will first need to extract the package:
tar zxvf kexec-tools.tar.gz
To be able to compile your system will have to have the compilation tools installed, including make, gcc, kernel-source, and kernel-headers. You can obtain these from the repositories relevant to your distribution. For instance, on Debian-based distros, these tools are obtained very easily by installing build-essential package (sudo apt-get install build-essential).
Please make sure you download the right package that matches your Kdump version. Otherwise, when you try to run Kexec, you are likely to see strange errors, similar to Possible errors we have seen earlier during the Kdump testing.
openSUSE (but also SLES) comes with a very handy YaST Kdump module (yast-kdump), which allows you to administer the Kdump configuration using YaST. On one hand, this makes the setup much easier. On the other, you will probably not understand the Kdump functionality as thoroughly as when using the command-line and working directly against the configuration file.
Nevertheless, I thought it would be useful to mention this. Indeed, you can see a number of screenshots taken on a openSUSE 11.1 machine, demonstrating the installation and the use of the yast-kdump package.
After the installation, you can find the module in the System sub-menu. It's called Kernel Kdump.
After launching the application, you can start managing the configuration, just like we did before. The main difference is getting used to the layout, as the options are now dispersed across a number of windows. Personally, I find this approach more difficult to understand and manage. However, you should be aware of its existence and use it if needed.
The following sources may be of value to you. Please note that the material presented has been written for experienced system administrators and is not very suitable for people just starting with Kdump.
Debugging Linux kernel using Kdump (PDF, direct link)
This tutorial is a part of my Linux Kernel Crash Book. The book is available for free download, in PDF format. Please check the book article for more details.