[System|Toolbox] Tools
for the Art
of System
Administration
HOME STAFF FAQ ADVERTISE LEGAL
binaryfreedom.com

Sections
   News
   Reviews
   Commentary

Community Events:
 
Submit an event

Kernel Internals, Part I

Wednesday January 17, 2001 09:07am PST
Join Marius as he takes us on a tour of /proc and kvm, two methods for accessing in-kernel data structures.

- Accessing [UNIX] Kernel Internals (1); /proc and kvm -

Many times, applications need to access data that is primarily available to the kernel. Most of the time, access to this kind of data happens via system calls, for example calling the uname system call returns some information on the system name, kernel version, architecture and so on. System calls obviously has full access to all of the kernel data structures they may desire as they are implemented in-kernel. Linux 2.4.0, for example, has 221 system calls available to userspace programs. For the most part, system calls are called through wrappers in the C library (libc) or others. Although system calls are usually sufficient for most applications, they simply can not provide for the needs of every application. It would simply be impractical for the kernel to provide system calls for every exported data structure; however, the need is still there to access some of the more exotic data structures that reside in the kernel. This niche of applications typically belongs to speciality applications, such a s 'ps' and 'top,' that display highly specialized information about processes on the system.

-kvm-

Kernel programmers solved this problem by providing an interface, typically named /dev/kmem. This device, true to UNIX traditions, is simply an interface to the kernel virtual memory of the live system. Writing to the device will be reflected in the virtual memory address space, likewise, reading from it will read from the live kernel virtual memory address space. It's quite clear that the data structures we are interested in reside within this address space.

One API that uses /dev/kmem is 'libkvm,' the kernel virtual memory interface. It is typically present on SunOS and BSD systems. KVM allows for looking up of symbols as well as reading and writing to the kernel virtual memory. A KVM session is very consistent with the standard UNIX way of dealing with files, that is:

  • open interface                kvm_open()
  • do something                 kvm_nlist(), others
  • read/write to interface    kvm_read(), kvm_write()
  • close interace                 kvm_close()

Similarly, KVM also uses a descriptor to keep track of the session. Let's say, for example, that you were on a system which exported the symbol _page_size which represented the page size of the system; let's also say that _page_size is of type int representing number of bytes. A KVM session to retrieve the page size follows:

char err[_POSIX2_LINE_MAX];
kvm_t *kd;
int page_size;
struct nlist nl[] = {
     {"_page_size"}, /* the symbol we're looking for */
     {NULL} }; /* terminator */

/* open kvm session and get a descriptor */
kd = kvm_open(NULL, NULL, NULL, O_RDONLY, err)
kvm_nlist(kd, nl); /* look the symbol up */
kvm_read(kd, nl->n_value, &page_size, sizeof(int)); /* read it */

printf("page size is %i bytes\n", page_size);

kvm_close(kd);

The above example completely emits error checking for clarity. Firstly, the data structure nlist is simply a data structure describing the symbol table entry format. kvm_nlist will look up all the symbols defined in the array of nlist structures (which is terminated by a NULL) and enter their addresses in the n_value and n_type members of the structure representing the symbol address and type respectively. If the lookup failed, a '0' will be placed there instead of the actual values. The following kvm_read call reads from the kernel virtual memory at the address of the symbol we looked up, copying the value (int long) in to page_size. You close the session with kvm_close(). Symbol names are obviously system dependent, but as such, the KVM interface is portable. Some newer implementations of KVM have taken the concept a step further, providing interfaces for several commonly used symbols.

For example, the OpenBSD KVM interface additionally provides

  • kvm_getprocs     gets processes that should be inspected
  • kvm_getargv       gets a the argv of a given process
  • kvm_getenvv       gets the environment of given process
Both kvm_getargv and kvm_getenvv applies to the processes selected with kvm_getprocs. A lot of data is also gathered about the processes selected by the means of kvm_getprocs as well. Interesting data structures made available is struct proc which contains a lot of information about the process selected (in fact, it's the data structure used internally by OpenBSD to represent a process). Other interesting data provided by kvm_getprocs include information about the processes address space as well as memory and cpu usage (useful for statistics).

For certain applications, data that is provided through KVM can be vital. However, there is a large security risk associated with the usage of /dev/kmem. Due to it's unrestricted access to memory, /dev/kmem is typically only readeable by root and members of the group kmem and only writeable by root. Therefore, applications typically have to be setuid (or setgid depending on the usage of KVM) to run which explains why ps on OpenBSD is setgid to kmem.

-/proc-

KVM obviously has quite a few limitations; although it's very flexible, it also has potential to be insecure. Also, non-privileged users (without setuid/setgid executables) cannot enjoy the advantages it provides. With this in mind, and the need for many users to access certain pieces of data typically contained within the kernel, developers designed the /proc interface. /proc is a filesystem provided by the kernel that acts as an interface to certain in-kernel data structures. It also acts as a sysctl interface, but that is not within the scope of this article. Several kernel variables reside in /proc itself, but perhaps the most useful data comes from examining /proc/pid where pid is the pid of the process you want to examine. Within /proc/pid the owner of the process is also the owner of the files, so a user can easily examine his or her own processes. A number of useful files reside within /proc/pid including:

  • environ     the current environmet of the process
  • stat           process statistics
  • statm        memory process statistics
  • mem         the memory belonging by the process

These files can be read and parsed like any other file. These are, of course, all the variables that the kernel developer felt it necessary and safe for the individual users to have access to. It's nowhere as flexible or extensible as access to /dev/kmem, but it provides access to the most necessary kernel data structures in a more safe manner than /dev/kmem (and its respective APIs) does. The /proc interface also aims to be a portable way of accessing in-kernel data structures (although the author knows of at least one discrepancy between the Linux and Solaris implementations).

-conclusions-

The two ways of accessing in-kernel data structures discussed in this article both have their advantages and disadvantages; it all boils down to flexibility and extensibility versus security, the latter often being prioritized in today's sensitive world of information. It is often specialized applications that need to access these variables, but many times, these applications are essential to the daily running and maintenance of the system. It is also an important concept for the system administrator to be aware of.

Next time we'll discuss more specifically how to retrieve in-kernel data structres, not provided by /proc, in a system that does not provide the KVM interface. In many ways an examination on how KVM works.

Comment? - Or do you think this article blows chunks and you could write a better one in your sleep? Then do it!
View Comment Page

Copyright © 2004, The Binary Freedom Project, LLC.