Why does gnome-panel use 290MB?

The Puzzle

I noticed, on my 64-bit Linux system, that gnome-panel was apparently using an awful lot of virtual memory.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 3734 rjk       20   0  290m  16m 6668 S    0  0.8   3:22.05 gnome-panel

Virtual memory’s pretty cheap, so that’s not hugely problematic, but it seemed a huge amount for a glorified toolbar. I looked in /proc/3734/maps and found that about 260MB of that space belonged to shared libraries. Now, gnome-panel does use a lot of library, 82 to be precise, but 3MB per library sounded a lot, and the biggest of them is only 4MB. Looking closer I noticed that an awful lot of the libraries had 2MB (0x200000) non-executable mappings associated with them. As an example here are the mappings for GTK+ (with the size in hex added at the start for convenience):

3c7000 7f967efd6000-7f967f39d000 r-xp 00000000 fe:00 3111259  /usr/lib/libgtk-x11-2.0.so.0.1200.12
200000 7f967f39d000-7f967f59d000 ---p 003c7000 fe:00 3111259  /usr/lib/libgtk-x11-2.0.so.0.1200.12
  a000 7f967f59d000-7f967f5a7000 rw-p 003c7000 fe:00 3111259  /usr/lib/libgtk-x11-2.0.so.0.1200.12

The first line is the code segment and the last the data segment. But what’s the strange 0x200000 (2MB) mapping in the middle? (Skip to the summary at the end if you just want to know the answer.)

The Runtime Linker

I traced a process starting up to see what the runtime linker actually requested:

open("/usr/lib/libgtk-x11-2.0.so.0", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0pn\6\0\0\0\0\0@"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0644, st_size=4002912, ...}) = 0
mmap(NULL, 6106920, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0989b92000
mprotect(0x7f0989f59000, 2097152, PROT_NONE) = 0
mmap(0x7f098a159000, 40960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3c7000) = 0x7f098a159000

The mysterious extra mapping corresponds to that mprotect() call. 2MB has been marked as PROT_NONE between the code and data segments.

Although the gap remains mysterious at this point, it’s possible to say that this memory will never actually be used in any meaningful sense; it does not waste any RAM or swap space. The only thing being consumed is address space (which is in plentiful supply on 64-bit systems). Arguably it’s a bug (in something) that it’s accounted as part of the virtual memory consumption of the process at all.

The Shared Object

It turns out that the dynamic linker is just following instructions. From the library’s header:

    LOAD off    0x0000000000000000 vaddr 0x0000000000000000 paddr 0x0000000000000000 align 2**21
         filesz 0x00000000003c6614 memsz 0x00000000003c6614 flags r-x
    LOAD off    0x00000000003c7000 vaddr 0x00000000005c7000 paddr 0x00000000005c7000 align 2**21
         filesz 0x0000000000009ca8 memsz 0x000000000000bf28 flags rw-

Note that the data segment is at offset 0x3c7000 in the file, but is to be loaded at offset 0x5c7000. There’s the 2MB difference. The mprotect() fills the gap; the relevant fragment of the runtime linker is:

if (has_holes)
  /* Change protection on the excess portion to disallow all access;
     the portions we do not remap later will be inaccessible as if
     unallocated.  Then jump into the normal segment-mapping loop to
     handle the portion of the segment past the end of the file
     mapping.  */
  __mprotect ((caddr_t) (l->l_addr + c->mapend),
              loadcmds[nloadcmds - 1].mapstart - c->mapend,
              PROT_NONE);

That doesn’t explain two things:

why the library requests this large gap
why the gap needs to explicitly marked as PROT_NONE rather than simply left unmapped

The first question is discussed below. As for the second, all I can think of is that the idea is to avoid anything being mapped in between the library’s code and data segments, and although that would be rather weird I can’t see how it would do any harm.

The Linker

I created a trivial shared library. I actually started with a small C source file but stripped it down to a few lines of assembler to eliminate any irrelevant details.

$ cat t.s
.globl object
        .data
        .align 4
object:
        .long   1
$ as -o t.o t.s
$ ld --verbose -M -shared -o t.so t.o

This still had the mysterious 2Mbyte gap. That shows that it’s a property of the linker, not something that shared library authors are doing.

    LOAD off    0x00000000000001c8 vaddr 0x00000000002001c8 paddr 0x00000000002001c8 align 2**21
         filesz 0x00000000000000cc memsz 0x00000000000000cc flags rw-

The link map generated by the -M option is unreasonably long but the evidence of the gap can be found:

.gcc_except_table
 *(.gcc_except_table .gcc_except_table.*)
                0x00000000000001c8                . = (ALIGN (0x200000) - ((0x200000 - .) & 0x1fffff))
                0x00000000002001c8                . = (0x200000 DATA_SEGMENT_ALIGN 0x1000)

The --verbose output gives a further clue:

  /* Adjust the address for the data segment.  We want to adjust up to
     the same address within the page on the next page up.  */
  . = ALIGN (CONSTANT (MAXPAGESIZE)) - ((CONSTANT (MAXPAGESIZE) - .) & (CONSTANT (MAXPAGESIZE) - 1));
  . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE));

...and that’s enough to find the ultimate source file, binutils/ld/scripttempl/elf.sc (and a bunch of others, but this one has the most plausible name!)

  /* Adjust the address for the data segment.  We want to adjust up to
     the same address within the page on the next page up.  */
  ${CREATE_SHLIB-${CREATE_PIE-${RELOCATING+. = ${DATA_ADDR-${DATA_SEGMENT_ALIGN}};}}}
  ${CREATE_SHLIB+${RELOCATING+. = ${SHLIB_DATA_ADDR-${DATA_SEGMENT_ALIGN}};}}
  ${CREATE_PIE+${RELOCATING+. = ${SHLIB_DATA_ADDR-${DATA_SEGMENT_ALIGN}};}}

Firstly we can say it’s done it since at least May 1999, which is the earliest revision of that file in the binutils CVS. So digging through revision history is not likely to be especially enlightening. The changelog goes back a bit further however, with the following entry between release 2.6 and 2.7:

Thu Feb 15 13:58:06 1996  Ian Lance Taylor  <ian@cygnus.com>
[...]
        * scripttempl/elf.sc: Don't skip a page in virtual memory space if
        the text segment ends exactly on a page boundary.

The patch from 2.6 to 2.7 still exists and in the old version some more detail can be found:

@@ -86,22 +87,8 @@
   ${RELOCATING+${OTHER_READONLY_SECTIONS}}
 
   /* Adjust the address for the data segment.  We want to adjust up to
-     the same address within the page on the next page up.  It would
-     be more correct to do this:
-       ${RELOCATING+. = ${DATA_ADDR-ALIGN(${MAXPAGESIZE})
-               + ((ALIGN(8) + ${MAXPAGESIZE} - ALIGN(${MAXPAGESIZE}))
-                  & (${MAXPAGESIZE} - 1)};}
-     The current expression does not correctly handle the case of a
-     text segment ending precisely at the end of a page; it causes the
-     data segment to skip a page.  The above expression does not have
-     this problem, but it will currently (2/95) cause BFD to allocate
-     a single segment, combining both text and data, for this case.
-     This will prevent the text segment from being shared among
-     multiple executions of the program; I think that is more
-     important than losing a page of the virtual address space (note
-     that no actual memory is lost; the page which is skipped can not
-     be referenced).  */
-  ${RELOCATING+. = ${DATA_ADDR- ALIGN(8) + ${MAXPAGESIZE}};}
+     the same address within the page on the next page up.  */
+  ${RELOCATING+. = ${DATA_ADDR-ALIGN(${MAXPAGESIZE}) + (ALIGN(8) & (${MAXPAGESIZE} - 1))};}
 
   .data  ${RELOCATING-0} :
   {

That’s a useful clue. The text segment cannot be modified, the data segment can. So to keep the text segment efficiently sharable between processes (these are supposed to be shared libraries after all!), it’s necessary to ensure that the data segment begins in a separate page from the end of the text segment.

The old version in the diff above rounds the address up to a multiple of 8 and then adds the maximum page size.

The new version is hard to follow but as far as I can tell the change log entry is still accurate: if the text segment ends on a page boundary then the data segment follows contiguously, otherwise exactly one page is skipped. That sounds testable, and indeed:

$ cat t.s
        .text
        .fill   0x1ffe38,1,1
.globl object
        .data
        .align 4
object:
        .long 1
$ ld -M -shared -o t.so t.o
[...]
.gcc_except_table
 *(.gcc_except_table .gcc_except_table.*)
                0x0000000000200000                . = (ALIGN (0x200000) - ((0x200000 - .) & 0x1fffff))
                0x0000000000200000                . = (0x200000 DATA_SEGMENT_ALIGN 0x1000)

So much for the implementation details. That doesn’t really explain why the desired address is the same offset in the next page, rather than simply the start of the next page. It may be as simple as that having originally been the easy thing to do (originally) and the logic stuck.

BFD

The value 0x200000 for the maximum page size comes from bfd/elf64-x86-64.c. Contrast this with the value of 0x1000 in elf32-i386.c, which explains why the effect isn’t as noticable on 32-bit systems - the same gaps in every shared library are still introduced, they’re at least a factor of 512 smaller.

It looks like the reason for the huge maximum page size on amd64 is improved performance (presumably only if you actually configure your system to use those huge pages).

Platform And Versions

All of the above was done on an amd64 Debian lenny system. ld is from binutils 2.18.1~cvs20080103-7 and the runtime linker from libc6 2.7-18.

Summary

Applications with lots of shared libraries on 64-bit Linux are reported as using 2MB more per shared library than they actually occupy. This extra doesn’t cost you any RAM or swap space, just address space within each process, which is in plentiful supply on 64-bit platforms. The underlying reason is to do with keeping libraries efficiently sharable, but the implementation is a little odd.

RJK | Contents