Fixmapping VM pages, Vsyscalls
Published on: 2005-8-15
Fixmapping VM pages, Vsyscalls
2005-08-15T14:00:00
Faster system calls
The classical Linux system call mechanism is to put the call number in the eax register (in the case of i386) and simply invoke:int $0x80Here is a code fragment which measures the time taken to invoke `getpid' 10000000 times:
#define N 10000000 int pid; main() { int i; for(i = 0; i < N; i++) { asm("movl $20, %%eax \n" "int $0x80 \n" "movl %%eax, pid \n" : : :"eax"); } printf("got pid = %d, actual pid = %d\n", pid, getpid()); }On my P4 (HT) 2.8GHz system, the program took about 3.9 seconds to execute. Modern Pentium/AMD processors support instructions like sysenter/syscall using which it is possible to get into kernel mode faster. The problem here is that checking which mechanism (ie, int80/ syscall/sysenter) is supported by the processor as part of the system call invocation will itself incur an unnecessary overhead. What is the solution?
Fixmapping
It's possible to assign hard-coded virtual addresses to physical addresses during system bootup - note that only the virtual address is hard coded, the physical address is determined dynamically. The solution which Linus has implemented is: during bootup, get a free page and map it to virtual address 0xffffe000. Determine what kind of syscall mechanism your CPU supports and simply store a few bytes of machine code at that location; machine code which will trap into the kernel using the fastest available mechanism. Now, the user program can execute a system call by simply jumping to this particular virtual address! Here is a small C program which reads from 0xffffe000 and dumps it to stdout; the output can be redirected to a file and analyzed.main() { char *s, buf[4096]; s = (char *)0xffffe000; memcpy(buf, s, sizeof(buf)); write(1, buf, sizeof(buf)); }We run the program:
./a.out > datand do a `file dat'. Here is the output:
dat: ELF 32-bit LSB shared object, Intel 80386, version 1 (SYSV), strippedThe fixmapped page contains an ELF shared object. Let's do:
objdump -d datHere is part of the output we get:
Disassembly of section .text: ffffe400 <.text>: ffffe400: 51 push %ecx ffffe401: 52 push %edx ffffe402: 55 push %ebp ffffe403: 89 e5 mov %esp,%ebp ffffe405: 0f 34 sysenter ffffe407: 90 nopNote that 0xffffe400 is the start of the code sequence which ultimately traps into the kernel by calling `sysenter'.
Is there a speed up?
Let's find out. Here is a test program:#define N 10000000 int pid; main() { int i; for(i = 0; i < N; i++) { asm("movl $20, %%eax \n" "call 0xffffe400 \n" "movl %%eax, pid \n" : : :"eax"); } printf("got pid = %d, actual pid = %d\n", pid, getpid()); }I am getting a run time of 1.4 seconds (down from 3.9 for the int80 version)!
Fixmap your own Hello,World
Problem: Write a Hello,World printing program which doesn't have the sequence "Hello,World" stored in it. Solution: Let's fixmap a page containing "Hello,World"- Edit include/asm/fixmap.h; just add FIX_HELLO_WORLD below FIX_VSYSCALL and change the macro FIXADDR_USER_END to make it look like (FIXADDR_USER_START + 2*PAGE_SIZE)
- Edit arch/i386/kernel/sysenter.c. First, add an intialization:
unsigned long page2 = get_zeroed_page(GFP_ATOMIC);
Then add the code:__set_fixmap(FIX_HELLO_WORLD, __pa(page2), PAGE_READONLY_EXEC); memcpy((void*)page2, "Hello,World", 12);
That's all. Recompile the kernel (I am using a kernel.org 2.6.12) and write a user program which copies from virtual address 0xffffd000; you should get your Hello,World.