%include "./lin-club.mgp" %default 0 bgrad 0 0 1 0 1 "white" %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page %nodefault %bgrad 0 0 1 0 1 "white" %center, size 7, font "typewriter", fore "blue", back "white", vgap 20 %center Haifa Linux Club %image "/usr/src/linux/Documentation/logo.gif" 0 60 60 1 System Call Tracker %size 6 Design, Implementation, Goals %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Topics General Overview - What Is syscalltrack? Architecture Overview Development Process Overview syscalltrack's Kernel Modules The Hijacker Module The Filtering Module Communication With User-Space Code Auto-Generation Problems In Kernel Space %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Topics (Cont.) syscalltrack's Configuration Utility - sct_config The Configuration File The Tree Parser The Filter-Expression Parser handling Errors %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Topics (Cont.) Interesting Bugs and Technical Issues The Problems With Hijacking System Calls The gcc Signal 11 "attribute" Problem Handling Structure Parameters The egcs "string on stack" Crash System Call Multiplexing The Future The Authors %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page General Overview - What Is syscalltrack? Kernel modules and user utilities to allow filtering, logging and \ altering the invocation of system calls. Currently supports filtering and logging. Modules work with kernel 2.2.19 and 2.4.X Configured using a configuration file and a tool that injects rules \ from this file into the kernel module. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Architecture Overview 2 modules - one to hijack system calls, another to perform the actual \ filtering, and communicating with user-mode code. A communications library (sct_ctrl_lib) allows user-processes to \ configure the module, using the 'sysctl' interface. User-mode utility parses the configuration file, validates the rules, \ and then deletes all existing rules in the module, and injects the new ones. Modules cannot be unloaded as long as rules are defined in them. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Development Process Overview All source is kept on a CVS server (on sourceforge). Code development is usually done using user-mode-linux \ (http://user-mode-linux.sf.net/). Committed patches are sent to a mailing list, and reviewed by other \ members. Bugs are often quickly found and fixed this way. When we think of things we want to have - we write them down (in a file \ or as a 'task' on sourceforge), else we'd forget them. Then things are done \ based on both priority and heart desires. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page syscalltrack's Kernel Modules syscalltrack contains 2 kernel modules - 'syscall_hijack.o' and \ 'sct_rules.o'. The first handles system call 'hijacking' (that is, replacing a system \ call with another function). The last does the tracking operation itself. The split into two modules is done to avoid race conditions inherent \ to the use of modules in tracked code. More about this later. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The Hijacker Module This module exports syscall hijacking functions - \ hijack_syscall_before, hijack_syscall_after, release_syscall_before and \ release_syscall_after. Pointers to hijacking functions are inserted into the kernel's system \ call table, 'sys_call_table' instead of the pointer to the original system call \ kernel's function. In order to avoid kernel crashes, invocation reference counting is \ kept for each system call, and the syscall_hijack.o module cannot be unloaded \ while one of the system calls (and thus its functions) is active. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The Filtering Module This module (sct_rules.o) accepts user-control messages via sysctl, to \ add rules to a given syscall, delete a rule, print the rules into the system's \ log file, etc. When a system call is invoked, a stub function for this syscall is \ executed. This function matches the call against all rules for this syscall, \ and if a match is found, an action is performed. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The Filtering Module (Cont.) For each system call, 2 sets of rules are kept - 'before' rules, \ and 'after' rules. 'before' rules are checked right before invoking the system call. \ Thus, they could be used to disallow the syscall from being executed, \ or even alter parameters sent to the system call. 'after' rules are checked right after the syscall returns, and before \ returning to the user. They allow checking and logging the syscall's return \ value, altering this return value, and so on. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Communication With User-Space Communications with user-space is currently done by registering a \ "directory" of commands for 'sysctl' system calls. Each supported command \ has its own 'constant', and a single function then accepts these commands \ and handles them. Because sysctl provides a limited interface, its hard to notify \ the user-space application about reasons for failures. Replacing the 'sysctl' \ interface with a controlling device file will solve this problem. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Code Auto-Generation Many functions in both kernel modules look very similar, and vary \ mostly by name/id of the system call they handle, and the parameters this \ system call receives. Instead of writing a lot of similar functions using copy/paste, a perl \ script generates the code for these functions. It does that by a combination \ of template files with macros, data types mappings and hard-coded constructs. This approach to code generation is not new, and could be spun into \ its own project - of writing a general automatic code generator. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Problems In Kernel Space - Module Unload Race When a module has one of its functions executed, or in the execution \ stack of a process/interrupt handler, unloading the module could crash the \ system - its code page is still in use, and yet might be re-allocated by the \ kernel for other purposes. Thus, a module writer must make sure that no active invocations of its \ functions exist when the module is unloaded. The only way to prevent a module from being unloaded, is to increase \ its use count (using the 'MOD_INC_USE_COUNT' macro), and decrease it to zero \ only when there is no method to access any of its commands. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Problems In Kernel Space - SMP And Re-Entrancy A process executing a system call might go to sleep, allowing another \ process to execute this system call. In SMP systems, even if a process does \ not go to sleep, its code might be executed in parallel in another process. To avoid races (and data structures corruption) inherent to such \ situations, data structures must be carefully protected, using semaphores \ (around sections that might sleep), spin-locks (around non-sleeping sections, \ to handle SMP machines) etc. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Problems In Kernel Space - SMP And Re-Entrancy (Cont.) However, over-use of locks would slow the kernel: either by forcing \ serial execution in cases where an SMP machine could actually work in parallel, \ or by introducing redundant overhead (locking and unlocking adds extra overhead). %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page syscalltrack's Configuration Utility - sct_config 'sct_config' is a C++ program that allows configuring the module. It \ is made of a parser (to read and understand a config file) and a \ commands generator (to generate data for commands for the kernel module). 'sct_config' is written using a combination of top-down and \ recursive-decent parsing. 'sct_config' may be used to perform other control operations - deleting \ all active rules, or printing them to the system's log file. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The Configuration File The configuration file contains a list of rules. A rule might look like this: %size 3, fore "black", back "white", vgap 30, prefix " ", font "typewriter" rule { syscall_name=unlink rule_name=second_unlink_rule when:before filter_expression {PARAMS[1] in ("passwd", "/etc/passwd")} action:LOG } %size 5, font "standard", fore "blue" In addition, the logging format may be specified, like this: %size 2.5, fore "black", back "white", vgap 30, prefix " ", font "typewriter" log_format { before {syscall: %pid[%comm]: %sid_%sname(%params) (r %ruleid)} after {syscall: %pid[%comm]: %sid_%sname(%params) = %retval (r %ruleid)} } %size 5, font "standard" %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The Tree Parser 'sct_config' reads the configuration file to memory and passes it to a parser. The parser parses the configuration file and for each keyword \ it recognizes, such as 'rule', 'syscall_name' or 'filter_expression', \ it calls a parser for that keyword's value. The value can be a simple token, such as 'before', or a \ complex token such as the entire rule's contents or a filter \ expression, like 'PID != 1 && PARAMS[1] in ("boo", "bee", "bah")'. The parser should one day be written using compiler \ construction tools, such as lex && yacc, but for now some of us \ derive an unholy pleasure from writing it by hand. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The Filter Parser The filter expression parser knows how to take an expression \ of the form 'PID != ~777 && PARAMS[1] + 3 > ("foo" in ("boo", "bee", \ "bah"))' and build a filter tree suitable for passing to the \ kernel, while maintaining operator precedence. The basic algorithm is taken from the dragon book and was \ extended for our grammar in several sleepless nights. It uses one \ function for each precedence level, where each function looks like \ this: %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The Filter Parser (Cont.) %size 3, fore "black", back "white", vgap 30, prefix " ", font "typewriter" filter_node_t* filter_parser::expr_bitwise_and(bool get) { filter_node_t* left = equality(get); filter_node_t* right = NULL; for (;;) { switch(curr_tok.t) { case (BITWISE_AND): right = equality(true); left = make_bit_and(left, right); if (!left) { strstream err; err << "syntax error at expression ending in " << curr_tok.orig_text << ends; throw parse_exception(_p.get_file_name(), get_line_num(), err.str()); } break; default: return left; } } } %size 5, font "standard" %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Error Handling Since 'sct_config' is our user interface, it is imperative \ that it reports errors to the user clearly and concisely and not \ drown him in useless debugging data. Each function that encounters an error throws an exception \ with as much context information as it has. What line of the file was \ the error on, what was the token that caused the error, etc. Once thrown, an exception travels upwards in the call chain \ until a suitable exception handler is found. An exception handler \ could add information only it has to the exception and re-throw it, or \ it could report it to the user. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Interesting Bugs and Technical Issues Development of kernel modules (and even user-mode code) tends to \ expose various system bugs and "design obstacles". syscalltrack, being a \ non-standard project, seems to reveal quite a few of those. We'll illustrate a few of the more interesting/annoying ones here, to \ give one an impression. There were probably quite a few others, which our precious minds \ managed to suppress. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The Problems With Hijacking System Calls There is no mechanism in Linux's kernel to hijack system calls, due to \ political reasons. Thus, hijacking them is done using the same methods that \ one used for hijacking interrupts under MS-DOS. This method means you locate the system call table - a table of \ pointers to all syscalls, mapped by syscall ID - copy a pointer to your own \ table, and replace the original pointer with a pointer to your function. Luckily, the system call table (sys_call_table) is exporter for \ modules to use, so there was no need for searching the kernel's memory \ directly. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The Problems With Hijacking System Calls (Cont.) Of-course, you eventually should invoke the original system call, with \ the original parameters, or else the system breaks in most peculiar manners. \ It did... %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The gcc Signal 11 "attribute" Problem Certain versions of gcc kept dying with an internal error \ when compiling the module code. So how do you debug such a problem? After banging your head repeatedly against the wall, you try \ to locate the problem, using the binary search method. You try to \ compile smaller and smaller chunks of code, until you hit one(!) line \ of code which causes the compiler to barf. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The gcc Signal 11 "attribute" Problem (Cont.) %size 3, fore "black", back "white", vgap 30, prefix " ", font "typewriter" typedef asmlinkage int (*stub_func_5)(const char* pathname, int flags, mode_t mode); %size 5, font "standard", fore "blue" It turns out that 'asmlinkage', which is a kernel macro for \ one of gcc's "attribute" keywords, is only valid for function \ declarations, and not function pointer declarations. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Handling Structure Parameters Some syscalls accept pointers to structures as parameters. Since they \ are complex parameters, we needed a method to filter based on a specific field \ in a struct. The simplest way to do this was translating a struct into a vector, \ with each struct field as an item in that vector. 'sct_config' translates \ struct type + field name into an index. The kernel module translates the struct into a vector, and uses that \ vector for later matching operations. This approach makes type-casting impossible, and thus will change \ soon, into using 'class vectors'. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The egcs "String On Stack" Crash g++ used to lag behind, and caused 'egcs' to exist. They were merged \ back together in gcc v2.9.5 . Older distributions (e.g. red-hat 6.2) come with \ an older version of egcs, which has quite obvious bugs. One such bug related to templates (std::string and similar) which were \ allocated on the stack. Finding the bug required blindly playing with string \ allocations (using the 'lion in the desert' method) until the culprit was \ found. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page System Call Multiplexing Some "system calls" are actually multiplexers for several functions. \ For example, 'socketcall' is a multiplexer for all socket-related functions: \ socket, accept, connect, listen, shutdown, recv, send... Handling those required special handling, since we wanted to expose \ the muxed functions (not the muxing syscall) to users. Our solution was to encode the function ID inside the syscall ID, \ and have the kernel module break the number down, and apply rules to \ sub-functions, rather then to the syscall itself. For that to work, we needed to copy the kernel's mux syscall and make \ our own copy, that calls our functions, instead of the original system call. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page Implementing Communications Through a Device File. sct_config could potentiall communicate with a kernel module \ through one of several mechanisms: /proc, sysctl, and through a \ device file. Only sysctl is curently implemented. A device file is considered the best design, since it's the \ most flexible and most "unixish" - everything is a file. sycalltrack \ will have two device files, '/dev/sct_ctrl/', \ which will be used to control the module (inject rules, read the \ current rules, etc) and '/dev/sct_log', which will be read only and \ used to read the filter matches. Work on the device file is well underway - we expect to have \ it ready Real Soon Now(tm). %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The Future... Syscalltrack could divert to quite a few directions in the future, \ as our hearts desire. For instance, we intend to add support for altering the contents of \ parameters and return values, before/after invoking a syscall. This has \ quite a few usages, e.g. "fixing" programs for which we don't have the source, \ or injecting faults into programs and seeing how they cope with them, etc. We could write an API that allows other modules to register rules that \ invoke callbacks in their functions - though this would definitely cause \ political wars in the kernel - unless we explicitly state this interface is \ GPLed - so we will. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The Future (Cont.) Perhaps even externalising system call activation into user-space, to \ allow for easier development of various types of features, and debugging them \ in user-space. mulix is playing with the idea (well, not his - its "stolen") of writing \ a module that sits on top of syscalltrack, learns patterns of use of the \ system, and later alerts if these patterns change. %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %page The Authors, in Alphabetical Last-Name Order: Shlomi Fish guy keren mulix Amir Shalem Eli Shemer ___________ (this could be YOU!) %prefix " " %image "/usr/src/linux/Documentation/logo.gif" 0 20 20 1