Act I - The Linux Kernel's VFS - exposition
-  Is the component in the kernel that handles file-systems, directory and
     file access.
-  It abstracts common tasks of many file-systems.
-  And presents the user with a unified interface, via the file-related
     system calls.
Act II - Relations Of The VFS With The Rest Of The System - the plot thickens...
-  The VFS interacts with file-systems
-  ... which interact with the buffer cache, page-cache and block devices.
-  The VFS also interacts with the user...
-  ... via system calls
-  Finally, the VFS supplies data structures such as the dcache, inodes cache
     and open files tables...
VFS And The System - Static Relations
The following figure shows the static relations of the VFS with the rest of
the system:
 
VFS And The System - Dynamic Relations
The following figure shows the dynamic relations of the VFS objects with the
rest of the system:
 
Act III - Internal Components Of The VFS - the naked souls...
The following components comprise the VFS:
-  dcache - cache of "dentry" objects, used to translate paths to inodes.
-  inode cache - cache of "inode" objects, used to represent files and
     directories on the file systems.
-  common code - code which is used by many file-systems, was moved into
     functions which are now part of the VFS.
About Caches
-  Generally, the VFS attempts to keep caches of objects, because:
  
  -  Allocating objects (memory) takes time.
  
-  Resolving objects takes time.
  
 
-  So for each object type the VFS holds:
  
  -  A list of used (i.e. may NOT be deleted) objects.
  
-  A list of resolved but un-used (i.e. recently resolved but may be
       deleted) objects.
  
-  A list of un-assigned and un-used (i.e. completely free) objects - by
       using SLAB caches.
  
 
The Dcache
-  Contains a hash table of "dentry" objects, each representing a translation
     from a path to an inode (including "negative" dentries - representing
     recent lookups to non-existing files).
-  The dentries are also connected in a tree structure, representing the
     structures of files and directories on the mounted file systems.
-  An entry remains in the dcache until its file-system is un-mounted...
-  ... or until it is pruned during a cache shrink, which happens once
     every 300 seconds, by the swap-daemon (kswapd)...
-  ... or when the swap-daemon (kswapd) needs to free space.
-  See file fs/cache.c, function prune_dcache, for the gory details.
The Inode Cache
-  Contains a hash table of "inode" objects, each representing a
     file/directory on a mounted file system.
-  Each inode is potentially linked with a dentry...
-  ... as well as with the file-system it belongs to.
-  Each inode contains a list of pages and (dirty) buffers belonging to the
     file/directory this inode represents.
The File Object
-  An object representing an open file instance.
-  Points to a dentry, which points to an inode that represents the
     actual file...
-  Contains information such as the "current position", access mode using
     which the file was opened, uid and gid via which the file is open, etc.
-  Contains a pointer to 'file operations', which is set by the underlying
     file-system (or device driver - in case we opened a device-special file).
Act IV - The VFS Sources - there's a birdhouse in your code...
Lets review the more-interesting source files of the VFS, all found under
directory "fs":
-  super.c - handling of super-blocks and of file-system types.
-  namespace.c - file-system mount and unmount.
-  namei.c - file tree manipulation (lookups, inode creation and deletion,
     permissions checking...).
-  read_write.c - implementation of read, write and lseek system calls for
                    files.
-  dquot.c - handling of disk usage quotas.
-  dcache.c - implementation of the dcache...
-  inode.c - implementation of the inode object and inode cache.
Act V - Example VFS Operations - three paths to your soul...
-  Let us look at a few interesting VFS operations:
  
  -  Path to inode translation.
  
-  File open.
  
-  File read.
  
 
VFS Operations - Path To Inode Translation
Given a (full or relative) path to a file, find its inode:
Entry function: user_path_walk, in include/linux/fs.h.
Actual lookup function: link_path_walk, in fs/namei.c.
-  First, get the dentry to "/" (if it's a full path) or to "." (if it's
     a relative path).
-  Start scanning the path, and follow via the dcache.
  
  -  Handle "." by skipping.
  
-  Handle ".." using dentry->d_parent.
  
-  Handle others via a cache lookup.
  
-  If not found in the cache - ask the underlying filesystem to perform
       a real lookup.
  
 
VFS Operations - File Open
Given a file path, an open mode and a file permissions mask:
Entry function: sys_open, in fs/open.c.
Underlying open function: filp_open, in fs/open.c.
-  Allocate a free file descriptor.
-  Try to open the file (next slide).
-  On success, put the new 'struct file' in the fd table of the process.
 On error, free the allocated file descriptor.
Actually Opening The File
Entry function: open_namei, in fs/namei.c.
-  If not opening in create mode (no O_CREAT flag given):
 lookup the file via the dentry cache (path_lookup -> path_walk ->
     link_path_walk).
-  Otherwise (it is an open in create mode):
  
  -  Lookup the parent directory (path_lookup again). If it does not
       exist, or is not a directory - fail the operation.
  
-  Lookup the file in the parent directory. If it does not exist, create
       it (see vfs_create later on).
  
-  Handle special cases (flags mismatch, file is a directory, file is
       a link...).
  
 
-  In both cases (open with create or file already exists):
  
  -  Perform sanity checks.
  
-  Check permissions.
  
-  Check various mode limitations (e.g. file is read-only and trying to
       open for write, trying to open a device file on a 'no_dev' file-system
       mount, etc.).
  
-  Handle truncation.
  
 
VFS Create
Entry function: vfs_create, in fs/namei.c.
-  Check if we may create a new entry in the given directory (mostly
     permission checking).
-  Check that the underlying inode has a 'create' inode operation.
-  Invoke the 'create' inode operation (of the file-system).
VFS Operations - File Read
Entry function: sys_read, in fs/read_write.c:
-  Using the file descriptor, get the (already opened) file struct.
-  Verify that the file's access mode allows read.
-  Check locks.
-  Invoke the underlying 'read' file operation of the file's inode.
Reading Via The Page Cache
Entry function: do_generic_file_read, in mm/filemap.c:
-  Get the address mapping of the file's inode.
-  Translate the read position to a page index (each page contains
     2^PAGE_CACHE_SHIFT bytes).
-  Calculate read-ahead parameters (i.e. reading from a file at position
     X normally causes reading several more pages, assuming the application
     is likely to request those pages next).
-  Make sure we're not reading past EOF.
-  For each page in the range the user asked to read from:
  
  -  Look for the page in the page cache. If it is there, and
  
-  If it's there, and is up-to-date:
    
    -  If we're not in 'non-blocking' mode (i.e. we're allowed to block),
         invoke read-ahead.
    
-  Copy data from the page to the user's buffer.
    
 
-  Otherwise, use the address mapping 'readpage' operation to read the
       page, and start over (unless we're in non-blocking mode).
  
 
References
-  Linux Kernel 2.4 Internals - Virtual Filesystem (VFS)
-  The Linux Virtual File-System Layer
-  Linux Virtual File System (lecture slides from 1998).
-  A Small Trail Through The Linux Kernel (code walk-through for open and read system calls).
Originally written by guy keren
guy keren