During conversion of txt file to rst format we added a bunch of lists. To ease the review of that patch the list contents were not changed. We do that now as a separate patch. This patch does not change the contents of the document in any real way, does whitespace fixes and adds missing periods to list items. Clean up lists by: - Adding missing periods. - Correcting the column width. - Correcting the indentation. Tested-by: Randy Dunlap <rdunlap@xxxxxxxxxxxxx> Signed-off-by: Tobin C. Harding <tobin@xxxxxxxxxx> --- Documentation/filesystems/vfs.rst | 1114 ++++++++++++++--------------- 1 file changed, 557 insertions(+), 557 deletions(-) diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst index 7ab885de9085..bd8f7891f44b 100644 --- a/Documentation/filesystems/vfs.rst +++ b/Documentation/filesystems/vfs.rst @@ -127,34 +127,34 @@ members are defined: }; - ``name``: the name of the filesystem type, such as "ext2", "iso9660", - "msdos" and so on + "msdos" and so on. -- ``fs_flags``: various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.) +- ``fs_flags``: various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.). -- ``mount``: the method to call when a new instance of this - filesystem should be mounted +- ``mount``: the method to call when a new instance of this filesystem + should be mounted. - ``kill_sb``: the method to call when an instance of this filesystem - should be shut down + should be shut down. -- ``owner``: for internal VFS use: you should initialize this to THIS_MODULE in - most cases. +- ``owner``: for internal VFS use: you should initialize this to + THIS_MODULE in most cases. -- ``next``: for internal VFS use: you should initialize this to NULL +- ``next``: for internal VFS use: you should initialize this to NULL. -- ``s_lock_key``, ``s_umount_key``: lockdep-specific +- ``s_lock_key``, ``s_umount_key``: lockdep-specific. The mount() method has the following arguments: -- ``struct file_system_type *fs_type``: describes the filesystem, partly initialized - by the specific filesystem code +- ``struct file_system_type *fs_type``: describes the filesystem, partly + initialized by the specific filesystem code. -- ``int flags``: mount flags +- ``int flags``: mount flags. - ``const char *dev_name``: the device name we are mounting. - ``void *data``: arbitrary mount options, usually comes as an ASCII - string (see "Mount Options" section) + string (see "Mount Options" section) The mount() method must return the root dentry of the tree requested by caller. An active reference to its superblock must be grabbed and the @@ -179,22 +179,22 @@ implementation. Usually, a filesystem uses one of the generic mount() implementations and provides a fill_super() callback instead. The generic variants are: -- ``mount_bdev``: mount a filesystem residing on a block device +- ``mount_bdev``: mount a filesystem residing on a block device. -- ``mount_nodev``: mount a filesystem that is not backed by a device +- ``mount_nodev``: mount a filesystem that is not backed by a device. - ``mount_single``: mount a filesystem which shares the instance between - all mounts + all mounts. A fill_super() callback implementation has the following arguments: - ``struct super_block *sb``: the superblock structure. The callback - must initialize this properly. + must initialize this properly. - ``void *data``: arbitrary mount options, usually comes as an ASCII - string (see "Mount Options" section) + string (see "Mount Options" section). -- ``int silent``: whether or not to be silent on error +- ``int silent``: whether or not to be silent on error. The Superblock Object @@ -240,87 +240,87 @@ noted. This means that most methods can block safely. All methods are only called from a process context (i.e. not from an interrupt handler or bottom half). -- ``alloc_inode``: this method is called by alloc_inode() to allocate memory - for struct inode and initialize it. If this function is not - defined, a simple 'struct inode' is allocated. Normally - alloc_inode will be used to allocate a larger structure which - contains a 'struct inode' embedded within it. +- ``alloc_inode``: this method is called by alloc_inode() to allocate + memory for struct inode and initialize it. If this function is not + defined, a simple 'struct inode' is allocated. Normally alloc_inode + will be used to allocate a larger structure which contains a 'struct + inode' embedded within it. - ``destroy_inode``: this method is called by destroy_inode() to release - resources allocated for struct inode. It is only required if - ->alloc_inode was defined and simply undoes anything done by - ->alloc_inode. + resources allocated for struct inode. It is only required if + ->alloc_inode was defined and simply undoes anything done by + ->alloc_inode. -- ``dirty_inode``: this method is called by the VFS to mark an inode dirty. +- ``dirty_inode``: this method is called by the VFS to mark an inode + dirty. - ``write_inode``: this method is called when the VFS needs to write an - inode to disc. The second parameter indicates whether the write - should be synchronous or not, not all filesystems check this flag. + inode to disc. The second parameter indicates whether the write + should be synchronous or not, not all filesystems check this flag. - ``drop_inode``: called when the last access to the inode is dropped, - with the inode->i_lock spinlock held. + with the inode->i_lock spinlock held. - This method should be either NULL (normal UNIX filesystem - semantics) or "generic_delete_inode" (for filesystems that do not - want to cache inodes - causing "delete_inode" to always be - called regardless of the value of i_nlink) + This method should be either NULL (normal UNIX filesystem semantics) + or "generic_delete_inode" (for filesystems that do not want to cache + inodes - causing "delete_inode" to always be called regardless of the + value of i_nlink). - The "generic_delete_inode()" behavior is equivalent to the - old practice of using "force_delete" in the put_inode() case, - but does not have the races that the "force_delete()" approach - had. + The "generic_delete_inode()" behavior is equivalent to the old + practice of using "force_delete" in the put_inode() case, but does not + have the races that the "force_delete()" approach had. -- ``delete_inode``: called when the VFS wants to delete an inode +- ``delete_inode``: called when the VFS wants to delete an inode. - ``put_super``: called when the VFS wishes to free the superblock - (i.e. unmount). This is called with the superblock lock held + (i.e. unmount). This is called with the superblock lock held. -- ``sync_fs``: called when VFS is writing out all dirty data associated with - a superblock. The second parameter indicates whether the method - should wait until the write out has been completed. Optional. +- ``sync_fs``: called when VFS is writing out all dirty data associated + with a superblock. The second parameter indicates whether the method + should wait until the write out has been completed. Optional. -- ``freeze_fs``: called when VFS is locking a filesystem and - forcing it into a consistent state. This method is currently - used by the Logical Volume Manager (LVM). +- ``freeze_fs``: called when VFS is locking a filesystem and forcing it + into a consistent state. This method is currently used by the Logical + Volume Manager (LVM). -- ``unfreeze_fs``: called when VFS is unlocking a filesystem and making it writable - again. +- ``unfreeze_fs``: called when VFS is unlocking a filesystem and making + it writable again. - ``statfs``: called when the VFS needs to get filesystem statistics. -- ``remount_fs``: called when the filesystem is remounted. This is called - with the kernel lock held +- ``remount_fs``: called when the filesystem is remounted. This is + called with the kernel lock held. -- ``clear_inode``: called then the VFS clears the inode. Optional +- ``clear_inode``: called then the VFS clears the inode. Optional. - ``umount_begin``: called when the VFS is unmounting a filesystem. - ``show_options``: called by the VFS to show mount options for - /proc/<pid>/mounts. (see "Mount Options" section) + /proc/<pid>/mounts (see "Mount Options" section). - ``quota_read``: called by the VFS to read from filesystem quota file. - ``quota_write``: called by the VFS to write to filesystem quota file. -- ``nr_cached_objects``: called by the sb cache shrinking function for the - filesystem to return the number of freeable cached objects it contains. - Optional. +- ``nr_cached_objects``: called by the sb cache shrinking function for + the filesystem to return the number of freeable cached objects it + contains. Optional. -- ``free_cache_objects``: called by the sb cache shrinking function for the - filesystem to scan the number of objects indicated to try to free them. - Optional, but any filesystem implementing this method needs to also - implement ->nr_cached_objects for it to be called correctly. +- ``free_cache_objects``: called by the sb cache shrinking function for + the filesystem to scan the number of objects indicated to try to free + them. Optional, but any filesystem implementing this method needs to + also implement ->nr_cached_objects for it to be called correctly. - We can't do anything with any errors that the filesystem might - encountered, hence the void return type. This will never be called if - the VM is trying to reclaim under GFP_NOFS conditions, hence this - method does not need to handle that situation itself. + We can't do anything with any errors that the filesystem might + encountered, hence the void return type. This will never be called if + the VM is trying to reclaim under GFP_NOFS conditions, hence this + method does not need to handle that situation itself. - Implementations must include conditional reschedule calls inside any - scanning loop that is done. This allows the VFS to determine - appropriate scan batch sizes without having to worry about whether - implementations will cause holdoff problems due to large scan batch - sizes. + Implementations must include conditional reschedule calls inside any + scanning loop that is done. This allows the VFS to determine + appropriate scan batch sizes without having to worry about whether + implementations will cause holdoff problems due to large scan batch + sizes. Whoever sets up the inode is responsible for filling in the "i_op" field. This is a pointer to a "struct inode_operations" which describes @@ -334,23 +334,24 @@ On filesystems that support extended attributes (xattrs), the s_xattr superblock field points to a NULL-terminated array of xattr handlers. Extended attributes are name:value pairs. -- ``name``: Indicates that the handler matches attributes with the specified name - (such as "system.posix_acl_access"); the prefix field must be NULL. +- ``name``: Indicates that the handler matches attributes with the + specified name (such as "system.posix_acl_access"); the prefix field + must be NULL. -- ``prefix``: Indicates that the handler matches all attributes with the specified - name prefix (such as "user."); the name field must be NULL. +- ``prefix``: Indicates that the handler matches all attributes with the + specified name prefix (such as "user."); the name field must be NULL. -- ``list``: Determine if attributes matching this xattr handler should be listed - for a particular dentry. Used by some listxattr implementations like - generic_listxattr. +- ``list``: Determine if attributes matching this xattr handler should + be listed for a particular dentry. Used by some listxattr + implementations like generic_listxattr. -- ``get``: Called by the VFS to get the value of a particular extended attribute. - This method is called by the getxattr(2) system call. +- ``get``: Called by the VFS to get the value of a particular extended + attribute. This method is called by the getxattr(2) system call. -- ``set``: Called by the VFS to set the value of a particular extended attribute. - When the new value is NULL, called to remove a particular extended - attribute. This method is called by the the setxattr(2) and - removexattr(2) system calls. +- ``set``: Called by the VFS to set the value of a particular extended + attribute. When the new value is NULL, called to remove a particular + extended attribute. This method is called by the the setxattr(2) and + removexattr(2) system calls. When none of the xattr handlers of a filesystem match the specified attribute name or when a filesystem doesn't support extended attributes, @@ -400,120 +401,118 @@ Again, all methods are called without any locks being held, unless otherwise noted. - ``create``: called by the open(2) and creat(2) system calls. Only - required if you want to support regular files. The dentry you - get should not have an inode (i.e. it should be a negative - dentry). Here you will probably call d_instantiate() with the - dentry and the newly created inode + required if you want to support regular files. The dentry you get + should not have an inode (i.e. it should be a negative dentry). Here + you will probably call d_instantiate() with the dentry and the newly + created inode. - ``lookup``: called when the VFS needs to look up an inode in a parent - directory. The name to look for is found in the dentry. This - method must call d_add() to insert the found inode into the - dentry. The "i_count" field in the inode structure should be - incremented. If the named inode does not exist a NULL inode - should be inserted into the dentry (this is called a negative - dentry). Returning an error code from this routine must only - be done on a real error, otherwise creating inodes with system - calls like create(2), mknod(2), mkdir(2) and so on will fail. - If you wish to overload the dentry methods then you should - initialise the "d_dop" field in the dentry; this is a pointer - to a struct "dentry_operations". - This method is called with the directory inode semaphore held - -- ``link``: called by the link(2) system call. Only required if you want - to support hard links. You will probably need to call - d_instantiate() just as you would in the create() method + directory. The name to look for is found in the dentry. This method + must call d_add() to insert the found inode into the dentry. The + "i_count" field in the inode structure should be incremented. If the + named inode does not exist a NULL inode should be inserted into the + dentry (this is called a negative dentry). Returning an error code + from this routine must only be done on a real error, otherwise + creating inodes with system calls like create(2), mknod(2), mkdir(2) + and so on will fail. If you wish to overload the dentry methods then + you should initialise the "d_dop" field in the dentry; this is a + pointer to a struct "dentry_operations". This method is called with + the directory inode semaphore held. + +- ``link``: called by the link(2) system call. Only required if you + want to support hard links. You will probably need to call + d_instantiate() just as you would in the create() method. - ``unlink``: called by the unlink(2) system call. Only required if you - want to support deleting inodes - -- ``symlink``: called by the symlink(2) system call. Only required if you - want to support symlinks. You will probably need to call - d_instantiate() just as you would in the create() method - -- ``mkdir``: called by the mkdir(2) system call. Only required if you want - to support creating subdirectories. You will probably need to - call d_instantiate() just as you would in the create() method - -- ``rmdir``: called by the rmdir(2) system call. Only required if you want - to support deleting subdirectories - -- ``mknod``: called by the mknod(2) system call to create a device (char, - block) inode or a named pipe (FIFO) or socket. Only required - if you want to support creating these types of inodes. You - will probably need to call d_instantiate() just as you would - in the create() method - -- ``rename``: called by the rename(2) system call to rename the object to - have the parent and name given by the second inode and dentry. - - The filesystem must return -EINVAL for any unsupported or - unknown flags. Currently the following flags are implemented: - (1) RENAME_NOREPLACE: this flag indicates that if the target - of the rename exists the rename should fail with -EEXIST - instead of replacing the target. The VFS already checks for - existence, so for local filesystems the RENAME_NOREPLACE - implementation is equivalent to plain rename. - (2) RENAME_EXCHANGE: exchange source and target. Both must - exist; this is checked by the VFS. Unlike plain rename, - source and target may be of different type. - -- ``get_link``: called by the VFS to follow a symbolic link to the - inode it points to. Only required if you want to support - symbolic links. This method returns the symlink body - to traverse (and possibly resets the current position with - nd_jump_link()). If the body won't go away until the inode - is gone, nothing else is needed; if it needs to be otherwise - pinned, arrange for its release by having get_link(..., ..., done) - do set_delayed_call(done, destructor, argument). - In that case destructor(argument) will be called once VFS is - done with the body you've returned. - May be called in RCU mode; that is indicated by NULL dentry - argument. If request can't be handled without leaving RCU mode, - have it return ERR_PTR(-ECHILD). - -- ``readlink``: this is now just an override for use by readlink(2) for the - cases when ->get_link uses nd_jump_link() or object is not in - fact a symlink. Normally filesystems should only implement - ->get_link for symlinks and readlink(2) will automatically use - that. - -- ``permission``: called by the VFS to check for access rights on a POSIX-like - filesystem. - - May be called in rcu-walk mode (mask & MAY_NOT_BLOCK). If in rcu-walk - mode, the filesystem must check the permission without blocking or - storing to the inode. - - If a situation is encountered that rcu-walk cannot handle, return - -ECHILD and it will be called again in ref-walk mode. - -- ``setattr``: called by the VFS to set attributes for a file. This method - is called by chmod(2) and related system calls. - -- ``getattr``: called by the VFS to get attributes of a file. This method - is called by stat(2) and related system calls. + want to support deleting inodes. + +- ``symlink``: called by the symlink(2) system call. Only required if + you want to support symlinks. You will probably need to call + d_instantiate() just as you would in the create() method. + +- ``mkdir``: called by the mkdir(2) system call. Only required if you + want to support creating subdirectories. You will probably need to + call d_instantiate() just as you would in the create() method. + +- ``rmdir``: called by the rmdir(2) system call. Only required if you + want to support deleting subdirectories. + +- ``mknod``: called by the mknod(2) system call to create a device + (char, block) inode or a named pipe (FIFO) or socket. Only required + if you want to support creating these types of inodes. You will + probably need to call d_instantiate() just as you would in the + create() method. + +- ``rename``: called by the rename(2) system call to rename the object + to have the parent and name given by the second inode and dentry. + + The filesystem must return -EINVAL for any unsupported or unknown + flags. Currently the following flags are implemented: + + 1. ``RENAME_NOREPLACE``: this flag indicates that if the target of the + rename exists the rename should fail with -EEXIST instead of + replacing the target. The VFS already checks for existence, so for + local filesystems the RENAME_NOREPLACE implementation is equivalent + to plain rename. + 2. ``RENAME_EXCHANGE``: exchange source and target. Both must exist; + this is checked by the VFS. Unlike plain rename, source and target + may be of different type. + +- ``get_link``: called by the VFS to follow a symbolic link to the inode + it points to. Only required if you want to support symbolic links. + This method returns the symlink body to traverse (and possibly resets + the current position with nd_jump_link()). If the body won't go away + until the inode is gone, nothing else is needed; if it needs to be + otherwise pinned, arrange for its release by having get_link(..., ..., + done) do set_delayed_call(done, destructor, argument). In that case + destructor(argument) will be called once VFS is done with the body + you've returned. May be called in RCU mode; that is indicated by NULL + dentry argument. If request can't be handled without leaving RCU + mode, have it return ERR_PTR(-ECHILD). + +- ``readlink``: this is now just an override for use by readlink(2) for + the cases when ->get_link uses nd_jump_link() or object is not in fact + a symlink. Normally filesystems should only implement ->get_link for + symlinks and readlink(2) will automatically use that. + +- ``permission``: called by the VFS to check for access rights on a + POSIX-like filesystem. + + May be called in rcu-walk mode (mask & MAY_NOT_BLOCK). If in rcu-walk + mode, the filesystem must check the permission without blocking or + storing to the inode. + + If a situation is encountered that rcu-walk cannot handle, return + -ECHILD and it will be called again in ref-walk mode. + +- ``setattr``: called by the VFS to set attributes for a file. This + method is called by chmod(2) and related system calls. + +- ``getattr``: called by the VFS to get attributes of a file. This + method is called by stat(2) and related system calls. - ``listxattr``: called by the VFS to list all extended attributes for a - given file. This method is called by the listxattr(2) system call. + given file. This method is called by the listxattr(2) system call. -- ``update_time``: called by the VFS to update a specific time or the i_version of - an inode. If this is not defined the VFS will update the inode itself - and call mark_inode_dirty_sync. +- ``update_time``: called by the VFS to update a specific time or the + i_version of an inode. If this is not defined the VFS will update the + inode itself and call mark_inode_dirty_sync. -- ``atomic_open``: called on the last component of an open. Using this optional - method the filesystem can look up, possibly create and open the file in - one atomic operation. If it wants to leave actual opening to the - caller (e.g. if the file turned out to be a symlink, device, or just - something filesystem won't do atomic open for), it may signal this by - returning finish_no_open(file, dentry). This method is only called if - the last component is negative or needs lookup. Cached positive dentries - are still handled by f_op->open(). If the file was created, - FMODE_CREATED flag should be set in file->f_mode. In case of O_EXCL - the method must only succeed if the file didn't exist and hence FMODE_CREATED - shall always be set on success. +- ``atomic_open``: called on the last component of an open. Using this + optional method the filesystem can look up, possibly create and open + the file in one atomic operation. If it wants to leave actual opening + to the caller (e.g. if the file turned out to be a symlink, device, or + just something filesystem won't do atomic open for), it may signal + this by returning finish_no_open(file, dentry). This method is only + called if the last component is negative or needs lookup. Cached + positive dentries are still handled by f_op->open(). If the file was + created, FMODE_CREATED flag should be set in file->f_mode. In case of + O_EXCL the method must only succeed if the file didn't exist and hence + FMODE_CREATED shall always be set on success. -- ``tmpfile``: called in the end of O_TMPFILE open(). Optional, equivalent to - atomically creating, opening and unlinking a file in given directory. +- ``tmpfile``: called in the end of O_TMPFILE open(). Optional, + equivalent to atomically creating, opening and unlinking a file in + given directory. The Address Space Object @@ -666,185 +665,180 @@ cache in your filesystem. The following members are defined: int (*swap_deactivate)(struct file *); }; -- ``writepage``: called by the VM to write a dirty page to backing store. - This may happen for data integrity reasons (i.e. 'sync'), or - to free up memory (flush). The difference can be seen in - wbc->sync_mode. - The PG_Dirty flag has been cleared and PageLocked is true. - writepage should start writeout, should set PG_Writeback, - and should make sure the page is unlocked, either synchronously - or asynchronously when the write operation completes. - - If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to - try too hard if there are problems, and may choose to write out - other pages from the mapping if that is easier (e.g. due to - internal dependencies). If it chooses not to start writeout, it - should return AOP_WRITEPAGE_ACTIVATE so that the VM will not keep - calling ->writepage on that page. - - See the file "Locking" for more details. - -- ``readpage``: called by the VM to read a page from backing store. - The page will be Locked when readpage is called, and should be - unlocked and marked uptodate once the read completes. - If ->readpage discovers that it needs to unlock the page for - some reason, it can do so, and then return AOP_TRUNCATED_PAGE. - In this case, the page will be relocated, relocked and if - that all succeeds, ->readpage will be called again. - -- ``writepages``: called by the VM to write out pages associated with the - address_space object. If wbc->sync_mode is WBC_SYNC_ALL, then - the writeback_control will specify a range of pages that must be - written out. If it is WBC_SYNC_NONE, then a nr_to_write is given - and that many pages should be written if possible. - If no ->writepages is given, then mpage_writepages is used - instead. This will choose pages from the address space that are - tagged as DIRTY and will pass them to ->writepage. - -- ``set_page_dirty``: called by the VM to set a page dirty. - This is particularly needed if an address space attaches - private data to a page, and that data needs to be updated when - a page is dirtied. This is called, for example, when a memory - mapped page gets modified. - If defined, it should set the PageDirty flag, and the - PAGECACHE_TAG_DIRTY tag in the radix tree. - -- ``readpages``: called by the VM to read pages associated with the address_space - object. This is essentially just a vector version of - readpage. Instead of just one page, several pages are - requested. - readpages is only used for read-ahead, so read errors are - ignored. If anything goes wrong, feel free to give up. - -- ``write_begin``: - Called by the generic buffered write code to ask the filesystem to - prepare to write len bytes at the given offset in the file. The - address_space should check that the write will be able to complete, - by allocating space if necessary and doing any other internal - housekeeping. If the write will update parts of any basic-blocks on - storage, then those blocks should be pre-read (if they haven't been - read already) so that the updated blocks can be written out properly. - - The filesystem must return the locked pagecache page for the specified - offset, in *pagep, for the caller to write into. - - It must be able to cope with short writes (where the length passed to - write_begin is greater than the number of bytes copied into the page). - - flags is a field for AOP_FLAG_xxx flags, described in - include/linux/fs.h. - - A void * may be returned in fsdata, which then gets passed into - write_end. - - Returns 0 on success; < 0 on failure (which is the error code), in - which case write_end is not called. - -- ``write_end``: After a successful write_begin, and data copy, write_end must - be called. len is the original len passed to write_begin, and copied - is the amount that was able to be copied. - - The filesystem must take care of unlocking the page and releasing it - refcount, and updating i_size. - - Returns < 0 on failure, otherwise the number of bytes (<= 'copied') - that were able to be copied into pagecache. - -- ``bmap``: called by the VFS to map a logical block offset within object to - physical block number. This method is used by the FIBMAP - ioctl and for working with swap-files. To be able to swap to - a file, the file must have a stable mapping to a block - device. The swap system does not go through the filesystem - but instead uses bmap to find out where the blocks in the file - are and uses those addresses directly. +- ``writepage``: called by the VM to write a dirty page to backing + store. This may happen for data integrity reasons (i.e. 'sync'), or + to free up memory (flush). The difference can be seen in + wbc->sync_mode. The PG_Dirty flag has been cleared and PageLocked is + true. writepage should start writeout, should set PG_Writeback, and + should make sure the page is unlocked, either synchronously or + asynchronously when the write operation completes. + + If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to try too + hard if there are problems, and may choose to write out other pages + from the mapping if that is easier (e.g. due to internal + dependencies). If it chooses not to start writeout, it should return + AOP_WRITEPAGE_ACTIVATE so that the VM will not keep calling + ->writepage on that page. + + See the file "Locking" for more details. + +- ``readpage``: called by the VM to read a page from backing store. The + page will be Locked when readpage is called, and should be unlocked + and marked uptodate once the read completes. If ->readpage discovers + that it needs to unlock the page for some reason, it can do so, and + then return AOP_TRUNCATED_PAGE. In this case, the page will be + relocated, relocked and if that all succeeds, ->readpage will be + called again. + +- ``writepages``: called by the VM to write out pages associated with + the address_space object. If wbc->sync_mode is WBC_SYNC_ALL, then the + writeback_control will specify a range of pages that must be written + out. If it is WBC_SYNC_NONE, then a nr_to_write is given and that + many pages should be written if possible. If no ->writepages is + given, then mpage_writepages is used instead. This will choose pages + from the address space that are tagged as DIRTY and will pass them to + ->writepage. + +- ``set_page_dirty``: called by the VM to set a page dirty. This is + particularly needed if an address space attaches private data to a + page, and that data needs to be updated when a page is dirtied. This + is called, for example, when a memory mapped page gets modified. If + defined, it should set the PageDirty flag, and the PAGECACHE_TAG_DIRTY + tag in the radix tree. + +- ``readpages``: called by the VM to read pages associated with the + address_space object. This is essentially just a vector version of + readpage. Instead of just one page, several pages are requested. + readpages is only used for read-ahead, so read errors are ignored. If + anything goes wrong, feel free to give up. + +- ``write_begin``: Called by the generic buffered write code to ask the + filesystem to prepare to write len bytes at the given offset in the + file. The address_space should check that the write will be able to + complete, by allocating space if necessary and doing any other + internal housekeeping. If the write will update parts of any + basic-blocks on storage, then those blocks should be pre-read (if they + haven't been read already) so that the updated blocks can be written + out properly. + + The filesystem must return the locked pagecache page for the specified + offset, in *pagep, for the caller to write into. + + It must be able to cope with short writes (where the length passed to + write_begin is greater than the number of bytes copied into the page). + + flags is a field for AOP_FLAG_xxx flags, described in + include/linux/fs.h. + + A void * may be returned in fsdata, which then gets passed into + write_end. + + Returns 0 on success; < 0 on failure (which is the error code), in + which case write_end is not called. + +- ``write_end``: After a successful write_begin, and data copy, + write_end must be called. len is the original len passed to + write_begin, and copied is the amount that was able to be copied. + + The filesystem must take care of unlocking the page and releasing it + refcount, and updating i_size. + + Returns < 0 on failure, otherwise the number of bytes (<= 'copied') + that were able to be copied into pagecache. + +- ``bmap``: called by the VFS to map a logical block offset within + object to physical block number. This method is used by the FIBMAP + ioctl and for working with swap-files. To be able to swap to a file, + the file must have a stable mapping to a block device. The swap + system does not go through the filesystem but instead uses bmap to + find out where the blocks in the file are and uses those addresses + directly. - ``invalidatepage``: If a page has PagePrivate set, then invalidatepage - will be called when part or all of the page is to be removed - from the address space. This generally corresponds to either a - truncation, punch hole or a complete invalidation of the address - space (in the latter case 'offset' will always be 0 and 'length' - will be PAGE_SIZE). Any private data associated with the page - should be updated to reflect this truncation. If offset is 0 and - length is PAGE_SIZE, then the private data should be released, - because the page must be able to be completely discarded. This may - be done by calling the ->releasepage function, but in this case the - release MUST succeed. - -- ``releasepage``: releasepage is called on PagePrivate pages to indicate - that the page should be freed if possible. ->releasepage - should remove any private data from the page and clear the - PagePrivate flag. If releasepage() fails for some reason, it must - indicate failure with a 0 return value. - releasepage() is used in two distinct though related cases. The - first is when the VM finds a clean page with no active users and - wants to make it a free page. If ->releasepage succeeds, the - page will be removed from the address_space and become free. - - The second case is when a request has been made to invalidate - some or all pages in an address_space. This can happen - through the fadvise(POSIX_FADV_DONTNEED) system call or by the - filesystem explicitly requesting it as nfs and 9fs do (when - they believe the cache may be out of date with storage) by - calling invalidate_inode_pages2(). - If the filesystem makes such a call, and needs to be certain - that all pages are invalidated, then its releasepage will - need to ensure this. Possibly it can clear the PageUptodate - bit if it cannot free private data yet. + will be called when part or all of the page is to be removed from the + address space. This generally corresponds to either a truncation, + punch hole or a complete invalidation of the address space (in the + latter case 'offset' will always be 0 and 'length' will be PAGE_SIZE). + Any private data associated with the page should be updated to reflect + this truncation. If offset is 0 and length is PAGE_SIZE, then the + private data should be released, because the page must be able to be + completely discarded. This may be done by calling the ->releasepage + function, but in this case the release MUST succeed. + +- ``releasepage``: releasepage is called on PagePrivate pages to + indicate that the page should be freed if possible. ->releasepage + should remove any private data from the page and clear the PagePrivate + flag. If releasepage() fails for some reason, it must indicate + failure with a 0 return value. releasepage() is used in two distinct + though related cases. The first is when the VM finds a clean page + with no active users and wants to make it a free page. If + ->releasepage succeeds, the page will be removed from the + address_space and become free. + + The second case is when a request has been made to invalidate some or + all pages in an address_space. This can happen through the + fadvise(POSIX_FADV_DONTNEED) system call or by the filesystem + explicitly requesting it as nfs and 9fs do (when they believe the + cache may be out of date with storage) by calling + invalidate_inode_pages2(). If the filesystem makes such a call, and + needs to be certain that all pages are invalidated, then its + releasepage will need to ensure this. Possibly it can clear the + PageUptodate bit if it cannot free private data yet. - ``freepage``: freepage is called once the page is no longer visible in - the page cache in order to allow the cleanup of any private - data. Since it may be called by the memory reclaimer, it - should not assume that the original address_space mapping still - exists, and it should not block. + the page cache in order to allow the cleanup of any private data. + Since it may be called by the memory reclaimer, it should not assume + that the original address_space mapping still exists, and it should + not block. - ``direct_IO``: called by the generic read/write routines to perform - direct_IO - that is IO requests which bypass the page cache - and transfer data directly between the storage and the - application's address space. - -- ``isolate_page``: Called by the VM when isolating a movable non-lru page. - If page is successfully isolated, VM marks the page as PG_isolated - via __SetPageIsolated. - -- ``migrate_page``: This is used to compact the physical memory usage. - If the VM wants to relocate a page (maybe off a memory card - that is signalling imminent failure) it will pass a new page - and an old page to this function. migrate_page should - transfer any private data across and update any references - that it has to the page. - -- ``putback_page``: Called by the VM when isolated page's migration fails. - -- ``launder_page``: Called before freeing a page - it writes back the dirty page. To - prevent redirtying the page, it is kept locked during the whole - operation. - -- ``is_partially_uptodate``: Called by the VM when reading a file through the - pagecache when the underlying blocksize != pagesize. If the required - block is up to date then the read can complete without needing the IO - to bring the whole page up to date. - -- ``is_dirty_writeback``: Called by the VM when attempting to reclaim a page. - The VM uses dirty and writeback information to determine if it needs - to stall to allow flushers a chance to complete some IO. Ordinarily - it can use PageDirty and PageWriteback but some filesystems have - more complex state (unstable pages in NFS prevent reclaim) or - do not set those flags due to locking problems. This callback - allows a filesystem to indicate to the VM if a page should be - treated as dirty or writeback for the purposes of stalling. - -- ``error_remove_page``: normally set to generic_error_remove_page if truncation - is ok for this address space. Used for memory failure handling. - Setting this implies you deal with pages going away under you, - unless you have them locked or reference counts increased. + direct_IO - that is IO requests which bypass the page cache and + transfer data directly between the storage and the application's + address space. + +- ``isolate_page``: Called by the VM when isolating a movable non-lru + page. If page is successfully isolated, VM marks the page as + PG_isolated via __SetPageIsolated. + +- ``migrate_page``: This is used to compact the physical memory usage. + If the VM wants to relocate a page (maybe off a memory card that is + signalling imminent failure) it will pass a new page and an old page + to this function. migrate_page should transfer any private data + across and update any references that it has to the page. + +- ``putback_page``: Called by the VM when isolated page's migration + fails. + +- ``launder_page``: Called before freeing a page - it writes back the + dirty page. To prevent redirtying the page, it is kept locked during + the whole operation. + +- ``is_partially_uptodate``: Called by the VM when reading a file + through the pagecache when the underlying blocksize != pagesize. If + the required block is up to date then the read can complete without + needing the IO to bring the whole page up to date. + +- ``is_dirty_writeback``: Called by the VM when attempting to reclaim a + page. The VM uses dirty and writeback information to determine if it + needs to stall to allow flushers a chance to complete some IO. + Ordinarily it can use PageDirty and PageWriteback but some filesystems + have more complex state (unstable pages in NFS prevent reclaim) or do + not set those flags due to locking problems. This callback allows a + filesystem to indicate to the VM if a page should be treated as dirty + or writeback for the purposes of stalling. + +- ``error_remove_page``: normally set to generic_error_remove_page if + truncation is ok for this address space. Used for memory failure + handling. Setting this implies you deal with pages going away under + you, unless you have them locked or reference counts increased. - ``swap_activate``: Called when swapon is used on a file to allocate - space if necessary and pin the block lookup information in - memory. A return value of zero indicates success, - in which case this file can be used to back swapspace. + space if necessary and pin the block lookup information in memory. A + return value of zero indicates success, in which case this file can be + used to back swapspace. -- ``swap_deactivate``: Called during swapoff on files where swap_activate - was successful. +- ``swap_deactivate``: Called during swapoff on files where + swap_activate was successful. The File Object @@ -908,87 +902,93 @@ This describes how the VFS can manipulate an open file. As of kernel Again, all methods are called without any locks being held, unless otherwise noted. -- ``llseek``: called when the VFS needs to move the file position index +- ``llseek``: called when the VFS needs to move the file position index. -- ``read``: called by read(2) and related system calls +- ``read``: called by read(2) and related system calls. -- ``read_iter``: possibly asynchronous read with iov_iter as destination +- ``read_iter``: possibly asynchronous read with iov_iter as + destination. -- ``write``: called by write(2) and related system calls +- ``write``: called by write(2) and related system calls. -- ``write_iter``: possibly asynchronous write with iov_iter as source +- ``write_iter``: possibly asynchronous write with iov_iter as source. -- ``iopoll``: called when aio wants to poll for completions on HIPRI iocbs +- ``iopoll``: called when aio wants to poll for completions on HIPRI + iocbs. -- ``iterate_shared``: called when the VFS needs to read the directory contents - when filesystem supports concurrent dir iterators +- ``iterate``: called when the VFS needs to read the directory contents. + +- ``iterate_shared``: called when the VFS needs to read the directory + contents when filesystem supports concurrent dir iterators. - ``poll``: called by the VFS when a process wants to check if there is - activity on this file and (optionally) go to sleep until there - is activity. Called by the select(2) and poll(2) system calls + activity on this file and (optionally) go to sleep until there is + activity. Called by the select(2) and poll(2) system calls. - ``unlocked_ioctl``: called by the ioctl(2) system call. -- ``compat_ioctl``: called by the ioctl(2) system call when 32 bit system calls - are used on 64 bit kernels. +- ``compat_ioctl``: called by the ioctl(2) system call when 32 bit + system calls are used on 64 bit kernels. -- ``mmap``: called by the mmap(2) system call +- ``mmap``: called by the mmap(2) system call. -- ``open``: called by the VFS when an inode should be opened. When the VFS - opens a file, it creates a new "struct file". It then calls the - open method for the newly allocated file structure. You might - think that the open method really belongs in - "struct inode_operations", and you may be right. I think it's - done the way it is because it makes filesystems simpler to - implement. The open() method is a good place to initialize the - "private_data" member in the file structure if you want to point - to a device structure +- ``open``: called by the VFS when an inode should be opened. When the + VFS opens a file, it creates a new "struct file". It then calls the + open method for the newly allocated file structure. You might think + that the open method really belongs in "struct inode_operations", and + you may be right. I think it's done the way it is because it makes + filesystems simpler to implement. The open() method is a good place + to initialize the "private_data" member in the file structure if you + want to point to a device structure. -- ``flush``: called by the close(2) system call to flush a file +- ``flush``: called by the close(2) system call to flush a file. -- ``release``: called when the last reference to an open file is closed +- ``release``: called when the last reference to an open file is closed. -- ``fsync``: called by the fsync(2) system call. Also see the section above - entitled "Handling errors during writeback". +- ``fsync``: called by the fsync(2) system call. Also see the section + above entitled "Handling errors during writeback". - ``fasync``: called by the fcntl(2) system call when asynchronous - (non-blocking) mode is enabled for a file + (non-blocking) mode is enabled for a file. -- ``lock``: called by the fcntl(2) system call for F_GETLK, F_SETLK, and F_SETLKW - commands +- ``lock``: called by the fcntl(2) system call for F_GETLK, F_SETLK, and + F_SETLKW commands. -- ``get_unmapped_area``: called by the mmap(2) system call +- ``get_unmapped_area``: called by the mmap(2) system call. -- ``check_flags``: called by the fcntl(2) system call for F_SETFL command +- ``check_flags``: called by the fcntl(2) system call for F_SETFL + command. -- ``flock``: called by the flock(2) system call +- ``flock``: called by the flock(2) system call. -- ``splice_write``: called by the VFS to splice data from a pipe to a file. This - method is used by the splice(2) system call +- ``splice_write``: called by the VFS to splice data from a pipe to a + file. This method is used by the splice(2) system call. -- ``splice_read``: called by the VFS to splice data from file to a pipe. This - method is used by the splice(2) system call +- ``splice_read``: called by the VFS to splice data from file to a pipe. + This method is used by the splice(2) system call -- ``setlease``: called by the VFS to set or release a file lock lease. setlease - implementations should call generic_setlease to record or remove - the lease in the inode after setting it. +- ``setlease``: called by the VFS to set or release a file lock lease. + setlease implementations should call generic_setlease to record or + remove the lease in the inode after setting it. -- ``fallocate``: called by the VFS to preallocate blocks or punch a hole. +- ``fallocate``: called by the VFS to preallocate blocks or punch a + hole. - ``copy_file_range``: called by the copy_file_range(2) system call. -- ``remap_file_range``: called by the ioctl(2) system call for FICLONERANGE and - FICLONE and FIDEDUPERANGE commands to remap file ranges. An - implementation should remap len bytes at pos_in of the source file into - the dest file at pos_out. Implementations must handle callers passing - in len == 0; this means "remap to the end of the source file". The - return value should the number of bytes remapped, or the usual - negative error code if errors occurred before any bytes were remapped. - The remap_flags parameter accepts REMAP_FILE_* flags. If - REMAP_FILE_DEDUP is set then the implementation must only remap if the - requested file ranges have identical contents. If REMAP_CAN_SHORTEN is - set, the caller is ok with the implementation shortening the request - length to satisfy alignment or EOF requirements (or any other reason). +- ``remap_file_range``: called by the ioctl(2) system call for + FICLONERANGE and FICLONE and FIDEDUPERANGE commands to remap file + ranges. An implementation should remap len bytes at pos_in of the + source file into the dest file at pos_out. Implementations must + handle callers passing in len == 0; this means "remap to the end of + the source file". The return value should the number of bytes + remapped, or the usual negative error code if errors occurred before + any bytes were remapped. The remap_flags parameter accepts + REMAP_FILE_* flags. If REMAP_FILE_DEDUP is set then the + implementation must only remap if the requested file ranges have + identical contents. If REMAP_CAN_SHORTEN is set, the caller is ok + with the implementation shortening the request length to satisfy + alignment or EOF requirements (or any other reason). - ``fadvise``: possibly called by the fadvise64() system call. @@ -1035,146 +1035,147 @@ defined: struct dentry *(*d_real)(struct dentry *, const struct inode *); }; -- ``d_revalidate``: called when the VFS needs to revalidate a dentry. This - is called whenever a name look-up finds a dentry in the - dcache. Most local filesystems leave this as NULL, because all their - dentries in the dcache are valid. Network filesystems are different - since things can change on the server without the client necessarily - being aware of it. +- ``d_revalidate``: called when the VFS needs to revalidate a dentry. + This is called whenever a name look-up finds a dentry in the dcache. + Most local filesystems leave this as NULL, because all their dentries + in the dcache are valid. Network filesystems are different since + things can change on the server without the client necessarily being + aware of it. - This function should return a positive value if the dentry is still - valid, and zero or a negative error code if it isn't. + This function should return a positive value if the dentry is still + valid, and zero or a negative error code if it isn't. - d_revalidate may be called in rcu-walk mode (flags & LOOKUP_RCU). - If in rcu-walk mode, the filesystem must revalidate the dentry without - blocking or storing to the dentry, d_parent and d_inode should not be - used without care (because they can change and, in d_inode case, even - become NULL under us). + d_revalidate may be called in rcu-walk mode (flags & LOOKUP_RCU). If + in rcu-walk mode, the filesystem must revalidate the dentry without + blocking or storing to the dentry, d_parent and d_inode should not be + used without care (because they can change and, in d_inode case, even + become NULL under us). - If a situation is encountered that rcu-walk cannot handle, return - -ECHILD and it will be called again in ref-walk mode. + If a situation is encountered that rcu-walk cannot handle, return + -ECHILD and it will be called again in ref-walk mode. -- ``d_weak_revalidate``: called when the VFS needs to revalidate a "jumped" dentry. - This is called when a path-walk ends at dentry that was not acquired by - doing a lookup in the parent directory. This includes "/", "." and "..", - as well as procfs-style symlinks and mountpoint traversal. +- ``d_weak_revalidate``: called when the VFS needs to revalidate a + "jumped" dentry. This is called when a path-walk ends at dentry that + was not acquired by doing a lookup in the parent directory. This + includes "/", "." and "..", as well as procfs-style symlinks and + mountpoint traversal. - In this case, we are less concerned with whether the dentry is still - fully correct, but rather that the inode is still valid. As with - d_revalidate, most local filesystems will set this to NULL since their - dcache entries are always valid. + In this case, we are less concerned with whether the dentry is still + fully correct, but rather that the inode is still valid. As with + d_revalidate, most local filesystems will set this to NULL since their + dcache entries are always valid. - This function has the same return code semantics as d_revalidate. + This function has the same return code semantics as d_revalidate. - d_weak_revalidate is only called after leaving rcu-walk mode. + d_weak_revalidate is only called after leaving rcu-walk mode. -- ``d_hash``: called when the VFS adds a dentry to the hash table. The first - dentry passed to d_hash is the parent directory that the name is - to be hashed into. +- ``d_hash``: called when the VFS adds a dentry to the hash table. The + first dentry passed to d_hash is the parent directory that the name is + to be hashed into. - Same locking and synchronisation rules as d_compare regarding - what is safe to dereference etc. + Same locking and synchronisation rules as d_compare regarding what is + safe to dereference etc. -- ``d_compare``: called to compare a dentry name with a given name. The first - dentry is the parent of the dentry to be compared, the second is - the child dentry. len and name string are properties of the dentry - to be compared. qstr is the name to compare it with. +- ``d_compare``: called to compare a dentry name with a given name. The + first dentry is the parent of the dentry to be compared, the second is + the child dentry. len and name string are properties of the dentry to + be compared. qstr is the name to compare it with. - Must be constant and idempotent, and should not take locks if - possible, and should not or store into the dentry. - Should not dereference pointers outside the dentry without - lots of care (eg. d_parent, d_inode, d_name should not be used). + Must be constant and idempotent, and should not take locks if + possible, and should not or store into the dentry. Should not + dereference pointers outside the dentry without lots of care (eg. + d_parent, d_inode, d_name should not be used). - However, our vfsmount is pinned, and RCU held, so the dentries and - inodes won't disappear, neither will our sb or filesystem module. - ->d_sb may be used. + However, our vfsmount is pinned, and RCU held, so the dentries and + inodes won't disappear, neither will our sb or filesystem module. + ->d_sb may be used. - It is a tricky calling convention because it needs to be called under - "rcu-walk", ie. without any locks or references on things. + It is a tricky calling convention because it needs to be called under + "rcu-walk", ie. without any locks or references on things. -- ``d_delete``: called when the last reference to a dentry is dropped and the - dcache is deciding whether or not to cache it. Return 1 to delete - immediately, or 0 to cache the dentry. Default is NULL which means to - always cache a reachable dentry. d_delete must be constant and - idempotent. +- ``d_delete``: called when the last reference to a dentry is dropped + and the dcache is deciding whether or not to cache it. Return 1 to + delete immediately, or 0 to cache the dentry. Default is NULL which + means to always cache a reachable dentry. d_delete must be constant + and idempotent. -- ``d_init``: called when a dentry is allocated +- ``d_init``: called when a dentry is allocated. -- ``d_release``: called when a dentry is really deallocated +- ``d_release``: called when a dentry is really deallocated. - ``d_iput``: called when a dentry loses its inode (just prior to its - being deallocated). The default when this is NULL is that the - VFS calls iput(). If you define this method, you must call - iput() yourself + being deallocated). The default when this is NULL is that the VFS + calls iput(). If you define this method, you must call iput() + yourself. - ``d_dname``: called when the pathname of a dentry should be generated. - Useful for some pseudo filesystems (sockfs, pipefs, ...) to delay - pathname generation. (Instead of doing it when dentry is created, - it's done only when the path is needed.). Real filesystems probably - dont want to use it, because their dentries are present in global - dcache hash, so their hash should be an invariant. As no lock is - held, d_dname() should not try to modify the dentry itself, unless - appropriate SMP safety is used. CAUTION : d_path() logic is quite - tricky. The correct way to return for example "Hello" is to put it - at the end of the buffer, and returns a pointer to the first char. - dynamic_dname() helper function is provided to take care of this. - - .. code-block:: c - - static char *pipefs_dname(struct dentry *dent, char *buffer, int buflen) - { - return dynamic_dname(dentry, buffer, buflen, "pipe:[%lu]", - dentry->d_inode->i_ino); - } - -- ``d_automount``: called when an automount dentry is to be traversed (optional). - This should create a new VFS mount record and return the record to the - caller. The caller is supplied with a path parameter giving the - automount directory to describe the automount target and the parent - VFS mount record to provide inheritable mount parameters. NULL should - be returned if someone else managed to make the automount first. If - the vfsmount creation failed, then an error code should be returned. - If -EISDIR is returned, then the directory will be treated as an - ordinary directory and returned to pathwalk to continue walking. - - If a vfsmount is returned, the caller will attempt to mount it on the - mountpoint and will remove the vfsmount from its expiration list in - the case of failure. The vfsmount should be returned with 2 refs on - it to prevent automatic expiration - the caller will clean up the - additional ref. - - This function is only used if DCACHE_NEED_AUTOMOUNT is set on the - dentry. This is set by __d_instantiate() if S_AUTOMOUNT is set on the - inode being added. - -- ``d_manage``: called to allow the filesystem to manage the transition from a - dentry (optional). This allows autofs, for example, to hold up clients - waiting to explore behind a 'mountpoint' while letting the daemon go - past and construct the subtree there. 0 should be returned to let the - calling process continue. -EISDIR can be returned to tell pathwalk to - use this directory as an ordinary directory and to ignore anything - mounted on it and not to check the automount flag. Any other error - code will abort pathwalk completely. - - If the 'rcu_walk' parameter is true, then the caller is doing a - pathwalk in RCU-walk mode. Sleeping is not permitted in this mode, - and the caller can be asked to leave it and call again by returning - -ECHILD. -EISDIR may also be returned to tell pathwalk to - ignore d_automount or any mounts. - - This function is only used if DCACHE_MANAGE_TRANSIT is set on the - dentry being transited from. - -- ``d_real``: overlay/union type filesystems implement this method to return one of - the underlying dentries hidden by the overlay. It is used in two - different modes: - - Called from file_dentry() it returns the real dentry matching the inode - argument. The real dentry may be from a lower layer already copied up, - but still referenced from the file. This mode is selected with a - non-NULL inode argument. - - With NULL inode the topmost real underlying dentry is returned. + Useful for some pseudo filesystems (sockfs, pipefs, ...) to delay + pathname generation. (Instead of doing it when dentry is created, + it's done only when the path is needed.). Real filesystems probably + dont want to use it, because their dentries are present in global + dcache hash, so their hash should be an invariant. As no lock is + held, d_dname() should not try to modify the dentry itself, unless + appropriate SMP safety is used. CAUTION : d_path() logic is quite + tricky. The correct way to return for example "Hello" is to put it at + the end of the buffer, and returns a pointer to the first char. + dynamic_dname() helper function is provided to take care of this. + + .. code-block:: c + + static char *pipefs_dname(struct dentry *dent, char *buffer, int buflen) + { + return dynamic_dname(dentry, buffer, buflen, "pipe:[%lu]", + dentry->d_inode->i_ino); + } + +- ``d_automount``: called when an automount dentry is to be traversed + (optional). This should create a new VFS mount record and return the + record to the caller. The caller is supplied with a path parameter + giving the automount directory to describe the automount target and + the parent VFS mount record to provide inheritable mount parameters. + NULL should be returned if someone else managed to make the automount + first. If the vfsmount creation failed, then an error code should be + returned. If -EISDIR is returned, then the directory will be treated + as an ordinary directory and returned to pathwalk to continue walking. + + If a vfsmount is returned, the caller will attempt to mount it on the + mountpoint and will remove the vfsmount from its expiration list in + the case of failure. The vfsmount should be returned with 2 refs on + it to prevent automatic expiration - the caller will clean up the + additional ref. + + This function is only used if DCACHE_NEED_AUTOMOUNT is set on the + dentry. This is set by __d_instantiate() if S_AUTOMOUNT is set on the + inode being added. + +- ``d_manage``: called to allow the filesystem to manage the transition + from a dentry (optional). This allows autofs, for example, to hold up + clients waiting to explore behind a 'mountpoint' while letting the + daemon go past and construct the subtree there. 0 should be returned + to let the calling process continue. -EISDIR can be returned to tell + pathwalk to use this directory as an ordinary directory and to ignore + anything mounted on it and not to check the automount flag. Any other + error code will abort pathwalk completely. + + If the 'rcu_walk' parameter is true, then the caller is doing a + pathwalk in RCU-walk mode. Sleeping is not permitted in this mode, + and the caller can be asked to leave it and call again by returning + -ECHILD. -EISDIR may also be returned to tell pathwalk to ignore + d_automount or any mounts. + + This function is only used if DCACHE_MANAGE_TRANSIT is set on the + dentry being transited from. + +- ``d_real``: overlay/union type filesystems implement this method to + return one of the underlying dentries hidden by the overlay. It is + used in two different modes: + + Called from file_dentry() it returns the real dentry matching the + inode argument. The real dentry may be from a lower layer already + copied up, but still referenced from the file. This mode is selected + with a non-NULL inode argument. + + With NULL inode the topmost real underlying dentry is returned. Each dentry has a pointer to its parent dentry, as well as a hash list of child dentries. Child dentries are basically like files in a @@ -1188,39 +1189,38 @@ There are a number of functions defined which permit a filesystem to manipulate dentries: - ``dget``: open a new handle for an existing dentry (this just increments - the usage count) + the usage count). - ``dput``: close a handle for a dentry (decrements the usage count). If - the usage count drops to 0, and the dentry is still in its - parent's hash, the "d_delete" method is called to check whether - it should be cached. If it should not be cached, or if the dentry - is not hashed, it is deleted. Otherwise cached dentries are put - into an LRU list to be reclaimed on memory shortage. + the usage count drops to 0, and the dentry is still in its parent's hash, + the "d_delete" method is called to check whether it should be cached. If + it should not be cached, or if the dentry is not hashed, it is deleted. + Otherwise cached dentries are put into an LRU list to be reclaimed on + memory shortage. - ``d_drop``: this unhashes a dentry from its parents hash list. A - subsequent call to dput() will deallocate the dentry if its - usage count drops to 0 + subsequent call to dput() will deallocate the dentry if its usage count + drops to 0. - ``d_delete``: delete a dentry. If there are no other open references to - the dentry then the dentry is turned into a negative dentry - (the d_iput() method is called). If there are other - references, then d_drop() is called instead + the dentry then the dentry is turned into a negative dentry (the d_iput() + method is called). If there are other references, then d_drop() is + called instead. - ``d_add``: add a dentry to its parents hash list and then calls - d_instantiate() + d_instantiate(). - ``d_instantiate``: add a dentry to the alias hash list for the inode and - updates the "d_inode" member. The "i_count" member in the - inode structure should be set/incremented. If the inode - pointer is NULL, the dentry is called a "negative - dentry". This function is commonly called when an inode is - created for an existing negative dentry + updates the "d_inode" member. The "i_count" member in the inode + structure should be set/incremented. If the inode pointer is NULL, the + dentry is called a "negative dentry". This function is commonly called + when an inode is created for an existing negative dentry. -- ``d_lookup``: look up a dentry given its parent and path name component - It looks up the child of that given name from the dcache - hash table. If it is found, the reference count is incremented - and the dentry is returned. The caller must use dput() - to free the dentry when it finishes using it. +- ``d_lookup``: look up a dentry given its parent and path name component. + It looks up the child of that given name from the dcache hash table. If + it is found, the reference count is incremented and the dentry is + returned. The caller must use dput() to free the dentry when it finishes + using it. Mount Options @@ -1234,8 +1234,8 @@ On mount and remount the filesystem is passed a string containing a comma separated list of mount options. The options can have either of these forms: - option - option=value +- option +- option=value The <linux/parser.h> header defines an API that helps parse these options. There are plenty of examples on how to use it in existing @@ -1248,11 +1248,11 @@ Showing options If a filesystem accepts mount options, it must define show_options() to show all the currently active options. The rules are: - - options MUST be shown which are not default or their values differ - from the default +- options MUST be shown which are not default or their values differ from + the default. - - options MAY be shown which are enabled by default or have their - default value +- options MAY be shown which are enabled by default or have their default + value. Options used only internally between a mount helper and the kernel (such as file descriptors), or which only have an effect during the mounting -- 2.21.0