[Finally getting back to this, now split into two patches] Various updates to chapters 1,2,4 and 5 of the XFS User Guide. Fixed various spelling/grammar mistakes, updated outdated and/or incorrect facts, added some new slides for delayed allocation and direct i/o. Lachlan diff --git a/XFS_User_Guide/en-US/XFS-Background.xml b/XFS_User_Guide/en-US/XFS-Background.xml index e20f6e0..b610f60 100644 --- a/XFS_User_Guide/en-US/XFS-Background.xml +++ b/XFS_User_Guide/en-US/XFS-Background.xml @@ -195,12 +195,12 @@ </listitem> <listitem> <para> - Large filesystems: one terabyte, 2<superscript>40</superscript>, on 32 bit systems; unlimited on 64 bit systems + Large files: up to 9 ExaBytes (16TB on 32-bit systems). </para> </listitem> <listitem> <para> - Large files: one terabyte, 2<superscript>40</superscript>, on 32 bit systems; 2<superscript>63</superscript> on 64 bit systems + Large filesystems: up to 18 ExaBytes (16TB on 32-bit systems). </para> </listitem> <listitem> @@ -220,7 +220,7 @@ </listitem> <listitem> <para> - Parallel access to inodes + Parallelized access via allocation groups </para> </listitem> <listitem> @@ -230,17 +230,32 @@ </listitem> <listitem> <para> - Asynchronous metadata transaction logging for quick recover + Asynchronous metadata transaction logging for quick recovery </para> </listitem> <listitem> <para> - Delayed allocation to improve data contiguity + Delayed allocation to improve data contiguity and performance </para> </listitem> <listitem> <para> - ACL's --Access Control Lists (see <command>chacl(1)</command>, <command>acl(4)</command>, <command>acl_get_file(3c)</command>, <command>acl_set_file(3c)</command> + Extended attributes (such as Access Control Lists) + </para> + </listitem> + <listitem> + <para> + Extent based allocation (including unwritten extents) + </para> + </listitem> + <listitem> + <para> + Variable allocation block sizes + </para> + </listitem> + <listitem> + <para> + Direct I/O </para> </listitem> </itemizedlist> @@ -264,9 +279,6 @@ <para> 2.4 kernel (2004) </para> - <para> - SLES9 and beyond - </para> </listitem> </itemizedlist> <para> @@ -286,19 +298,30 @@ <section> <title>Who is using XFS</title> <para> - <ulink url="http://xfs.org/index.php/XFS_Companies" /> - <ulink url="http://xfs.org/index.php/Linux_Distributions_shipping_XFS" /> + XFS is included in a number of distributions: </para> <itemizedlist> <listitem> <para> + <ulink url="http://xfs.org/index.php/Linux_Distributions_shipping_XFS" /> + </para> + </listitem> + <listitem> + <para> List is not always current, but it gives an indication of the spread of users </para> </listitem> </itemizedlist> <para> - XFS is included in a number of distributions + Some of the companies using XFS: </para> + <itemizedlist> + <listitem> + <para> + <ulink url="http://xfs.org/index.php/XFS_Companies" /> + </para> + </listitem> + </itemizedlist> </section> <section> <title>XFS Distributions – kernelspace</title> diff --git a/XFS_User_Guide/en-US/XFS-Overview.xml b/XFS_User_Guide/en-US/XFS-Overview.xml index 1762b39..afb84e1 100644 --- a/XFS_User_Guide/en-US/XFS-Overview.xml +++ b/XFS_User_Guide/en-US/XFS-Overview.xml @@ -50,9 +50,9 @@ <title>Filesystem Block Size (FSB)</title> <para>Filesystem blocks (FSBs) are the unit of space for a filesystem</para> <itemizedlist> - <listitem><para>Filesystem blocks are comprised of one or more device-level sectors.</para></listitem> + <listitem><para>Filesystem blocks are composed of one or more device-level sectors.</para></listitem> </itemizedlist> - <para>The page management implementation in Linux limits the FSB size to the page size</para> + <para>The page management implementation in Linux limits the maximum FSB size to the page size</para> <itemizedlist> <listitem><para>4KB on ia32 and x86_64 architectures</para></listitem> <listitem><para>16KB on ia64</para></listitem> @@ -72,7 +72,13 @@ <itemizedlist> <listitem><para>For very large files, the file’s inode may have thousands of extents, or one very large extent. Usually something in between.</para></listitem> </itemizedlist> - <para>Extents are also used for file and directory metadata when the information exceeds the space reserved for an inode</para> + <para>Extents are used for files, directory metadata and extended attributes when the information exceeds the space reserved in the inode</para> + <para>Using extents helps to</para> + <itemizedlist> + <listitem><para>minimize the disk space required to store a file's block map</para></listitem> + <listitem><para>reduce the effects of fragmentation</para></listitem> + <listitem><para>improve I/O performance by allowing fewer and larger I/O operations</para></listitem> + </itemizedlist> </section> <section> <title>Unwritten Extents</title> @@ -87,11 +93,10 @@ </itemizedlist> </para></listitem> <listitem><para><command>fallocate(1)</command> on newer glibc versions</para></listitem> - <listitem><para>Through direct IOs of specific (un)alignment.</para></listitem> + <listitem><para>Through direct IOs of specific alignment (such as stripe boundaries)</para></listitem> </itemizedlist> - <para>They are a security measure, to ensure allocated but not yet initialised space - ondisk is not visible to arbitrary users</para> <para>Unwritten extents apply only to regular files.</para> + <para>The unwritten state prevents the uninitialised data in the extent from being exposed to the user.</para> <para>Once such an extent is written to, or partially written to, a transaction is issued to convert the written part into a regular written extent, and mark the remaining (up to 2) extents as unwritten.</para> @@ -99,20 +104,66 @@ <para><command># xfs_io -f -c 'resvsp 0 10m' -c 'bmap -vp' /tmp/foo</command></para> </section> <section> + <title>Delayed Allocation</title> + <para>Delayed allocation splits file block allocation into two stages:</para> + <itemizedlist> + <listitem><para>Reservation - disk space is reserved (but not allocated) when writing to cache + <itemizedlist> + <listitem><para>decrements free block count</para></listitem> + <listitem><para>creates a virtual 'delalloc' extent</para></listitem> + </itemizedlist> + </para></listitem> + <listitem><para>Allocation - disk blocks are allocated when flushing data from cache to disk + <itemizedlist> + <listitem><para>converts 'delalloc' extent to real extent</para></listitem> + </itemizedlist> + </para></listitem> + </itemizedlist> + <para>Benefits of delayed allocation</para> + <itemizedlist> + <listitem><para>Fragmentation is reduced by combining writes and allocating extents in large chunks</para></listitem> + <listitem><para>Short lived files may never need to be allocated</para></listitem> + <listitem><para>Files written randomly (such as those that are memory mapped) can now be allocated contiguously</para></listitem> + </itemizedlist> + </section> + <section> + <title>Direct I/O</title> + <para>Direct I/O allows an application to transfer data directly to disk from an application buffer and vice versa.</para> + <itemizedlist> + <listitem><para>Data does not pass through the filesystem cache</para></listitem> + <listitem><para>Data is transferred by DMA and does not involve CPU overhead</para></listitem> + <listitem><para>Synchronous I/O</para></listitem> + <listitem><para>XFS allows for parallel writes to same file</para></listitem> + </itemizedlist> + <para>Uses of direct I/O</para> + <itemizedlist> + <listitem><para>Backup programs, so that they can work without polluting the page cache</para></listitem> + <listitem><para>Applications that need 'intelligent' caching</para></listitem> + <listitem><para>High performance, bandwidth intensive workloads</para></listitem> + </itemizedlist> + </section> + <section> + <title>Stripe Alignment</title> + <para>Delayed allocations can be aligned to stripe unit/width boundaries if past eof</para> + <para>Direct I/O can align block allocations on stripe unit/width boundaries</para> + </section> + <section> <title>Inodes</title> <para>XFS has three inode structures</para> - <para>Ondisk inode</para> + <para>XFS inode</para> <itemizedlist> - <listitem><para>Used for storing the metadata for all files, directories and other file types</para></listitem> - <listitem><para>By default 256 bytes but can be up to 2KiB</para></listitem> + <listitem><para>In-memory XFS inode used only by the filesystem</para></listitem> </itemizedlist> - <para> Linux inode</para> + <para>Ondisk inode</para> <itemizedlist> - <listitem><para>xfs_inode_t has the Linux inode embedded in it</para></listitem> + <listitem><para>Used for storing the metadata for files, directories and other file types</para></listitem> + <listitem><para>Default size is 256 bytes and can be up to 2KB</para></listitem> + <listitem><para>Embedded within the XFS inode</para></listitem> </itemizedlist> - <para>XFS inode</para> + <para> Linux inode</para> <itemizedlist> - <listitem><para>xfs_inode contains the ondisk inode structure in memory</para></listitem> + <listitem><para>Generic inode structure used by VFS</para></listitem> + <listitem><para>Embedded within the XFS inode</para></listitem> </itemizedlist> </section> <section> @@ -123,16 +174,19 @@ </section> <section> <title>Journal Log</title> - <para>XFS Journal logs all metadata transactions</para> + <para>XFS Journal logs all metadata changes</para> <itemizedlist> - <listitem><para>No record of data, only that the file size had changed</para></listitem> + <listitem><para>Only filesystem metadata is logged, not user data</para></listitem> </itemizedlist> - <para>Allows the filesystem to replay and recover the filesystem in seconds</para> + <para>Allows the filesystem to replay the log and recover the filesystem quickly after a crash</para> <itemizedlist> <listitem><para>No requirement to run fsck</para></listitem> </itemizedlist> - <para>Log replay will apply filesystem and metadata changes that had been - logged but may not have been applied to the filesystem when it went down</para> + <para>Log replay will apply filesystem and metadata changes during a mount that had been + logged but may not have yet been applied to the filesystem</para> <para>The log may be located on a separate device</para> + <itemizedlist> + <listitem><para>Can improve performance due to reduced disk contention</para></listitem> + </itemizedlist> </section> </chapter> diff --git a/XFS_User_Guide/en-US/XFS-allocators.xml b/XFS_User_Guide/en-US/XFS-allocators.xml index ba5bad0..41ba28a 100644 --- a/XFS_User_Guide/en-US/XFS-allocators.xml +++ b/XFS_User_Guide/en-US/XFS-allocators.xml @@ -20,7 +20,6 @@ <para>New directories are placed in different AGs where possible</para> <para>Watch the inode numbers as directory inodes are created:</para> <para><programlisting> -> mkdir a b > mkdir a b > ls -li total 0 131 drwxr-xr-x 2 sjv users 6 2006-10-20 12:12 a diff --git a/XFS_User_Guide/en-US/XFS-mkfs.xml b/XFS_User_Guide/en-US/XFS-mkfs.xml index ce26572..17e8d67 100644 --- a/XFS_User_Guide/en-US/XFS-mkfs.xml +++ b/XFS_User_Guide/en-US/XFS-mkfs.xml @@ -5,7 +5,7 @@ <title>mkfs</title> <section> <title>Creating XFS Filesystems</title> - <para>mkfs.xfs supports a large number of options for configuration a large number of different XFS filesystems</para> + <para>mkfs.xfs supports a large number of options for configuring many different XFS filesystems</para> <itemizedlist> <listitem><para>See <command>mkfs.xfs(8)</command></para></listitem> </itemizedlist> @@ -14,15 +14,15 @@ <listitem><para>100s = 100 sectors = 100 x 512 bytes*</para></listitem> <listitem><para>100b = 100 blocks = 100 x 4 kilobytes*</para></listitem> <listitem><para>100k = 100 * 1024 bytes</para></listitem> - <listitem><para>Assuming 512 bytes sectors and 4 KB filesyste</para></listitem> + <listitem><para>Assuming 512 bytes sectors and 4 KB filesystem block size</para></listitem> </itemizedlist> <para>-N option can be used to show filesystem parameters without creating a filesystem</para> <para>Also provides the capability to pre-initialise the filesystem with directories and inodes, which is useful for testing</para> </section> <section> <title>mkfs - Allocation Block Size</title> - <para>Specify the fundamental block size of the filesystem.</para> - <para>The default value is 4096 bytes (4 KB), the minimum is 512, and the maximum is 65536 (64 KB).</para> + <para>Specify the fundamental allocation block size of the filesystem.</para> + <para>The default value is 4KB, the minimum is 512 bytes, and the maximum is 64KB</para> <para>XFS on Linux currently only supports pagesize or smaller blocks.</para> <para>To create a filesystem with a block size of 2048 bytes you would use:</para> <para><command>mkfs.xfs -b size=2048 device</command></para> @@ -30,17 +30,17 @@ </section> <section> <title>mkfs - Allocation groups</title> - <para>The data section of an XFS filesystem is divided into allocation groups</para> - <para>More allocation groups imply more parallelism when allocation blocks and inodes.</para> - <para>To create filesystem with 16 allocation groups you would use:</para> + <para>An XFS filesystem is divided into allocation groups</para> + <para>More allocation groups offer more parallelism when allocating blocks and inodes</para> + <para>To create filesystem with 16 allocation groups you would use</para> <para><command>mkfs.xfs -d acount=16 device</command></para> - <para>To create a filesystem with fixed size allocation groups</para> + <para>To create a filesystem with a specific size for the allocation groups</para> <para><command>mkfs.xfs -d agsize=4g device</command></para> <para><emphasis>Filesystems with too few or too many allocation groups should be avoided.</emphasis></para> </section> <section> <title>mkfs - Stripe Alignment</title> - <para>Aligning file data on stripe width boundaries can significantly improve performance on large RAIDs</para> + <para>Aligning file data I/Os on stripe width boundaries can significantly improve performance on large RAIDs</para> <itemizedlist> <listitem><para>A 2MB write to filesystem with a 2MB stripe width and 512KB stripe unit will result in four I/Os, one to each lun. Without alignment this would often require two I/Os to one disk, @@ -74,28 +74,33 @@ <section> <title>mkfs - Extended Attributes</title> <para>Specify the version of extended attribute inline allocation policy to be used.</para> - <para>Default is zero, when extended attributes are used for the first time the version - will be set to either one or two.</para> - <para>Version two uses a more efficient algorithm for managing the available inline inode space - than version one, however, for backward compatibility, version one is selected by default.</para> + <itemizedlist> + <listitem><para>Version 1 has fixed regions for attribute and extent data</para></listitem> + <listitem><para>Version 2 (default) uses a more efficient algorithm for managing the available inline inode space</para></listitem> + <listitem><para>Version 1 inodes are automatically converted to version 2 on the fly</para></listitem> + </itemizedlist> <para>To force the use of version two extended attributes you would use:</para> <para><command>mkfs.xfs -i attr=2 device</command></para> </section> <section> <title>mkfs - Naming options</title> <para>Specify the version and size parameters for the naming (directory) area of the filesystem.</para> - <para>The naming (directory) version is 1 or 2, defaulting to 2 if unspecified.</para> - <para>XFS on Linux does not support naming (directory) version 1.</para> + <itemizedlist> + <listitem><para>The naming (directory) version is either 2 (default) or ci (implies version 2).</para></listitem> + <listitem><para>Version 2 allows for the directory block size to be a multiple of the allocation block size</para></listitem> + <listitem><para>Version ci supports ASCII-only case-insensitive naming</para></listitem> + <listitem><para>XFS on Linux does not support naming (directory) version 1.</para></listitem> + </itemizedlist> <para>To create a filesystem with a directory block size of 16KB you would use:</para> <para><command>mkfs.xfs -n size=16384 device</command></para> - <para>ASCII-only case-insensitive naming is also supported:</para> + <para>ASCII-only case-insensitive naming:</para> <para><command>mkfs.xfs -n version=ci device</command></para> </section> <section> <title>mkfs - External Log</title> <para>The journal log can be on a different device to the rest of the filesystem</para> <itemizedlist> - <listitem><para>At least 512 blocks.</para></listitem> + <listitem><para>At least 512 filesystem blocks</para></listitem> <listitem><para>No more than 64K blocks or 128MB, whichever is smaller</para></listitem> <listitem><para>Defaults to maximum size for >1TB filesystems</para></listitem> </itemizedlist> @@ -105,7 +110,6 @@ </itemizedlist> <para><command>mkfs.xfs -l logdev=log_device device</command></para> <para><command>mount -o logdev=log_device device path</command></para> - <para>XXX Image goes here</para> </section> <section> <title>mkfs - Realtime</title> @@ -123,7 +127,6 @@ <para>Receives limited testing and support in Linux</para> <note><para>Filesystems created with a realtime subvolume can only be mounted on kernels with CONFIG_XFS_RT enabled</para></note> - <para>XXX Image goes here</para> </section> <section> <title>mkfs - Filesystem Image</title> diff --git a/XFS_User_Guide/en-US/XFS-mount.xml b/XFS_User_Guide/en-US/XFS-mount.xml index e175f95..75b36b4 100644 --- a/XFS_User_Guide/en-US/XFS-mount.xml +++ b/XFS_User_Guide/en-US/XFS-mount.xml @@ -30,32 +30,31 @@ <para><command>mount -o logdev=log_device,rtdev=rt_device device mountpoint</command></para> </section> <section> - <title>Mount Options - 64bit Inodes</title> - <para>By default XFS uses 32bit inodes</para> - <itemizedlist> - <listitem><para>The inode’s number roughly equates to its location on disk + <title>Mount Options - 32 or 64 bit Inodes</title> + <para>The inode's number is derived from its location on disk <itemizedlist> <listitem><para>Combination of allocation group, cluster and block</para></listitem> </itemizedlist> - </para></listitem> - <listitem><para>Inode on Linux is 32bit on 32bit machines - <itemizedlist> - <listitem><para>May change in future kernels</para></listitem> - </itemizedlist> - </para></listitem> - <listitem><para>Allocator will place 32bit inodes in the first terabyte + </para> + <para>32 bit inodes (default):</para> + <itemizedlist> + <listitem><para>Allocator can only place 32bit inodes in the first terabyte <itemizedlist> <listitem><para>Using a larger inode size means less inodes per cluster allowing 32bit inodes to be located beyond the first terabyte</para></listitem> </itemizedlist> </para></listitem> + <listitem><para>Allocator will rotate data extents across allocations groups to leave room for inodes</para></listitem> </itemizedlist> - <para>inode64 option on 64bit machines allows inodes to span the entire filesystem</para> + <para>64 bit inodes:</para> <itemizedlist> + <listitem><para>Only available on 64bit machines</para></listitem> + <listitem><para>Use inode64 mount option to enable</para></listitem> + <listitem><para>Allows inodes to span the entire filesystem</para></listitem> <listitem><para>Allocator will try to put file extents in same allocation group as inode</para></listitem> <listitem><para>Not all backup tools support 64bit inodes <itemizedlist> - <listitem><para>Inode number used to identify file between backups</para></listitem> + <listitem><para>Inode number used to uniquely identify files in backups</para></listitem> </itemizedlist> </para></listitem> </itemizedlist> @@ -66,7 +65,7 @@ <para>Values must be specified in 512-byte block units.</para> <para>For example, to use a stripe unit of 1MB and a stripe width of 8MB:</para> <para><command>mount -o sunit=2048,swidth=16384 device mountpoint</command></para> - <para><command>swalloc</command> option</para> + <para><command>swalloc</command> mount option</para> <itemizedlist> <listitem><para>data allocations will be rounded up to stripe width boundaries when the current end of file is being extended and the file size is larger than the @@ -75,6 +74,7 @@ </section> <section> <title>Mount Options - Large I/O</title> + <para>These mount options affect the preferred filesystem I/O size reported by <command>stat(2)</command></para> <para><command>largeio</command></para> <itemizedlist> <listitem><para>A filesystem that has a <command>swidth</command> specified will return the @@ -82,29 +82,30 @@ <listitem><para>If the filesystem does not have a <command>swidth</command> specified but does specify an <command>allocsize</command> then <command>allocsize</command> (in bytes) will be returned instead.</para></listitem> - <listitem><para>If neither of these two options are specified, then filesystem will behave as - if <command>nolargeio</command> was specified.</para></listitem> </itemizedlist> - <para><command>largeio</command></para> + <para><command>nolargeio</command> (default)</para> <itemizedlist> - <listitem><para>The optimal I/O reported in <command>st_blksize</command> by <command>stat(2)</command> - will be as small as possible to allow user applications to avoid inefficient - read/modify/write I/O.</para></listitem> + <listitem><para>The optimal I/O reported in st_blksize will be as small as possible to + allow user applications to avoid inefficient read/modify/write I/O.</para></listitem> </itemizedlist> + <para>If neither of these two options are specified, then the filesystem will behave as if + <command>nolargeio</command> was specified.</para> </section> <section> <title>Mount Options - Performance Tweaks</title> - <para><command>osyncisdsync (default/deprecated)</command></para> + <para><command>osyncisdsync</command> (default/deprecated)</para> <itemizedlist> <listitem><para>Writes to files opened with the O_SYNC flag set will behave as if the O_DSYNC flag had been used instead.</para></listitem> <listitem><para>This can result in better performance without compromising data safety.</para></listitem> <listitem><para>However timestamp updates from O_SYNC writes can be lost if the system crashes. - Use osyncisosync to disable this setting.</para></listitem> + Use <command>osyncisosync</command> to disable this setting.</para></listitem> </itemizedlist> <para><command>ikeep</command></para> <itemizedlist> <listitem><para>When inode clusters are emptied of inodes, keep them around on the disk.</para></listitem> + <listitem><para>Use the <command>noikeep</command> option to force empty inode clusters to be returned to + the free space pool.</para></listitem> </itemizedlist> </section> <section> @@ -145,21 +146,24 @@ </section> <section> <title>Mount Options - Barriers</title> - <para>Write cache on disk can result in filesystem corruption since XFS is told the + <para>Write caches on disks can result in filesystem corruption since XFS may be told a log transaction has reached the disk when in fact it is still in the disk cache.</para> <itemizedlist> <listitem><para>A journal log assumes that the log transaction is updated on disk before - the metadata is updated, but this may not be true with write caching</para></listitem> + updating the metadata in the filesystem, but this may not be true with + write caching</para></listitem> </itemizedlist> - <para>XFS is able to issue write barriers if the device supports it</para> + <para>XFS is able to issue write barriers if the underlying devices support it</para> <itemizedlist> <listitem><para>Ensures that the log entry is written before any other data</para></listitem> </itemizedlist> <para>Write barriers are enabled by default in XFS</para> <itemizedlist> - <listitem><para>Filesystem will attempt to determine is barriers are supported and will + <listitem><para>Filesystem will attempt to determine if barriers are supported and will issue a warning to the syslog if they are not</para></listitem> <listitem><para>The <command>nobarrier</command> option disables write barriers</para></listitem> + <listitem><para>Barriers should be disabled when using a RAID with battery backed controller + cache (but only if the individual disk write caches are disabled)</para></listitem> </itemizedlist> <para>See</para> <itemizedlist> @@ -175,7 +179,8 @@ <para><command>mount -o grpquota device mountpoint</command></para> <para>Project quota accounting enabled, and limits (optionally) enforced.</para> <para><command>mount -o prjquota device mountpoint</command></para> - <para>Can optionally specify <command>uqnoenforce, gqnoenforce</command> and - <command>pqnoenforce</command> to use soft limits.</para> + <para>Can optionally specify <command>uqnoenforce</command>, + <command>gqnoenforce</command> and <command>pqnoenforce</command> + to use soft limits.</para> </section> </chapter> _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs