On 4/7/14, 10:39 AM, Brian Foster wrote: > Hi all, > > This is v2 of the speculative preallocation FAQ bits. The initial > proposal was here: > > http://oss.sgi.com/archives/xfs/2014-03/msg00316.html > > This version includes some updates based on review from arekm and > dchinner. Most notably, the content has been broken down into a few more > questions. Unless there are further major changes required, I'll plan to > post something along these lines to the wiki when my account is > approved. Thanks for the feedback! > > Brian > > --- > > Q: Why do files on XFS use more data blocks than expected? > > A: > > The XFS speculative preallocation algorithm allocates extra blocks > beyond end of file (EOF) to minimise file fragmentation during buffered s/minimise/minimize/ > write workloads. Workloads that benefit from this behaviour include > slowly growing files, concurrent writers and mixed reader/writer > workloads. It also provides fragmentation resistence in situations where s/resistence/resistance/ > memory pressure prevents adequate buffering of dirty data to allow > formation of large contiguous regions of data in memory. > > This post-EOF block allocation is accounted identically to blocks within > EOF. It is visible in 'st_blocks' counts via stat() system calls, > accounted as globally allocated space and against quotas that apply to > the associated file. The space is reported by various userspace > utilities (stat, du, df, ls) and thus provides a common source of > confusion for administrators. Post-EOF blocks are temporary in most > situations and are usually reclaimed via several possible mechanisms in > XFS. "usually reclaimed" - is it ever "never" reclaimed, then? > See the FAQ entry on speculative preallocation for details. > > Q: What is speculative preallocation? > > A: > > XFS speculatively preallocates post-EOF blocks on file extending writes > in anticipation of future extending writes. The size of a preallocation > is dynamic and depends on the runtime state of the file and fs. > Generally speaking, preallocation is disabled for very small files and > preallocation sizes grow as files grow larger. > > Preallocations are capped to the maximum extent size supported by the > filesystem. Preallocation size is throttled automatically as the > filesystem approaches low free space conditions or other allocation > limits on a file (such as a quota). > > In most cases, speculative preallocation is automatically reclaimed when > a file is closed. Preallocation may also persist beyond the lifecycle of > the file descriptor. Certain application behaviors that are known to > cause fragmentation, such as file server workloads, slowly growing > files, etc., benefit from this and delay the removal of preallocated > blocks beyond fd close. this is a little handwavy. "It's reclaimed when it's closed, except when it's not?" Can we say something more informative here? > Q: How can I speed up or avoid delayed removal of speculative > preallocation? > > A: > > Remove the inode from the VFS cache or unmount the filesystem to remove > speculative preallocations associated with an inode. How does a user remove an inode from the VFS cache? ;) So far the answer to this question sounds like "no." We can't remove a single inode; drop_caches is way too heavy weight, and unmount isn't really viable in most cases. > Linux 3.8 (and later) includes a scanner to perform background trimming > of files with lingering post-EOF preallocations. The scanner bypasses > dirty files to avoid interference with ongoing writes. A 5 minute scan > interval is used by default and can be adjusted via the following file > (value in seconds): > > /proc/sys/fs/xfs/speculative_prealloc_lifetime > > Q: Is speculative preallocation permanent? > > A: > > Although speculative preallocation can lead to reports of excess space > usage, the preallocated space is not permanent unless explicitly made so > via fallocate or a similar interface. Preallocated space can also be > encoded permanently in situations where file size is extended beyond a > range of post-EOF blocks (i.e., via truncate). Otherwise, preallocated (maybe "an extending truncate") > blocks are reclaimed on file close, inode reclaim, unmount or in the > background once file write activity subsides. > > Q: My workload has known characteristics - can I tune speculative > preallocation to an optimal fixed size? > > A: > > The 'allocsize=' mount option configures the XFS block allocation > algorithm to use a fixed allocation size. Speculative preallocation is > not dynamically resized when the allocsize mount option is set and thus > the potential for fragmentation is increased. XFS historically set > allocsize to 64k by default. Thanks, -Eric _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs