On Thu, Jan 20, 2011 at 08:41:19PM +0000, Peter Vajgel wrote: > We write about 100 100GB files into a single 10TB volume with xfs. > We are using allocsize=1g to limit the fragmentation with a great > success. We also need to reserve some space (~200GB) on each > filesystem for processing the files and writing new versions of > the files. Once we have only 200GB available we stop writing to > the files. However with allocsize it's not that easy - we see +/- > 100GB added or taken depending if there are still writes going and > if the file was reopened ... Is there a way to programmatically > disable allocsize speculative preallocation once we exceed certain > threshold and also return the current speculative preallocation > back to the free space (without closing the file)? No and no. However, if you take a look at the new dynamic specualtive allocation code in 2.6.38-rc1, it scales back the preallocation as ENOSPC is approached but doesn't do any reclaiming of existing preallocation. It will also preallocates much larger extents so it may not be ideal for you, either. I've appended the commit message below. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx commit 055388a3188f56676c21e92962fc366ac8b5cb72 Author: Dave Chinner <dchinner@xxxxxxxxxx> Date: Tue Jan 4 11:35:03 2011 +1100 xfs: dynamic speculative EOF preallocation Currently the size of the speculative preallocation during delayed allocation is fixed by either the allocsize mount option of a default size. We are seeing a lot of cases where we need to recommend using the allocsize mount option to prevent fragmentation when buffered writes land in the same AG. Rather than using a fixed preallocation size by default (up to 64k), make it dynamic by basing it on the current inode size. That way the EOF preallocation will increase as the file size increases. Hence for streaming writes we are much more likely to get large preallocations exactly when we need it to reduce fragementation. For default settings, the size of the initial extents is determined by the number of parallel writers and the amount of memory in the machine. For 4GB RAM and 4 concurrent 32GB file writes: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET T 0: [0..1048575]: 1048672..2097247 0 (1048672..2097247) 10 1: [1048576..2097151]: 5242976..6291551 0 (5242976..6291551) 10 2: [2097152..4194303]: 12583008..14680159 0 (12583008..14680159) 20 3: [4194304..8388607]: 25165920..29360223 0 (25165920..29360223) 41 4: [8388608..16777215]: 58720352..67108959 0 (58720352..67108959) 83 5: [16777216..33554423]: 117440584..134217791 0 (117440584..134217791) 167 6: [33554424..50331511]: 184549056..201326143 0 (184549056..201326143) 167 7: [50331512..67108599]: 251657408..268434495 0 (251657408..268434495) 167 and for 16 concurrent 16GB file writes: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET 0: [0..262143]: 2490472..2752615 0 (2490472..2752615) 2 1: [262144..524287]: 6291560..6553703 0 (6291560..6553703) 2 2: [524288..1048575]: 13631592..14155879 0 (13631592..14155879) 5 3: [1048576..2097151]: 30408808..31457383 0 (30408808..31457383) 10 4: [2097152..4194303]: 52428904..54526055 0 (52428904..54526055) 20 5: [4194304..8388607]: 104857704..109052007 0 (104857704..109052007) 41 6: [8388608..16777215]: 209715304..218103911 0 (209715304..218103911) 83 7: [16777216..33554423]: 452984848..469762055 0 (452984848..469762055) 167 Because it is hard to take back specualtive preallocation, cases where there are large slow growing log files on a nearly full filesystem may cause premature ENOSPC. Hence as the filesystem nears full, the maximum dynamic prealloc size Ñs reduced according to this table (based on 4k block size): freespace max prealloc size >5% full extent (8GB) 4-5% 2GB (8GB >> 2) 3-4% 1GB (8GB >> 3) 2-3% 512MB (8GB >> 4) 1-2% 256MB (8GB >> 5) <1% 128MB (8GB >> 6) This should reduce the amount of space held in speculative preallocation for such cases. The allocsize mount option turns off the dynamic behaviour and fixes the prealloc size to whatever the mount option specifies. i.e. the behaviour is unchanged. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs