On Fri, Oct 07, 2011 at 03:49:57PM +0200, Bernhard Schmidt wrote: > Am 07.10.2011 03:37, schrieb Dave Chinner: > > Hi, > > >> this is an XFS-related summary of a problem report I sent to the > >> postfix mailinglist a few minutes ago after a bulkmail test system > >> blew up during a stress test. > >> > >> We have a few MTAs running SLES11.1 amd64 (2.6.32.45-0.3-default), > >> 10 GB XFS Spooldirectory with default blocksize (4k). It was > >> bombarded with mails faster than it could send them on, which > >> eventually led to almost 2 million files of ~1.5kB in one directory. > >> Suddenly, this started to happen > >> > >> lxmhs45:/var/spool/postfix-bulk/postfix-bulkinhss # touch a > >> touch: cannot touch `a': No space left on device > >> lxmhs45:/var/spool/postfix-bulk/postfix-bulkinhss # df . > >> Filesystem 1K-blocks Used Available Use% Mounted on > >> /dev/sdb 10475520 7471160 3004360 72% > > > > So you have a 10GB filesystem, with about 3GB of free space. > > > >> /var/spool/postfix-bulk > >> lxmhs45:/var/spool/postfix-bulk/postfix-bulkinhss # df -i . > >> Filesystem Inodes IUsed IFree IUse% Mounted on > >> /dev/sdb 10485760 1742528 8743232 17% /var/spool/postfix-bulk > > > > And with 1.7 million inodes in it. That's a lot for a tiny > > filesystem, and not really a use case that XFS is well suited to. > > XFS will work, but it won't age gracefully under these conditions... > > > > As it is, your problem is most likely fragmented free space (an > > aging problem). Inodes are allocated in chunks of 64, so require an > > -aligned- contiguous 16k extent for the default 256 byte inode size. > > If you have no aligned contiguous 16k extents free then inode > > allocation will fail. > > > > Running 'xfs_db -r "-c freesp -s" /dev/sdb' will give you a > > histogram of free space extents in the filesystem, which will tell > > us if you are hitting this problem. > > I managed to create the situation again. This time the total usage is a > bit higher, but it still failed. No surprise. The way you are using the filesystem is pre-disposed to this sort of problem. > lxmhs45:~ # df /var/spool/postfix-bulk > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/sdb 10475520 8071008 2404512 78% > /var/spool/postfix-bulk > lxmhs45:~ # df -i /var/spool/postfix-bulk > Filesystem Inodes IUsed IFree IUse% Mounted on > /dev/sdb 11500544 1882496 9618048 17% /var/spool/postfix-bulk > > This is the output requested. > > lxmhs45:~ # xfs_db -r "-c freesp -s" /dev/sdb > from to extents blocks pct > 1 1 32230 32230 5.36 > 2 3 6874 16476 2.74 > 4 7 138151 552604 91.90 > total free extents 177255 > total free blocks 601310 > average free extent size 3.39234 And that shows your freespace is indeed badly fragmentedi and the cause of your problem. The majority of the free space is in 4-7 block extents which, if inode allocation is failing, are all unaligned even though they are large enough for an inode chunk. > lxmhs45:~ # xfs_info /dev/sdb > meta-data=/dev/sdb isize=256 agcount=4, agsize=655360 blks > = sectsz=512 attr=2 > data = bsize=4096 blocks=2621440, imaxpct=50 ^^^^^^^^^^ And there lies the reason you are getting the filesystem into this situation - you're allowing a very large number of inodes to be created in the filesystem. I'd suggest that for your workload, you need to allow at least 10GB of disk space per million inodes. Because of the number of small files, XFS is going to need a much larger amount of free space available to prevent aging related freespace fragmentation problems. The above ratio results in a maximum space usage of about 50%, which will avoid such issues. If you need to hold 2 million files, use a 20GB filesystem... > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal bsize=4096 blocks=2560, version=2 ^^^^^^^^^^^^^ And you'll probably get better performance if you use a larger log as well. FWIW, if you have anyone with developer time available, finishing off this work: http://xfs.org/index.php/Unfinished_work#Inline_data_in_inodes and using 2kB inodes (which would fit ~1900 bytes of data in line) would solve you problem entirely and perform much, much better. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs