Hi,
this is an XFS-related summary of a problem report I sent to the postfix
mailinglist a few minutes ago after a bulkmail test system blew up
during a stress test.
We have a few MTAs running SLES11.1 amd64 (2.6.32.45-0.3-default), 10 GB
XFS Spooldirectory with default blocksize (4k). It was bombarded with
mails faster than it could send them on, which eventually led to almost
2 million files of ~1.5kB in one directory. Suddenly, this started to happen
lxmhs45:/var/spool/postfix-bulk/postfix-bulkinhss # touch a
touch: cannot touch `a': No space left on device
lxmhs45:/var/spool/postfix-bulk/postfix-bulkinhss # df .
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdb 10475520 7471160 3004360 72%
/var/spool/postfix-bulk
lxmhs45:/var/spool/postfix-bulk/postfix-bulkinhss # df -i .
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/sdb 10485760 1742528 8743232 17% /var/spool/postfix-bulk
So we could not create any file in the spool directory anymore despite
df claiming to have both free blocks and inodes. This led to a pretty
spectacular lockup of the mail processing chain.
My theory is that XFS is using a full 4k block for each 1.5kB file,
which accounts to some loss. But still, 10GB / 4kB makes 2.5 mio files,
which have surely not been reached here. Is there that high overhead?
Why is neither df-metric reporting this problem? Is there any way to get
reasonable readings out of df in this case? The system would have
stopped accepting mail from outside if the freespace would have sunk
below 2GB, so out-of-space happened way to early for it.
Thanks for your answers,
Bernhard
_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs