Hi, I am using the XFS filesystem as the backend for Openstack Swift. On my setup, I have a single server with 8 data disks; each of them is one XFS volume. I am running a workload which does many concurrent writes of 256K files into the XFS volumes. Openstack Swift takes care of evenly distributing the data across all the 8 disks. It also uses extended attributes for each of the files it writes. It also explicitly does a fsync() at the end for each file. I am seeing a behavior where the system pretty much stalls for ~5 seconds after every 30 seconds. I see that the # of ios goes up but the actual write bandwidth during this 5 second period is very low (see attached images). After a fair bit of investigation, we've narrowed down the problem to XFS's syncd (fs.xfs.xfssyncd_centisecs). This runs at a default interval of 30 seconds. I have a couple of questions: 1. If all file writes are done with an fsync() at the end, what is xfssyncd doing for several seconds? 2. How does xfssyncd actually work across several disks? Currently, it seems that when it runs, it pretty much stalls the entire system. 3. I see that fs.xfs.xfssyncd_centisecs is the parameter to tune the interval. But that doesn't give us much. Increasing the interval simply postpones the work. When xfssyncd runs, it takes more time. Are there any other options I can try to make xfssyncd not stall the system when it runs? Thanks in advance. -Shri P.S. I'm not a member of this list. Direct replies appreciated.
Attachment:
write_throughput6.png
Description: PNG image
Attachment:
read_write_requests_complete_rate6.png
Description: PNG image
_______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs