Hi, I am running Openstack Swift on a single server with 8 disks. All these 8 disks are formatted with default XFS parameters. Each disk has a capacity of 3TB. The machine has 64GB of data. Here's what Openstack Swift does: 1. The file-system is mounted at /srv/node/r0. 2. Creates a temp file: /srv/node/r0/tmp/tmp_sdfsdf 3. Writes to this file: 4 writes of 64K each and does an fsync and close. Final size of the file is 256K. 4. Create the path: /srv/node/r0/1004/eef/deadbeef. The directory /srv/node/r0/objects/1004 already existed before. So it only needs to create "eef" and "deadbeef". Before creating each directory, it verifies that the directory does not exist. 5. Rename the file /srv/node/r0/tmp/tmp_sdfsdf to /srv/node/r0/objects/1004/eef/deadbeef/foo.data. 6. fsync /srv/node/r0/objects/1004/eef/deadbeef/foo.data. 7. It then does a directory listing for /srv/node/r0/objects/1004/eef. 8. Opens the file /srv/node/r0/objects/1004/hashes.pkl 9. Writes to the file /srv/node/r0/objects/1004/hashes.pkl 10. Closes the file /srv/node/r0/objects/1004/hashes.pkl. Writes are getting sharded across ~1024 directories. Essentially, there are 0000-1024 directories under /srv/node/r0/objects/. 1004 above is one of them in the example above. This works great when the filesystem is newly formatted and mounted. However, as more and more data get's written to the system, the above sequence of events progressively gets slower. * We observe that the time for fsync remains pretty much constant throughout. * What seems to be causing the performance to nosedive, is that inode and dentry caching doesn't seem to be working. * For experiment sake, we set vfs_cache_pressure to 0 so there would be no reclaiming of inode and dentry cache entries. However, that does not seem to help. * We see openat() calls taking close to 1 second. Any ideas, what might be causing this behavior? Are there other params, specifically, xfs params that can be tuned for this workload. The sequence of events above is the typical workload, at high concurrency. Here are the answers to other questions requested from the XFS wiki page: * kernel version (uname -a) 3.13.0-39-generic #66-Ubuntu SMP Tue Oct 28 13:30:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux * xfsprogs version xfs_repair version 3.1.7 * number of CPUs 16 * contents of /proc/meminfo See attached file mem_info. * contents of /proc/mounts /dev/mapper/troll_data_vg_23578621012a_1-troll_data_lv_1 /srv/node/r0 xfs rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,inode64,logbufs=8,noquota 0 0 /dev/mapper/troll_data_vg_23578621012a_2-troll_data_lv_2 /srv/node/r1 xfs rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,inode64,logbufs=8,noquota 0 0 /dev/mapper/troll_data_vg_23578621012a_3-troll_data_lv_3 /srv/node/r2 xfs rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,inode64,logbufs=8,noquota 0 0 /dev/mapper/troll_data_vg_23578621012a_4-troll_data_lv_4 /srv/node/r3 xfs rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,inode64,logbufs=8,noquota 0 0 /dev/mapper/troll_data_vg_23578621012a_5-troll_data_lv_5 /srv/node/r4 xfs rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,inode64,logbufs=8,noquota 0 0 /dev/mapper/troll_data_vg_23578621012a_6-troll_data_lv_6 /srv/node/r5 xfs rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,inode64,logbufs=8,noquota 0 0 /dev/mapper/troll_data_vg_23578621012a_7-troll_data_lv_7 /srv/node/r6 xfs rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,inode64,logbufs=8,noquota 0 0 /dev/mapper/troll_data_vg_23578621012a_8-troll_data_lv_8 /srv/node/r7 xfs rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,inode64,logbufs=8,noquota 0 0 * contents of /proc/partitions See attached file partitions_info. * RAID layout (hardware and/or software) NO RAID!! * LVM configuration See attached file lvm_info. Use lvdisplay to obtain it. * type of disks you are using sdm disk 2.7T ST3000NXCLAR3000 sdm1 part 1M sdm2 part 2.7T dm-1 lvm 2.7T * write cache status of drives Drives have no write cache. * size of BBWC and mode it is running in No BBWC * xfs_info output on the filesystem in question meta-data=/dev/mapper/troll_data_vg_23578621012a_8-troll_data_lv_8 isize=256 agcount=4, agsize=183141376 blks = sectsz=512 attr=2 data = bsize=4096 blocks=732565504, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=357698, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 * dmesg output showing all error messages and stack traces No errors. * IOStat and VMStat output. See the attached files iostat_log and vmstat_log. -Shri
Attachment:
mem_info
Description: Binary data
Attachment:
partitions_info
Description: Binary data
Attachment:
iostat_log
Description: Binary data
Attachment:
vmstat_log
Description: Binary data
_______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs