Re: XFS Syncd

Shrinand Javadekar <shrinand@xxxxxxxxxxxxxx> · Wed, 3 Jun 2015 16:18:20 -0700

Here you go!

- Kernel version
Linux my-host 3.16.0-38-generic #52~14.04.1-Ubuntu SMP Fri May 8
09:43:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

- xfsprogs version (xfs_repair -V)
xfs_repair version 3.1.9

- number of CPUs
16

- contents of /proc/meminfo
(attached).

- contents of /proc/mounts
rootfs / rootfs rw 0 0
sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
udev /dev devtmpfs rw,relatime,size=32965720k,nr_inodes=8241430,mode=755 0 0
devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /run tmpfs rw,nosuid,noexec,relatime,size=6595420k,mode=755 0 0
/dev/mapper/troll_root_vg-troll_root_lv / ext4 rw,relatime,data=ordered 0 0
none /sys/fs/cgroup tmpfs rw,relatime,size=4k,mode=755 0 0
none /sys/fs/fuse/connections fusectl rw,relatime 0 0
none /sys/kernel/debug debugfs rw,relatime 0 0
none /sys/kernel/security securityfs rw,relatime 0 0
none /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
none /run/shm tmpfs rw,nosuid,nodev,relatime 0 0
none /run/user tmpfs rw,nosuid,nodev,noexec,relatime,size=102400k,mode=755 0 0
none /sys/fs/pstore pstore rw,relatime 0 0
/dev/mapper/troll_root_vg-troll_iso_lv /mnt/factory_reset ext4
rw,relatime,data=ordered 0 0
/dev/mapper/TrollGroup-TrollVolume /lvm ext4 rw,relatime,data=ordered 0 0
/dev/mapper/troll_root_vg-troll_log_lv /var/log ext4
rw,relatime,data=ordered 0 0
systemd /sys/fs/cgroup/systemd cgroup
rw,nosuid,nodev,noexec,relatime,name=systemd 0 0
/dev/mapper/35000c50062e6a12b-part2 /srv/node/r1 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062e6a7eb-part2 /srv/node/r2 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062e6a567-part2 /srv/node/r3 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062ea068f-part2 /srv/node/r4 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062ea066b-part2 /srv/node/r5 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062e69ecf-part2 /srv/node/r6 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062ea067b-part2 /srv/node/r7 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0
/dev/mapper/35000c50062e6a493-part2 /srv/node/r8 xfs
rw,nosuid,nodev,noexec,noatime,nodiratime,attr2,nobarrier,inode64,logbufs=8,noquota
0 0

- contents of /proc/partitions
(attached)

RAID layout (hardware and/or software)
- No RAID

- LVM configuration
No LVM

- type of disks you are using
Rotational disks

- write cache status of drives
Disabled

- size of BBWC and mode it is running in
No BBWC

- xfs_info output on the filesystem in question

The following is the info on one of the disks. Other 7 disks are identical.

meta-data=/dev/mapper/35000c50062e6a7eb-part2 isize=256    agcount=64,
agsize=11446344 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=732566016, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=357698, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

- dmesg output showing all error messages and stack traces
No errors/stack traces.

- Workload causing the problem:

Openstack Swift. This is what it's doing:

1. A path like /srv/node/r1/objects/1024/eef/tmp already exists.
/srv/node/r1 is the mount point.
2. Creates a tmp file, say tmpfoo in the patch above. Path:
/srv/node/r1/objects/1024/eef/tmp/tmpfoo.
3. Issues a 256KB write into this file.
4. Issues an fsync on the file.
5. Closes this file.
6. Creates another directory named "deadbeef" inside "eef" if it
doesn't exist. Path /srv/node/r1/objects/1024/eef/deadbeef.
7. Moves file tmpfoo into the deadbeef directory using rename().
/srv/node/r1/objects/1023/eef/tmp/tmpfoo -->
/srv/node/r1/objects/1024/eef/deadbeef/foo.data
8. Does a readdir on /srv/node/r1/objects/1024/eef/deadbeef/
9. Iterates over all files obtained in #8 above. Usually #8 gives only one file.

There are 8 mounts for 8 disks: /srv/node/r1 through /srv/node/r8. The
above steps happen concurrently for all 8 disks.

- IOStat and vmstat output
(attached)

- Trace cmd report
Too big to attach. Here's a link:
https://www.dropbox.com/s/3xxe2chsv4fsrv8/trace_report.txt.zip?dl=0

- Perf top output.
Unfortunately, I couldn't run perf top. I keep getting the following error:

WARNING: perf not found for kernel 3.16.0-38

  You may need to install the following packages for this specific kernel:
    linux-tools-3.16.0-38-generic
    linux-cloud-tools-3.16.0-38-generic

On Tue, Jun 2, 2015 at 8:57 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Tue, Jun 02, 2015 at 11:43:30AM -0700, Shrinand Javadekar wrote:
>> Sorry, I dropped the ball on this one. We found some other problems
>> and I was busy fixing them.
>>
>> So, the xfsaild thread/s that kick in every 30 seconds are hitting us
>> pretty badly. Here's a graph with the latest tests I ran. We get great
>> throughput for ~18 seconds but then the world pretty much stops for
>> the next ~12 seconds or so making the final numbers look pretty bad.
>> This particular graph was plotted when the disk had ~150GB of data
>> (total capacity of 3TB).
>>
>> I am using a 3.16.0-38-generic kernel (upgraded since the time I wrote
>> the first email on this thread).
>>
>> I know fs.xfs.xfssyncd_centisecs controls this interval of 30 seconds.
>> What other options can I tune for making this work better?
>>
>> We have 8 disks. And unfortunately, all 8 disks are brought to a halt
>> every 30 seconds. Does XFS have options to only work on a subset of
>> disks at a time?
>>
>> Also, what does XFS exactly do every 30 seconds? If I understand it
>> right, metadata can be 3 locations:
>>
>> 1. Memory
>> 2. Log buffer on disk
>> 3. Final location on disk.
>>
>> Every 30 seconds, from where to where is this metadata being copied?
>> Are there ways to just disable this to avoid the stop-of-the-world
>> pauses (at the cost of lower but sustained performance)?
>
> I can't use this information to help you as you haven't presented
> any of the data I've asked for.  We need to restart here and base
> everything on data and observation. i.e. first principles.
>
> Can you provide all of the information here:
>
> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
>
> and most especially the iostat and vmstat outputs while the problem
> is occurring. The workload description is not what is going wrong
> or what you think is happening, but a description of the application
> you are running that causes the problem.
>
> This will give me a baseline of your hardware, the software, the
> behaviour and the application you are running, and hence give me
> something to start with.
>
> I'd also like to see the output from perf top while the problem is
> occurring, so we might be able to see what is generating the IO...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
MemTotal:       65954164 kB
MemFree:        13959108 kB
MemAvailable:   32757820 kB
Buffers:          176636 kB
Cached:          6429784 kB
SwapCached:       103432 kB
Active:         27430416 kB
Inactive:        6313768 kB
Active(anon):   24825928 kB
Inactive(anon):  2326792 kB
Active(file):    2604488 kB
Inactive(file):  3986976 kB
Unevictable:       14108 kB
Mlocked:           14108 kB
SwapTotal:      16777212 kB
SwapFree:       16346352 kB
Dirty:              3992 kB
Writeback:             0 kB
AnonPages:      27093116 kB
Mapped:            80260 kB
Shmem:              9484 kB
Slab:           14808144 kB
SReclaimable:   12460664 kB
SUnreclaim:      2347480 kB
KernelStack:       27696 kB
PageTables:        96588 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    49754292 kB
Committed_AS:   41952748 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      543104 kB
VmallocChunk:   34359013376 kB
HardwareCorrupted:     0 kB
AnonHugePages:  22728704 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:     1557220 kB
DirectMap2M:    59236352 kB
DirectMap1G:     8388608 kB
major minor  #blocks  name

  11        0    1048575 sr0
   8       48 2930266584 sdd
   8       49       1024 sdd1
   8       50 2930264064 sdd2
   8       32 2930266584 sdc
   8       33       1024 sdc1
   8       34 2930264064 sdc2
   8       64 2930266584 sde
   8       65       1024 sde1
   8       66 2930264064 sde2
   8       96 2930266584 sdg
   8       97       1024 sdg1
   8       98 2930264064 sdg2
   8       80 2930266584 sdf
   8       81       1024 sdf1
   8       82 2930264064 sdf2
   8      112 2930266584 sdh
   8      113       1024 sdh1
   8      114 2930264064 sdh2
   8      128 2930266584 sdi
   8      129       1024 sdi1
   8      130 2930264064 sdi2
   8      144 2930266584 sdj
   8      145       1024 sdj1
   8      146 2930264064 sdj2
   8      160 2930266584 sdk
   8      161       1024 sdk1
   8      162 2930264064 sdk2
   8      176 2930266584 sdl
   8      177       1024 sdl1
   8      178 2930264064 sdl2
   8      192 2930266584 sdm
   8      193       1024 sdm1
   8      194 2930264064 sdm2
   8      208 2930266584 sdn
   8      209       1024 sdn1
   8      210 2930264064 sdn2
   9      127 2930132800 md127
   9      126 2930132800 md126
 252        0 1465065472 dm-0
 252        1   52428800 dm-1
 252        2    5242880 dm-2
 252        3   16777216 dm-3
 252        4    3145728 dm-4
 252        6 2930266584 dm-6
 252        5 2930266584 dm-5
 252        7 2930266584 dm-7
 252        8 2930266584 dm-8
 252        9       1024 dm-9
 252       10       1024 dm-10
 252       11       1024 dm-11
 252       12 2930264064 dm-12
 252       13       1024 dm-13
 252       14 2930264064 dm-14
 252       15 2930264064 dm-15
 252       16 2930264064 dm-16
 252       17 2930266584 dm-17
 252       18 2930266584 dm-18
 252       19       1024 dm-19
 252       20       1024 dm-20
 252       21 2930264064 dm-21
 252       22 2930266584 dm-22
 252       24       1024 dm-24
 252       25 2930264064 dm-25
 252       23 2930264064 dm-23
 252       26 2930266584 dm-26
 252       27       1024 dm-27
 252       28 2930264064 dm-28
Attachment:
vmstat.out

Description: Binary data
Attachment:
iostat.out

Description: Binary data
_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs