Hi Stan, thank you very much for your comments, I need some time to follow through all of your suggestions. Yes, this is a dovecot IMAP and sendmail SMTP server, and there are 1360 mailboxes. With this email, I'd like to give you a bit more information with strace -T and others commands. Because "# CONFIG_LOCK_STAT is not set", I can not run perf lock record. Since this is a production server, I am reluctant to rebuild the kernel on it. But I hope the info below may give you an idea that it is not "lock" related. As slow "ls -l" and "du" also happen on the two file servers (64TB and 32TB), it does not feel like it is lock related. I run this "(time strace -T ls -l /var/spool/mail > ls-out) >& ls-trace" and got quite a bit data. First "time" output: ===== real 0m17.720s user 0m0.574s sys 0m1.271s ===== Some 14s was spent on 164 "poll" calls on fd=5 that took more than 10ms, total number of "poll" call is 6618! Below is the trace output, red numbers are the three longest time taken, listed at top are those that took more than 100ms, the three sections are for the top three time taken, note the three mailboxes are not special compare to another mailboxes, a run at another time would produce calls on different mailboxes. Also, cache seams to play a role too, after a fresh run that takes long time, subsequent runs give instant results: ===== poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.144634> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.104580> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.104812> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.181959> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.102520> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.107539> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.142462> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.215745> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.309854> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.166665> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.101035> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.113119> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.120745> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.110027> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.137906> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.249039> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.104079> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.250003> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.103791> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.102018> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.103551> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.103696> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.102164> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.105843> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.104413> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.101304> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.101869> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.101366> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.132958> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.309411> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.100342> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.101279> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.144725> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.109867> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.109951> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.290189> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.102449> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.101182> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.111699> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.107351> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.104558> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.115447> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.163904> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.266427> ##### df=5 opened: socket(PF_FILE, SOCK_STREAM, 0) = 5 <0.000016> fcntl(5, F_GETFL) = 0x2 (flags O_RDWR) <0.000010> fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000010> fcntl(5, F_GETFD) = 0 <0.000010> fcntl(5, F_SETFD, FD_CLOEXEC) = 0 <0.000010> connect(5, {sa_family=AF_FILE, path="/var/lib/sss/pipes/nss"}, 110) = 0 <0.001886> fstat(5, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 <0.000010> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000012> sendto(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000029> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000012> sendto(5, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000027> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011> read(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000013> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011> read(5, "\1\0\0\0", 4) = 4 <0.000025> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011> sendto(5, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000026> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011> sendto(5, "\225N\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000025> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000515> read(5, "L\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000012> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011> read(5, "\1\0\0\0\0\0\0\0\225N\0\0\311\0\0\0qchang\0*\0Qing Ch"..., 60) = 60 <0.000026> open("/etc/group", O_RDONLY|O_CLOEXEC) = 6 <0.000017> fstat(6, {st_mode=S_IFREG|0644, st_size=790, ...}) = 0 <0.000010> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd5d53f4000 <0.004539> read(6, "root:x:0:root\nbin:x:1:root,bin,d"..., 4096) = 790 <0.000018> read(6, "", 4096) = 0 <0.000011> close(6) = 0 <0.000011> munmap(0x7fd5d53f4000, 4096) = 0 <0.000016> open("/var/lib/sss/mc/group", O_RDONLY|O_CLOEXEC) = 6 <0.000021> fstat(6, {st_mode=S_IFREG|0644, st_size=5206312, ...}) = 0 <0.000010> mmap(NULL, 5206312, PROT_READ, MAP_SHARED, 6, 0) = 0x7fd5ce3c1000 <0.000015> poll([{fd=5, events=POLLIN|POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000012> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011> sendto(5, "\24\0\0\0\"\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000031> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011> sendto(5, "\311\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000025> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000427> read(5, "?\0\0\0\"\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000013> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011> read(5, "\1\0\0\0\0\0\0\0\311\0\0\0\3\0\0\0\0*\0qiu"..., 47) = 47 <0.000017> lstat("/var/spool/mail/root", {st_mode=S_IFREG|0600, st_size=187649403, ...}) = 0 <0.000019> lgetxattr("/var/spool/mail/root", "security.selinux", "unconfined_u:object_r:file_t:s0", 255) = 32 <0.000020> lstat("/var/spool/mail/root", {st_mode=S_IFREG|0600, st_size=187649403, ...}) = 0 <0.000014> lgetxattr("/var/spool/mail/root", "system.posix_acl_access", 0x0, 0) = -1 ENODATA (No data available) <0.000018> lgetxattr("/var/spool/mail/root", "system.posix_acl_default", 0x0, 0) = -1 ENODATA (No data available) <0.000016> lstat("/var/spool/mail/agutstei", {st_mode=S_IFREG|0600, st_size=3864251, ...}) = 0 <0.000016> lgetxattr("/var/spool/mail/agutstei", "security.selinux", "unconfined_u:object_r:file_t:s0", 255) = 32 <0.000018> lstat("/var/spool/mail/agutstei", {st_mode=S_IFREG|0600, st_size=3864251, ...}) = 0 <0.000015> lgetxattr("/var/spool/mail/agutstei", "system.posix_acl_access", 0x0, 0) = -1 ENODATA (No data available) <0.000016> lgetxattr("/var/spool/mail/agutstei", "system.posix_acl_default", 0x0, 0) = -1 ENODATA (No data available) <0.000015> open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 7 <0.000017> fstat(7, {st_mode=S_IFREG|0644, st_size=1835, ...}) = 0 <0.000011> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd5ce3c0000 <0.000014> read(7, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1835 <0.000020> read(7, "", 4096) = 0 <0.000011> close(7) = 0 <0.000012> ###### took 0.309854s: munmap(0x7fd5ce3c0000, 4096) = 0 <0.000015> poll([{fd=5, events=POLLIN|POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000010> sendto(5, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000034> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000010> sendto(5, "\253\260\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000335> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.309854> read(5, "N\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000012> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000010> read(5, "\1\0\0\0\0\0\0\0\253\260\0\0\253\1\0\0acqu\0*\0Chen Ge A"..., 62) = 62 <0.000016> lstat("/var/spool/mail/araouf", {st_mode=S_IFREG|0600, st_size=3943191, ...}) = 0 <0.000052> lgetxattr("/var/spool/mail/araouf", "security.selinux", "unconfined_u:object_r:file_t:s0", 255) = 32 <0.000021> lstat("/var/spool/mail/araouf", {st_mode=S_IFREG|0600, st_size=3943191, ...}) = 0 <0.000014> lgetxattr("/var/spool/mail/araouf", "system.posix_acl_access", 0x0, 0) = -1 ENODATA (No data available) <0.000017> lgetxattr("/var/spool/mail/araouf", "system.posix_acl_default", 0x0, 0) = -1 ENODATA (No data available) <0.000015> open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 7 <0.000016> fstat(7, {st_mode=S_IFREG|0644, st_size=1835, ...}) = 0 <0.000010> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd5ce3c0000 <0.000012> read(7, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1835 <0.000018> read(7, "", 4096) = 0 <0.000010> close(7) = 0 <0.000012> ###### took 0.309411: munmap(0x7fd5ce3c0000, 4096) = 0 <0.000015> poll([{fd=5, events=POLLIN|POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000010> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000009> sendto(5, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000035> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000010> sendto(5, "\357\246\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000342> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.309411> read(5, "V\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000012> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000010> read(5, "\1\0\0\0\0\0\0\0\357\246\0\0\253\1\0\0pthaya\0*\0Parthip"..., 70) = 70 <0.000018> lstat("/var/spool/mail/ptsao", {st_mode=S_IFREG|0600, st_size=515358, ...}) = 0 <0.000025> lgetxattr("/var/spool/mail/ptsao", "security.selinux", "unconfined_u:object_r:file_t:s0", 255) = 32 <0.000021> lstat("/var/spool/mail/ptsao", {st_mode=S_IFREG|0600, st_size=515358, ...}) = 0 <0.000014> lgetxattr("/var/spool/mail/ptsao", "system.posix_acl_access", 0x0, 0) = -1 ENODATA (No data available) <0.000019> lgetxattr("/var/spool/mail/ptsao", "system.posix_acl_default", 0x0, 0) = -1 ENODATA (No data available) <0.000014> open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 7 <0.000017> fstat(7, {st_mode=S_IFREG|0644, st_size=1835, ...}) = 0 <0.000010> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd5ce3c0000 <0.000012> read(7, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1835 <0.000019> read(7, "", 4096) = 0 <0.000009> close(7) = 0 <0.000011> ###### took 0.290189: munmap(0x7fd5ce3c0000, 4096) = 0 <0.000015> poll([{fd=5, events=POLLIN|POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000010> sendto(5, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000034> poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000010> sendto(5, "\10\270\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000391> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.290189> read(5, "N\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000012> poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000010> read(5, "\1\0\0\0\0\0\0\0\10\270\0\0\327\1\0\0taberk\0*\0Thomas "..., 62) = 62 <0.000017> lstat("/var/spool/mail/taha", {st_mode=S_IFREG|0600, st_size=293648143, ...}) = 0 <0.000018> lgetxattr("/var/spool/mail/taha", "security.selinux", "unconfined_u:object_r:file_t:s0", 255) = 32 <0.000022> lstat("/var/spool/mail/taha", {st_mode=S_IFREG|0600, st_size=293648143, ...}) = 0 <0.000013> lgetxattr("/var/spool/mail/taha", "system.posix_acl_access", 0x0, 0) = -1 ENODATA (No data available) <0.000021> lgetxattr("/var/spool/mail/taha", "system.posix_acl_default", 0x0, 0) = -1 ENODATA (No data available) <0.000014> open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 7 <0.000021> fstat(7, {st_mode=S_IFREG|0644, st_size=1835, ...}) = 0 <0.000010> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd5ce3c0000 <0.000015> read(7, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1835 <0.000033> read(7, "", 4096) = 0 <0.000011> close(7) = 0 <0.000013> ===== ====== iostat around the time ls -l was run ===== [root@dovecot2 ~]# iostat -xmz 5 Linux 2.6.32-358.18.1.el6.x86_64 (dovecot2.sri.utoronto.ca) 11/12/2013 _x86_64_ (16 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 0.56 0.00 0.17 0.06 0.00 99.21 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.01 19.88 0.14 26.86 0.00 0.18 14.18 0.02 0.57 0.28 0.75 sdb 0.23 0.08 1.75 1.98 0.04 0.02 30.38 0.01 2.81 1.26 0.47 sdc 0.00 0.03 13.78 0.71 1.34 0.05 196.35 0.02 1.60 0.77 1.11 dm-0 0.00 0.00 0.09 2.42 0.00 0.01 8.49 0.00 1.18 0.52 0.13 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 3.09 0.27 0.00 dm-2 0.00 0.00 0.06 44.33 0.00 0.17 8.14 0.04 0.90 0.14 0.63 avg-cpu: %user %nice %system %iowait %steal %idle 1.31 0.00 0.46 0.10 0.00 98.12 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 27.40 1.00 65.80 0.02 0.36 11.78 0.04 0.61 0.43 2.86 sdb 0.00 0.00 1.80 0.00 0.03 0.00 31.11 0.01 6.44 2.00 0.36 sdc 0.00 1.00 0.00 4.80 0.00 0.03 11.79 0.03 6.88 0.46 0.22 dm-0 0.00 0.00 0.80 1.20 0.02 0.00 24.00 0.01 6.10 6.10 1.22 dm-2 0.00 0.00 0.00 92.00 0.00 0.36 8.00 0.06 0.62 0.19 1.76 avg-cpu: %user %nice %system %iowait %steal %idle 5.76 0.00 1.62 0.59 0.00 92.03 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 1045.60 0.80 959.40 0.05 7.83 16.82 0.44 0.46 0.21 19.98 sdb 0.00 0.00 1.40 1.20 0.01 0.01 9.23 0.00 0.62 0.54 0.14 sdc 0.00 0.00 0.00 0.60 0.00 0.01 18.67 0.00 0.33 0.33 0.02 dm-0 0.00 0.00 0.80 8.00 0.05 0.03 20.00 0.02 2.05 1.41 1.24 dm-2 0.00 0.00 0.00 1997.00 0.00 7.80 8.00 1.24 0.62 0.09 18.86 avg-cpu: %user %nice %system %iowait %steal %idle 4.38 0.00 1.29 0.67 0.00 93.66 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 1204.40 0.00 1074.60 0.00 8.90 16.97 0.41 0.38 0.20 21.04 sdb 0.20 0.00 2.00 0.00 0.01 0.00 13.60 0.01 6.80 4.00 0.80 dm-0 0.00 0.00 0.00 9.00 0.00 0.04 8.00 0.01 1.27 0.42 0.38 dm-2 0.00 0.00 0.00 2270.00 0.00 8.87 8.00 1.33 0.59 0.09 20.74 avg-cpu: %user %nice %system %iowait %steal %idle 4.24 0.00 1.16 0.58 0.00 94.02 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 1159.20 0.00 985.80 0.00 8.38 17.41 0.34 0.35 0.18 17.84 sdb 0.00 0.00 0.20 1.40 0.00 0.01 8.62 0.00 0.75 0.75 0.12 sdc 0.00 0.00 0.00 0.20 0.00 0.00 16.00 0.00 1.00 1.00 0.02 dm-0 0.00 0.00 0.00 5.00 0.00 0.02 8.00 0.00 0.24 0.24 0.12 dm-2 0.00 0.00 0.00 2140.00 0.00 8.36 8.00 1.23 0.58 0.08 17.90 avg-cpu: %user %nice %system %iowait %steal %idle 2.77 0.00 1.14 0.54 0.00 95.55 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 768.20 0.00 847.40 0.00 6.31 15.25 0.36 0.43 0.19 16.40 sdb 0.00 0.00 0.00 2.00 0.00 0.01 7.00 0.00 0.10 0.10 0.02 sdc 0.00 0.00 0.00 2.00 0.00 0.04 37.60 0.00 0.70 0.40 0.08 dm-0 0.00 0.00 0.00 18.80 0.00 0.07 8.00 0.09 4.84 0.27 0.50 dm-2 0.00 0.00 0.00 1596.80 0.00 6.24 8.00 0.84 0.52 0.10 16.22 ===== ===== vmstat ===== [root@dovecot2 ~]# vmstat 5 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 2 0 3476 785664 223836 29329692 0 0 88 16 1 1 1 0 99 0 0 1 0 3476 783512 223844 29330448 0 0 84 3383 2210 3985 3 1 96 0 0 0 1 3476 779328 223848 29331208 0 0 20 9254 4530 9894 5 2 92 1 0 2 0 3476 778616 223856 29331956 0 0 1 9104 4160 10909 4 1 94 1 0 1 0 3476 777584 223864 29332932 0 0 0 10056 4428 10263 4 1 94 1 0 0 0 3476 778312 223872 29333044 0 0 2 660 921 1871 1 0 99 0 0 ===== Really appreciate your willingness to help and your time spent on this! Qing On 04/11/2013 1:42 AM, Stan Hoeppner
wrote:
On 11/1/2013 2:11 PM, Qing Chang wrote:I have collected some info following the FAQ guide. It may still not be enough. Please let me know.You say /var/spool/mail contains 1360 "items". That's ambiguous. You mean 1360 user mbox files, yes?# time to run ls -l (1360 items): time ls -l /var/spool/mail real 0m10.271s user 0m0.097s sys 0m0.162sThe combined CPU execution time is apparently 0.259s. Something else is adding 10s+ of delay. mbox is notorious for locking problems. I wonder if stat() is waiting on release of exclusive write locks being held by your MDA during delivery. This would also tend to explain the variability you demonstrated, with one run being 10s and another being 7s--fewer files locked on the second run. Run 'ls' with "strace -T" to find out which system call is adding the delay. Run perf-lock concurrently to see the locks. More comments below.# Oct 29, 2013 # # XFS slowness with ls du etc. kernel version (uname -a) Linux dovecot2.sri.utoronto.ca 2.6.32-358.18.1.el6.x86_64 #1 SMP Fri Aug 2 17:04:38 EDT 2013 x86_64 x86_64 x86_64 GNU/Linux xfsprogs version (xfs_repair -V) xfs_repair version 3.1.1 number of CPUs: 16 virtual CPUs contents of /proc/meminfo MemTotal: 32878632 kB MemFree: 310260 kB Buffers: 220496 kB Cached: 29971428 kB SwapCached: 212 kB Active: 9726912 kB Inactive: 21180124 kB Active(anon): 385040 kB Inactive(anon): 330260 kB Active(file): 9341872 kB Inactive(file): 20849864 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 6160376 kB SwapFree: 6159896 kB Dirty: 772 kB Writeback: 0 kB AnonPages: 714912 kB Mapped: 52696 kB Shmem: 184 kB Slab: 1045988 kB SReclaimable: 776440 kB SUnreclaim: 269548 kB KernelStack: 8736 kB PageTables: 57172 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 22599692 kB Committed_AS: 1148180 kB VmallocTotal: 34359738367 kB VmallocUsed: 335224 kB VmallocChunk: 34333710552 kB HardwareCorrupted: 0 kB AnonHugePages: 180224 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 10240 kB DirectMap2M: 33544192 kB contents of /proc/mounts [root@dovecot2 ~]# cat /proc/mounts rootfs / rootfs rw 0 0 proc /proc proc rw,relatime 0 0 sysfs /sys sysfs rw,seclabel,relatime 0 0 devtmpfs /dev devtmpfs rw,seclabel,relatime,size=16429456k,nr_inodes=4107364,mode=755 0 0 devpts /dev/pts devpts rw,seclabel,relatime,gid=5,mode=620,ptmxmode=000 0 0 tmpfs /dev/shm tmpfs rw,seclabel,relatime 0 0 /dev/mapper/vg_ipa1-lv_root / ext4 rw,seclabel,relatime,barrier=1,data="" 0 0 none /selinux selinuxfs rw,relatime 0 0 devtmpfs /dev devtmpfs rw,seclabel,relatime,size=16429456k,nr_inodes=4107364,mode=755 0 0 /proc/bus/usb /proc/bus/usb usbfs rw,relatime 0 0 /dev/sda1 /boot ext4 rw,seclabel,relatime,barrier=1,data="" 0 0 /dev/mapper/vg_ipa1-lv_home /var ext4 rw,seclabel,relatime,barrier=1,data="" 0 0 /dev/sdb1 /mailhomes xfs rw,seclabel,relatime,attr2,delaylog,nobarrier,inode64,usrquota 0 0 /dev/sdc1 /var/spool/mail xfs rw,seclabel,relatime,attr2,delaylog,nobarrier,inode64,usrquota 0 0 none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0 sunrpc /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0 contents of /proc/partitions major minor #blocks name 8 0 83886080 sda 8 1 512000 sda1 8 2 83373056 sda2 8 16 5368709120 sdb 8 17 5368707072 sdb1 8 32 2147483648 sdc 8 33 2147481600 sdc1 253 0 56729600 dm-0 253 1 6160384 dm-1 253 2 20480000 dm-2 RAID layout (hardware and/or software): Hardware RAID6 IBM Storwize V7000What we're looking for here is the RAID geometry of the local disk device, logical volume, SAN LUN, etc, underlying the XFS filesystem, i.e. number of disks in the array, RAID level, stripe unit and stripe width.LVM configuration: XFS not configured type of disks you are using: SAS dirves write cache status of drives: V7000 has 16 GB of cache size of BBWC and mode it is running in: N/AThe 16GB of BBWC in the V7000 defaults to write-back mode, or has been manually enabled. If not you're playing Russian roulette, because you're mounting with nobarrier. You only disable barriers if you have persistent write cache, either battery or flash based.xfs_info output on the filesystem in question meta-data="" isize=256 agcount=4, agsize=134217600 blks = sectsz=512 attr=2 data = "" blocks=536870400, imaxpct=5 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 log =internal bsize=4096 blocks=262143, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=02TB seems rather large for a 1360 user mail spool directory, given new mail is apparently moved from INBOX to other folders in /mailhomes. Additionally, given the innate parallelism in a mail workload, 4 AGs is far less than optimal. XFS throughput with a highly parallel workload is limited by the number of allocation groups. This filesystem was apparently created with mkfs.xfs defaults. It should have been created with many more AGs, 16 minimum. Same goes for the /mailhomes filesystem.dmesg output showing all error messages and stack traces: N/A # time to run ls -l (1360 items): time ls -l /var/spool/mail real 0m10.271s user 0m0.097s sys 0m0.162s # iostat when ls -l is running: [root@dovecot2 ~]# date;iostat -xdmz 5 Tue Oct 29 14:44:28 EDT 2013 Linux 2.6.32-358.18.1.el6.x86_64 (dovecot2.sri.utoronto.ca) 10/29/2013 _x86_64_ (16 CPU) Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.02 17.87 0.17 24.49 0.01 0.17 14.16 0.01 0.56 0.28 0.68 sdb 0.19 0.08 1.62 1.94 0.04 0.02 33.47 0.01 2.76 1.23 0.44 sdc 0.00 0.03 11.57 0.69 1.10 0.05 192.23 0.02 1.65 0.78 0.96 dm-0 0.00 0.00 0.11 2.24 0.00 0.01 8.70 0.00 1.14 0.51 0.12 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 1.26 0.24 0.00 dm-2 0.00 0.00 0.08 40.11 0.00 0.16 8.18 0.03 0.86 0.14 0.57 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 155.00 0.00 127.20 0.00 1.10 17.75 0.06 0.45 0.25 3.22 sdb 0.00 0.20 0.00 11.60 0.00 0.07 11.81 0.05 4.07 0.17 0.20 sdc 0.00 0.00 0.00 0.40 0.00 0.00 12.00 0.00 0.50 0.50 0.02 dm-0 0.00 0.00 0.00 1.20 0.00 0.00 8.00 0.01 4.50 4.50 0.54 dm-2 0.00 0.00 0.00 281.40 0.00 1.10 8.00 0.19 0.68 0.10 2.74 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 1480.80 0.00 627.20 0.00 8.23 26.89 0.28 0.45 0.20 12.70 sdb 0.00 0.00 0.40 4.00 0.00 0.03 18.27 0.00 1.05 1.05 0.46 sdc 0.00 0.00 0.00 1.40 0.00 0.00 6.71 0.00 0.14 0.14 0.02 dm-0 0.00 0.00 0.00 17.20 0.00 0.07 8.00 0.02 1.09 0.09 0.16 dm-2 0.00 0.00 0.00 2090.20 0.00 8.16 8.00 1.65 0.79 0.06 12.52 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 1331.00 0.00 629.80 0.00 7.66 24.91 0.28 0.44 0.22 13.64 sdb 0.00 0.00 0.00 2.80 0.00 0.02 14.07 0.00 0.79 0.57 0.16 sdc 0.00 0.00 0.00 1.40 0.00 0.02 30.14 0.00 0.57 0.57 0.08 dm-0 0.00 0.00 0.00 1.20 0.00 0.00 8.00 0.00 3.83 3.83 0.46 dm-2 0.00 0.00 0.00 1959.60 0.00 7.65 8.00 1.43 0.73 0.07 13.20 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 6.60 0.00 0.60 0.00 0.03 96.00 0.00 1.33 1.33 0.08 sdb 0.00 0.20 0.00 0.80 0.00 0.00 8.25 0.00 0.75 0.75 0.06 sdc 0.00 0.20 0.00 0.20 0.00 0.00 16.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.00 6.20 0.00 0.02 8.00 0.01 0.97 0.03 0.02 dm-2 0.00 0.00 0.00 1.00 0.00 0.00 8.00 0.00 3.00 0.60 0.06 # a second run of ls -l: [root@dovecot2 ~]# date;time ls -l /var/spool/mail Fri Nov 1 15:04:43 EDT 2013 real 0m7.814s user 0m0.096s sys 0m0.139s [root@dovecot2 ~]# date;iostat -xmdz 5 Fri Nov 1 15:04:41 EDT 2013 Linux 2.6.32-358.18.1.el6.x86_64 (dovecot2.sri.utoronto.ca) 11/01/2013 _x86_64_ (16 CPU) Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.01 21.20 0.16 28.28 0.00 0.19 14.27 0.02 0.57 0.27 0.78 sdb 0.22 0.09 1.70 2.19 0.04 0.02 31.03 0.01 2.83 1.20 0.47 sdc 0.00 0.04 12.63 0.75 1.21 0.05 192.59 0.02 1.60 0.77 1.04 dm-0 0.00 0.00 0.11 2.46 0.00 0.01 8.57 0.00 1.17 0.51 0.13 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 2.03 0.21 0.00 dm-2 0.00 0.00 0.07 47.02 0.00 0.18 8.15 0.04 0.93 0.14 0.65 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 730.20 0.00 1251.60 0.00 7.76 12.69 0.52 0.42 0.21 25.86 sdb 0.00 1.00 0.80 25.40 0.01 0.16 12.62 0.01 0.37 0.18 0.46 sdc 0.00 0.00 0.00 2.60 0.00 0.03 20.62 0.00 0.31 0.23 0.06 dm-0 0.00 0.00 0.00 0.20 0.00 0.00 8.00 0.00 3.00 3.00 0.06 dm-2 0.00 0.00 0.00 1981.00 0.00 7.74 8.00 1.04 0.52 0.13 25.96 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 1544.20 0.00 2844.60 0.00 17.14 12.34 1.10 0.39 0.21 58.34 sdb 0.00 0.00 0.00 1.20 0.00 0.01 9.33 0.00 0.00 0.00 0.00 sdc 0.00 0.00 0.00 0.60 0.00 0.00 8.00 0.00 3.00 1.00 0.06 dm-0 0.00 0.00 0.00 3.00 0.00 0.01 8.00 0.00 0.07 0.07 0.02 dm-2 0.00 0.00 0.00 4386.00 0.00 17.13 8.00 2.19 0.50 0.13 58.46 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 104.80 0.00 212.80 0.00 1.24 11.94 0.09 0.41 0.21 4.44 sdb 0.00 0.00 0.20 3.00 0.00 0.01 9.81 0.00 0.50 0.50 0.16 sdc 0.00 0.20 0.00 2.80 0.00 0.02 16.86 0.01 4.43 0.79 0.22 dm-0 0.00 0.00 0.00 0.40 0.00 0.00 8.00 0.00 5.50 5.50 0.22 dm-2 0.00 0.00 0.00 317.00 0.00 1.24 8.00 0.15 0.47 0.13 4.24 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 0.00 6.60 0.00 1.20 0.00 0.03 52.00 0.00 0.33 0.33 0.04 sdb 0.00 0.20 0.00 1.00 0.00 0.00 9.60 0.00 0.00 0.00 0.00 sdc 0.00 0.00 0.00 0.20 0.00 0.00 2.00 0.00 1.00 1.00 0.02 dm-0 0.00 0.00 0.00 4.40 0.00 0.02 8.00 0.00 0.95 0.05 0.02 dm-2 0.00 0.00 0.00 3.40 0.00 0.01 8.00 0.00 0.06 0.06 0.02 [root@dovecot2 ~]# date;vmstat 5 Fri Nov 1 15:04:39 EDT 2013 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 1304 542644 229484 29885592 0 0 80 17 1 1 1 0 99 0 0 0 1 1304 542752 229484 29885696 0 0 0 2211 1518 1281 1 0 98 0 0 0 1 1304 544736 229484 29885696 0 0 6 17216 7535 7631 3 2 94 2 0 0 0 1304 543988 229484 29885696 0 0 0 7403 3782 3889 1 1 97 1 0 0 0 1304 543604 229484 29885712 0 0 1 215 399 277 0 0 100 0 0 |
_______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs