[ ... ] > Yes, this is a dovecot IMAP and sendmail SMTP server, and > there are 1360 mailboxes. That's more or less worst case for metadata accesses. Probably XFS is performing very well, and 'ls -l' is as fast as it can be. But you are only listing 1360 inodes, so even if slow it should not be that slow. > As slow "ls -l" and "du" also happen on the two file servers > (64TB and 32TB), [ ... ] People who "know better" like large partitions like that. Then you pay the consequences later... But that's another issue :-). > I run this "(time strace -T ls -l /var/spool/mail > ls-out) >& ls-trace" > and got quite a bit data. > First "time" output: > ===== > real 0m17.720s > user 0m0.574s > sys 0m1.271s > ===== > Some 14s was spent on 164 "poll" calls on fd=5 that took more than 10ms, > total number of "poll" call is 6618! That's for network calls. Nothing to do with filesystem: > connect(5, {sa_family=AF_FILE, path="/var/lib/sss/pipes/nss"}, 110) = 0 <0.001886> > fstat(5, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 <0.000010> > poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000012> > sendto(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000029> > poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000012> > sendto(5, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000027> > poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011> > read(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000013> > poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011> > read(5, "\1\0\0\0", 4) = 4 <0.000025> > poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011> > sendto(5, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000026> > poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011> > sendto(5, "\225N\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000025> > poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000515> > read(5, "L\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000012> > poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011> > read(5, "\1\0\0\0\0\0\0\0\225N\0\0\311\0\0\0qchang\0*\0Qing Ch"..., 60) = 60 <0.000026> This is 'getpwuid' as 'ls' tries to resolve a (numeric) user id to a (string) user name and you have SSS configured in '/etc/nsswith.conf' and it is probably using some very slow network directory system, but whatever it is, it is not a filesystem matter. You can easily double check by running 'ls -l -n' which will 'stat' the 1360 i-nodes all the same but without resolving uids to usernames, and that should be 14s faster based on what you report above. This is the busiest IO load section you have provided: > avg-cpu: %user %nice %system %iowait %steal %idle > 4.24 0.00 1.16 0.58 0.00 94.02 > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util > sda 0.00 1159.20 0.00 985.80 0.00 8.38 17.41 0.34 0.35 0.18 17.84 > sdb 0.00 0.00 0.20 1.40 0.00 0.01 8.62 0.00 0.75 0.75 0.12 > sdc 0.00 0.00 0.00 0.20 0.00 0.00 16.00 0.00 1.00 1.00 0.02 > dm-0 0.00 0.00 0.00 5.00 0.00 0.02 8.00 0.00 0.24 0.24 0.12 > dm-2 0.00 0.00 0.00 2140.00 0.00 8.36 8.00 1.23 0.58 0.08 17.90 and it shows only some low bandwidth writing, and very low IOPS utilization anyhow, so more confirmation this is nothing to do with XFS or storage in general. As to why 'du' is slow, well, it does not by default resolve uids to usernames, so that is probably because you have lots of small inodes scattered all over the place. Guessing wildly your Dovecot uses the catastrophic Maildir format or a variant, and as to your 64/32TB filetrees, good luck... :-) _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs