Re: slow ls -l

Qing Chang <qchang@xxxxxxxxxxxxxxx> · Wed, 13 Nov 2013 09:23:31 -0500

On 12/11/2013 7:42 PM, Dave Chinner wrote:
On Tue, Nov 12, 2013 at 01:15:00PM -0500, Qing Chang wrote:
Hi Stan,

thank you very much for your comments, I need some time to
follow through all of your suggestions.

Yes, this is a dovecot IMAP and sendmail SMTP server, and there
are 1360 mailboxes.

With this email, I'd like to give you a bit more information with
strace -T and others commands.

Because "# CONFIG_LOCK_STAT is not set", I can not run perf
lock record. Since this is a production server, I am reluctant to
rebuild the kernel on it. But I hope the info below may give
you an idea that it is not "lock" related. As slow "ls -l" and "du"
also happen on the two file servers (64TB and 32TB), it does not
feel like it is lock related.

I run this "(time strace -T ls -l /var/spool/mail > ls-out) >& ls-trace"
and got quite a bit data.

First "time" output:
=====
real    0m17.720s
user    0m0.574s
sys     0m1.271s
=====

Some 14s was spent on 164 "poll" calls on fd=5 that took more than 10ms,
total number of "poll" call is 6618!
So, think about it for a minute. What's that fd?

##### df=5 opened:
socket(PF_FILE, SOCK_STREAM, 0)         = 5 <0.000016>
fcntl(5, F_GETFL)                       = 0x2 (flags O_RDWR) <0.000010>
fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK)    = 0 <0.000010>
fcntl(5, F_GETFD)                       = 0 <0.000010>
fcntl(5, F_SETFD, FD_CLOEXEC)           = 0 <0.000010>
connect(5, {sa_family=AF_FILE, path="/var/lib/sss/pipes/nss"}, 110) = 0 <0.001886>
fstat(5, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 <0.000010>
So, in some cases it's taking 10ms to get some information from the
other end of a pipe. ANd that pipe has something to do with SSSD.

poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000012>
sendto(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000029>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000012>
sendto(5, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000027>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011>
read(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000013>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011>
read(5, "\1\0\0\0", 4)                  = 4 <0.000025>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011>
sendto(5, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000026>
poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011>
sendto(5, "\225N\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000025>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000515>
read(5, "L\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000012>
poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011>
read(5, "\1\0\0\0\0\0\0\0\225N\0\0\311\0\0\0qchang\0*\0Qing Ch"..., 60) = 60 <0.000026>
That looks like uid ->username and gid -> group name translations.
i.e. you've got a slow LDAP server and that is what is causing your
problems, not anything to do with XFS....
That's the nail on the head! This also explains why subsequent runs are faster because
SSSD caches. (I did a ls -l -n:-)

A fat thanks to all that replied, especially Stan for keeping this rolling.

BTW, my choice of XFS mainly _is_ because it supports huge size and I am happy
so far with it! Happier now...

Sincerely,
Qing

Cheers,

Dave.

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs