On Tue, Nov 12, 2013 at 01:15:00PM -0500, Qing Chang wrote: > Hi Stan, > > thank you very much for your comments, I need some time to > follow through all of your suggestions. > > Yes, this is a dovecot IMAP and sendmail SMTP server, and there > are 1360 mailboxes. > > With this email, I'd like to give you a bit more information with > strace -T and others commands. > > Because "# CONFIG_LOCK_STAT is not set", I can not run perf > lock record. Since this is a production server, I am reluctant to > rebuild the kernel on it. But I hope the info below may give > you an idea that it is not "lock" related. As slow "ls -l" and "du" > also happen on the two file servers (64TB and 32TB), it does not > feel like it is lock related. > > I run this "(time strace -T ls -l /var/spool/mail > ls-out) >& ls-trace" > and got quite a bit data. > > First "time" output: > ===== > real 0m17.720s > user 0m0.574s > sys 0m1.271s > ===== > > Some 14s was spent on 164 "poll" calls on fd=5 that took more than 10ms, > total number of "poll" call is 6618! So, think about it for a minute. What's that fd? > ##### df=5 opened: > socket(PF_FILE, SOCK_STREAM, 0) = 5 <0.000016> > fcntl(5, F_GETFL) = 0x2 (flags O_RDWR) <0.000010> > fcntl(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0 <0.000010> > fcntl(5, F_GETFD) = 0 <0.000010> > fcntl(5, F_SETFD, FD_CLOEXEC) = 0 <0.000010> > connect(5, {sa_family=AF_FILE, path="/var/lib/sss/pipes/nss"}, 110) = 0 <0.001886> > fstat(5, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0 <0.000010> So, in some cases it's taking 10ms to get some information from the other end of a pipe. ANd that pipe has something to do with SSSD. > poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000012> > sendto(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000029> > poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000012> > sendto(5, "\1\0\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000027> > poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011> > read(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000013> > poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011> > read(5, "\1\0\0\0", 4) = 4 <0.000025> > poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011> > sendto(5, "\24\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16, MSG_NOSIGNAL, NULL, 0) = 16 <0.000026> > poll([{fd=5, events=POLLOUT}], 1, 300000) = 1 ([{fd=5, revents=POLLOUT}]) <0.000011> > sendto(5, "\225N\0\0", 4, MSG_NOSIGNAL, NULL, 0) = 4 <0.000025> > poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000515> > read(5, "L\0\0\0\22\0\0\0\0\0\0\0\0\0\0\0", 16) = 16 <0.000012> > poll([{fd=5, events=POLLIN}], 1, 300000) = 1 ([{fd=5, revents=POLLIN}]) <0.000011> > read(5, "\1\0\0\0\0\0\0\0\225N\0\0\311\0\0\0qchang\0*\0Qing Ch"..., 60) = 60 <0.000026> That looks like uid ->username and gid -> group name translations. i.e. you've got a slow LDAP server and that is what is causing your problems, not anything to do with XFS.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs