Concurrent `ls` takes out the thrash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I was asked to figure out why the listing of very large directories was
slow. More specifically, why concurrently listing the same large directory is /very/ slow. It seems that sometimes a user's reaction to waiting for
'ls' to complete is to start a few more.. and then their machine takes a
very long time to complete that work.

I can reproduce that finding.  As an example:

time ls -fl /dir/with/200000/entries/ >/dev/null

real    0m10.766s
user    0m0.716s
sys     0m0.827s

But..

for i in {1..10}; do time ls -fl /dir/with/200000/entries/ >/dev/null & done

Each of these ^^ 'ls' commands will take 4 to 5 minutes to complete.

The problem is that concurrent 'ls' commands stack up in nfs_readdir() both waiting on the next page and taking turns filling the next page with xdr, but only one of them will have desc->plus set because setting it clears the flag on the directory. So if a page is filled by a process that doesn't have
desc->plus then the next pass through lookup(), it dumps the entire page
cache with nfs_force_use_readdirplus(). Then the next readdir starts all over filling the pagecache. Forward progress happens, but only after many
steps back re-filling the pagecache.

To me most obvious fix would be to serialize nfs_readdir() on the directory inode, so I'll follow-up with patch that does that with nfsi->rwsem. With that,
the above parallel 'ls' takes 12 seconds for each 'ls' to complete.

This only works because with concurrent 'ls' there is a consistent buffer size so a waiting nfs_readdir() started in the same place for an unmodified
directory should always hit the cache after waiting.  Serializing
nfs_readdir() will not solve this problem for concurrent callers with
differing buffer sizes, or starting at different offsets, since there's a good chance the waiting readdir() will not see the readdirplus flag when it
resumes and so will not prime the dcache.

While I think it's an OK fix, it feels bad to serialize.  At the same
time, nfs_readdir() is already serialized on the pagecache when concurrent callers need to go to the server. There might be other problems I haven't
thought about.

Maybe there's another way to fix this, or maybe we can just say "Don't do ls
more than once, you impatient bastards!"

Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux