Re: Concurrent `ls` takes out the thrash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 7 Dec 2016, at 10:46, Trond Myklebust wrote:

On Dec 7, 2016, at 08:28, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:

I was asked to figure out why the listing of very large directories was slow. More specifically, why concurrently listing the same large directory is /very/ slow. It seems that sometimes a user's reaction to waiting for 'ls' to complete is to start a few more.. and then their machine takes a
very long time to complete that work.

I can reproduce that finding.  As an example:

time ls -fl /dir/with/200000/entries/ >/dev/null

real    0m10.766s
user    0m0.716s
sys     0m0.827s

But..

for i in {1..10}; do time ls -fl /dir/with/200000/entries/ >/dev/null & done

Each of these ^^ 'ls' commands will take 4 to 5 minutes to complete.

The problem is that concurrent 'ls' commands stack up in nfs_readdir() both waiting on the next page and taking turns filling the next page with xdr, but only one of them will have desc->plus set because setting it clears the flag on the directory. So if a page is filled by a process that doesn't have desc->plus then the next pass through lookup(), it dumps the entire page cache with nfs_force_use_readdirplus(). Then the next readdir starts all over filling the pagecache. Forward progress happens, but only after many
steps back re-filling the pagecache.

Yes, the readdir code was written well before Al’s patches to parallelise the VFS operations, and a lot of it did rely on the inode->i_mutex being
set on the directory by the VFS layer.

How about the following suggestion: instead of setting a flag on the
inode, we iterate through the entries in &nfsi->open_files, and set a flag
on the struct nfs_open_dir_context that the readdir processes can copy
into desc->plus. Does that help with your workload?

That should work.. I guess I'll hack it up and present it for dissection.

Thanks!
Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux