On 7 Dec 2016, at 10:46, Trond Myklebust wrote:
On Dec 7, 2016, at 08:28, Benjamin Coddington <bcodding@xxxxxxxxxx>
wrote:
I was asked to figure out why the listing of very large directories
was
slow. More specifically, why concurrently listing the same large
directory
is /very/ slow. It seems that sometimes a user's reaction to waiting
for
'ls' to complete is to start a few more.. and then their machine
takes a
very long time to complete that work.
I can reproduce that finding. As an example:
time ls -fl /dir/with/200000/entries/ >/dev/null
real 0m10.766s
user 0m0.716s
sys 0m0.827s
But..
for i in {1..10}; do time ls -fl /dir/with/200000/entries/ >/dev/null
& done
Each of these ^^ 'ls' commands will take 4 to 5 minutes to complete.
The problem is that concurrent 'ls' commands stack up in
nfs_readdir() both
waiting on the next page and taking turns filling the next page with
xdr,
but only one of them will have desc->plus set because setting it
clears the
flag on the directory. So if a page is filled by a process that
doesn't have
desc->plus then the next pass through lookup(), it dumps the entire
page
cache with nfs_force_use_readdirplus(). Then the next readdir starts
all
over filling the pagecache. Forward progress happens, but only after
many
steps back re-filling the pagecache.
Yes, the readdir code was written well before Al’s patches to
parallelise
the VFS operations, and a lot of it did rely on the inode->i_mutex
being
set on the directory by the VFS layer.
How about the following suggestion: instead of setting a flag on the
inode, we iterate through the entries in &nfsi->open_files, and set a
flag
on the struct nfs_open_dir_context that the readdir processes can copy
into desc->plus. Does that help with your workload?
That should work.. I guess I'll hack it up and present it for
dissection.
Thanks!
Ben
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html