> On Dec 7, 2016, at 08:28, Benjamin Coddington <bcodding@xxxxxxxxxx> wrote: > > I was asked to figure out why the listing of very large directories was > slow. More specifically, why concurrently listing the same large directory > is /very/ slow. It seems that sometimes a user's reaction to waiting for > 'ls' to complete is to start a few more.. and then their machine takes a > very long time to complete that work. > > I can reproduce that finding. As an example: > > time ls -fl /dir/with/200000/entries/ >/dev/null > > real 0m10.766s > user 0m0.716s > sys 0m0.827s > > But.. > > for i in {1..10}; do time ls -fl /dir/with/200000/entries/ >/dev/null & done > > Each of these ^^ 'ls' commands will take 4 to 5 minutes to complete. > > The problem is that concurrent 'ls' commands stack up in nfs_readdir() both > waiting on the next page and taking turns filling the next page with xdr, > but only one of them will have desc->plus set because setting it clears the > flag on the directory. So if a page is filled by a process that doesn't have > desc->plus then the next pass through lookup(), it dumps the entire page > cache with nfs_force_use_readdirplus(). Then the next readdir starts all > over filling the pagecache. Forward progress happens, but only after many > steps back re-filling the pagecache. Yes, the readdir code was written well before Al’s patches to parallelise the VFS operations, and a lot of it did rely on the inode->i_mutex being set on the directory by the VFS layer. How about the following suggestion: instead of setting a flag on the inode, we iterate through the entries in &nfsi->open_files, and set a flag on the struct nfs_open_dir_context that the readdir processes can copy into desc->plus. Does that help with your workload? ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥