Hi Jeff Yes. If there are too many files in a cluster (100 million levels) far more than cache of a single mds, fixed inodes to special mds, and prevents other mds from loading these inodes, which can achieve better results. This problem is particularly prominent on mds0. This problem is similar to flushcache. Good ratio of ssh and hdd can get better results. Removing the duplicate inodes is equivalent to increasing the total amount of cache. I think. thanks gaoyu > > On Fri, 2019-09-13 at 10:08 +0800, Yan, Zheng wrote: > > On Thu, Sep 12, 2019 at 6:21 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > On Wed, 2019-09-11 at 11:30 -0700, Gregory Farnum wrote: > > > > On Tue, Sep 10, 2019 at 3:11 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > I've no particular objection here, but I'd prefer Greg's ack before we > > > > > merge it, since he raised earlier concerns. > > > > > > > > You have my acked-by in light of Zheng's comments elsewhere and the > > > > evidence that this actually works in some scenarios. > > > > > > > > Might be nice to at least get far enough to generate tickets based on > > > > your questions in the other thread, though: > > > > > > > > > > I'm not sold yet. > > > > > > Why is this something the client should have to worry about at all? Can > > > we do something on the MDS to better handle this situation? This really > > > feels like we're exposing an implementation detail via mount option. > > > > > > > I think we can. make mds return empty DirStat::dist in request reply > > > > I guess that'd make the client think that it wasn't replicated? > > Under what conditions would you have it return that in the reply? Should > we be looking to have the MDS favor forwarding over replication more (as > Greg seems to be suggesting)? > > Note too that I'm not opposed to adding some sort of mitigation for this > problem if needed to help with code that's in the field, but I'd prefer > to address the root cause if we can so the workaround may not be needed > in the future. > > Mount options are harder to deprecate since they'll be in docs forever, > and they are necessarily per-vfsmount. If you do need this, would the > switch be more appropriate as a kernel module parameter instead? > > > > At a bare minimum, if we take this, I'd like to see some documentation. > > > When should a user decide to turn this on or off? There are no > > > guidelines to the use of this thing so far. > > > > > > > > > > On Wed, Sep 11, 2019 at 9:26 AM Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > > In an ideal world, what should happen in this case? Should we be > > > > > changing MDS policy to forward the request in this situation? > > > > > > > > > > This mount option seems like it's exposing something that is really an > > > > > internal implementation detail to the admin. That might be justified, > > > > > but I'm unclear on why we don't expect more saner behavior from the MDS > > > > > on this? > > > > > > > > I think partly it's that early designs underestimated the cost of > > > > replication and overestimated its utility, but I also thought forwards > > > > were supposed to happen more often than replication so I'm curious why > > > > it's apparently not doing that. > > > > -Greg > > > > > > -- > > > Jeff Layton <jlayton@xxxxxxxxxx> > > > > > -- > Jeff Layton <jlayton@xxxxxxxxxx> >