Re: [PATCH RFC v5 0/2] nfsd: Initial implementation of NFSv4 Courteous Server

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2021-11-30 at 04:47 +0000, Chuck Lever III wrote:
> 
> > On Nov 29, 2021, at 11:08 PM, Trond Myklebust
> > <trondmy@xxxxxxxxxxxxxxx> wrote:
> > 
> > On Tue, 2021-11-30 at 01:42 +0000, Chuck Lever III wrote:
> > > 
> > > > > On Nov 29, 2021, at 7:11 PM, Dai Ngo <dai.ngo@xxxxxxxxxx>
> > > > > wrote:
> > > > 
> > > > 
> > > > > On 11/29/21 1:10 PM, Chuck Lever III wrote:
> > > > > 
> > > > > > > On Nov 29, 2021, at 2:36 PM, Dai Ngo <dai.ngo@xxxxxxxxxx>
> > > > > > > wrote:
> > > > > > 
> > > > > > 
> > > > > > On 11/29/21 11:03 AM, Chuck Lever III wrote:
> > > > > > > Hello Dai!
> > > > > > > 
> > > > > > > 
> > > > > > > > On Nov 29, 2021, at 1:32 PM, Dai Ngo
> > > > > > > > <dai.ngo@xxxxxxxxxx>
> > > > > > > > wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > On 11/29/21 9:30 AM, J. Bruce Fields wrote:
> > > > > > > > > On Mon, Nov 29, 2021 at 09:13:16AM -0800,
> > > > > > > > > dai.ngo@xxxxxxxxxx wrote:
> > > > > > > > > > Hi Bruce,
> > > > > > > > > > 
> > > > > > > > > > On 11/21/21 7:04 PM, dai.ngo@xxxxxxxxxx wrote:
> > > > > > > > > > > On 11/17/21 4:34 PM, J. Bruce Fields wrote:
> > > > > > > > > > > > On Wed, Nov 17, 2021 at 01:46:02PM -0800,
> > > > > > > > > > > > dai.ngo@xxxxxxxxxx wrote:
> > > > > > > > > > > > > On 11/17/21 9:59 AM,
> > > > > > > > > > > > > dai.ngo@xxxxxxxxxx wrote:
> > > > > > > > > > > > > > On 11/17/21 6:14 AM, J. Bruce Fields wrote:
> > > > > > > > > > > > > > > On Tue, Nov 16, 2021 at 03:06:32PM -0800,
> > > > > > > > > > > > > > > dai.ngo@xxxxxxxxxx wrote:
> > > > > > > > > > > > > > > > Just a reminder that this patch is
> > > > > > > > > > > > > > > > still
> > > > > > > > > > > > > > > > waiting for your review.
> > > > > > > > > > > > > > > Yeah, I was procrastinating and hoping
> > > > > > > > > > > > > > > yo'ud
> > > > > > > > > > > > > > > figure out the pynfs
> > > > > > > > > > > > > > > failure for me....
> > > > > > > > > > > > > > Last time I ran 4.0 OPEN18 test by itself
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > it passed. I will run
> > > > > > > > > > > > > > all OPEN tests together with 5.15-rc7 to
> > > > > > > > > > > > > > see if
> > > > > > > > > > > > > > the problem you've
> > > > > > > > > > > > > > seen still there.
> > > > > > > > > > > > > I ran all tests in nfsv4.1 and nfsv4.0 with
> > > > > > > > > > > > > courteous and non-courteous
> > > > > > > > > > > > > 5.15-rc7 server.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Nfs4.1 results are the same for both
> > > > > > > > > > > > > courteous
> > > > > > > > > > > > > and
> > > > > > > > > > > > > non-courteous server:
> > > > > > > > > > > > > > Of those: 0 Skipped, 0 Failed, 0 Warned,
> > > > > > > > > > > > > > 169
> > > > > > > > > > > > > > Passed
> > > > > > > > > > > > > Results of nfs4.0 with non-courteous server:
> > > > > > > > > > > > > > Of those: 8 Skipped, 1 Failed, 0 Warned,
> > > > > > > > > > > > > > 577
> > > > > > > > > > > > > > Passed
> > > > > > > > > > > > > test failed: LOCK24
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Results of nfs4.0 with courteous server:
> > > > > > > > > > > > > > Of those: 8 Skipped, 3 Failed, 0 Warned,
> > > > > > > > > > > > > > 575
> > > > > > > > > > > > > > Passed
> > > > > > > > > > > > > tests failed: LOCK24, OPEN18, OPEN30
> > > > > > > > > > > > > 
> > > > > > > > > > > > > OPEN18 and OPEN30 test pass if each is run by
> > > > > > > > > > > > > itself.
> > > > > > > > > > > > Could well be a bug in the tests, I don't know.
> > > > > > > > > > > The reason OPEN18 failed was because the test
> > > > > > > > > > > timed
> > > > > > > > > > > out waiting for
> > > > > > > > > > > the reply of an OPEN call. The RPC connection
> > > > > > > > > > > used
> > > > > > > > > > > for the test was
> > > > > > > > > > > configured with 15 secs timeout. Note that OPEN18
> > > > > > > > > > > only fails when
> > > > > > > > > > > the tests were run with 'all' option, this test
> > > > > > > > > > > passes if it's run
> > > > > > > > > > > by itself.
> > > > > > > > > > > 
> > > > > > > > > > > With courteous server, by the time OPEN18 runs,
> > > > > > > > > > > there
> > > > > > > > > > > are about 1026
> > > > > > > > > > > courtesy 4.0 clients on the server and all of
> > > > > > > > > > > these
> > > > > > > > > > > clients have opened
> > > > > > > > > > > the same file X with WRITE access. These clients
> > > > > > > > > > > were
> > > > > > > > > > > created by the
> > > > > > > > > > > previous tests. After each test completed, since
> > > > > > > > > > > 4.0
> > > > > > > > > > > does not have
> > > > > > > > > > > session, the client states are not cleaned up
> > > > > > > > > > > immediately on the
> > > > > > > > > > > server and are allowed to become courtesy
> > > > > > > > > > > clients.
> > > > > > > > > > > 
> > > > > > > > > > > When OPEN18 runs (about 20 minutes after the 1st
> > > > > > > > > > > test
> > > > > > > > > > > started), it
> > > > > > > > > > > sends OPEN of file X with OPEN4_SHARE_DENY_WRITE
> > > > > > > > > > > which causes the
> > > > > > > > > > > server to check for conflicts with courtesy
> > > > > > > > > > > clients.
> > > > > > > > > > > The loop that
> > > > > > > > > > > checks 1026 courtesy clients for share/access
> > > > > > > > > > > conflict took less
> > > > > > > > > > > than 1 sec. But it took about 55 secs, on my VM,
> > > > > > > > > > > for
> > > > > > > > > > > the server
> > > > > > > > > > > to expire all 1026 courtesy clients.
> > > > > > > > > > > 
> > > > > > > > > > > I modified pynfs to configure the 4.0 RPC
> > > > > > > > > > > connection
> > > > > > > > > > > with 60 seconds
> > > > > > > > > > > timeout and OPEN18 now consistently passed. The
> > > > > > > > > > > 4.0
> > > > > > > > > > > test results are
> > > > > > > > > > > now the same for courteous and non-courteous
> > > > > > > > > > > server:
> > > > > > > > > > > 
> > > > > > > > > > > 8 Skipped, 1 Failed, 0 Warned, 577 Passed
> > > > > > > > > > > 
> > > > > > > > > > > Note that 4.1 tests do not suffer this timeout
> > > > > > > > > > > problem because the
> > > > > > > > > > > 4.1 clients and sessions are destroyed after each
> > > > > > > > > > > test completes.
> > > > > > > > > > Do you want me to send the patch to increase the
> > > > > > > > > > timeout for pynfs?
> > > > > > > > > > or is there any other things you think we should
> > > > > > > > > > do?
> > > > > > > > > I don't know.
> > > > > > > > > 
> > > > > > > > > 55 seconds to clean up 1026 clients is about 50ms per
> > > > > > > > > client, which is
> > > > > > > > > pretty slow.  I wonder why.  I guess it's probably
> > > > > > > > > updating the stable
> > > > > > > > > storage information.  Is /var/lib/nfs/ on your server
> > > > > > > > > backed by a hard
> > > > > > > > > drive or an SSD or something else?
> > > > > > > > My server is a virtualbox VM that has 1 CPU, 4GB RAM
> > > > > > > > and
> > > > > > > > 64GB of hard
> > > > > > > > disk. I think a production system that supports this
> > > > > > > > many
> > > > > > > > clients should
> > > > > > > > have faster CPUs, faster storage.
> > > > > > > > 
> > > > > > > > > I wonder if that's an argument for limiting the
> > > > > > > > > number of
> > > > > > > > > courtesy
> > > > > > > > > clients.
> > > > > > > > I think we might want to treat 4.0 clients a bit
> > > > > > > > different
> > > > > > > > from 4.1
> > > > > > > > clients. With 4.0, every client will become a courtesy
> > > > > > > > client after
> > > > > > > > the client is done with the export and unmounts it.
> > > > > > > It should be safe for a server to purge a client's lease
> > > > > > > immediately
> > > > > > > if there is no open or lock state associated with it.
> > > > > > In this case, each client has opened files so there are
> > > > > > open
> > > > > > states
> > > > > > associated with them.
> > > > > > 
> > > > > > > When an NFSv4.0 client unmounts, all files should be
> > > > > > > closed
> > > > > > > at that
> > > > > > > point,
> > > > > > I'm not sure pynfs does proper clean up after each subtest,
> > > > > > I
> > > > > > will
> > > > > > check. There must be state associated with the client in
> > > > > > order
> > > > > > for
> > > > > > it to become courtesy client.
> > > > > Makes sense. Then a synthetic client like pynfs can DoS a
> > > > > courteous
> > > > > server.
> > > > > 
> > > > > 
> > > > > > > so the server can wait for the lease to expire and purge
> > > > > > > it
> > > > > > > normally. Or am I missing something?
> > > > > > When 4.0 client lease expires and there are still states
> > > > > > associated
> > > > > > with the client then the server allows this client to
> > > > > > become
> > > > > > courtesy
> > > > > > client.
> > > > > I think the same thing happens if an NFSv4.1 client neglects
> > > > > to
> > > > > send
> > > > > DESTROY_SESSION / DESTROY_CLIENTID. Either such a client is
> > > > > broken
> > > > > or malicious, but the server faces the same issue of
> > > > > protecting
> > > > > itself from a DoS attack.
> > > > > 
> > > > > IMO you should consider limiting the number of courteous
> > > > > clients
> > > > > the server can hold onto. Let's say that number is 1000. When
> > > > > the
> > > > > server wants to turn a 1001st client into a courteous client,
> > > > > it
> > > > > can simply expire and purge the oldest courteous client on
> > > > > its
> > > > > list. Otherwise, over time, the 24-hour expiry will reduce
> > > > > the
> > > > > set of courteous clients back to zero.
> > > > > 
> > > > > What do you think?
> > > > 
> > > > Limiting the number of courteous clients to handle the cases of
> > > > broken/malicious 4.1 clients seems reasonable as the last
> > > > resort.
> > > > 
> > > > I think if a malicious 4.1 clients could mount the server's
> > > > export,
> > > > opens a file (to create state) and repeats the same with a
> > > > different
> > > > client id then it seems like some basic security was already
> > > > broken;
> > > > allowing unauthorized clients to mount server's exports.
> > > 
> > > You can do this today with AUTH_SYS. I consider it a genuine
> > > attack
> > > surface.
> > > 
> > > 
> > > > I think if we have to enforce a limit, then it's only for
> > > > handling
> > > > of seriously buggy 4.1 clients which should not be the norm.
> > > > The
> > > > issue with this is how to pick an optimal number that is
> > > > suitable
> > > > for the running server which can be a very slow or a very fast
> > > > server.
> > > > 
> > > > Note that even if we impose an limit, that does not completely
> > > > solve
> > > > the problem with pynfs 4.0 test since its RPC timeout is
> > > > configured
> > > > with 15 secs which just enough to expire 277 clients based on
> > > > 53ms
> > > > for each client, unless we limit it ~270 clients which I think
> > > > it's
> > > > too low.
> > > > 
> > > > This is what I plan to do:
> > > > 
> > > > 1. do not support 4.0 courteous clients, for sure.
> > > 
> > > Not supporting 4.0 isn’t an option, IMHO. It is a fully supported
> > > protocol at this time, and the same exposure exists for 4.1, it’s
> > > just a little harder to exploit.
> > > 
> > > If you submit the courteous server patch without support for 4.0,
> > > I
> > > think it needs to include a plan for how 4.0 will be added later.
> > > 
> > > > 
> > 
> > Why is there a problem here? The requirements are the same for 4.0
> > and
> > 4.1 (or 4.2). If the lease under which the courtesy lock was
> > established has expired, then that courtesy lock must be released
> > if
> > some other client requests a lock that conflicts with the cached
> > lock
> > (unless the client breaks the courtesy framework by renewing that
> > original lease before the conflict occurs). Otherwise, it is
> > completely
> > up to the server when it decides to actually release the lock.
> > 
> > For NFSv4.1 and NFSv4.2, we have DESTROY_CLIENTID, which tells the
> > server when the client is actually done with the lease, making it
> > easy
> > to determine when it is safe to release all the courtesy locks.
> > However
> > if the client does not send DESTROY_CLIENTID, then we're in the
> > same
> > situation with 4.x (x>0) as we would be with bog standard NFSv4.0.
> > The
> > lease has expired, and so the courtesy locks are liable to being
> > dropped.
> 
> I agree the situation is the same for all minor versions.
> 
> 
> > At Hammerspace we have implemented courtesy locks, and our strategy
> > is
> > that when a conflict occurs, we drop the entire set of courtesy
> > locks
> > so that we don't have to deal with the "some locks were revoked"
> > scenario. The reason is that when we originally implemented
> > courtesy
> > locks, the Linux NFSv4 client support for lock revocation was a lot
> > less sophisticated than today. My suggestion is that you might
> > therefore consider starting along this path, and then refining the
> > support to make revocation more nuanced once you are confident that
> > the
> > coarser strategy is working as expected.
> 
> Dai’s implementation does all that, and takes the coarser approach at
> the moment. There are plans to explore the more nuanced behavior (by
> revoking only the conflicting lock instead of dropping the whole
> lease) after this initial work is merged.
> 
> The issue is there are certain pathological client behaviors (whether
> malicious or accidental) that can run the server out of resources,
> since it is holding onto lease state for a much longer time. We are
> simply trying to design a lease garbage collection scheme to meet
> that challenge.
> 
> I think limiting the number of courteous clients is a simple way to
> do this, but we could also shorten the courtesy lifetime as more
> clients enter that state, to ensure that they don’t overrun the
> server’s memory. Another approach might be to add a shrinker that
> purges the oldest courteous clients when the server comes under
> memory pressure.
> 
> 

We already have a scanner that tries to release all client state after
1 lease period. Just extend that to do it after 10 lease periods. If a
network partition hasn't recovered after 10 minutes, you probably have
bigger problems.

You can limit the number of clients as well, but that leads into a rats
nest of other issues that have nothing to do with courtesy locks and
everything to do with the fact that any client can hold a lot of state.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx






[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux