Re: [PATCH] nfsd: fallback to sync COPY if async not possible

"NeilBrown" <neilb@xxxxxxx> · Wed, 06 Nov 2024 15:30:54 +1100

On Wed, 06 Nov 2024, Olga Kornievskaia wrote:
> On Tue, Nov 5, 2024 at 4:06 PM NeilBrown <neilb@xxxxxxx> wrote:
> >
> > On Wed, 06 Nov 2024, Chuck Lever wrote:
> > >
> > > Having nfsd threads handle this workload again invites a DoS vector.
> >
> > Any more so that having nfsd thread handling a WRITE workload?
> 
> Isn't a difference between a COPY and a WRITE the fact that the server
> has the ability to restrict the TCP window of the client sending the
> bytes. And for simplicity's sake, if we assume client/server has a
> single TCP stream when the window size is limited then other WRITEs
> are also prevented from sending more data. But a COPY operation
> doesn't require much to be sent and preventing the client from sending
> another COPY can't be achieved thru TCP window size.

I think you are saying that the WRITE requests are naturally throttled. 
However that isn't necessarily the case.  If the network is
significantly faster than the storage path, and if a client (or several
clients) send enough concurrent WRITE requests, then all nfsd threads
could get stuck in writeback throttling code - which could be seen as a
denial of service as no other requests could be serviced until
congestion eases.

So maybe some threads should be reserved for non-IO requests and so
would avoid dequeuing WRITE and READ and COPY requests - and would not
be involved in async COPY.

So I'm still not convinced that COPY in any way introduced a new DoS
problem - providing the limit of async-copy threads is in place.

I wonder if we could use request deferral to delay IO requests until an
IO thread is available...

> 
> > > The 4MB chunk limit is there precisely to prevent synchronous COPY
> > > operations from tying up nfsd threads for too long. On a slow export,
> > > this is still not going to work, so I'd rather see a timer for this
> > > protection; say, 30ms, rather than a byte count. If more than 4MB
> > > can be handled quickly, that will be good for throughput.
> >
> > That sounds like a good goal.  Ideally we would need a way to negotiate
> > a window with write-back throttling so that we don't bother reading
> > until we know that writing to the page-cache won't block.  Certainly
> > worth exploring.
> >
> > >
> > > Note that we still want to limit the number of background copy
> > > operations going on. I don't want a mechanism where a client can
> > > start an unbounded amount of work on the server.
> >
> > This isn't obvious to me.  The server makes no promise concerning the
> > throughput it will provide.  Having a list of COPY requests that add up
> > to many terabytes isn't intrinsically a problem.  Having millions of
> > COPY requests in the list *might* be a little bit of a burden.  Using
> > __GFP_NORETRY might help put a natural limit on that.
> >
> > >
> > >
> > > > > > This came up because CVE-2024-49974 was created so I had to do something
> > > > > > about the theoretical DoS vector in SLE kernels.  I didn't like the
> > > > > > patch so I backported
> > > > > >
> > > > > > Commit 8d915bbf3926 ("NFSD: Force all NFSv4.2 COPY requests to be synchronous")
> 
> I'm doing the same for RHEL. But the fact that CVE was created seems
> like a flaw of the CVE creation process in this case. It should have
> never been made a CVE.

I think everyone agrees that the CVE process is flawed.  Fixing it is
not so easy :-(
I'm glad you are doing the same.  I'm a little concerned that this
disabled inter-server copies too.  I guess the answer to that is to help
resolve the problems with async copy so that it can be re-enabled.

> 
> > > > > >
> > > > > > instead (and wondered why it hadn't gone to stable).
> > > > >
> > > > > I was conservative about requesting a backport here. However, if a
> > > > > CVE has been filed, and if there is no automation behind that
> > > > > process, you can explicitly request aadc3bbea163 be backported.
> > > > >
> > > > > The problem, to me, was less about server resource depletion and
> > > > > more about client hangs.
> > > >
> > > > And yet the patch that dealt with the less important server resource
> > > > depletion was marked for stable, and the patch that dealt with client
> > > > hangs wasn't??
> > > >
> > > > The CVE was for that less important patch, probably because it contained
> > > > the magic word "DoS".
> > >
> > > Quite likely. I wasn't consulted before the CVE was opened, nor was
> > > I notified that it had been created.
> > >
> > > Note that distributions are encouraged to evaluate whether a CVE is
> > > serious enough to address, rather than simply backporting the fixes
> > > automatically. But I know some customers want every CVE handled, so
> > > that is sometimes difficult.
> >
> > Yes, it needs to be handled.  Declaring it invalid is certainly an
> > option for handling it.  I didn't quite feel I could justify that in
> > this case.
> >
> > >
> > >
> > > > I think 8d915bbf3926 should go to stable but I would like to understand
> > > > why you felt the need to be conservative.
> > >
> > > First, I'm told that LTS generally avoids taking backports that
> > > overtly change user-visible behavior like disabling server-to-server
> > > copy (which requires async COPY to work). That was the main reason
> > > for my hesitance.
> >
> > Why does server-to-server require async COPY?
> > RFC 7862 section 4.5.  Inter-Server Copy says
> >   The destination server may perform the copy synchronously or
> >   asynchronously.
> > but I see that nfsd4_copy() returns nfs4err_notsupp if the inter-server
> > copy_is_sync(), but it isn't immediately obvious to me why.  The patch
> > which landed this functionality doesn't explain the restriction.
> 
> The choice (my choice) for not implementing a synchronous inter-server
> copy was from my understanding that such operation would be taking too
> much time and tie up a knfsd thread.

I guess either mount or the read could add indefinite delays... Thanks.

> 
> > I guess that with multiple 4MB sync COPY requests the server would need
> > to repeatedly mount and unmount the source server which could be
> > unnecessary work - or would need to cache the mount and know when to
> > unmount it....
> 
> The implementation caches the (source server) mount for a period of
> time. But needing to issue multiple COPY can eventually impact
> performance.

True, but I think that is the goal.  If there are two many async COPYs,
force the rest to by sync so as the slow down the client and limit the
impact on the server.

Thanks,
NeilBrown