Re: [PATCH] nfsd: fallback to sync COPY if async not possible

Chuck Lever <chuck.lever@xxxxxxxxxx> · Wed, 6 Nov 2024 09:18:05 -0500

On Wed, Nov 06, 2024 at 03:30:54PM +1100, NeilBrown wrote:
> On Wed, 06 Nov 2024, Olga Kornievskaia wrote:
> > On Tue, Nov 5, 2024 at 4:06 PM NeilBrown <neilb@xxxxxxx> wrote:
> > >
> > > On Wed, 06 Nov 2024, Chuck Lever wrote:
> > > >
> > > > Having nfsd threads handle this workload again invites a DoS vector.
> > >
> > > Any more so that having nfsd thread handling a WRITE workload?
> > 
> > Isn't a difference between a COPY and a WRITE the fact that the server
> > has the ability to restrict the TCP window of the client sending the
> > bytes. And for simplicity's sake, if we assume client/server has a
> > single TCP stream when the window size is limited then other WRITEs
> > are also prevented from sending more data. But a COPY operation
> > doesn't require much to be sent and preventing the client from sending
> > another COPY can't be achieved thru TCP window size.
> 
> I think you are saying that the WRITE requests are naturally throttled. 
> However that isn't necessarily the case.  If the network is
> significantly faster than the storage path, and if a client (or several
> clients) send enough concurrent WRITE requests, then all nfsd threads
> could get stuck in writeback throttling code - which could be seen as a
> denial of service as no other requests could be serviced until
> congestion eases.

I agree that WRITE throttling needs attention too, but limiting
background COPY seems like the more urgent matter. Unlike READ or
WRITE, COPY isn't restricted to a single 1MB chunk.

> So maybe some threads should be reserved for non-IO requests and so
> would avoid dequeuing WRITE and READ and COPY requests - and would not
> be involved in async COPY.

I'm not enthusiastic about "reserving threads for other purposes".
The mechanism that services incoming network requests is simply not
at the right layer to sort I/O and non-I/O NFS requests.

Instead NFSD will need some kind of two-layer request handling
mechanism where any request that might take time or resources will
need to be moved out of the layer that handles ingress network
requests. I hesitate to call it deferral, because I suspect it will
be a hot path, unlike the current svc_defer.

> So I'm still not convinced that COPY in any way introduced a new DoS
> problem - providing the limit of async-copy threads is in place.

Because the kernel doesn't do background COPY at all right now, I'm
going to defer this change until after the v6.13 merge window. The
patch proposed at the beginning of this thread doesn't seem like a
fix to me, but rather a change in policy.

> > > > > > > This came up because CVE-2024-49974 was created so I had to do something
> > > > > > > about the theoretical DoS vector in SLE kernels.  I didn't like the
> > > > > > > patch so I backported
> > > > > > >
> > > > > > > Commit 8d915bbf3926 ("NFSD: Force all NFSv4.2 COPY requests to be synchronous")
> > 
> > I'm doing the same for RHEL. But the fact that CVE was created seems
> > like a flaw of the CVE creation process in this case. It should have
> > never been made a CVE.
> 
> I think everyone agrees that the CVE process is flawed.  Fixing it is
> not so easy :-(
> I'm glad you are doing the same.  I'm a little concerned that this
> disabled inter-server copies too.  I guess the answer to that is to help
> resolve the problems with async copy so that it can be re-enabled.

We've been working on restoring background COPY all summer. It is a
priority.

One issue is that a callback completion mechanism such as CB_OFFLOAD
is inherently unreliable. The SCSI community has loads of
implementation experience that backs this statement up. I've got an
implementation of OFFLOAD_STATUS ready for the Linux NFS client so
that it can poll for COPY completion if it thinks it has missed the
CB_OFFLOAD callback.

Another issue here is that RFC 7862 Section 4.8 wants each COPY
stateid to remain until the server sees the CB_OFFLOAD response.
NFSD needs to have a mechanism in place to ensure that a rogue
client or poor network conditions don't allow the set of waiting
COPY stateids to grow large over time.

-- 
Chuck Lever