Re: handle_async_copy calling kzalloc under spinlock

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Mon, 19 Nov 2018 16:05:23 -0500

On Fri, Nov 16, 2018 at 03:11:58PM -0500, Olga Kornievskaia wrote:
> On Fri, Nov 16, 2018 at 2:58 PM J. Bruce Fields <bfields@xxxxxxxxxxxx> wrote:
> > That race is discussed in
> > https://tools.ietf.org/html/rfc5661#section-2.10.6.3 and is supposed to
> > be dealt with by using referring triples and/or returning DELAY.
> 
> I believe those are suggestions and not mandates? A client can't rely
> that the server will implement referring sequence information. Sending
> "delay" to the server might be an option but it's an option that most
> like will interfere with performance as well?

Yes, I suppose the server either needs to implement referring triples or
retry pretty aggressively.

(By the way, I wonder if the server should always do synchronous copies
for copies smaller than a certain threshhold.  Might be hard to choose
the threshhold, though.)

> > > > And shouldn't CB_OFFLOAD be returning bad_stateid in the case it doesn't
> > > > recognize the given stateid?
> > >
> > > It could but what should the server do in this case. I would imagine
> > > it wouldn't do anything. There is nothing it can do. So now we have a
> > > copy that send the call and is going to wait on the reply which will
> > > never come as the 1st one came and we rejected it and now copy will
> > > wait forever.
> > >
> > > Please describe what "is wrong" with the current implementation.  I
> > > believe it provide a reasonable solution to the race condition.
> >
> > Looks like a server that sends bad stateids in callbacks could cause you
> > to allocate something that will never get freed.
> 
> I thought the philosophy was that client shouldn't be coded to a
> broken server. If needed, we can later on add a cleanup thread that
> goes thru the list and removes really old entries.

I suppose so.

I don't know, this design still makes me pretty uncomfortable, but I
guess I haven't come up with a strong reason it couldn't work.

--b.