Re: Orangefs ABI documentation

Mike Marshall <hubcap@xxxxxxxxxxxx> · Wed, 10 Feb 2016 22:22:40 -0500

> If there is (or at least supposed to be) something that prevents completions
> of readdir requests (on unrelated directories, by different processes, etc.)
> out of order, PLEASE SAY SO.  I would really prefer not to have to fight
> the readdir side of that mess; cancels are already bad enough ;-/

Hi Al... your ideas sound good to me, I'll try to get you good
answers on stuff like the above sometime tomorrow...

Thanks!

-Mike

On Wed, Feb 10, 2016 at 7:44 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Tue, Feb 09, 2016 at 11:13:28PM +0000, Al Viro wrote:
>> On Tue, Feb 09, 2016 at 10:40:50PM +0000, Al Viro wrote:
>>
>> > And the version in orangefs-2.9.3.tar.gz (your Frankenstein module?) is
>> > vulnerable to the same race.  2.8.1 isn't - it ignores signals on the
>> > cancel, but that means waiting for cancel to be processed (or timed out)
>> > on any interrupted read() before we return to userland.  We can return
>> > to that behaviour, of course, but I suspect that offloading it to something
>> > async (along with freeing the slot used by original operation) would be
>> > better from QoI point of view.
>>
>> That breakage had been introduced between 2.8.5 and 2.8.6 (at some point
>> during the spring of 2012).  AFAICS, all versions starting with 2.8.6 are
>> vulnerable...
>
> Matter of fact, older versions are _also_ broken, but it's much harder
> to trigger; you need the daemon to stall long enough for read() to time
> out, then everything as before.  After 2.8.5 all it takes on the read()
> side is a SIGKILL to read() caller just after the daemon has picked your
> request...
>
> TBH, it's tempting to kill the "wait for cancel to be finished" logics
> completely, mark cancel requests with a flag and stash the slot number
> into them.  Then, if flag is present, let set_op_state_purged() and
> set_op_state_serviced() callers release the slot and drop the request.
> And have wait_for_direct_io() treat the "need to cancel" case as "fire
> and forget" - no waiting for anything, no releasing the slots, just free
> the original op and bugger off.
>
> The set_op_state_purged() part is where it gets delicate - we want to remove
> the sucker from list/hash before dropping it.  AFAICS, it's doable - there's
> nothing in the "release slot and drop the request" that would not be allowed
> under the list/hash spinlocks...
>
> In principle, readdir problem could also be handled in a similar way, but
> there we'd need to have the interrupted readdir itself marked with that
> new flag and left in in-progress hash, so that when the response finally
> arrives it would be found and slot freeing would be handled ;-/  Doable,
> but not fun...
>
> If there is (or at least supposed to be) something that prevents completions
> of readdir requests (on unrelated directories, by different processes, etc.)
> out of order, PLEASE SAY SO.  I would really prefer not to have to fight
> the readdir side of that mess; cancels are already bad enough ;-/
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html