Re: [RFC PATCH] sunrpc: do not allow process to freeze within RPC state machine

Steve French <smfrench@xxxxxxxxx> · Thu, 4 Aug 2016 10:08:40 -0500



Adding linux-cifs to the discussion of Jeff's patch

https://git.samba.org/?p=jlayton/linux.git;a=commit;h=f7009a84dbf0da1ab8805b869fedffa23c605cc4

On Thu, Aug 4, 2016 at 8:57 AM, Stanislav Kinsburskiy
<skinsbursky@xxxxxxxxxxxxx> wrote:
>
>
> 04.08.2016 15:16, Jeff Layton пишет:
>
>> On Thu, 2016-08-04 at 12:55 +0200, Stanislav Kinsburskiy wrote:
>>>
>>> 03.08.2016 19:36, Jeff Layton пишет:
>>>>
>>>> On Wed, 2016-08-03 at 20:54 +0400, Stanislav Kinsburskiy wrote:
>>>>>
>>>>> Otherwise freezer cgroup state might never become "FROZEN".
>>>>>
>>>>> Here is a deadlock scheme for 2 processes in one freezer cgroup,
>>>>> which is
>>>>> freezing:
>>>>>
>>>>> CPU 0                                   CPU 1
>>>>> --------                                --------
>>>>> do_last
>>>>> inode_lock(dir->d_inode)
>>>>> vfs_create
>>>>> nfs_create
>>>>> ...
>>>>> __rpc_execute
>>>>> rpc_wait_bit_killable
>>>>> __refrigerator
>>>>>                                           do_last
>>>>>                                           inode_lock(dir->d_inode)
>>>>>
>>>>> So, the problem is that one process takes directory inode mutex,
>>>>> executes
>>>>> creation request and goes to refrigerator.
>>>>> Another one waits till directory lock is released, remains "thawed"
>>>>> and thus
>>>>> freezer cgroup state never becomes "FROZEN".
>>>>>
>>>>> Notes:
>>>>> 1) Interesting, that this is not a pure deadlock: one can thaw cgroup
>>>>> and then
>>>>> freeze it again.
>>>>> 2) The issue was introduced by commit
>>>>> d310310cbff18ec385c6ab4d58f33b100192a96a.
>>>>> 3) This patch is not aimed to fix the issue, but to show the problem
>>>>> root.
>>>>> Look like this problem moght be applicable to other hunks from the
>>>>> commit,
>>>>> mentioned above.
>>>>>
>>>>>
>>>>>>>> Signed-off-by: Stanislav Kinsburskiy <skinsbursky@xxxxxxxxxxxxx>
>>>>>
>>>>> ---
>>>>>    net/sunrpc/sched.c |    1 -
>>>>>    1 file changed, 1 deletion(-)
>>>>>
>>>>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
>>>>> index 9ae5885..ec7ccc1 100644
>>>>> --- a/net/sunrpc/sched.c
>>>>> +++ b/net/sunrpc/sched.c
>>>>> @@ -253,7 +253,6 @@ EXPORT_SYMBOL_GPL(rpc_destroy_wait_queue);
>>>>>       static int rpc_wait_bit_killable(struct wait_bit_key *key, int
>>>>> mode)
>>>>>    {
>>>>>>>>
>>>>>>>> -       freezable_schedule_unsafe();
>>>>>>>>         if (signal_pending_state(mode, current))
>>>>>>>>                 return -ERESTARTSYS;
>>>>>>>>         return 0;
>>>>
>>>> Ummm...so what actually does the schedule() with this patch?
>>>
>>> Schedule() replaces freezable_schedule_unsafe() of course, sorry for
>>> this.
>>>
>>>> There was a bit of discussion on this recently -- see the thread with
>>>> this subject line in linux-nfs:
>>>>
>>>>       Re: Hang due to nfs letting tasks freeze with locked inodes
>>>
>>> Thanks, had a look.
>>>
>>>> Basically it comes down to this:
>>>>
>>>> All of the proposals so far to fix this problem just switch out the
>>>> freezable_schedule_unsafe (and similar) calls for those that don't
>>>> allow the process to freeze.
>>>>
>>>> The problem there is that we originally added that stuff in response to
>>>> bug reports about machines failing to suspend. What often happens is
>>>> that the network interfaces come down, and then the freezer runs over
>>>> all of the processes, which never return because they're blocked
>>>> waiting on the server to reply.
>>>
>>> I probably don't understand something, but this sounds somewhat wrong to
>>> me: freezing processes _after_ network is down.
>>>
>>>
>>>>
>>>> ...shrug...
>>>>
>>>> Maybe we should just go ahead and do it (and to CIFS as well). Just be
>>>> prepared for the inevitable complaints about laptops failing to suspend
>>>> once you do.
>>>
>>> The worst part in all of this, from my POW, is that current behavior
>>> makes NFS non-freezable in a generic case, even in case of freezing a
>>> container, which has it's own net ns and NFS mount.
>>> So, I would say, that returning of previous logic would make the
>>> world better.
>>>
>>>> Part of the fix, I think is to add a return code (similar to
>>>> ERESTARTSYS) that gets interpreted near the kernel-userland boundary
>>>> as: "allow the process to be frozen, and then retry the call once it's
>>>> resumed".
>>>>
>>>> With that, filesystems could return the error code when they want to
>>>> redrive the entire syscall from that level. That won't work for non-
>>>> idempotent requests though. We'd need to do something more elaborate
>>>> there.
>>>>
>>> Might be, that breaking rpc request is something that should be avoided
>>> at all.
>>> With all these locks being held, almost all (any?) of the requests to
>>> remote server
>>> should be considered as an atomic operation from freezer point of view.
>>> The process always can be frozen on signal handling.
>>>
>>> IOW, I might worth considering a scenario, when NFS is not freezable at
>>> all,
>>> and any problems with suspend on laptops/whatever have to solved in
>>> suspend code.
>>>
>>>
>> Fair enough. At this point, I don't care much one way or another. Maybe
>> if we make this change and laptops start failing to suspend, we'll be
>> able to use that as leverage pursue other avenues to make the
>> suspend/resume subsystem work with NFS.
>>
>> That said, the patch you have really isn't sufficient. There are places
>> where the NFS client can sleep while waiting for things other than RPC
>> calls.
>
>
> Sure. As I said, this patch wasn't aimed to fix the issue but rather start
> the discussion.
> Thanks for your patch.
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


-- 
Thanks,

Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html