Re: Is it supposed to be ok to call del_gendisk while userspace is frozen?

Nigel Cunningham <ncunningham@xxxxxxxxxxx> · Tue, 18 May 2010 08:51:23 +1000

Hi.

On 18/05/10 06:35, Rafael J. Wysocki wrote:
> On Monday 17 May 2010, Nigel Cunningham wrote:
>> On 17/05/10 12:22, Alan Stern wrote:
>>> On Mon, 17 May 2010, Nigel Cunningham wrote:
>>>>>> I object to the patch.
>>>>>>
>>>>>> Tell the patch it ought to exit once thawed, by all means.
>>>>>
>>>>> I'm not sure what you mean.  Care to explain?
>>>>
>>>> I mean "Set up some sort of flag that it can look at once thawed at
>>>> resume time, and use that to tell it to exit at that point."
>>>
>>> Doesn't the patch do exactly that?  The "flag" is set by virtue of the
>>> fact that this is part of del_gendisk -- which means the disk is being
>>> unregistered and hence the writeback thread will exit shortly.
>>>
>>>>>> Make the patch unfreezeable to begin with, by all means.
>>>>>
>>>>> That wouldn't work.
>>>>
>>>> Why not?
>>>
>>> It would be nice to know exactly why.  Perhaps the underlying problem
>>> can be fixed.
>>>
>>>>>> If you know a disk is going to be unregistered during resume,
>>>>>
>>>>> How do we check that, exactly?
>>>>
>>>> Well, if you can figure out that you need to go down this path at this
>>>> point in the process, you must be able to apply the same logic to come
>>>> to the same conclusion earlier in the process.
>>>
>>> That's not true.  You don't know that a device is going to be unplugged
>>> until it actually _is_ unplugged.
>>
>> Sorry - I got unregistered during suspend (instead of resume) in my
>> head. That said, I'd argue that we should be...
>>
>> 1) Syncing all the data at the start of the suspend/hibernate, so
>> there's nothing for the workthread to do if we do del_gendisk.
>> 2) Telling things to exit if we do find the device is gone away at
>> resume time, but not relying on the going-away happening until post
>> process thaw, for a couple of reasons:
>> - Potential for races/confusion/mess etc in having $random process
>> thawing other processes. Only the thread doing the suspend/hibernate
>> should be freezing/thawing.
>
> I don't see a problem here, as far as kernel threads are concerned.  In this
> particular case this is a subsystem thawing a thread that belongs to it.  No
> problem.
>
>> - We're dealing with the symptom, not the cause. Almost always a bad idea.
>
> I very much prefer to have a fix for a symptom than no fix at all, which is the
> realistic alternative in this case.
>
> So, I think we should merge the patch and if someone finds the root cause
> at one point in future, then we can just use the *right* approach instead of
> the present one.
>
> The problem is real and people in the field are affected by it, so if you don't
> have a working alternative patch, please just let go.

I'm not denying that the problem is real. What I am concerned about is 
finding a real solution, not just putting a sticky plaster over the 
wound. It seems to me to be much wiser to deal with the issue properly 
now instead of doing extra work later to diagnose what might be a harder 
to reproduce symptom of the same problem. I'd happily put the time in 
now myself, but I simply don't have the time this week.

Would it be possible to apply the patch, adding some sort of new tag 
that can be used to say "This needs further attention", perhaps 
including an enduring reference to this conversation. Later, the 'real' 
fix could include another special tag that says "Proper fix for the 
symptom addressed in commit 5e94f810"?

Regards,

Nigel
_______________________________________________
linux-pm mailing list
linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/linux-pm