Re: 2.6.37.1 s2disk regression (TPM)

Stefan Berger <stefanb@xxxxxxxxxxxxxxxxxx> · Tue, 22 Feb 2011 06:57:44 -0500

On 02/22/2011 03:41 AM, Jiri Slaby wrote:
> On 02/22/2011 01:42 AM, Stefan Berger wrote:
>> On 02/21/2011 05:10 PM, Jiri Slaby wrote:
>>> On 02/21/2011 11:07 PM, Rajiv Andrade wrote:
>>>> On 02/21/2011 06:44 PM, Jiri Slaby wrote:
>>>>> On 02/21/2011 10:29 PM, Stefan Berger wrote:
>>>>>> On 02/21/2011 03:39 PM, Jiri Slaby wrote:
>>>>>>> On 02/21/2011 06:12 PM, Rajiv Andrade wrote:
>>>>>>>> On 02/21/2011 01:34 PM, Jiri Slaby wrote:
>>>>>>>>> There has to be another problem which caused my regression. And
>>>>>>>>> since it
>>>>>>>>> reports "Operation Timed out", the former default timeout values
>>>>>>>>> worked
>>>>>>>>> for me, the ones read from TPM do not.
>>>>>>>> Yes, it's highly due inconsistent timeout values reported by the
>>>>>>>> TPM as
>>>>>>>> I mentioned, my working timeouts are:
>>>>>>>> 3020000 4510000 181000000
>>>>>>> 1000000 2000 150000
>>>>>>>
>>>>>>> Actually the first one from HW is 1. This is one is HZ after
>>>>>>> correction
>>>>>>> in get_timeout. So perhaps it is in ms, yes.
>>>>>> Following the specs, the timeouts are supposed to be in
>>>>>> microseconds and
>>>>>> ascending order for short, medium and long duration. Of course, if the
>>>>>> device returns wrong timeouts, the command isn't going to succeed,
>>>>>> failing the suspend in this case. Nevertheless, I think we need the
>>>>>> patch I put in but at the same time we'll need a work-around for
>>>>>> devices
>>>>>> like this.
>>>>> Yes, the patch is correct per se. But as it breaks bunch of machines it
>>>>> cannot go in now. The rule is no regressions.
>>>>>
>>>>> After you have the workaround it should go into the next rc1 after
>>>>> that.
>>>>> Do you plan to add a dmi-based quirk? Or, IOW do you want me to attach
>>>>> dmidecode output? Or are you going to base it solely on TPM
>>>>> manufacturer/version
>>>> It's more reliable to base the workaround on the values themselves,
>>>> instead of the TPM's ID, since
>>>> we don't know whether other models will behave similarly.
>>> As I wrote, you may base it on dmi data.
>>>
>>>> It should be fine then to extend the existing workaround for short
>>>> timeouts to the medium and long ones.
>>> OK, but how will you guess the values?
>> One way of doing it would be to at least make sure that the timeouts are
>>
>> short<  medium<  long
>>
>> and if that's not true, as in the case of your TPM, set the timeouts to
>> 0 and have Rajiv's work-around kick in  OR we assign the same high
>> values to the timeouts explicily that Rajiv's work-around is using right
>> now. Of course there could be another type of bad TPM firmware out there
>> where all values are in ascending order but given in ms and cause
>> time-outs -- but I would wait for someone to point that out since I am
>> not aware of such a device.
> Note that it is in ascending order (1 2000 150000). As I wrote the first
> timeout (1) is replaced by one HZ in get_timeouts.
The forthcoming patch will simply also adapt the other 2 values and 
multiply them by 1000. The reason for the suspend failure is the 2nd 
timeout with TPM_SaveState command being of medium duration.

There will be a 2nd patch for re-enabling the TPM's interrupts that the 
BIOS may (this may be BIOS-dependent) have disabled while sending a 
command (TPM_Startup) to the TPM upon resume and having used polling 
mode and leaving it with the interrupts disabled.

I'd appreciate it if you tested both of them.

    Stefan

_______________________________________________
linux-pm mailing list
linux-pm@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/linux-pm