Re: [RFC 08/10] platform/x86/intel/ifs: Add IFS sysfs interface

"Joseph, Jithu" <jithu.joseph@xxxxxxxxx> · Mon, 7 Mar 2022 11:55:26 -0800

On 3/7/2022 11:15 AM, Dan Williams wrote:
> On Mon, Mar 7, 2022 at 11:09 AM Joseph, Jithu <jithu.joseph@xxxxxxxxx> wrote:
>>
>>
>>
>> On 3/7/2022 9:38 AM, Dan Williams wrote:
>>> On Fri, Mar 4, 2022 at 12:42 PM Joseph, Jithu <jithu.joseph@xxxxxxxxx> wrote:
>>>>
>>>>
>>>>
>>
>>>>>> + */
>>>>>> +static ssize_t details_show(struct device *dev,
>>>>>> +                           struct device_attribute *attr,
>>>>>> +                           char *buf)
>>>>>> +{
>>>>>> +       unsigned int cpu = dev->id;
>>>>>> +       int ret;
>>>>>> +
>>>>>> +       if (down_trylock(&ifs_sem))
>>>>>> +               return -EBUSY;
>>>>>
>>>>> What is the ifs_sem protecting? This result is immediately invalid
>>>>> after the lock is dropped anyway, so why hold it over reading the
>>>>> value? You can't prevent 2 threads racing each other here.
>>>>
>>>> percpu thread running scan_test_worker() will update per_cpu(ifs_state, cpu).scan_details. (before signalling this thread to run, this lock would be acquired)
>>>> This is to protect against the scenario where if the percpu thread is running a test and if at the same time a user is querying its status, they would see busy.
>>>
>>> That begs the question why would userspace be polling this file? Is it
>>> because it does not know when a test completes otherwise? How does it
>>> know that the result it is seeing is from the test it ran and not some
>>> other invocation to start a new test?
>>
>> I think I should have explained the need for locking at a higher level .
>> It is to make sure that only one of the below action happens at any time
>>
>> 1. single percpu test
>> 2. all-cpu test (tests all cores sequentially)
>> 3. scan binary reload
>> 4. querying the status
>>
>> (There are h/w reasons for why we restrict to a  single IFS test at any time)
>> If 4 happens while 1 or 2 is in progress , we return busy.  My earlier explanation was trying to explain why we are preventing 4 when 1 or 2   is in progress.
>>
>> As to the question of why would a user do 4 while 1 or 2 is in progress, there is no reason for him to do that, both the run_test (percpu and global) are blocking.
>> But if he issues a test from one shell and uses another shell to query the status  (while it is still in progress) he will see busy.
> 
> ...and what about the race that the semaphore cannot solve of test run
> launch racing collection of previous run results?

pardon me if I am missing something obvious here … but everybody (the 4 scenarios listed above) tries to acquire the same semaphore, or returns -EBUSY if try_lock() fails.
So in case of triggering "run_test" and viewing "status" racing scenario you mention, the below are the 2 interleaving I see

if "echo 1 >  /sys/devices/sysem/cpu/cpu10/ifs/run_test" gets the lock , the parallel  "cat  /sys/devices/sysem/cpu/cpu10/ifs/status" will return busy (and not the previous status)
if  "cat  /sys/devices/sysem/cpu/cpu10/ifs/status", gets  the lock it will display the status of the last test result and  the parallel  "echo 1 >  /sys/devices/sysem/cpu/cpu10/ifs/run_test" will fail with busy

This I believe is consistent behavior.