Re: [RFC rdma-next 0/1] Lock hardware stats

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 19/03/2018 08:39, Leon Romanovsky wrote:
> On Sun, Mar 18, 2018 at 04:38:00PM +0200, Mark Bloch wrote:
>>
>>
>> On 18/03/2018 16:00, Leon Romanovsky wrote:
>>> On Sun, Mar 18, 2018 at 01:33:31PM +0000, Mark Bloch wrote:
>>>> Today the get_hw_stats() API looks like this:
>>>>
>>>> get_hw_stats(struct ib_device *ibdev, struct rdma_hw_stats *stats,
>>>>              u8 port, int index)
>>>>
>>>> The driver gets a rdma_hw_stats structure, a port and an index and is given the
>>>> possibility to:
>>>>
>>>> 1) Fill only the counter at location index
>>>> 2) Fill all the counters
>>>>
>>>> Filling the counters is done like so:
>>>> 	stats->value[i], 0 <= i < stats->num_counters
>>>> The stats structure is shared between all the counters. Today all the drivers
>>>> that implement get_hw_stats() always fill all the counters regardless of the
>>>> index given.
>>>>
>>>> This may lead to the following scenario if we have multiple concurrent reads
>>>> of counters, cpu 0 (tries to read counter at index 0) cpu 1 (tries to read
>>>> counter at index 1)
>>>>
>>>>            CPU 0                                CPU 1
>>>> driver:  stats->value[0] = x;           |       ....
>>>>              ....                       |       ....
>>>>              ....                       |       ....
>>>> ib_core: return stats->value[0] to user | driver: stats->value[0] = y;
>>>>
>>>> We end up with read/write to the same location at the same time.
>>>> Which can lead to providing an incorrect value to the user. This RFC introduces
>>>> a lock to protect against that.
>>>>
>>>
>>> As I said in offline discussion, it solves non-existent problem.
>>> SW (kernel) doesn't guarantee that counters are correct every time you
>>> snapshot them. All CPUs are calling to FW prior to counter update, so
>>> they will write correct data, just a little bit out-of-sync.
>>
>> You assume here the driver/compiler assigns only once to stats->value[i] (which you can't)
>> and that read/write are atomic (which you also can't).
> 
> I'm not assuming any driver/compiler optimizations, but the fact that
> locking is needed to provide concurrent read/write access to shared
> variables. In case of hw_stats, every thread is providing its struct
> to be filled. Your RFC patch supports it very clearly, by calling
> with hw_stats from the stack.

?
hw_stats is allocated and attached to port or (and) to a device.
see: setup_hw_stats().
The callback that is called when trying to read a counter is show_hw_stats()
it takes the hw_stats from the port (or the device) and passes it
into the driver as an argument (see get_hw_stats() callback provided by drivers).

I don't understand the stack reference.

> 
>>
>> You can totally get garbage values today (and not just out-of-sync)
> 
> I don't see how it is possible, but would be glad to be proven wrong.
> 
> Thanks
> 
>>
>>>
>>> And in 64bits systems, such writes to tats->value will be atomic.
>>>
>>> Thanks
>>>
>>>
>>>> Mark Bloch (1):
>>>>   IB/core: Protect against concurrent access to hardware stats
>>>>
>>>>  drivers/infiniband/core/sysfs.c | 10 ++++++++--
>>>>  include/rdma/ib_verbs.h         |  1 +
>>>>  2 files changed, 9 insertions(+), 2 deletions(-)
>>>>
>>>> --
>>>> 1.8.4.3
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> Mark
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux