On 19/03/2018 08:39, Leon Romanovsky wrote: > On Sun, Mar 18, 2018 at 04:38:00PM +0200, Mark Bloch wrote: >> >> >> On 18/03/2018 16:00, Leon Romanovsky wrote: >>> On Sun, Mar 18, 2018 at 01:33:31PM +0000, Mark Bloch wrote: >>>> Today the get_hw_stats() API looks like this: >>>> >>>> get_hw_stats(struct ib_device *ibdev, struct rdma_hw_stats *stats, >>>> u8 port, int index) >>>> >>>> The driver gets a rdma_hw_stats structure, a port and an index and is given the >>>> possibility to: >>>> >>>> 1) Fill only the counter at location index >>>> 2) Fill all the counters >>>> >>>> Filling the counters is done like so: >>>> stats->value[i], 0 <= i < stats->num_counters >>>> The stats structure is shared between all the counters. Today all the drivers >>>> that implement get_hw_stats() always fill all the counters regardless of the >>>> index given. >>>> >>>> This may lead to the following scenario if we have multiple concurrent reads >>>> of counters, cpu 0 (tries to read counter at index 0) cpu 1 (tries to read >>>> counter at index 1) >>>> >>>> CPU 0 CPU 1 >>>> driver: stats->value[0] = x; | .... >>>> .... | .... >>>> .... | .... >>>> ib_core: return stats->value[0] to user | driver: stats->value[0] = y; >>>> >>>> We end up with read/write to the same location at the same time. >>>> Which can lead to providing an incorrect value to the user. This RFC introduces >>>> a lock to protect against that. >>>> >>> >>> As I said in offline discussion, it solves non-existent problem. >>> SW (kernel) doesn't guarantee that counters are correct every time you >>> snapshot them. All CPUs are calling to FW prior to counter update, so >>> they will write correct data, just a little bit out-of-sync. >> >> You assume here the driver/compiler assigns only once to stats->value[i] (which you can't) >> and that read/write are atomic (which you also can't). > > I'm not assuming any driver/compiler optimizations, but the fact that > locking is needed to provide concurrent read/write access to shared > variables. In case of hw_stats, every thread is providing its struct > to be filled. Your RFC patch supports it very clearly, by calling > with hw_stats from the stack. ? hw_stats is allocated and attached to port or (and) to a device. see: setup_hw_stats(). The callback that is called when trying to read a counter is show_hw_stats() it takes the hw_stats from the port (or the device) and passes it into the driver as an argument (see get_hw_stats() callback provided by drivers). I don't understand the stack reference. > >> >> You can totally get garbage values today (and not just out-of-sync) > > I don't see how it is possible, but would be glad to be proven wrong. > > Thanks > >> >>> >>> And in 64bits systems, such writes to tats->value will be atomic. >>> >>> Thanks >>> >>> >>>> Mark Bloch (1): >>>> IB/core: Protect against concurrent access to hardware stats >>>> >>>> drivers/infiniband/core/sysfs.c | 10 ++++++++-- >>>> include/rdma/ib_verbs.h | 1 + >>>> 2 files changed, 9 insertions(+), 2 deletions(-) >>>> >>>> -- >>>> 1.8.4.3 >>>> >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Mark >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html Mark -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html