On Sun, Mar 18, 2018 at 04:38:00PM +0200, Mark Bloch wrote: > > > On 18/03/2018 16:00, Leon Romanovsky wrote: > > On Sun, Mar 18, 2018 at 01:33:31PM +0000, Mark Bloch wrote: > >> Today the get_hw_stats() API looks like this: > >> > >> get_hw_stats(struct ib_device *ibdev, struct rdma_hw_stats *stats, > >> u8 port, int index) > >> > >> The driver gets a rdma_hw_stats structure, a port and an index and is given the > >> possibility to: > >> > >> 1) Fill only the counter at location index > >> 2) Fill all the counters > >> > >> Filling the counters is done like so: > >> stats->value[i], 0 <= i < stats->num_counters > >> The stats structure is shared between all the counters. Today all the drivers > >> that implement get_hw_stats() always fill all the counters regardless of the > >> index given. > >> > >> This may lead to the following scenario if we have multiple concurrent reads > >> of counters, cpu 0 (tries to read counter at index 0) cpu 1 (tries to read > >> counter at index 1) > >> > >> CPU 0 CPU 1 > >> driver: stats->value[0] = x; | .... > >> .... | .... > >> .... | .... > >> ib_core: return stats->value[0] to user | driver: stats->value[0] = y; > >> > >> We end up with read/write to the same location at the same time. > >> Which can lead to providing an incorrect value to the user. This RFC introduces > >> a lock to protect against that. > >> > > > > As I said in offline discussion, it solves non-existent problem. > > SW (kernel) doesn't guarantee that counters are correct every time you > > snapshot them. All CPUs are calling to FW prior to counter update, so > > they will write correct data, just a little bit out-of-sync. > > You assume here the driver/compiler assigns only once to stats->value[i] (which you can't) > and that read/write are atomic (which you also can't). I'm not assuming any driver/compiler optimizations, but the fact that locking is needed to provide concurrent read/write access to shared variables. In case of hw_stats, every thread is providing its struct to be filled. Your RFC patch supports it very clearly, by calling with hw_stats from the stack. > > You can totally get garbage values today (and not just out-of-sync) I don't see how it is possible, but would be glad to be proven wrong. Thanks > > > > > And in 64bits systems, such writes to tats->value will be atomic. > > > > Thanks > > > > > >> Mark Bloch (1): > >> IB/core: Protect against concurrent access to hardware stats > >> > >> drivers/infiniband/core/sysfs.c | 10 ++++++++-- > >> include/rdma/ib_verbs.h | 1 + > >> 2 files changed, 9 insertions(+), 2 deletions(-) > >> > >> -- > >> 1.8.4.3 > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > Mark > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
signature.asc
Description: PGP signature