On Mon, Mar 19, 2018 at 10:06:20AM +0200, Mark Bloch wrote: > > > On 19/03/2018 08:39, Leon Romanovsky wrote: > > On Sun, Mar 18, 2018 at 04:38:00PM +0200, Mark Bloch wrote: > >> > >> > >> On 18/03/2018 16:00, Leon Romanovsky wrote: > >>> On Sun, Mar 18, 2018 at 01:33:31PM +0000, Mark Bloch wrote: > >>>> Today the get_hw_stats() API looks like this: > >>>> > >>>> get_hw_stats(struct ib_device *ibdev, struct rdma_hw_stats *stats, > >>>> u8 port, int index) > >>>> > >>>> The driver gets a rdma_hw_stats structure, a port and an index and is given the > >>>> possibility to: > >>>> > >>>> 1) Fill only the counter at location index > >>>> 2) Fill all the counters > >>>> > >>>> Filling the counters is done like so: > >>>> stats->value[i], 0 <= i < stats->num_counters > >>>> The stats structure is shared between all the counters. Today all the drivers > >>>> that implement get_hw_stats() always fill all the counters regardless of the > >>>> index given. > >>>> > >>>> This may lead to the following scenario if we have multiple concurrent reads > >>>> of counters, cpu 0 (tries to read counter at index 0) cpu 1 (tries to read > >>>> counter at index 1) > >>>> > >>>> CPU 0 CPU 1 > >>>> driver: stats->value[0] = x; | .... > >>>> .... | .... > >>>> .... | .... > >>>> ib_core: return stats->value[0] to user | driver: stats->value[0] = y; > >>>> > >>>> We end up with read/write to the same location at the same time. > >>>> Which can lead to providing an incorrect value to the user. This RFC introduces > >>>> a lock to protect against that. > >>>> > >>> > >>> As I said in offline discussion, it solves non-existent problem. > >>> SW (kernel) doesn't guarantee that counters are correct every time you > >>> snapshot them. All CPUs are calling to FW prior to counter update, so > >>> they will write correct data, just a little bit out-of-sync. > >> > >> You assume here the driver/compiler assigns only once to stats->value[i] (which you can't) > >> and that read/write are atomic (which you also can't). > > > > I'm not assuming any driver/compiler optimizations, but the fact that > > locking is needed to provide concurrent read/write access to shared > > variables. In case of hw_stats, every thread is providing its struct > > to be filled. Your RFC patch supports it very clearly, by calling > > with hw_stats from the stack. > > ? > hw_stats is allocated and attached to port or (and) to a device. > see: setup_hw_stats(). > The callback that is called when trying to read a counter is show_hw_stats() > it takes the hw_stats from the port (or the device) and passes it > into the driver as an argument (see get_hw_stats() callback provided by drivers). > > I don't understand the stack reference. 794 static ssize_t show_hw_stats(struct kobject *kobj, struct attribute *attr, 795 char *buf) 796 { 797 struct ib_device *dev; 798 struct ib_port *port; 799 struct hw_stats_attribute *hsa; 800 struct rdma_hw_stats *stats; 801 int ret; 802 803 hsa = container_of(attr, struct hw_stats_attribute, attr); 804 if (!hsa->port_num) { 805 dev = container_of((struct device *)kobj, 806 struct ib_device, dev); 807 stats = dev->hw_stats; 808 } else { 809 port = container_of(kobj, struct ib_port, kobj); 810 dev = port->ibdev; 811 stats = port->hw_stats; 812 } 813 ret = update_hw_stats(dev, stats, hsa->port_num, hsa->index); ^^^^^ pointer from stack 814 if (ret) 815 return ret; 816 return print_hw_stat(stats, hsa->index, buf); 817 } > > > > >> > >> You can totally get garbage values today (and not just out-of-sync) > > > > I don't see how it is possible, but would be glad to be proven wrong. > > > > Thanks > > > >> > >>> > >>> And in 64bits systems, such writes to tats->value will be atomic. > >>> > >>> Thanks > >>> > >>> > >>>> Mark Bloch (1): > >>>> IB/core: Protect against concurrent access to hardware stats > >>>> > >>>> drivers/infiniband/core/sysfs.c | 10 ++++++++-- > >>>> include/rdma/ib_verbs.h | 1 + > >>>> 2 files changed, 9 insertions(+), 2 deletions(-) > >>>> > >>>> -- > >>>> 1.8.4.3 > >>>> > >>>> -- > >>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > >>>> the body of a message to majordomo@xxxxxxxxxxxxxxx > >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> Mark > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > >> the body of a message to majordomo@xxxxxxxxxxxxxxx > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > Mark > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
signature.asc
Description: PGP signature