On Tue, 17 Dec 2019 09:30:32 +0100 SeongJae Park <sjpark@xxxxxxxxxx> wrote: > On Tue, 17 Dec 2019 09:16:47 +0100 "Jürgen Groß" <jgross@xxxxxxxx> wrote: > > > On 17.12.19 08:59, SeongJae Park wrote: > > > On Tue, 17 Dec 2019 07:23:12 +0100 "Jürgen Groß" <jgross@xxxxxxxx> wrote: > > > > > >> On 16.12.19 20:48, SeongJae Park wrote: > > >>> On on, 16 Dec 2019 17:23:44 +0100, Jürgen Groß wrote: > > >>> > > >>>> On 16.12.19 17:15, SeongJae Park wrote: > > >>>>> On Mon, 16 Dec 2019 15:37:20 +0100 SeongJae Park <sjpark@xxxxxxxxxx> wrote: > > >>>>> > > >>>>>> On Mon, 16 Dec 2019 13:45:25 +0100 SeongJae Park <sjpark@xxxxxxxxxx> wrote: > > >>>>>> > > >>>>>>> From: SeongJae Park <sjpark@xxxxxxxxx> > > >>>>>>> > > >>>>> [...] > > >>>>>>> --- a/drivers/block/xen-blkback/xenbus.c > > >>>>>>> +++ b/drivers/block/xen-blkback/xenbus.c > > >>>>>>> @@ -824,6 +824,24 @@ static void frontend_changed(struct xenbus_device *dev, > > >>>>>>> } > > >>>>>>> > > >>>>>>> > > >>>>>>> +/* Once a memory pressure is detected, squeeze free page pools for a while. */ > > >>>>>>> +static unsigned int buffer_squeeze_duration_ms = 10; > > >>>>>>> +module_param_named(buffer_squeeze_duration_ms, > > >>>>>>> + buffer_squeeze_duration_ms, int, 0644); > > >>>>>>> +MODULE_PARM_DESC(buffer_squeeze_duration_ms, > > >>>>>>> +"Duration in ms to squeeze pages buffer when a memory pressure is detected"); > > >>>>>>> + > > >>>>>>> +/* > > >>>>>>> + * Callback received when the memory pressure is detected. > > >>>>>>> + */ > > >>>>>>> +static void reclaim_memory(struct xenbus_device *dev) > > >>>>>>> +{ > > >>>>>>> + struct backend_info *be = dev_get_drvdata(&dev->dev); > > >>>>>>> + > > >>>>>>> + be->blkif->buffer_squeeze_end = jiffies + > > >>>>>>> + msecs_to_jiffies(buffer_squeeze_duration_ms); > > >>>>>> > > >>>>>> This callback might race with 'xen_blkbk_probe()'. The race could result in > > >>>>>> __NULL dereferencing__, as 'xen_blkbk_probe()' sets '->blkif' after it links > > >>>>>> 'be' to the 'dev'. Please _don't merge_ this patch now! > > >>>>>> > > >>>>>> I will do more test and share results. Meanwhile, if you have any opinion, > > >>>>>> please let me know. > > >>> > > >>> I reduced system memory and attached bunch of devices in short time so that > > >>> memory pressure occurs while device attachments are ongoing. Under this > > >>> circumstance, I was able to see the race. > > >>> > > >>>>> > > >>>>> Not only '->blkif', but 'be' itself also coule be a NULL. As similar > > >>>>> concurrency issues could be in other drivers in their way, I suggest to change > > >>>>> the reclaim callback ('->reclaim_memory') to be called for each driver instead > > >>>>> of each device. Then, each driver could be able to deal with its concurrency > > >>>>> issues by itself. > > >>>> > > >>>> Hmm, I don't like that. This would need to be changed back in case we > > >>>> add per-guest quota. > > >>> > > >>> Extending this callback in that way would be still not too hard. We could use > > >>> the argument to the callback. I would keep the argument of the callback to > > >>> 'struct device *' as is, and will add a comment saying 'NULL' value of the > > >>> argument means every devices. As an example, xenbus would pass NULL-ending > > >>> array of the device pointers that need to free its resources. > > >>> > > >>> After seeing this race, I am now also thinking it could be better to delegate > > >>> detailed control of each device to its driver, as some drivers have some > > >>> complicated and unique relation with its devices. > > >>> > > >>>> > > >>>> Wouldn't a get_device() before calling the callback and a put_device() > > >>>> afterwards avoid that problem? > > >>> > > >>> I didn't used the reference count manipulation operations because other similar > > >>> parts also didn't. But, if there is no implicit reference count guarantee, it > > >>> seems those operations are indeed necessary. > > >>> > > >>> That said, as get/put operations only adjust the reference count, those will > > >>> not make the callback to wait until the linking of the 'backend' and 'blkif' to > > >>> the device (xen_blkbk_probe()) is finished. Thus, the race could still happen. > > >>> Or, am I missing something? > > >> > > >> No, I think we need a xenbus lock per device which will need to be > > >> taken in xen_blkbk_probe(), xenbus_dev_remove() and while calling the > > >> callback. > > > > > > I also agree that locking should be used at last. But, as each driver manages > > > its devices and resources in their way, it could have its unique race > > > conditions. And, each unique race condition might have its unique efficient > > > way to synchronize it. Therefore, I think the synchronization should be done > > > by each driver, not by xenbus and thus we should make the callback to be called > > > per-driver. > > > > xenbus controls creation and removing of devices, so applying locking > > at xenbus level is the right thing to do in order to avoid races with > > device removal. > > > > In case a backend has further synchronization requirements those have to > > be handled at backend level, of course. > > > > In the end you'll need the xenbus level locking anyway in order to avoid > > a race when the last backend specific device is just being removed when > > the callback is about to be called for that device. Or you'd need to > > call try_get_module() before calling into each backend... > > Agreed. Thank you for your kind explanation of your concerns. Just posted the v11 patchset[1], which is based on your idea and passed my test. [1] https://lore.kernel.org/xen-devel/20191217160748.693-1-sjpark@xxxxxxxxxx/ Thanks, SeongJae Park > > > Thanks, > SeongJae Park > > > > > > > Juergen