Re: [PATCH V2 2/4] zram: don't fail to remove zram during unloading module

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Oct 20, 2021 at 09:55:46AM +0800, Ming Lei wrote:
> When zram module is started to be unloaded, no one should use all zram
> disks at that time. But disk's show() or store() attribute method may be
> running, this way is expected because device_del()(called from
> del_gendisk) will drain all pending show()/store().

How about:

----------
When the zram module is being unloaded, no one should be using the
zram disks. However even while being unloaded the zram module's
sysfs attributes might be poked at to re-configure the zram module.
This is expected, and kernfs ensures that these operations complete
before device_del() completes.
----------

> But reset_store() may set ->claim which will fail zram_remove(), when
> this happens, the zram device will be leaked and the warning of 'Error:
> Removing state 63 which has instances left.' is triggered during
> unloading module.

Note: the "Removing state 63 which has instances left" does not
necessarily mean the struct zram is leaked. It just means that the per
cpu struct zcomp instances are leaked, given the CPU hotplug removal
callback is in charge of cleaning them up. That this gave us a hint that
we do in fact leak a struct zram device as well is separate from what
the message means, but I do agree it should be *that* because as you
noted, LTP does not yet try to make spaghetti with hot_add_show().

So, how about:

----------
When the reset sysfs op (reset_store()) gets called the zram->claim will
be set, and this prevents zram_remove() from removing a zram device. If one
is unloading the module and one races to run the reset sysfs op we end
up leaking the struct zram device. We learned about this issue through
the error "Error: Removing state 63 which has instances left". While
this just means the any of the per cpu struct zcomp instances are
leaked when we're removing the CPU hotplug multistate callback, the
test case (running LTP zram02.sh twice in different terminals) that
allows us to reproduce this issue only mucks with reseting the devices,
not hot_add_show(). Setting zram->claim will prevent zram_remove() from
completing successfully, and so on module removal zram_remove_cb() does
not tell us when it failed to remove the full struct zram device and
it leaks.
----------

This begs the question, should we not then also make zram_remove_cb()
chatty on failure?

> Fixes the issue by not failing zram_remove() if ->claim is set, and
> we actually need to do nothing in case that zram_reset() is running
> since del_gendisk() will wait until zram_reset() is done.

Sure this all looks like a reasonable alternative to the issue without
using a generic lock. It does beg the questions if this suffices for
the other oddball sysfs ops, but we can take that up with time.

> Reported-by: Luis Chamberlain <mcgrof@xxxxxxxxxx>
> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>

In so far as the code is concerned:

Reviewed-by: Luis Chamberlain <mcgrof@xxxxxxxxxx>

  Luis



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux