Re: [PATCH v3 0/4] brd discard patches

Ming Lei <ming.lei@xxxxxxxxxx> · Tue, 23 Jan 2024 10:49:40 +0800

On Mon, Jan 22, 2024 at 05:30:07PM +0100, Mikulas Patocka wrote:
> Hi
> 
> 
> On Fri, 19 Jan 2024, Ming Lei wrote:
> 
> > Hi Mikulas,
> > 
> > On Thu, Aug 10, 2023 at 12:07:07PM +0200, Mikulas Patocka wrote:
> > > Hi
> > > 
> > > Here I'm submitting the ramdisk discard patches for the next merge window. 
> > > If you want to make some more changes, please let me now.
> > 
> > brd discard is removed in f09a06a193d9 ("brd: remove discard support")
> > in 2017 because it is just driver private write_zero, and user can get same
> > result with fallocate(FALLOC_FL_ZERO_RANGE).
> > 
> > Also you only mentioned the motivation in V1 cover-letter:
> > 
> > https://lore.kernel.org/linux-block/alpine.LRH.2.02.2209151604410.13231@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> > 
> > ```
> > Zdenek asked me to write it, because we use brd in the lvm2 testsuite and
> > it would be benefical to run the testsuite with discard enabled in order
> > to test discard handling.
> > ```
> > 
> > But we have lots of test disks with discard support: loop, scsi_debug,
> > null_blk, ublk, ..., so one requestion is that why brd discard is
> > a must for lvm2 testsuite to cover (lvm)discard handling?
> 
> We should ask Zdeněk Kabeláč about it - he is expert about the lvm2 
> testsuite.
> 
> > The reason why brd didn't support discard by freeing pages is writeback
> > deadlock risk, see:
> > 
> > commit f09a06a193d9 ("brd: remove discard support")
> > 
> > -static void discard_from_brd(struct brd_device *brd,
> > -                       sector_t sector, size_t n)
> > -{
> > -       while (n >= PAGE_SIZE) {
> > -               /*
> > -                * Don't want to actually discard pages here because
> > -                * re-allocating the pages can result in writeback
> > -                * deadlocks under heavy load.
> > -                */
> > -               if (0)
> > -                       brd_free_page(brd, sector);
> > -               else
> > -                       brd_zero_page(brd, sector);
> > -               sector += PAGE_SIZE >> SECTOR_SHIFT;
> > -               n -= PAGE_SIZE;
> > -       }
> > -}
> > 
> > However, you didn't mention how your patches address this potential
> > risk, care to document it? I can't find any related words about
> > this problem.
> 
> The writeback deadlock can happen even without discard - if the machine 
> runs out of memory while writing data to a ramdisk. But the probability is 
> increased when discard is used, because pages are freed and re-allocated 
> more often.

Yeah, I agree, what I meant is that this thing needs to be documented,
given discard is re-introduced, and the original deadlock comment isn't
addressed

> 
> Generally, the admin should make sure that the machine has enough 
> available memory when creating a ramdisk - then, the deadlock can't 
> happen.
> 
> Ramdisk has no limit on the number of allocated pages, so when it runs out 
> of memory, the oom killer will try to kill unrelated processes and the 
> machine will hang. If there is risk of overflowing the available memory, 
> the admin should use tmpfs instead of a ramdisk - tmpfs can be configured 
> with a limit and it can also swap out pages.
> 
> > BTW, your patches looks more complicated than the original removed
> > discard implementation. And if the above questions get addressed,
> > I am happy to provide review on the following patches.
> 
> My patches actually free the discarded pages. The original discard 
> implementation just overwrote the pages with zeroes without freeing them.

The original implementation supports to discard by freeing pages, and
it is just bypassed unconditionally by:

               if (0)
                       brd_free_page(brd, sector);
               else
                       brd_zero_page(brd, sector);

However, page could be freed by discard when it is being consumed in brd_do_bvec().

Maybe your patch of "brd: extend the rcu regions to cover read and write"
can be simplified a bit, such as:

- grab rcu read lock in brd_do_bvec()
- release the rcu read lock when allocating page via alloc_page() in
  brd_insert_page()
- change free page by rcu

Or avoid it by holding page reference:

- grabbing page reference in brd_lookup_page() if it is called from
copy_to_brd() or copy_from_brd(), and drop it after it is consumed

Thanks,
Ming