Re: [PATCH] Revert "block: Do not discard buffers under a mounted filesystem"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue 16-02-21 23:05:57, Damien Le Moal wrote:
> On 2021/02/17 2:51, Keith Busch wrote:
> > On Tue, Feb 16, 2021 at 04:36:06PM +0000, Christoph Hellwig wrote:
> >> On Tue, Feb 16, 2021 at 02:38:49PM +0100, Jan Kara wrote:
> >>> Apparently there are several userspace programs that depend on being
> >>> able to call BLKDISCARD ioctl without the ability to grab bdev
> >>> exclusively - namely FUSE filesystems have the device open without
> >>> O_EXCL (the kernel has the bdev open with O_EXCL) so the commit breaks
> >>> fstrim(8) for such filesystems. Also LVM when shrinking LV opens PV and
> >>> discards ranges released from LV but that PV may be already open
> >>> exclusively by someone else (see bugzilla link below for more details).
> >>>
> >>> This reverts commit 384d87ef2c954fc58e6c5fd8253e4a1984f5fe02.
> >>
> >> I think that is a bad idea. We fixed the problem for a reason.
> >> I think the right fix is to just do nothing if the device hasn't been
> >> opened with O_EXCL and can't be reopened with it, just don't do anything
> >> but also don't return an error.  After all discard and thus
> >> BLKDISCARD is purely advisory.
> > 
> > A discard is advisory, but BLKZEROOUT is not, so something different
> > should happen there. We were also planning to send a patch using this
> > same pattern for Zone Reset to fix stale page cache issues after the
> > reset, but we'll wait to see how this settles before sending that.
> 
> There is also another problem: the truncate_bdev & operation following it
> (discard, zeroout or zone reset) are not atomic vs read/write operations to the
> bdev. Without mutual exclusion, that page invalidation is best effort only since
> reads can snick in between the truncate and discard (or zeroout or zone reset).
> With our zone reset stale page problem case, it is reads from udevd that we see
> snicking in between the truncate bdev and zone reset and so we still get stale
> pages after the zone reset is finished. No solution to propose for solving that,
> yet...

Well, at least blkdev_fallocate() does:

	truncate_bdev_range();
	blkdev_issue_zeroout();
	invalidate_inode_pages2_range();

so racing reads should not result in stale page cache contents AFAICT.

									Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux