On Mon, Mar 1, 2021 at 9:38 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > On Mon, Mar 01, 2021 at 07:33:28PM -0800, Dan Williams wrote: > > On Mon, Mar 1, 2021 at 6:42 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > [..] > > > We do not need a DAX specific mechanism to tell us "DAX device > > > gone", we need a generic block device interface that tells us "range > > > of block device is gone". > > > > This is the crux of the disagreement. The block_device is going away > > *and* the dax_device is going away. > > No, that is not the disagreement I have with what you are saying. > You still haven't understand that it's even more basic and generic > than devices going away. At the simplest form, all the filesystem > wants is to be notified of is when *unrecoverable media errors* > occur in the persistent storage that underlies the filesystem. > > The filesystem does not care what that media is build from - PMEM, > flash, corroded spinning disks, MRAM, or any other persistent media > you can think off. It just doesn't matter. > > What we care about is that the contents of a *specific LBA range* no > longer contain *valid data*. IOWs, the data in that range of the > block device has been lost, cannot be retreived and/or cannot be > written to any more. > > PMEM taking a MCE because ECC tripped is a media error because data > is lost and inaccessible until recovery actions are taken. > > MD RAID failing a scrub is a media error and data is lost and > unrecoverable at that layer. > > A device disappearing is a media error because the storage media is > now permanently inaccessible to the higher layers. > > This "media error" categorisation is a fundamental property of > persistent storage and, as such, is a property of the block devices > used to access said persistent storage. > > That's the disagreement here - that you and Christoph are saying > ->corrupted_range is not a block device property because only a > pmem/DAX device currently generates it. > > You both seem to be NACKing a generic interface because it's only > implemented for the first subsystem that needs it. AFAICT, you > either don't understand or are completely ignoring the architectural > need for it to be provided across the rest of the storage stack that > *block device based filesystems depend on*. No I'm NAKing it because it's the wrong interface. See my 'struct badblocks' argument in the reply to Darrick. That 'struct badblocks' infrastructure arose from MD and is shared with PMEM. > > Sure, there might be dax device based fielsystems around the corner. > They just require a different pmem device ->corrupted_range callout > to implement the notification - one that directs to the dax device > rather than the block device. That's simple and trivial to > implement, but such functionaity for DAX devices does not replace > the need for the same generic functionality to be provided across a > *range of different block devices* as required by *block device > based filesystems*. > > And that's fundamentally the problem. XFS is block device based, not > DAX device based. We require errors to be reported through block > device mechanisms. fs-dax does not change this - it is based on pmem > being presented as a primarily as a block device to the block device > based filesystems and only secondarily as a dax device. Hence if it > can be trivially implemented as a block device interface, that's > where it should go, because then all the other block devices that > the filesytem runs on can provide the same functionality for similar > media error events.... Sure, use 'struct badblocks' not struct block_device and block_device_operations. > > > The dax_device removal implies one > > set of actions (direct accessed pfns invalid) the block device removal > > implies another (block layer sector access offline). > > There you go again, saying DAX requires an action, while the block > device notification is a -state change- (i.e. goes offline). There you go reacting to the least generous interpretation of what I said. s/pfns invalid/pfns offline/ > > This is exactly what I said was wrong in my last email. > > > corrupted_range > > is blurring the notification for 2 different failure domains. Look at > > the nascent idea to mount a filesystem on dax sans a block device. > > Look at the existing plumbing for DM to map dax_operations through a > > device stack. > > Ummm, it just maps the direct_access call to the underlying device > and calls it's ->direct_access method. All it's doing is LBA > mapping. That's all it needs to do for ->corrupted_range, too. > I have no clue why you think this is a problem for error > notification... > > > Look at the pushback Ruan got for adding a new > > block_device operation for corrupted_range(). > > one person said "no". That's hardly pushback. Especially as I think > Christoph's objection about this being dax specific functionality > is simply wrong, as per above. It's not wrong when we have a perfectly suitable object for sector based error notification and when we're trying to disentangle 'struct block_device' from 'struct dax_device'.