On Mon, Mar 01, 2021 at 07:33:28PM -0800, Dan Williams wrote: > On Mon, Mar 1, 2021 at 6:42 PM Dave Chinner <david@xxxxxxxxxxxxx> wrote: > [..] > > We do not need a DAX specific mechanism to tell us "DAX device > > gone", we need a generic block device interface that tells us "range > > of block device is gone". > > This is the crux of the disagreement. The block_device is going away > *and* the dax_device is going away. No, that is not the disagreement I have with what you are saying. You still haven't understand that it's even more basic and generic than devices going away. At the simplest form, all the filesystem wants is to be notified of is when *unrecoverable media errors* occur in the persistent storage that underlies the filesystem. The filesystem does not care what that media is build from - PMEM, flash, corroded spinning disks, MRAM, or any other persistent media you can think off. It just doesn't matter. What we care about is that the contents of a *specific LBA range* no longer contain *valid data*. IOWs, the data in that range of the block device has been lost, cannot be retreived and/or cannot be written to any more. PMEM taking a MCE because ECC tripped is a media error because data is lost and inaccessible until recovery actions are taken. MD RAID failing a scrub is a media error and data is lost and unrecoverable at that layer. A device disappearing is a media error because the storage media is now permanently inaccessible to the higher layers. This "media error" categorisation is a fundamental property of persistent storage and, as such, is a property of the block devices used to access said persistent storage. That's the disagreement here - that you and Christoph are saying ->corrupted_range is not a block device property because only a pmem/DAX device currently generates it. You both seem to be NACKing a generic interface because it's only implemented for the first subsystem that needs it. AFAICT, you either don't understand or are completely ignoring the architectural need for it to be provided across the rest of the storage stack that *block device based filesystems depend on*. Sure, there might be dax device based fielsystems around the corner. They just require a different pmem device ->corrupted_range callout to implement the notification - one that directs to the dax device rather than the block device. That's simple and trivial to implement, but such functionaity for DAX devices does not replace the need for the same generic functionality to be provided across a *range of different block devices* as required by *block device based filesystems*. And that's fundamentally the problem. XFS is block device based, not DAX device based. We require errors to be reported through block device mechanisms. fs-dax does not change this - it is based on pmem being presented as a primarily as a block device to the block device based filesystems and only secondarily as a dax device. Hence if it can be trivially implemented as a block device interface, that's where it should go, because then all the other block devices that the filesytem runs on can provide the same functionality for similar media error events.... > The dax_device removal implies one > set of actions (direct accessed pfns invalid) the block device removal > implies another (block layer sector access offline). There you go again, saying DAX requires an action, while the block device notification is a -state change- (i.e. goes offline). This is exactly what I said was wrong in my last email. > corrupted_range > is blurring the notification for 2 different failure domains. Look at > the nascent idea to mount a filesystem on dax sans a block device. > Look at the existing plumbing for DM to map dax_operations through a > device stack. Ummm, it just maps the direct_access call to the underlying device and calls it's ->direct_access method. All it's doing is LBA mapping. That's all it needs to do for ->corrupted_range, too. I have no clue why you think this is a problem for error notification... > Look at the pushback Ruan got for adding a new > block_device operation for corrupted_range(). one person said "no". That's hardly pushback. Especially as I think Christoph's objection about this being dax specific functionality is simply wrong, as per above. > > This is why we need to communicate what error occurred, not what > > action a device driver thinks needs to be taken. > > The driver is only an event producer in this model, whatever the > consumer does at the other end is not its concern. There may be a > generic consumer and a filesystem specific consumer. <sigh> That's why these are all ops functions that can provide multiple implementations to different device types. So that when we get a new use case, the ops function structure can be replaced with one that directs the notification to the new user instead of to the existing one. It's a design pattern we use all over the kernel code. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx