On Tue, Apr 26, 2016 at 09:25:52AM +1000, Dave Chinner wrote: > On Mon, Apr 25, 2016 at 05:14:36PM +0000, Verma, Vishal L wrote: > > On Mon, 2016-04-25 at 01:31 -0700, hch@xxxxxxxxxxxxx wrote: > > > On Sat, Apr 23, 2016 at 06:08:37PM +0000, Verma, Vishal L wrote: > > > > > > > > direct_IO might fail with -EINVAL due to misalignment, or -ENOMEM > > > > due > > > > to some allocation failing, and I thought we should return the > > > > original > > > > -EIO in such cases so that the application doesn't lose the > > > > information > > > > that the bad block is actually causing the error. > > > EINVAL is a concern here. Not due to the right error reported, but > > > because it means your current scheme is fundamentally broken - we > > > need to support I/O at any alignment for DAX I/O, and not fail due to > > > alignbment concernes for a highly specific degraded case. > > > > > > I think this whole series need to go back to the drawing board as I > > > don't think it can actually rely on using direct I/O as the EIO > > > fallback. > > > > > Agreed that DAX I/O can happen with any size/alignment, but how else do > > we send an IO through the driver without alignment restrictions? Also, > > the granularity at which we store badblocks is 512B sectors, so it > > seems natural that to clear such a sector, you'd expect to send a write > > to the whole sector. > > > > The expected usage flow is: > > > > - Application hits EIO doing dax_IO or load/store io > > > > - It checks badblocks and discovers it's files have lost data > > Lots of hand-waving here. How does the application map a bad > "sector" to a file without scanning the entire filesystem to find > the owner of the bad sector? FWIW there was some discussion @ LSF about using (XFS) rmap to figure out which parts of a file (on XFS) have gone bad. Chris Mason said that he'd like to collaborate on having a common getfsmap ioctl between btrfs and XFS since they have a backref index that could be hooked up to it for them. Obviously the app still has to coordinate stopping file IO and calling GETFSMAP since the fs won't do that on its own. There's also the question of how to handle LBA translation if there's other stuff like dm in the way. I don't think device-mapper or md do reverse mapping, so things get murky from here. Guess I should get on pushing out a getfsmap patch for review. :) --D (/me doesn't have answers to any of your other questions.) > > - It write()s those sectors (possibly converted to file offsets using > > fiemap) > > * This triggers the fallback path, but if the application is doing > > this level of recovery, it will know the sector is bad, and write the > > entire sector > > Where does the application find the data that was lost to be able to > rewrite it? > > > - Or it replaces the entire file from backup also using write() (not > > mmap+stores) > > * This just frees the fs block, and the next time the block is > > reallocated by the fs, it will likely be zeroed first, and that will be > > done through the driver and will clear errors > > There's an implicit assumption that applications will keep redundant > copies of their data at the /application layer/ and be able to > automatically repair it? And then there's the implicit assumption > that it will unlink and free the entire file before writing a new > copy, and that then assumes the the filesystem will zero blocks if > they get reused to clear errors on that LBA sector mapping before > they are accessible again to userspace.. > > It seems to me that there are a number of assumptions being made > across multiple layers here. Maybe I've missed something - can you > point me to the design/architecture description so I can see how > "app does data recovery itself" dance is supposed to work? > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html