---- Original Message ---- Subject: [LSF/MM TOPIC] Badblocks checking/representation in filesystems Sent: Jan 13, 2017 1:40 PM From: "Verma, Vishal L" <vishal.l.verma@xxxxxxxxx> To: lsf-pc@xxxxxxxxxxxxxxxxxxxxxxxxxx Cc: linux-nvdimm@xxxxxxxxxxxx, linux-block@xxxxxxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx > The current implementation of badblocks, where we consult the badblocks > list for every IO in the block driver works, and is a last option > failsafe, but from a user perspective, it isn't the easiest interface to > work with. As I remember, FAT and HFS+ specifications contain description of bad blocks (physical sectors) table. I believe that this table was used for the case of floppy media. But, finally, this table becomes to be the completely obsolete artefact because mostly storage devices are reliably enough. Why do you need in exposing the bad blocks on the file system level? Do you expect that next generation of NVM memory will be so unreliable that file system needs to manage bad blocks? What's about erasure coding schemes? Do file system really need to suffer from the bad block issue? Usually, we are using LBAs and it is the responsibility of storage device to map a bad physical block/page/sector into valid one. Do you mean that we have access to physical NVM memory address directly? But it looks like that we can have a "bad block" issue even we will access data into page cache's memory page (if we will use NVM memory for page cache, of course). So, what do you imply by "bad block" issue? > > A while back, Dave Chinner had suggested a move towards smarter > handling, and I posted initial RFC patches [1], but since then the topic > hasn't really moved forward. > > I'd like to propose and have a discussion about the following new > functionality: > > 1. Filesystems develop a native representation of badblocks. For > example, in xfs, this would (presumably) be linked to the reverse > mapping btree. The filesystem representation has the potential to be > more efficient than the block driver doing the check, as the fs can > check the IO happening on a file against just that file's range. What do you mean by "file system can check the IO happening on a file"? Do you mean read or write operation? What's about metadata? If we are talking about the discovering a bad block on read operation then rare modern file system is able to survive as for the case of metadata as for the case of user data. Let's imagine that we have really mature file system driver then what does it mean to encounter a bad block? The failure to read a logical block of some metadata (bad block) means that we are unable to extract some part of a metadata structure. From file system driver point of view, it looks like that our file system is corrupted, we need to stop the file system operations and, finally, to check and recover file system volume by means of fsck tool. If we find a bad block for some user file then, again, it looks like an issue. Some file systems simply return "unrecovered read error". Another one, theoretically, is able to survive because of snapshots, for example. But, anyway, it will look like as Read-Only mount state and the user will need to resolve such trouble by hands. If we are talking about discovering a bad block during write operation then, again, we are in trouble. Usually, we are using asynchronous model of write/flush operation. We are preparing the consistent state of all our metadata structures in the memory, at first. The flush operations for metadata and user data can be done in different times. And what should be done if we discover bad block for any piece of metadata or user data? Simple tracking of bad blocks is not enough at all. Let's consider user data, at first. If we cannot write some file's block successfully then we have two ways: (1) forget about this piece of data; (2) try to change the associated LBA for this piece of data. The operation of re-allocation LBA number for discovered bad block (user data case) sounds as real pain. Because you need to rebuild the metadata that track the location of this part of file. And it sounds as practically impossible operation, for the case of LFS file system, for example. If we have trouble with flushing any part of metadata then it sounds as complete disaster for any file system. Are you really sure that file system should process bad block issue? >In contrast, today, the block driver checks against the whole block device > range for every IO. On encountering badblocks, the filesystem can > generate a better notification/error message that points the user to > (file, offset) as opposed to the block driver, which can only provide > (block-device, sector). > > 2. The block layer adds a notifier to badblock addition/removal > operations, which the filesystem subscribes to, and uses to maintain its > badblocks accounting. (This part is implemented as a proof of concept in > the RFC mentioned above [1]). I am not sure that any bad block notification during/after IO operation is valuable for file system. Maybe, it could help if file system simply will know about bad block beforehand the operation of logical block allocation. But what subsystem will discover bad blocks before any IO operations? How file system will receive information or some bad block table? I am not convinced that suggested badblocks approach is really feasible. Also I am not sure that file system should see the bad blocks at all. Why hardware cannot manage this issue for us? Thanks, Vyacheslav Dubeyko. ��.n��������+%������w��{.n�����{���)��jg��������ݢj����G�������j:+v���w�m������w�������h�����٥