On Wed, 21 Aug 2024, David Chu wrote: > From: Mikulas Patocka <mpatocka@xxxxxxxxxx> > > > > > > Hi Mikulas, > > On Wed, 21 Aug 2024, Mikulas Patocka wrote: > > 'D' - do nothing - dm-integrity doesn't do anything to try to maintain > > data/metadata integrity - if the system crashes, the metadata may be > > corrupted. It may be useful for things like operating system installation, > > where you don't recover from a crash at all. > > Thanks for the quick and detailed response! I am actually *not interested > in crashes*, but in what happens during a normal run, when there are two data > writes to the same sector on disk. Let's say these writes are write A and > write B, and we are running dm-integrity in 'D' mode (so there is no journal). > > dm-integrity makes sure that if the writes' sector ranges intersect, then one > write will not be sent to disk until the other returns, like so: > > Write A and B > | > v > ----------------------------------------- > | dm-integrity | > ----------------------------------------- > | ^ | > v Write A | Write A end_io v Write B > ----------------------------------------- > | disk | > ----------------------------------------- > > dm-integrity then stores the hash of write B. > > This behavior suggests to me that dm-integrity assumes that if write A returns > before write B is sent to disk, then write A must be written to disk *before* > write B (or maybe write A is never written, but in any case, write B is the > final write). Otherwise, if the disk reorders write A and write B, then there > would be a mismatch between the hash that dm-integrity stores and the actual > write on disk. > > Is this the assumption dm-integrity is making? > And if so, how does it square with the hardware reordering I/O requests? > > Thanks, > David Hi There is a red-black-tree of all in-progress I/O (see ic->in_progress) and when we start an I/O, we add it to the tree with "add_new_range" and when we end an I/O, we delete it from the tree with "remove_range_unlocked". The tree makes sure that there are no overlapping I/Os in progress. Regarding disk-reordering - the disk may reorder I/Os, but if there is no crash, the disk must appear to be coherent. Therefore, if we write A, get A's endio and then write B, the disk must read B from this location, it can't read A. The reordering only becomes a problem, if the system crashes (in that case, it is unknown if the disk will read A or B after a crash). I think that the SCSI standard even allows reading garbage after a crash. Mikulas