Re: dm-integrity and write reordering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Wed, 21 Aug 2024, David Chu wrote:

> From: Mikulas Patocka <mpatocka@xxxxxxxxxx>
> 
> 
> 
> 
> 
> Hi Mikulas,
> 
> On Wed, 21 Aug 2024, Mikulas Patocka wrote:
> > 'D' - do nothing - dm-integrity doesn't do anything to try to maintain 
> > data/metadata integrity - if the system crashes, the metadata may be 
> > corrupted. It may be useful for things like operating system installation, 
> > where you don't recover from a crash at all.
> 
> Thanks for the quick and detailed response! I am actually *not interested
> in crashes*, but in what happens during a normal run, when there are two data
> writes to the same sector on disk. Let's say these writes are write A and
> write B, and we are running dm-integrity in 'D' mode (so there is no journal).
> 
> dm-integrity makes sure that if the writes' sector ranges intersect, then one
> write will not be sent to disk until the other returns, like so:
> 
>   Write A and B
>     | 
>     v
>   -----------------------------------------
>   | dm-integrity                          |
>   -----------------------------------------
>     |             ^                    |
>     v Write A     | Write A end_io     v Write B
>   -----------------------------------------
>   | disk                                  |
>   -----------------------------------------
> 
> dm-integrity then stores the hash of write B.
> 
> This behavior suggests to me that dm-integrity assumes that if write A returns
> before write B is sent to disk, then write A must be written to disk *before*
> write B (or maybe write A is never written, but in any case, write B is the
> final write). Otherwise, if the disk reorders write A and write B, then there
> would be a mismatch between the hash that dm-integrity stores and the actual
> write on disk.
> 
> Is this the assumption dm-integrity is making?
> And if so, how does it square with the hardware reordering I/O requests?
> 
> Thanks,
> David

Hi

There is a red-black-tree of all in-progress I/O (see ic->in_progress) and 
when we start an I/O, we add it to the tree with "add_new_range" and when 
we end an I/O, we delete it from the tree with "remove_range_unlocked".

The tree makes sure that there are no overlapping I/Os in progress.

Regarding disk-reordering - the disk may reorder I/Os, but if there is no 
crash, the disk must appear to be coherent. Therefore, if we write A, get 
A's endio and then write B, the disk must read B from this location, it 
can't read A.

The reordering only becomes a problem, if the system crashes (in that 
case, it is unknown if the disk will read A or B after a crash). I think 
that the SCSI standard even allows reading garbage after a crash.

Mikulas





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux