> > But with RAID (since 2.6.13), it can produce corruption because when the > > buffer is modified while being written, different versions of data can be > > written to devices in the RAID array. For example: > > > > 1. pdflush turns off a dirty bit on Ext2 bitmap buffer and starts writing > > the buffer to RAID-1 > > 2. the kernel allocates some blocks in that Ext2 bitmap. One of RAID-1 > > devices writes new data, the other one gets old data. > > 3. The kernel turns on the buffer dirty bit, so this buffer is scheduled for > > next write. > > 4. RAID-1 subsystem sees that both writes finished, it thinks that this > > region is in-sync, turns off its dirty bit in its region bitmap and writes > > the bitmap to disk. > > > Would this help: > RAID-1 sees that both writes finished. It checks the dirty bits on all > relevant buffers/pages. If none got re-dirtied, then it is ok to > turn off the dirty bit in the region bitmap and write that. Otherwise, it is > not! > > Or is such a check too time-consuming? That is impossible. The page cache can answer questions like "where is page 0x1234 from inode 0x5678 located on disk?" But it can't answer the reverse question: "which inode and which page is using disk block 0x12345678?" Furthermore, with device mapper you can stack several mapping tables each on other --- and again --- device mapper can't solve the reverse problem it can't tell you which filesystem is using block X. Mikulas > Helge Hafting -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel