On 7/13/22 13:56, Christoph Hellwig wrote: > > On Fri, Jul 08, 2022 at 12:45:33PM +0200, Sergei Shtepa wrote: >> 1. Work at the partition or disk level? >> At the user level, programs operate with block devices. >> In fact, the "disk" entity makes sense only for the kernel level. >> When the user chooses which block devices to backup and which not, >> he operates with mounting points, which are converted into block >> devices, partitions. Therefore, it is better to handle bio before >> remapping to disk. >> If the filtering is performed after remapping, then we will be >> forced to apply a filter to the entire disk, or complicate the >> filtering algorithm by calculating which range of sectors bio is >> addressed to. And if bio is addressed to the partition boundary... >> Filtering at the block device level seems to me a simpler solution. >> But this is not the biggest problem. > Note that bi_bdev stays for the partition things came from. So we > could still do filtering after blk_partition_remap has been called, > the filter driver just needs to be careful on how to interpret the > sector numbers. Thanks. I'll check it out. > >> 2. Can the filter sleep or postpone bio processing to the worker thread? > I think all of te above is fine, just for normal submit_bio based > drivers. Good. But I'm starting to think that for request-based block devices, filtering should be different. I need to check it out. >> The problem is in the implementation of the COW algorithm. >> If I send a bio to read a chunk (one bio), and then pass a write bio, >> then with some probability I am reading partially overwritten data. >> Writing overtakes reading. And flags REQ_SYNC and REQ_PREFLUSH don't help. >> Maybe it's a disk driver issue, or a hypervisor, or a NAS, or a RAID, >> or maybe normal behavior. I don't know. Although, maybe I'm not working >> correctly with flags. I have seen the comments on patch 11/20, but I am >> not sure that the fixes will solve this problem. >> But because of this, I have to postpone the write until the read completes. > In the I/O stack there really isn't any ordering. While a general > reordering looks a bit odd to be, it absolutely it always possible. > Thank you! So this is normal behavior and locking the writing is necessary. When designing the module, I mistakenly thought that it would be enough to set the correct order of sending bios. >> 2.1 The easiest way to solve the problem is to block the writer's thread >> with a semaphore. And for bio with a flag REQ_NOWAIT, complete processing >> with bio_wouldblock_error(). This is the solution currently being used. > This sounds ok. The other option would be to put the write on hold and > only queue it up from the read completion (or rather a workqueue kicked > off from the read completion). But this is basically the same, just > without blocking the I/O submitter, so we could do the semaphore first > and optimize later as needed. > >> If I am blocked by the q->q_usage_counter counter, then I will not >> be able to execute COW in the context of the current thread due to deadlocks. >> I will have to use a scheme with an additional worker thread. >> Bio filtering will become much more complicated. > q_usage_counter itself doesn't really block you from doing anything. > You can still sleep inside of it, and most driver do that. > Ok. I will try to lower the handle point under the protection of the q_usage_counter. Maybe I'm mistaken about deadlocks. Thank you so much for the review and for the explanatory answers! I got a lot of useful recommendations. I have a lot of work to do to improve the patch.