Re: [PATCH 01/20] block, blk_filter: enable block device filters

Sergei Shtepa <sergei.shtepa@xxxxxxxxx> · Wed, 13 Jul 2022 15:47:23 +0200

On 7/13/22 13:56, Christoph Hellwig wrote:
> 
> On Fri, Jul 08, 2022 at 12:45:33PM +0200, Sergei Shtepa wrote:
>> 1. Work at the partition or disk level?
>> At the user level, programs operate with block devices.
>> In fact, the "disk" entity makes sense only for the kernel level. 
>> When the user chooses which block devices to backup and which not,
>> he operates with mounting points, which are converted into block
>> devices, partitions. Therefore, it is better to handle bio before
>> remapping to disk.
>> If the filtering is performed after remapping, then we will be
>> forced to apply a filter to the entire disk, or complicate the
>> filtering algorithm by calculating which range of sectors bio is
>> addressed to. And if bio is addressed to the partition boundary...
>> Filtering at the block device level seems to me a simpler solution.
>> But this is not the biggest problem.
> Note that bi_bdev stays for the partition things came from.  So we
> could still do filtering after blk_partition_remap has been called,
> the filter driver just needs to be careful on how to interpret the
> sector numbers.

Thanks. I'll check it out.

> 
>> 2. Can the filter sleep or postpone bio processing to the worker thread?
> I think all of te above is fine, just for normal submit_bio based
> drivers.

Good. But I'm starting to think that for request-based block devices,
filtering should be different. I need to check it out.

>> The problem is in the implementation of the COW algorithm.
>> If I send a bio to read a chunk (one bio), and then pass a write bio,
>> then with some probability I am reading partially overwritten data.
>> Writing overtakes reading. And flags REQ_SYNC and REQ_PREFLUSH don't help.
>> Maybe it's a disk driver issue, or a hypervisor, or a NAS, or a RAID,
>> or maybe normal behavior. I don't know. Although, maybe I'm not working
>> correctly with flags. I have seen the comments on patch 11/20, but I am
>> not sure that the fixes will solve this problem.
>> But because of this, I have to postpone the write until the read completes.
> In the I/O stack there really isn't any ordering.  While a general
> reordering looks a bit odd to be, it absolutely it always possible.
> 

Thank you!
So this is normal behavior and locking the writing is necessary.
When designing the module, I mistakenly thought that it would be enough
to set the correct order of sending bios.

>> 2.1 The easiest way to solve the problem is to block the writer's thread
>> with a semaphore. And for bio with a flag REQ_NOWAIT, complete processing
>> with bio_wouldblock_error(). This is the solution currently being used.
> This sounds ok.  The other option would be to put the write on hold and
> only queue it up from the read completion (or rather a workqueue kicked
> off from the read completion).  But this is basically the same, just
> without blocking the I/O submitter, so we could do the semaphore first
> and optimize later as needed.
> 
>> If I am blocked by the q->q_usage_counter counter, then I will not
>> be able to execute COW in the context of the current thread due to deadlocks.
>> I will have to use a scheme with an additional worker thread.
>> Bio filtering will become much more complicated.
> q_usage_counter itself doesn't really block you from doing anything.
> You can still sleep inside of it, and most driver do that.
> 
Ok. I will try to lower the handle point under the protection of the
q_usage_counter. Maybe I'm mistaken about deadlocks.

Thank you so much for the review and for the explanatory answers!
I got a lot of useful recommendations.
I have a lot of work to do to improve the patch.