On Wed, Mar 30, 2022 at 08:28:28AM -0400, Dennis Zhou wrote: > I think cloning is a special case that I might have gotten wrong. If > there is a bio_set_dev() call after each clone(), then the > bio_clone_blkg_association() is excess work. We'd need to audit how > bio_alloc_clone() is being used to be safe. Alternatively, we could opt > for a bio_alloc_clone_noblkg(), but that's a little bit uglier. As of Linux 5.18, the cloning interfaces have changed and take a block devie that the clone is intended to be used for, and bio_set_dev is mostly (there is a few more sports to be cleaned up in dm/md/bcache/btrfs) only used for remapping to a new device. That being said I've eyed the code in bio_associate_blkg a bit and I've been wondering about some of how it is implemented as well. Is recursive throttling really a thing? i.e. we can have cgroup policies on the upper (e.g. dm) device and then again on the lower (e.g. nvme device)? I think the code currently supports that, and if we want to keep that I don't really see much of a way to avoid the lookup, but maybe we cn make it faster.