I haven't been able to deep dive into the details, but the structure of this still makes me very unhappy. Most of it is related to the software fallback again. Please split the fallback into a separate file, and also into a separate data structure. There is abslutely no need to have the overhead of the software only fields for the hardware case. On the counter side I think all the core block layer code added should go into a single file instead of split into three with some odd layering. Also what I don't understand is why this managed key-slots on a per-bio basis. Wou;dn't it make a whole lot more sense to manage them on a struct request basis once most of the merging has been performed?