On Mon, 11 Jan 2021 at 17:52, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > > CRC-T10DIF is a very poor match for the crypto API: > - every user in the kernel calls it via a library wrapper around the > shash API, so all callers share a single instance of the transform > - each architecture provides at most a single optimized implementation, > based on SIMD instructions for carryless multiplication > > In other words, none of the flexibility it provides is put to good use, > and so it is better to get rid of this complexity, and expose the optimized > implementations via static calls instead. This removes a substantial chunk > of code, and also gets rid of any indirect calls on architectures that > obsess about those (x86) > > If this approach is deemed suitable, there are other places where we might > consider adopting it: CRC32 and CRC32(C). > > Patch #1 does some preparatory refactoring and removes the library wrapper > around the shash transform. > > Patch #2 introduces the actual static calls, along with the registration > routines to update the crc-t10dif implementation at runtime. > > Patch #3 updates the generic CRC-T10DIF shash driver so it distinguishes > between the optimized library code and the generic library code. > > Patches #4 to #7 update the various arch implementations to switch over to > the new method. > > Special request to Peter to take a look at patch #2, and in particular, > whether synchronize_rcu_tasks() is sufficient to ensure that a module > providing the target of a static call can be unloaded safely. > It seems I may have managed to confuse myself slightly here: without an upper bound on the size of the input of the crc_t10dif() routine, I suppose we can never assume that all its callers have finished. Insights welcome ...