On Tue, Dec 05, 2023 at 03:25:16PM -0600, Tom Zanussi wrote: > Hi, this is v12 of the IAA crypto driver, incorporating feedback from > v11. > > v12 changes: > > - Moved __free_iaa_wq out of spinlock section. > > > v11 changes: > > - Changed the wq size default from 16 to wq->max_wq_Size in > 'dmaengine: idxd: Add support for device/wq defaults and updated > the Documentation to match. > > - Added new Reviewed-by and Acked-by tags. > > > v10 changes: > > - Fixed the warnings pointed out in v9 when using 'make W=1 C=1'. > > - Added a new patch, 'dmaengine: idxd: Add support for device/wq > defaults' which auto-configures a default set of devices and wqs > so users don't have to do any IAA configuration/setup if they > don't want to. Also updated the Documentation accordingly. > > - Removed the 'dmaengine: idxd: add wq driver name support for > accel-config user tool' patch since it's now upstream. > > - Added a section (in the Appendix) to > Documentation/driver-api/crypto/iaa/iaa-crypto.rst describing the > sysfs interface. > > - While converting my testing scripts to accel-config, realized > accel-config won't allow disabling the wqs if client_count is > 0, > so to address it moved the idxd_wq_get() into iaa_wq_get(). > > - Fixed a bug in the pr_debug() in wq_table_next_wq(), noticed while > testing. > > > v9 changes: > > - Renamed drv_enable/disable_wq() to idxd_drv_enable/disable_wq() > and exported it, changing all existing callers as well as the > iaa_crypto driver. > > - While testing, ran into a use-after-free bug in the irq support > flagged by KASAN so fixed that up in iaa_compress() (added missing > disable_async check). > > - Also, while fixing the use-after-free bug, rearranged the out: > part of iaa_desc_complete() to make it cleaner. > > - Also for the verify cases, reversed the dma mapping by adding and > calling a new iaa_remap_for_verify() function, since verify > basically does a decompress after reversing the src and dst > buffers. > > - Added new Acked-by and Reviewed-by tags. > > > v8 changes: > > - Rebased to current cryptodev tree. > > - Added generic-deflate software fallback for decompression > in cases where there's a hardware failure that would > otherwise prevent the data being recovered. > > - Changed the driver_name code to use strim(). > > - Changed the null-destination cases to use sgl_alloc_order() rather > than sgl_alloc(). > > - Added more Reviewed-by tags. > > > v7 changes: > > - Rebased to current cryptodev tree. > > - Removed 'canned' compression mode (deflate-iaa-canned) and fixed > up dependencies in other patches and Documentation. > > - Removed op_block checks. > > - Removed a stray debugging #ifdef. > > - Changed sysfs-driver-dma-idxd driver_name version to 6.6.0. > > > v6 changes: > > - Rebased to current cryptodev tree. > > - Changed code to create/register separate algorithms for each > compression mode - one for 'fixed' (deflate-iaa) and one for > 'canned' (deflate-iaa-canned). > > - Got rid of 'compression_mode' attribute and all the compression > mode code that deals with 'active' compression modes, since > there's no longer a single active compression mode. > > - Use crypto_ctx to allow common compress/decompress code to > distinguish between the different compression modes. Also use it > to capture settings such as verify_compress, use_irq, etc. In > addition to being cleaner, this will allow for easier performance > comparisons using different modes/settings. > > - Update Documentation and comments to reflect the changes. > > - Fixed a bug found by Rex Zhang in decompress_header() which > unmapped src2 rather than src as it should have. Thanks, Rex! > > > v5 changes: > > - Rebased to current cryptodev tree. > > - Changed sysfs-driver-dma-idxd driver_name version to 6.5.0. > > - Renamed wq private accessor functions to idxd_wq_set/get_private(). > > v4 changes: > > - Added and used DRIVER_NAME_SIZE for wq driver_name. > > - Changed all spaces to tabs in CRYPTO_DEV_IAA_CRYPTO_STATS config > menu. > > - Removed the private_data void * from wq and replaced with > wq_confdev() instead, as suggested by Dave Jiang. > > - Added more Reviewed-by tags. > > v3 changes: > > - Reworked the code to only allow the registered crypto alg to be > unregistered by removing the module. Also added an iaa_wq_get() > and iaa_wq_put() to take/give up a reference to the work queue > while there are compresses/decompresses in flight. This is > synchronized with the wq remove function, so that the > iaa_wq/iaa_devices can't go away beneath active operations. This > was tested by removing/disabling the iaa wqs/devices while > operations were in flight. > > - Simplified the rebalance code and removed cpu_to_iaa() function > since it was overly complicated and wasn't actually working as > advertised. > > - As a result of reworking the above code, fixed several bugs such > as possibly unregistering an unregistered crypto alg, a memory > leak where iaa_wqs weren't being freed, and making sure the > compression schemes were registered before registering the driver. > > - Added set_/idxd_wq_private() accessors for wq private data. > > - Added missing XPORT_SYMBOL_NS_GPL() to [PATCH 04/15] dmaengine: > idxd: Export descriptor management functions > > - Added Dave's Reviewed-by: tags from v2. > > - Updated Documentation and commit messages to reflect the changes > above. > > - Rebased to to cryptodev tree, since that has the earlier changes > that moved the intel drivers to crypto/intel. > > v2 changes: > > - Removed legacy interface and all related code; merged async > interface into main deflate patch. > > - Added support for the null destination case. Thanks to Giovanni > Cabiddu for making me aware of this as well as the selftests for > it. > > - Had to do some rearrangement of the code in order to pass all the > selftests. Also added a testcase for 'canned'. > > - Moved the iaa crypto driver to drivers/crypto/intel, and moved all > the other intel drivers there as well (which will be posted as a > separate series immediately following this one). > > - Added an iaa crypto section to MAINTAINERS. > > - Updated the documenation and commit messages to reflect the removal > of the legacy interface. > > - Changed kernel version from 6.3.0 to 6.4.0 in patch 01/15 (wq > driver name support) > > v1: > > This series adds Linux crypto algorithm support for Intel® In-memory > Analytics Accelerator (Intel IAA) [1] hardware compression and > decompression, which is available on Sapphire Rapids systems. > > The IAA crypto support is implemented as an IDXD sub-driver. The IDXD > driver already present in the kernel provides discovery and management > of the IAA devices on a system, as well as all the functionality > needed to manage, submit, and wait for completion of work executed on > them. The first 7 patches (patches starting with dmaengine:) add > small bits of underlying IDXD plumbing needed to allow external > sub-drivers to take advantage of this support and claim ownership of > specific IAA devices and workqueues. > > The remaining patches add the main support for this feature via the > crypto API, making it transparently accessible to kernel features that > can make use of it such as zswap and zram (patches starting with > crypto – iaa:). > > These include both sync/async support for the deflate algorithm > implemented by the IAA hardware, as well as an additional option for > driver statistics and Documentation. > > Patch 8 ('[PATCH 08/15] crypto: iaa - Add IAA Compression Accelerator > Documentation') describes the IAA crypto driver in detail; the > following is just a high-level synopsis meant to aid the following > discussion. > > The IAA hardware is fairly complex and generally requires a > knowledgeable administrator with sufficiently detailed understanding > of the hardware to set it up before it can be used. As mentioned in > the Documentation, this typically requires using a special tool called > accel-config to enumerate and configure IAA workqueues, engines, etc, > although this can also be done using only sysfs files. > > The operation of the driver mirrors this requirement and only allows > the hardware to be accessed via the crypto layer once the hardware has > been configured and bound to the the IAA crypto driver. As an IDXD > sub-driver, the IAA crypto driver essentially takes ownership of the > hardware until it is given up explicitly by the administrator. This > occurs automatically when the administrator enables the first IAA > workqueue or disables the last one; the iaa_crypto (sync and async) > algorithms are registered when the first workqueue is enabled, and > deregistered when the last one is disabled. > > The normal sequence of operations would normally be: > > < configure the hardware using accel-config or sysfs > > > < configure the iaa crypto driver (see below) > > > < configure the subsystem e.g. zswap/zram to use the iaa_crypto algo > > > < run the workload > > > There are a small number of iaa_crypto driver attributes that the > administrator can configure, and which also need to be configured > before the algorithm is enabled: > > compression_mode: > > The IAA crypto driver supports an extensible interface supporting > any number of different compression modes that can be tailored to > specific types of workloads. These are implemented as tables and > given arbitrary names describing their intent. > > There are currently only 2 compression modes, “canned” and “fixed”. > In order to set a compression mode, echo the mode’s name to the > compression_mode driver attribute: > > echo "canned" > /sys/bus/dsa/drivers/crypto/compression_mode > > There are a few other available iaa_crypto driver attributes (see > Documentation for details) but the main one we want to consider in > detail for now is the ‘sync_mode’ attribute. > > The ‘sync_mode’ attribute has 3 possible settings: ‘sync’, ‘async’, > and ‘async_irq’. > > The context for these different modes is that although the iaa_crypto > driver implements the asynchronous crypto interface, the async > interface is currently only used in a synchronous way by facilities > like zswap that make use of it. > > This is fine for software compress/decompress algorithms, since > there’s no real benefit in being able to use a truly asynchronous > interface with them. This isn’t the case, though, for hardware > compress/decompress engines such as IAA, where truly asynchronous > behavior is beneficial if not completely necessary to make optimal use > of the hardware. > > The IAA crypto driver ‘sync_mode’ support should allow facilities such > as zswap to ‘support async (de)compression in some way [2]’ once > they are modified to actually make use of it. > > When the ‘async_irq’ sync_mode is specified, the driver sets the bits > in the IAA work descriptor to generate an irq when the work completes. > So for every compression or decompression, the IAA acomp_alg > implementations called by crypto_acomp_compress/decompress() simply > set up the descriptor, turn on the 'request irq' bit and return > immediately with -EINPROGRESS. When the work completes, the irq fires > and the IDXD driver’s irq thread for that irq invokes the callback the > iaa_crypto module registered with IDXD. When the irq thread gets > scheduled, it wakes up the caller, which could be for instance zswap, > waiting synchronously via crypto_wait_req(). > > Using the simple madvise test program in '[PATCH 08/15] crypto: iaa - > Add IAA Compression Accelerator Documentation' along with a set of > pages from the spec17 benchmark and tracepoint instrumentation > measuring the time taken between the start and end of each compress > and decompress, this case, async_irq, takes on average 6,847 ns for > compression and 5,840 ns for decompression. (See Table 1 below for a > summary of all the tests.) > > When sync_mode is set to ‘sync’, the interrupt bit is not set and the > work descriptor is submitted in the same way it was for the previous > case. In this case the call doesn’t return but rather loops around > waiting in the iaa_crypto driver’s check_completion() function which > continually checks the descriptor’s completion bit until it finds it > set to ‘completed’. It then returns to the caller, again for example > zswap waiting in crypto_wait_req(). From the standpoint of zswap, > this case is exactly the same as the previous case, the difference > seen only in the crypto layer and the iaa_crypto driver internally; > from its standpoint they’re both synchronous calls. There is however > a large performance difference: an average of 3,177 ns for compress > and 2,235 ns for decompress. > > The final sync_mode is ‘async’. In this case also the interrupt bit > is not set and the work descriptor is submitted, returning immediately > to the caller with -EINPROGRESS. Because there’s no interrupt set to > notify anyone when the work completes, the caller needs to somehow > check for work completion. Because core code like zswap can’t do this > directly by for example calling iaa_crypto’s check_completion(), there > would need to be some changes made to code like zswap and the crypto > layer in order to take advantage of this mode. As such, there are no > numbers to share for this mode. > > Finally, just a quick discussion of the remaining numbers in Table 1, > those comparing the iaa_crypto sync and async irq cases to software > deflate. Software deflate took average of 108,978 ns for compress and > 14,485 ns for decompress. > > As can be seen from Table 1, the numbers using the iaa_crypto driver > for deflate as compared to software are so much better that merging it > would seem to make sense on its own merits. The 'async' sync_mode > described above, however, offers the possibility of even greater gains > to be had against higher-performing algorithms such as lzo, via > parallelization, once the calling facilities are modified to take > advantage of it. Follow-up patchsets to this one will demonstrate > concretely how that might be accomplished. > > Thanks, > > Tom > > > Table 1. Zswap latency and compression numbers (in ns): > > Algorithm compress decompress > ---------------------------------------------------------- > iaa sync 3,177 2,235 > iaa async irq 6,847 5,840 > software deflate 108,978 14,485 > > [1] https://cdrdv2.intel.com/v1/dl/getContent/721858 > > [2] https://lore.kernel.org/lkml/20201107065332.26992-1-song.bao.hua@xxxxxxxxxxxxx/ > > > Dave Jiang (1): > dmaengine: idxd: add external module driver support for dsa_bus_type > > Tom Zanussi (13): > dmaengine: idxd: Rename drv_enable/disable_wq to > idxd_drv_enable/disable_wq, and export > dmaengine: idxd: Export descriptor management functions > dmaengine: idxd: Export wq resource management functions > dmaengine: idxd: Add wq private data accessors > dmaengine: idxd: add callback support for iaa crypto > crypto: iaa - Add IAA Compression Accelerator Documentation > crypto: iaa - Add Intel IAA Compression Accelerator crypto driver core > crypto: iaa - Add per-cpu workqueue table with rebalancing > crypto: iaa - Add compression mode management along with fixed mode > crypto: iaa - Add support for deflate-iaa compression algorithm > crypto: iaa - Add irq support for the crypto async interface > crypto: iaa - Add IAA Compression Accelerator stats > dmaengine: idxd: Add support for device/wq defaults > > .../driver-api/crypto/iaa/iaa-crypto.rst | 824 +++++++ > Documentation/driver-api/crypto/iaa/index.rst | 20 + > Documentation/driver-api/crypto/index.rst | 20 + > Documentation/driver-api/index.rst | 1 + > MAINTAINERS | 7 + > crypto/testmgr.c | 10 + > drivers/crypto/intel/Kconfig | 1 + > drivers/crypto/intel/Makefile | 1 + > drivers/crypto/intel/iaa/Kconfig | 19 + > drivers/crypto/intel/iaa/Makefile | 12 + > drivers/crypto/intel/iaa/iaa_crypto.h | 173 ++ > .../crypto/intel/iaa/iaa_crypto_comp_fixed.c | 92 + > drivers/crypto/intel/iaa/iaa_crypto_main.c | 2182 +++++++++++++++++ > drivers/crypto/intel/iaa/iaa_crypto_stats.c | 313 +++ > drivers/crypto/intel/iaa/iaa_crypto_stats.h | 53 + > drivers/dma/idxd/Makefile | 2 +- > drivers/dma/idxd/bus.c | 6 + > drivers/dma/idxd/cdev.c | 6 +- > drivers/dma/idxd/defaults.c | 53 + > drivers/dma/idxd/device.c | 13 +- > drivers/dma/idxd/dma.c | 9 +- > drivers/dma/idxd/idxd.h | 83 +- > drivers/dma/idxd/init.c | 7 + > drivers/dma/idxd/irq.c | 12 +- > drivers/dma/idxd/submit.c | 9 +- > 25 files changed, 3897 insertions(+), 31 deletions(-) > create mode 100644 Documentation/driver-api/crypto/iaa/iaa-crypto.rst > create mode 100644 Documentation/driver-api/crypto/iaa/index.rst > create mode 100644 Documentation/driver-api/crypto/index.rst > create mode 100644 drivers/crypto/intel/iaa/Kconfig > create mode 100644 drivers/crypto/intel/iaa/Makefile > create mode 100644 drivers/crypto/intel/iaa/iaa_crypto.h > create mode 100644 drivers/crypto/intel/iaa/iaa_crypto_comp_fixed.c > create mode 100644 drivers/crypto/intel/iaa/iaa_crypto_main.c > create mode 100644 drivers/crypto/intel/iaa/iaa_crypto_stats.c > create mode 100644 drivers/crypto/intel/iaa/iaa_crypto_stats.h > create mode 100644 drivers/dma/idxd/defaults.c > > -- > 2.34.1 All applied. Thanks. -- Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt