This adds a new io_uring interface to exchange meta along with read/write. The patchset is on top block for-next [1] and keith's cleanup patch [2]. Interface: A new meta_type field is introduced in SQE, which describes type of meta that is passed. Currently only one type "PI" is supported. Meta information is represented using a newly introduced 'struct io_uring_meta_pi'. Application sets up a SQE128 ring, and prepares io_uring_meta_pi within second SQE. Application populates 'struct io_uring_meta_pi' fields as below: * pi_flags: these are meta-type specific flags. Three flags are exposed for integrity type, namely IO_INTEGRITY_CHK_GUARD/APPTAG/REFTAG. * len: length of the meta buffer * addr: address of the meta buffer * seed: seed value for ref tag remapping * app_tag: optional application-specific 16b value; this goes along with INTEGRITY_CHK_APPTAG flag. * rsvd: reserved space for storage tag. Block path (direct IO) , NVMe and SCSI driver are modified to support this. Patch 1 is an enhancement patch. Patch 2 is required to make the bounce buffer copy back work correctly. Patch 3 to 5 are prep patches. Patch 6 adds the io_uring support. Patch 7 gives us unified interface for user and kernel generated integrity. Patch 8 adds support in SCSI and patch 9 in NVMe. Patch 10 adds the support for block direct IO. Some of the design choices came from this discussion [3]. Example program on how to use the interface is appended below [4] (It also tests whether reftag remapping happens correctly or not) Tree: https://github.com/SamsungDS/linux/tree/feat/pi_us_v5 Testing: has been done by modifying fio to use this interface. https://github.com/SamsungDS/fio/tree/priv/feat/pi-test-v6 Changes since v4; https://lore.kernel.org/linux-block/20241016112912.63542-1-anuj20.g@xxxxxxxxxxx/ - better variable names to describe bounce buffer copy back (hch) - move defintion of flags in the same patch introducing uio_meta (hch) - move uio_meta definition to include/linux/uio.h (hch) - bump seed size in uio_meta to 8 bytes (martin) - move flags definition to include/uapi/linux/fs.h (hch) - s/meta/metadata in commit description of io-uring (hch) - rearrange the meta fields in sqe for cleaner layout - partial submission case is not applicable as, we are only plumbing for async case - s/META_TYPE_INTEGRITY/META_TYPE_PI (hch, martin) - remove unlikely branching (hch) - Better formatting, misc cleanups, better commit descriptions, reordering commits(hch) Changes since v3: https://lore.kernel.org/linux-block/20240823103811.2421-1-anuj20.g@xxxxxxxxxxx/ - add reftag seed support (Martin) - fix incorrect formatting in uio_meta (hch) - s/IOCB_HAS_META/IOCB_HAS_METADATA (hch) - move integrity check flags to block layer header (hch) - add comments for BIP_CHECK_GUARD/REFTAG/APPTAG flags (hch) - remove bio_integrity check during completion if IOCB_HAS_METADATA is set (hch) - use goto label to get rid of duplicate error handling (hch) - add warn_on if trying to do sync io with iocb_has_metadata flag (hch) - remove check for disabling reftag remapping (hch) - remove BIP_INTEGRITY_USER flag (hch) - add comment for app_tag field introduced in bio_integrity_payload (hch) - pass request to nvme_set_app_tag function (hch) - right indentation at a place in scsi patch (hch) - move IOCB_HAS_METADATA to a separate fs patch (hch) Changes since v2: https://lore.kernel.org/linux-block/20240626100700.3629-1-anuj20.g@xxxxxxxxxxx/ - io_uring error handling styling (Gabriel) - add documented helper to get metadata bytes from data iter (hch) - during clone specify "what flags to clone" rather than "what not to clone" (hch) - Move uio_meta defination to bio-integrity.h (hch) - Rename apptag field to app_tag (hch) - Change datatype of flags field in uio_meta to bitwise (hch) - Don't introduce BIP_USER_CHK_FOO flags (hch, martin) - Driver should rely on block layer flags instead of seeing if it is user-passthrough (hch) - update the scsi code for handling user-meta (hch, martin) Changes since v1: https://lore.kernel.org/linux-block/20240425183943.6319-1-joshi.k@xxxxxxxxxxx/ - Do not use new opcode for meta, and also add the provision to introduce new meta types beyond integrity (Pavel) - Stuff IOCB_HAS_META check in need_complete_io (Jens) - Split meta handling in NVMe into a separate handler (Keith) - Add meta handling for __blkdev_direct_IO too (Keith) - Don't inherit BIP_COPY_USER flag for cloned bio's (Christoph) - Better commit descriptions (Christoph) Changes since RFC: - modify io_uring plumbing based on recent async handling state changes - fixes/enhancements to correctly handle the split for meta buffer - add flags to specify guard/reftag/apptag checks - add support to send apptag [1] https://git.kernel.dk/cgit/linux-block/log/?h=for-next [2] https://lore.kernel.org/linux-block/20241016201309.1090320-1-kbusch@xxxxxxxx/ [3] https://lore.kernel.org/linux-block/20240705083205.2111277-1-hch@xxxxxx/ [4] #define _GNU_SOURCE #include <stdio.h> #include <stdlib.h> #include <string.h> #include <linux/fs.h> #include <linux/io_uring.h> #include <linux/types.h> #include "liburing.h" /* * write data/meta. read both. compare. send apptag too. * prerequisite: * protected xfer: format namespace with 4KB + 8b, pi_type = 1 * For testing reftag remapping on device-mapper, create a * device-mapper and run this program. Device mapper creation: * # echo 0 80 linear /dev/nvme0n1 0 > /tmp/table * # echo 80 160 linear /dev/nvme0n1 200 >> /tmp/table * # dmsetup create two /tmp/table * # ./a.out /dev/dm-0 */ #define DATA_LEN 4096 #define META_LEN 8 struct t10_pi_tuple { __be16 guard; __be16 apptag; __be32 reftag; }; int main(int argc, char *argv[]) { struct io_uring ring; struct io_uring_sqe *sqe = NULL; struct io_uring_cqe *cqe = NULL; void *wdb,*rdb; char wmb[META_LEN], rmb[META_LEN]; char *data_str = "data buffer"; int fd, ret, blksize; struct stat fstat; unsigned long long offset = DATA_LEN * 10; struct t10_pi_tuple *pi; struct io_uring_meta_pi *md; if (argc != 2) { fprintf(stderr, "Usage: %s <block-device>", argv[0]); return 1; }; if (stat(argv[1], &fstat) == 0) { blksize = (int)fstat.st_blksize; } else { perror("stat"); return 1; } if (posix_memalign(&wdb, blksize, DATA_LEN)) { perror("posix_memalign failed"); return 1; } if (posix_memalign(&rdb, blksize, DATA_LEN)) { perror("posix_memalign failed"); return 1; } memset(wdb, 0, DATA_LEN); fd = open(argv[1], O_RDWR | O_DIRECT); if (fd < 0) { printf("Error in opening device\n"); return 0; } ret = io_uring_queue_init(8, &ring, IORING_SETUP_SQE128); if (ret) { fprintf(stderr, "ring setup failed: %d\n", ret); return 1; } /* write data + meta-buffer to device */ sqe = io_uring_get_sqe(&ring); if (!sqe) { fprintf(stderr, "get sqe failed\n"); return 1; } io_uring_prep_write(sqe, fd, wdb, DATA_LEN, offset); sqe->meta_type = META_TYPE_PI; md = (struct io_uring_meta_pi *) sqe->big_sqe; md->addr = (__u64)wmb; md->len = META_LEN; /* flags to ask for guard/reftag/apptag*/ md->pi_flags = IO_INTEGRITY_CHK_GUARD | IO_INTEGRITY_CHK_REFTAG | IO_INTEGRITY_CHK_APPTAG; md->app_tag = 0x1234; md->seed = 10; pi = (struct t10_pi_tuple *)wmb; pi->guard = 0; pi->reftag = 0x0A000000; pi->apptag = 0x3412; ret = io_uring_submit(&ring); if (ret <= 0) { fprintf(stderr, "sqe submit failed: %d\n", ret); return 1; } ret = io_uring_wait_cqe(&ring, &cqe); if (!cqe) { fprintf(stderr, "cqe is NULL :%d\n", ret); return 1; } if (cqe->res < 0) { fprintf(stderr, "write cqe failure: %d", cqe->res); return 1; } io_uring_cqe_seen(&ring, cqe); /* read data + meta-buffer back from device */ sqe = io_uring_get_sqe(&ring); if (!sqe) { fprintf(stderr, "get sqe failed\n"); return 1; } io_uring_prep_read(sqe, fd, rdb, DATA_LEN, offset); sqe->meta_type = META_TYPE_PI; md = (struct io_uring_meta_pi *) sqe->big_sqe; md->addr = (__u64)rmb; md->len = META_LEN; md->pi_flags = IO_INTEGRITY_CHK_GUARD | IO_INTEGRITY_CHK_REFTAG | IO_INTEGRITY_CHK_APPTAG; md->app_tag = 0x1234; md->seed = 10; ret = io_uring_submit(&ring); if (ret <= 0) { fprintf(stderr, "sqe submit failed: %d\n", ret); return 1; } ret = io_uring_wait_cqe(&ring, &cqe); if (!cqe) { fprintf(stderr, "cqe is NULL :%d\n", ret); return 1; } if (cqe->res < 0) { fprintf(stderr, "read cqe failure: %d", cqe->res); return 1; } pi = (struct t10_pi_tuple *)rmb; if (pi->apptag != 0x3412) printf("Failure: apptag mismatch!\n"); if (pi->reftag != 0x0A000000) printf("Failure: reftag mismatch!\n"); io_uring_cqe_seen(&ring, cqe); pi = (struct t10_pi_tuple *)rmb; if (strncmp(wmb, rmb, META_LEN)) printf("Failure: meta mismatch!, wmb=%s, rmb=%s\n", wmb, rmb); if (strncmp(wdb, rdb, DATA_LEN)) printf("Failure: data mismatch!\n"); io_uring_queue_exit(&ring); free(rdb); free(wdb); return 0; } Anuj Gupta (7): block: define set of integrity flags to be inherited by cloned bip block: modify bio_integrity_map_user to accept iov_iter as argument fs, iov_iter: define meta io descriptor fs: introduce IOCB_HAS_METADATA for metadata io_uring/rw: add support to send metadata along with read/write block: introduce BIP_CHECK_GUARD/REFTAG/APPTAG bip_flags scsi: add support for user-meta interface Christoph Hellwig (1): block: copy back bounce buffer to user-space correctly in case of split Kanchan Joshi (2): nvme: add support for passing on the application tag block: add support to pass user meta buffer block/bio-integrity.c | 84 ++++++++++++++++++++++++++++------- block/blk-integrity.c | 10 ++++- block/fops.c | 42 ++++++++++++++---- drivers/nvme/host/core.c | 21 +++++---- drivers/scsi/sd.c | 4 +- include/linux/bio-integrity.h | 19 ++++++-- include/linux/fs.h | 1 + include/linux/uio.h | 10 +++++ include/uapi/linux/fs.h | 9 ++++ include/uapi/linux/io_uring.h | 29 ++++++++++++ io_uring/io_uring.c | 9 ++++ io_uring/rw.c | 79 +++++++++++++++++++++++++++++++- io_uring/rw.h | 14 +++++- 13 files changed, 290 insertions(+), 41 deletions(-) -- 2.25.1