On 04/06/2020 05:28 PM, Damien Le Moal wrote: > On 2020/04/07 8:24, Chaitanya Kulkarni wrote: >> One of the common application for the file systems and drivers to map >> the buffer to the bio and issue I/Os on the block device. > > Drivers generally do not do this. At least not lower ones (scsi, nvme, etc). I > guess you are referring to DM drivers. If yes, better be clear about this. > Also, "the buffer" is a little unclear. Better qualify that. > Agree most drivers deals with sg_list. Right now there is only one such an example I know which is rnbd as driver as it is mentioned in the cover-letter references. > Another thing: "to map the buffer to the bio" is a little strange. The reverse > seems more natural: a bio maps a buffer. > Make sense, will do. >> >> This is a prep patch which adds two helper functions for block layer >> which allows file-systems and the drivers to map data buffer on to the >> bios and issue bios synchronously using blkdev_issue_rw() or >> asynchronously using __blkdev_issue_rw(). > > The function names are basically the same but do completely different things. > Better rename that. May be blkdev_issue_rw_sync() and blkdev_issue_rw_async() ? > > This is exactly I named functions to start with (see the function documentation which I missed to update), but the pattern in blk-lib.c uses no such sync and async postfix, which is used is sync and async context all over the kernel :- # grep EXPORT block/blk-lib.c EXPORT_SYMBOL(__blkdev_issue_discard); EXPORT_SYMBOL(blkdev_issue_discard);, since as it is nr_sects and sector calculation doesn't matter after break. EXPORT_SYMBOL(blkdev_issue_write_same); EXPORT_SYMBOL(__blkdev_issue_zeroout); EXPORT_SYMBOL(blkdev_issue_zeroout); EXPORT_SYMBOL_GPL(__blkdev_issue_rw); EXPORT_SYMBOL_GPL(blkdev_issue_rw); In absence of documentation for naming how about we just follow existing naming convention ? Which matches the pattern for new API. >> >> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@xxxxxxx> >> --- >> block/blk-lib.c | 105 +++++++++++++++++++++++++++++++++++++++++ >> include/linux/blkdev.h | 7 +++ >> 2 files changed, 112 insertions(+) >> >> diff --git a/block/blk-lib.c b/block/blk-lib.c >> index 5f2c429d4378..451c367fc0d6 100644 >> --- a/block/blk-lib.c >> +++ b/block/blk-lib.c >> @@ -405,3 +405,108 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, >> return ret; >> } >> EXPORT_SYMBOL(blkdev_issue_zeroout); >> + >> +/** >> + * __blkdev_ssue_rw - issue read/write bios from buffer asynchronously > > s/__blkdev_ssue_rw/__blkdev_issue_rw Thanks, will fix. > >> + * @bdev: blockdev to read/write >> + * @buf: data buffer >> + * @sector: start sector >> + * @nr_sects: number of sectors >> + * @op: REQ_OP_READ or REQ_OP_WRITE >> + * @opf: flags >> + * @gfp_mask: memory allocation flags (for bio_alloc) >> + * @biop: pointer to anchor bio >> + * >> + * Description: >> + * Generic helper function to map data buffer into bios for read and write ops. >> + * Returns pointer to the anchored last bio for caller to submit asynchronously >> + * or synchronously. >> + */ >> +int __blkdev_issue_rw(struct block_device *bdev, char *buf, sector_t sector, >> + sector_t nr_sects, unsigned op, unsigned opf, >> + gfp_t gfp_mask, struct bio **biop) >> +{ >> + bool vm = is_vmalloc_addr(buf) ? true : false; > > You do not need the clunky "? true : false" here. is_vmalloc_addr() returns a > bool already. > I will remove it. >> + struct bio *bio = *biop; >> + unsigned int nr_pages; >> + struct page *page; >> + unsigned int off; >> + unsigned int len; >> + int bi_size; >> + >> + if (!bdev_get_queue(bdev)) >> + return -ENXIO; >> + >> + if (bdev_read_only(bdev)) >> + return -EPERM; > > One can read a read-only device. So the check here is not correct. You need a > "&& op == REQ_OP_WRITE" in the condition. > True, this shouldn't be here, also I'm thinking of lifting these checks to caller, we can add it later if needed. >> + >> + if (!(op == REQ_OP_READ || op == REQ_OP_WRITE)) >> + return -EINVAL; > > This probably should be checked before read-only. > Yes. >> + >> + while (nr_sects != 0) { >> + nr_pages = __blkdev_sectors_to_bio_pages(nr_sects); >> + >> + bio = blk_next_bio(bio, nr_pages, gfp_mask); >> + bio->bi_iter.bi_sector = sector; >> + bio_set_dev(bio, bdev); >> + bio_set_op_attrs(bio, op, 0); >> + >> + while (nr_sects != 0) { >> + off = offset_in_page(buf); >> + page = vm ? vmalloc_to_page(buf) : virt_to_page(buf); >> + len = min((sector_t) PAGE_SIZE, nr_sects << 9); >> + >> + bi_size = bio_add_page(bio, page, len, off); > > The variable naming is super confusing. bio_add_page() returns 0 if nothing is > added and len if the page was added. So bi_size as a var name is not the best of > name in my opinion. > Here bio, page, len, off variables are passed to the bio_add_page() function, which has the same names in the function parameter list. (I'll fix the off to offset) Regarding the bi_size, given that bio_add_page() returns bio->bi_iter.bi_size, isn't bi_size also maps to the what function is returning and explains what we are dealing with? Also, bi_size is used in the blk-lib.c: __blkdev_issue_zero_pages(), if that is confusing then we need to change that, what is your preference? I'll send a cleanup patch __blkdev_issue_zero_pages() also. >> + >> + nr_sects -= bi_size >> 9; >> + sector += bi_size >> 9; >> + buf += bi_size; >> + >> + if (bi_size < len) >> + break; > > You will get either 0 or len from bio_add_page. So the check here is not ideal. > I think it is better to move it up under bio_add_page() and make it: > > len = bioa_add_page(bio, page, len, off); > if (!len) > break; I'd still like to keep bi_size, I think I had received a comment about using same variable for function call and updating as a return value. Regarding the check, will move it. > >> + } >> + cond_resched(); >> + } >> + *biop = bio; >> + return 0; >> +} >> +EXPORT_SYMBOL_GPL(__blkdev_issue_rw); >> + >> +/** >> + * blkdev_execute_rw_sync - issue read/write bios from buffer synchronously >> + * @bdev: blockdev to read/write >> + * @buf: data buffer >> + * @sector: start sector >> + * @count: number of bytes >> + * @op: REQ_OP_READ or REQ_OP_WRITE >> + * @opf: flags >> + * @gfp_mask: memory allocation flags (for bio_alloc) >> + * >> + * Description: >> + * Generic helper function to map data buffer buffer into bios for read and >> + * write requests. >> + */ >> +int blkdev_issue_rw(struct block_device *b, char *buf, sector_t sector, >> + unsigned count, unsigned op, unsigned opf, gfp_t mask) > > function name differs between description and declaration. > blkdev_execute_rw_sync() is better than blkdev_issue_rw() in my opinion. > That was remained from initial version, will fix it, thanks for pointing it out. >> +{ >> + unsigned int is_vmalloc = is_vmalloc_addr(buf); >> + sector_t nr_sects = count >> 9; >> + struct bio *bio = NULL; >> + int error; >> + >> + if (is_vmalloc && op == REQ_OP_WRITE) >> + flush_kernel_vmap_range(buf, count); >> + >> + opf |= REQ_SYNC; > > You can add this directly in the call below. > Breaks calling of __blkdev_issue_rw() to new line. This also adds a new line which is unavoidable but keeps the function call smooth in one line. >> + error = __blkdev_issue_rw(b, buf, sector, nr_sects, op, opf, mask, &bio); >> + if (!error && bio) { >> + error = submit_bio_wait(bio); >> + bio_put(bio); >> + } >> + >> + if (is_vmalloc && op == REQ_OP_READ) >> + invalidate_kernel_vmap_range(buf, count); >> + >> + return error; >> +} >> +EXPORT_SYMBOL_GPL(blkdev_issue_rw); >> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h >> index 32868fbedc9e..cb315b301ad9 100644 >> --- a/include/linux/blkdev.h >> +++ b/include/linux/blkdev.h >> @@ -1248,6 +1248,13 @@ static inline int sb_issue_zeroout(struct super_block *sb, sector_t block, >> gfp_mask, 0); >> } >> >> +extern int __blkdev_issue_rw(struct block_device *bdev, char *buf, >> + sector_t sector, sector_t nr_sects, unsigned op, >> + unsigned opf, gfp_t gfp_mask, struct bio **biop); >> +extern int blkdev_issue_rw(struct block_device *b, char *buf, sector_t sector, >> + unsigned count, unsigned op, unsigned opf, >> + gfp_t mask); > > No need for the extern I think. > Again, following the exiting pattern in the file for the header:- # grep blkdev_issue include/linux/blkdev.h extern int blkdev_issue_flush(struct block_device *, gfp_t, sector_t *); extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector, extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, extern int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector, extern int __blkdev_issue_rw(struct block_device *bdev, char *buf, extern int blkdev_issue_rw(struct block_device *b, char *buf, sector_t sector, >> + >> extern int blk_verify_command(unsigned char *cmd, fmode_t mode); >> >> enum blk_default_limits { >> > >