Bart, On 2019/06/21 0:25, Bart Van Assche wrote: > On 6/19/19 8:48 PM, Damien Le Moal wrote: >> + /* >> + * Limit the command buffer size to the arbitrary SD_ZBC_REPORT_SIZE >> + * size (1MB), allowing up to 16383 zone descriptors being reported with >> + * a single command. And make sure that this size does not exceed the >> + * hardware capabilities. To avoid disk revalidation failures due to >> + * memory allocation errors, retry the allocation with a smaller buffer >> + * size if the allocation fails. >> + */ >> + bufsize = min_t(size_t, *buflen, SD_ZBC_REPORT_SIZE); >> + bufsize = min_t(size_t, bufsize, >> + queue_max_hw_sectors(disk->queue) << 9); >> + for (order = get_order(bufsize); order >= 0; order--) { >> + page = alloc_pages(gfp_mask, order); >> + if (page) { >> + *buflen = PAGE_SIZE << order; >> + return page_address(page); >> + } >> + } > > Hi Damien, > > As you know Linux memory fragmentation tends to increase over time. The > above code has the very unfortunate property that the more memory is > fragmented the smaller the allocated buffer will become. I don't think > that's how kernel code should work. Have you considered to use vmalloc() > + blk_rq_map_sg() instead? See also efa_vmalloc_buf_to_sg() for an > example of how to build a scatterlist for memory allocated by vmalloc(). The REPORT_ZONES command is executed using scsi_execute_req(). Can we pass a vmalloc-ed buffer to that function ? It does look like it since scsi_execute_rq calls bio_rq_map_kern() which then calls bio_map_kern() which goes through the list of pages of the buffer. Just would like to confirm I understand this correctly. My concern with using vmalloc is that the worst case scenario will result in all pages of the buffer being non contiguous. In this case, since the report zones command cannot be split, we would need to limit the allocation to max_segments * page size, and that can be pretty small for some HBAs. Another reason I did not pursue the vmalloc route is that the processing of the report zones reply to transform zone information into struct blkzone is really painful to do with a vmalloced buffer as every page of the buffer needs to be kmap()/kunmap(). The code was like that when REPORT ZONES was processed as a BIO/request, but it was a lot of code for not much to be done. Or is there a more elegant solution for in-kernel mapping of a vmalloc buffer ? -- Damien Le Moal Western Digital Research