On 2019/06/21 0:25, Bart Van Assche wrote: > On 6/19/19 8:48 PM, Damien Le Moal wrote: >> + /* >> + * Limit the command buffer size to the arbitrary SD_ZBC_REPORT_SIZE >> + * size (1MB), allowing up to 16383 zone descriptors being reported with >> + * a single command. And make sure that this size does not exceed the >> + * hardware capabilities. To avoid disk revalidation failures due to >> + * memory allocation errors, retry the allocation with a smaller buffer >> + * size if the allocation fails. >> + */ >> + bufsize = min_t(size_t, *buflen, SD_ZBC_REPORT_SIZE); >> + bufsize = min_t(size_t, bufsize, >> + queue_max_hw_sectors(disk->queue) << 9); >> + for (order = get_order(bufsize); order >= 0; order--) { >> + page = alloc_pages(gfp_mask, order); >> + if (page) { >> + *buflen = PAGE_SIZE << order; >> + return page_address(page); >> + } >> + } > > Hi Damien, > > As you know Linux memory fragmentation tends to increase over time. The > above code has the very unfortunate property that the more memory is > fragmented the smaller the allocated buffer will become. I don't think > that's how kernel code should work. Have you considered to use vmalloc() > + blk_rq_map_sg() instead? See also efa_vmalloc_buf_to_sg() for an > example of how to build a scatterlist for memory allocated by vmalloc(). Bart, I have a fix along the lines you suggested, but since it modifies bio_rq_map_kern(), I would rather not push that as a bug fix this late in RC. Would you agree to accept the fix patch as is for now and I will send the more complete fix for 5.3 ? Note that this more complete fix also reworks the similar memory allocation for the struct blk_zone array used for zone revalidation. Put all together, the use of report zones uses only vmalloc-ed buffers and data structures, reduces pressure on the memory system and reducing chances of failures. Best regards. -- Damien Le Moal Western Digital Research