Re: [PATCH] sd_zbc: Fix report zones buffer allocation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Bart,

On 2019/06/21 0:25, Bart Van Assche wrote:
> On 6/19/19 8:48 PM, Damien Le Moal wrote:
>> +	/*
>> +	 * Limit the command buffer size to the arbitrary SD_ZBC_REPORT_SIZE
>> +	 * size (1MB), allowing up to 16383 zone descriptors being reported with
>> +	 * a single command. And make sure that this size does not exceed the
>> +	 * hardware capabilities. To avoid disk revalidation failures due to
>> +	 * memory allocation errors, retry the allocation with a smaller buffer
>> +	 * size if the allocation fails.
>> +	 */
>> +	bufsize = min_t(size_t, *buflen, SD_ZBC_REPORT_SIZE);
>> +	bufsize = min_t(size_t, bufsize,
>> +			queue_max_hw_sectors(disk->queue) << 9);
>> +	for (order = get_order(bufsize); order >= 0; order--) {
>> +		page = alloc_pages(gfp_mask, order);
>> +		if (page) {
>> +			*buflen = PAGE_SIZE << order;
>> +			return page_address(page);
>> +		}
>> +	}
> 
> Hi Damien,
> 
> As you know Linux memory fragmentation tends to increase over time. The 
> above code has the very unfortunate property that the more memory is 
> fragmented the smaller the allocated buffer will become. I don't think 
> that's how kernel code should work. Have you considered to use vmalloc() 
> + blk_rq_map_sg() instead? See also efa_vmalloc_buf_to_sg() for an 
> example of how to build a scatterlist for memory allocated by vmalloc().

The REPORT_ZONES command is executed using scsi_execute_req(). Can we pass a
vmalloc-ed buffer to that function ? It does look like it since scsi_execute_rq
calls bio_rq_map_kern() which then calls bio_map_kern() which goes through the
list of pages of the buffer. Just would like to confirm I understand this correctly.

My concern with using vmalloc is that the worst case scenario will result in all
pages of the buffer being non contiguous. In this case, since the report zones
command cannot be split, we would need to limit the allocation to max_segments *
page size, and that can be pretty small for some HBAs.

Another reason I did not pursue the vmalloc route is that the processing of the
report zones reply to transform zone information into struct blkzone is really
painful to do with a vmalloced buffer as every page of the buffer needs to be
kmap()/kunmap(). The code was like that when REPORT ZONES was processed as a
BIO/request, but it was a lot of code for not much to be done. Or is there a
more elegant solution for in-kernel mapping of a vmalloc buffer ?


-- 
Damien Le Moal
Western Digital Research




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux