Re: lib/scatterlist.c : sgl_alloc_order promises more than it delivers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2020-09-26 12:32 a.m., Bart Van Assche wrote:
On 2020-09-24 21:55, Douglas Gilbert wrote:
My code steps down from 1024 KiB elements on failure to 512 KiB and if that
fails it tries 256 KiB. Then it gives up. The log output is consistent with
my analysis. So your stated equality is an inequality when length >= 4 GiB.
There is no promotion of unsigned int nent to uint64_t .

You can write your own test harness if you don't believe me. The test machine
doesn't need much ram. Without the call to sgl_free() corrected, if it really
did try to get that much ram and failed toward the end, then (partially)
freed up what it had obtained, then you would see a huge memory leak ...>

Now your intention seems to be that a 4 GiB sgl should be valid. Correct?
Can that check just be dropped?

Hi Doug,

When I wrote that code, I did not expect that anyone would try to allocate
4 GiB or more as a single scatterlist. Are there any use cases for which a
4 GiB scatterlist works better than two or more smaller scatterlists?

Then one would wonder why it has this declaration:
    struct scatterlist *sgl_alloc_order(unsigned long long length,
                                        unsigned int order, bool chainable,
                                        gfp_t gfp, unsigned int *nent_p)

'unsigned long long length' [in bytes] is a lot; 64 or 128 bits worth;
definitely more than 32 bits.

And vmalloc is declared:
    void *vmalloc(unsigned long size);

Which is 64 bits on a 64 bit machine (i.e. must be able hold a pointer).
And it is vmalloc() that I want to replace with sgl_alloc_order() in the
scsi_debug driver. Robert Love writes of vmalloc():

    "The vmalloc() function, to make nonphysically contiguous pages
    contiguous in the virtual address space, must specifically set up
    the page table entries. Worse, pages obtained via vmalloc() must
    be mapped by their individual pages (because they are not physically
    contiguous), which results in much greater TLB4 thrashing than you see
    when directly mapped memory is used. Because of these concerns,
    vmalloc() is used only when absolutely necessary—typically, to obtain
    large regions of memory." ['LK Development' 3rd edition, page 244]

And scatterlist seems to be doing in the foreground what vmalloc() is
doing in the background, but without those drawbacks.

My testing suggests using a store built with sgl_alloc_order() *** is a
little faster but with a lower standard deviation (i.e. spread) on timings
from repeated tests.

Another advantage of a scatterlist-based store in the scsi_debug driver
is that the data-in and data-out buffers associated with SCSI commands
also come through as scatterlist-based objects. Thus I can do almost all
the manipulations the driver needs to do to simulate a disk by adding
these general functions:
    - sgl_copy_sgl()
    - sgl_cmp_sgl()
    - sgl_memset()
    - sgl_prefetch()

A memmove() variant would be simple to implement, but the scsi_debug
driver doesn't need it.

Do you agree that many hardware DMA engines do not support transferring
4 GiB or more at once?

I agree that one element of a scatter gather list should not exceed 4 GiB
of memory. In scsi_debug the scatter gather list (one per store) has
in some cases several thousand elements. But I do not agree that the _sum_
of the size of those elements should be limited to 4 GiB. With those two
lines removed from sgl_alloc_order() I can test an 8 GiB scsi_debug ram
disk on a 16 GiB machine. [I made it into 1 partition, did mkfs.ext4,
mounted it, rsync-ed the kernel source onto it and built a kernel that
runs. A reasonable test, no?]

Doug Gilbert


*** the very useful property of sgl_alloc_order() is that each element
    of the scatter gather list has the same order (or it fails). This
    allows O(1) navigation of a big store like a 8 GiB ramdisk since
    sg_miter_skip() can be avoided with some simple integer maths.



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux