Re: [PATCH 2/2] scsi: core: avoid to pre-allocate big chunk for sg list

Bart Van Assche <bvanassche@xxxxxxx> · Wed, 24 Apr 2019 08:32:31 -0700

On Wed, 2019-04-24 at 08:24 -0700, James Bottomley wrote:
+AD4 On Wed, 2019-04-24 at 15:52 +-0800, Ming Lei wrote:
+AD4 +AD4 On Tue, Apr 23, 2019 at 08:37:15AM -0700, Bart Van Assche wrote:
+AD4 +AD4 +AD4 On Tue, 2019-04-23 at 18:32 +-0800, Ming Lei wrote:
+AD4 +AD4 +AD4 +AD4  +ACM-define  SCSI+AF8-INLINE+AF8-PROT+AF8-SG+AF8-CNT  1
+AD4 +AD4 +AD4 +AD4  
+AD4 +AD4 +AD4 +AD4 +-+ACM-define  SCSI+AF8-INLINE+AF8-SG+AF8-CNT  2
+AD4 +AD4 +AD4 
+AD4 +AD4 +AD4 So this patch inserts one kmalloc() and one kfree() call in the hot
+AD4 +AD4 +AD4 path for every SCSI request with more than two elements in its
+AD4 +AD4 +AD4 scatterlist? Isn't
+AD4 +AD4 
+AD4 +AD4 Slab or its variants are designed for fast path, and NVMe PCI uses
+AD4 +AD4 slab for allocating sg list in fast path too.
+AD4 
+AD4 Actually, that's not really true  base kmalloc can do all sorts of
+AD4 things including kick off reclaim so it's not really something we like
+AD4 using in the fast path.  The only fast and safe kmalloc you can rely on
+AD4  in the fast path is GFP+AF8-ATOMIC which will fail quickly if no memory
+AD4 can easily be found.  +ACo-However+ACo the sg+AF8-table allocation functions are
+AD4 all pool backed (lib/sg+AF8-pool.c), so they use the lightweight GFP+AF8-ATOMIC
+AD4 mechanism for kmalloc initially coupled with a backing pool in case of
+AD4 failure to ensure forward progress.
+AD4 
+AD4 So, I think you're both right: you shouldn't simply use kmalloc, but
+AD4 this implementation doesn't, it uses the sg+AF8-table allocation functions
+AD4 which correctly control kmalloc to be lightweight and efficient and
+AD4 able to make forward progress.

Another concern is whether this change can cause a livelock. If the system
is running out of memory and the page cache submits a write request with
a scatterlist with more than two elements, if the kmalloc() for the
scatterlist fails, will that prevent the page cache from making any progress
with writeback?

Bart.