On Mon, Sep 23, 2019 at 06:36:32PM +0200, Vlastimil Babka wrote: > On 8/26/19 1:16 PM, Vlastimil Babka wrote: > > In most configurations, kmalloc() happens to return naturally aligned (i.e. > > aligned to the block size itself) blocks for power of two sizes. That means > > some kmalloc() users might unknowingly rely on that alignment, until stuff > > breaks when the kernel is built with e.g. CONFIG_SLUB_DEBUG or CONFIG_SLOB, > > and blocks stop being aligned. Then developers have to devise workaround such > > as own kmem caches with specified alignment [1], which is not always practical, > > as recently evidenced in [2]. > > > > The topic has been discussed at LSF/MM 2019 [3]. Adding a 'kmalloc_aligned()' > > variant would not help with code unknowingly relying on the implicit alignment. > > For slab implementations it would either require creating more kmalloc caches, > > or allocate a larger size and only give back part of it. That would be > > wasteful, especially with a generic alignment parameter (in contrast with a > > fixed alignment to size). > > > > Ideally we should provide to mm users what they need without difficult > > workarounds or own reimplementations, so let's make the kmalloc() alignment to > > size explicitly guaranteed for power-of-two sizes under all configurations. > > What this means for the three available allocators? > > > > * SLAB object layout happens to be mostly unchanged by the patch. The > > implicitly provided alignment could be compromised with CONFIG_DEBUG_SLAB due > > to redzoning, however SLAB disables redzoning for caches with alignment > > larger than unsigned long long. Practically on at least x86 this includes > > kmalloc caches as they use cache line alignment, which is larger than that. > > Still, this patch ensures alignment on all arches and cache sizes. > > > > * SLUB layout is also unchanged unless redzoning is enabled through > > CONFIG_SLUB_DEBUG and boot parameter for the particular kmalloc cache. With > > this patch, explicit alignment is guaranteed with redzoning as well. This > > will result in more memory being wasted, but that should be acceptable in a > > debugging scenario. > > > > * SLOB has no implicit alignment so this patch adds it explicitly for > > kmalloc(). The potential downside is increased fragmentation. While > > pathological allocation scenarios are certainly possible, in my testing, > > after booting a x86_64 kernel+userspace with virtme, around 16MB memory > > was consumed by slab pages both before and after the patch, with difference > > in the noise. > > > > [1] https://lore.kernel.org/linux-btrfs/c3157c8e8e0e7588312b40c853f65c02fe6c957a.1566399731.git.christophe.leroy@xxxxxx/ > > [2] https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@xxxxxxxxxx/ > > [3] https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_787740_&d=DwICaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=jJYgtDM7QT-W-Fz_d29HYQ&m=UUX1YoPTOOycNowHuRP2ZnqwSwZFjAFrkQFrqstidZ0&s=Kt_XTKlh2qxbC_7ME44MV3_QWFVRHlI1p2EQZYP0uqY&e= > > > > Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx> > > So if anyone thinks this is a good idea, please express it (preferably > in a formal way such as Acked-by), otherwise it seems the patch will be > dropped (due to a private NACK, apparently). > > Otherwise I don't think there can be an objective conclusion. On the one > hand we avoid further problems and workarounds due to misalignment (or > objects allocated beyond page boundary, which was only recently > mentioned), on the other hand we potentially make future changes to > SLAB/SLUB or hypotetical new implementation either more complicated, or > less effective due to extra fragmentation. Different people can have > different opinions on what's more important. > > Let me however explain why I think we don't have to fear the future > implementation complications that much. There was an argument IIRC that > extra non-debug metadata could start to be prepended/appended to an > object in the future (i.e. RCU freeing head?). > > 1) Caches can be already created with explicit alignment, so a naive > pre/appending implementation would already waste memory on such caches. > 2) Even without explicit alignment, a single slab cache for 512k objects > with few bytes added to each object would waste almost 512k as the > objects wouldn't fit precisely in an (order-X) page. The percentage > wasted depends on X. > 3) Roman recently posted a patchset [1] that basically adds a cgroup > pointer to each object. The implementation doesn't append it to objects > naively however, but adds a separately allocated array. Alignment is > thus unchanged. To be fair here, we *might* want to put this pointer just after/before the object to reduce the number of cache misses. I've put it into a separate place mostly for simplicity reasons. It's not an objection though, just a note. Thanks! > > [1] https://lore.kernel.org/linux-mm/20190905214553.1643060-1-guro@xxxxxx/ >