On Thu, Mar 2, 2023 at 9:11 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > On Thu, Mar 02, 2023 at 11:54:03AM -0800, Kees Cook wrote: > > On Thu, Mar 02, 2023 at 07:19:49PM +0000, Al Viro wrote: > > > On Thu, Mar 02, 2023 at 11:10:03AM -0800, Linus Torvalds wrote: > > > > On Thu, Mar 2, 2023 at 11:03 AM Linus Torvalds > > > > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > > > It might be best if we actually exposed it as a SLAB_SKIP_ZERO thing, > > > > > just to make it possible to say - exactly in situations like this - > > > > > that this particular slab cache has no advantage from pre-zeroing. > > > > > > > > Actually, maybe it's just as well to keep it per-allocation, and just > > > > special-case getname_flags() itself. > > > > > > > > We could replace the __getname() there with just a > > > > > > > > kmem_cache_alloc(names_cachep, GFP_KERNEL | __GFP_SKIP_ZERO); > > > > > > > > we're going to overwrite the beginning of the buffer with the path we > > > > copy from user space, and then we'd have to make people comfortable > > > > with the fact that even with zero initialization hardening on, the > > > > space after the filename wouldn't be initialized... > > > > > > ACK; same in getname_kernel() and sys_getcwd(), at the very least. > > > > FWIW, much earlier analysis suggested opting out these kmem caches: > > > > buffer_head > > names_cache > > mm_struct > > anon_vma > > skbuff_head_cache > > skbuff_fclone_cache > > I would probably add dentry_cache to it; the only subtle part is > ->d_iname and I'm convinced that explicit "make sure there's a NUL > at the very end" is enough. FWIW, a couple of years ago I was looking into implementing the following scheme for opt-out that I also discussed with Kees: 1. Add a ___GFP_SKIP_ZERO flag that is not intended to be used explicitly (e.g. disallow passing it to kmalloc(), add a checkpatch.pl warning). Explicitly passing an opt-out flag to allocation functions was considered harmful previously: https://lore.kernel.org/kernel-hardening/20190423083148.GF25106@xxxxxxxxxxxxxx/ 2. Define new allocation API that will allow opt-out: struct page *alloc_pages_uninit(gfp_t gfp, unsigned int order, const char *key); void *kmalloc_uninit(size_t size, gfp_t flags, const char *key); void *kmem_cache_alloc_uninit(struct kmem_cache *, gfp_t flags, const char *key); , where @key is an arbitrary string that identifies a single allocation or a group of allocations. 3. Provide a boot-time flag that holds a comma-separated list of opt-out keys that actually take effect: init_on_alloc.skip="xyz-camera-driver,some_big_buffer". The rationale behind this two-step mechanism is that despite certain allocations may be appealing opt-out targets for performance reasons, some vendors may choose to be on the safe side and explicitly list the allocations that should not be zeroed. Several possible enhancements include: 1. A SLAB_NOINIT memcache flag that prohibits cache merging and disables initialization. Because the number of caches is relatively small, it might be fine to explicitly pass SLAB_NOINIT at cache creation sites. Again, if needed, we could only use this flag as a hint that needs to be acknowledged by a boot-time option. 2. No opt-out for kmalloc() - instead advise users to promote the anonymous allocations they want to opt-out to memory caches with SLAB_NOINIT. 3. Provide an emergency brake that completely disables ___GFP_SKIP_ZERO if a major security breach is discovered. Extending this idea of per-callsite opt-out we could generate unique keys for every allocation in the kernel (or e.g. group them together by the caller name) and decide at runtime if we want to opt them out. But I am not sure anyone would want this level of control in their distro.