On Tue, Feb 25, 2020 at 5:18 AM Kees Cook <keescook@xxxxxxxxxxxx> wrote: > On Mon, Feb 24, 2020 at 04:35:00PM +0100, glider@xxxxxxxxxx wrote: > > Certain copy_from_user() invocations in binder.c are known to > > unconditionally initialize locals before their first use, like e.g. in > > the following case: > > > > struct binder_transaction_data tr; > > if (copy_from_user(&tr, ptr, sizeof(tr))) > > return -EFAULT; > > > > In such cases enabling CONFIG_INIT_STACK_ALL leads to insertion of > > redundant locals initialization that the compiler fails to remove. > > To work around this problem till Clang can deal with it, we apply > > __do_not_initialize to local Binder structures. > > It should be possible to write a Coccinelle script to identify all these > cases. (i.e. a single path from struct declaration to copy_from_user()) > and apply the changes automatically. This script could be checked into > scripts/coccinelle/ to help keep these markings in sync... I wonder whether it would instead be reasonable to define a helper macro for "copy object from userspace to the kernel", and then use this macro. Something like this: #define copy_object_from_user(objp, src) ({ \ __attribute__((uninitialized)) typeof(*(objp)) __buf; \ copy_from_user(&__buf, (src), sizeof(*(objp))); \ *(objp) = __buf; \ }) void blah(unsigned long user_addr) { struct foo stackobj; copy_object_from_user(&stackobj, user_addr); do_stuff(&stackobj); } Unfortunately, clang runs a MemCpy optimization pass before the Dead Store Elimination pass, which makes the copy_from_user() go directly to `stackobj` instead of __buf before DSE has had a chance to get rid of the first memcpy(). Grrr... But looking at the list of passes that LLVM runs, we can see that between the MemCpy optimization and the DSE pass, we have Bit-Tracking Dead Store Elimination... okay, fine, so we hack together some code such that it contains a fake branch that is only resolved between MemCpy and DSE: unsigned long blub, blab; unsigned long get_blub(void) { return blub & 5; } #define copy_object_from_user(objp, src) ({ \ __attribute__((uninitialized)) typeof(*(objp)) __buf; \ *(char*)&__buf = 0; \ copy_from_user(&__buf, (src), sizeof(*(objp))); \ int x = get_blub(); int y = blab & 10;\ if ((x & y) == 0) { \ *(objp) = __buf; \ } \ }) But it still doesn't work, even though the IR looks like DSE ought to work: *** IR Dump After Value Propagation *** ; Function Attrs: nounwind uwtable define dso_local void @blah(i64) local_unnamed_addr #1 { %2 = alloca %struct.foo, align 4 %3 = alloca %struct.foo, align 4 %4 = bitcast %struct.foo* %2 to i8* call void @llvm.lifetime.start.p0i8(i64 1008, i8* nonnull %4) #4 call void @llvm.memset.p0i8.i64(i8* nonnull align 4 %4, i8 -86, i64 1008, i1 false) %5 = bitcast %struct.foo* %3 to i8* call void @llvm.lifetime.start.p0i8(i64 1008, i8* nonnull %5) #4 store i8 0, i8* %5, align 4, !tbaa !6 call void @copy_from_user(i8* nonnull %5, i64 %0, i64 1008) #4 call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 %4, i8* nonnull align 4 %5, i64 1008, i1 false), !tbaa.struct !7 call void @llvm.lifetime.end.p0i8(i64 1008, i8* nonnull %5) #4 call void @do_stuff(%struct.foo* nonnull %2) #4 call void @llvm.lifetime.end.p0i8(i64 1008, i8* nonnull %4) #4 ret void } *** IR Dump After Dead Store Elimination *** ; Function Attrs: nounwind uwtable define dso_local void @blah(i64) local_unnamed_addr #1 { %2 = alloca %struct.foo, align 4 %3 = alloca %struct.foo, align 4 %4 = bitcast %struct.foo* %2 to i8* call void @llvm.lifetime.start.p0i8(i64 1008, i8* nonnull %4) #4 call void @llvm.memset.p0i8.i64(i8* nonnull align 4 %4, i8 -86, i64 1008, i1 false) %5 = bitcast %struct.foo* %3 to i8* call void @llvm.lifetime.start.p0i8(i64 1008, i8* nonnull %5) #4 store i8 0, i8* %5, align 4, !tbaa !6 call void @copy_from_user(i8* nonnull %5, i64 %0, i64 1008) #4 call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 %4, i8* nonnull align 4 %5, i64 1008, i1 false), !tbaa.struct !7 call void @llvm.lifetime.end.p0i8(i64 1008, i8* nonnull %5) #4 call void @do_stuff(%struct.foo* nonnull %2) #4 call void @llvm.lifetime.end.p0i8(i64 1008, i8* nonnull %4) #4 ret void } I guess maybe clang can't do DSE past a function call or something? We also can't trick the DSE pass using an empty "asm volatile" with an output-only memory operand, because the DSE pass can't optimize inline asm. Is there really no way to somehow hack this together inside a macro? :( _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel