On Tue, Feb 25, 2020 at 9:14 AM Jann Horn <jannh@xxxxxxxxxx> wrote: > > On Tue, Feb 25, 2020 at 5:18 AM Kees Cook <keescook@xxxxxxxxxxxx> wrote: > > On Mon, Feb 24, 2020 at 04:35:00PM +0100, glider@xxxxxxxxxx wrote: > > > Certain copy_from_user() invocations in binder.c are known to > > > unconditionally initialize locals before their first use, like e.g. in > > > the following case: > > > > > > struct binder_transaction_data tr; > > > if (copy_from_user(&tr, ptr, sizeof(tr))) > > > return -EFAULT; > > > > > > In such cases enabling CONFIG_INIT_STACK_ALL leads to insertion of > > > redundant locals initialization that the compiler fails to remove. > > > To work around this problem till Clang can deal with it, we apply > > > __do_not_initialize to local Binder structures. > > > > It should be possible to write a Coccinelle script to identify all these > > cases. (i.e. a single path from struct declaration to copy_from_user()) > > and apply the changes automatically. This script could be checked into > > scripts/coccinelle/ to help keep these markings in sync... > > I wonder whether it would instead be reasonable to define a helper > macro for "copy object from userspace to the kernel", and then use > this macro. Something like this: I really like this idea, but I think hacking something based on the order of LLVM passes is too fragile and can also lead to suboptimal code generation with GCC. Maybe we can pull the variable declaration together with __attribute__((uninitialized)) inside this macro instead? > #define copy_object_from_user(objp, src) ({ \ > __attribute__((uninitialized)) typeof(*(objp)) __buf; \ > copy_from_user(&__buf, (src), sizeof(*(objp))); \ > *(objp) = __buf; \ > }) > > void blah(unsigned long user_addr) { > struct foo stackobj; > copy_object_from_user(&stackobj, user_addr); > do_stuff(&stackobj); > } > > Unfortunately, clang runs a MemCpy optimization pass before the Dead > Store Elimination pass, which makes the copy_from_user() go directly > to `stackobj` instead of __buf before DSE has had a chance to get rid > of the first memcpy(). Grrr... > > But looking at the list of passes that LLVM runs, we can see that > between the MemCpy optimization and the DSE pass, we have Bit-Tracking > Dead Store Elimination... okay, fine, so we hack together some code > such that it contains a fake branch that is only resolved between > MemCpy and DSE: > > unsigned long blub, blab; > unsigned long get_blub(void) { return blub & 5; } > #define copy_object_from_user(objp, src) ({ \ > __attribute__((uninitialized)) typeof(*(objp)) __buf; \ > *(char*)&__buf = 0; \ > copy_from_user(&__buf, (src), sizeof(*(objp))); \ > int x = get_blub(); int y = blab & 10;\ > if ((x & y) == 0) { \ > *(objp) = __buf; \ > } \ > }) > > But it still doesn't work, even though the IR looks like DSE ought to work: > > *** IR Dump After Value Propagation *** > ; Function Attrs: nounwind uwtable > define dso_local void @blah(i64) local_unnamed_addr #1 { > %2 = alloca %struct.foo, align 4 > %3 = alloca %struct.foo, align 4 > %4 = bitcast %struct.foo* %2 to i8* > call void @llvm.lifetime.start.p0i8(i64 1008, i8* nonnull %4) #4 > call void @llvm.memset.p0i8.i64(i8* nonnull align 4 %4, i8 -86, i64 > 1008, i1 false) > %5 = bitcast %struct.foo* %3 to i8* > call void @llvm.lifetime.start.p0i8(i64 1008, i8* nonnull %5) #4 > store i8 0, i8* %5, align 4, !tbaa !6 > call void @copy_from_user(i8* nonnull %5, i64 %0, i64 1008) #4 > call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 %4, i8* > nonnull align 4 %5, i64 1008, i1 false), !tbaa.struct !7 > call void @llvm.lifetime.end.p0i8(i64 1008, i8* nonnull %5) #4 > call void @do_stuff(%struct.foo* nonnull %2) #4 > call void @llvm.lifetime.end.p0i8(i64 1008, i8* nonnull %4) #4 > ret void > } > *** IR Dump After Dead Store Elimination *** > ; Function Attrs: nounwind uwtable > define dso_local void @blah(i64) local_unnamed_addr #1 { > %2 = alloca %struct.foo, align 4 > %3 = alloca %struct.foo, align 4 > %4 = bitcast %struct.foo* %2 to i8* > call void @llvm.lifetime.start.p0i8(i64 1008, i8* nonnull %4) #4 > call void @llvm.memset.p0i8.i64(i8* nonnull align 4 %4, i8 -86, i64 > 1008, i1 false) > %5 = bitcast %struct.foo* %3 to i8* > call void @llvm.lifetime.start.p0i8(i64 1008, i8* nonnull %5) #4 > store i8 0, i8* %5, align 4, !tbaa !6 > call void @copy_from_user(i8* nonnull %5, i64 %0, i64 1008) #4 > call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 4 %4, i8* > nonnull align 4 %5, i64 1008, i1 false), !tbaa.struct !7 > call void @llvm.lifetime.end.p0i8(i64 1008, i8* nonnull %5) #4 > call void @do_stuff(%struct.foo* nonnull %2) #4 > call void @llvm.lifetime.end.p0i8(i64 1008, i8* nonnull %4) #4 > ret void > } > > I guess maybe clang can't do DSE past a function call or something? DSE only works within a basic block, which might actually be a problem with the real copy_from_user(), which expands to a call to _copy_from_user() and a branch. > We also can't trick the DSE pass using an empty "asm volatile" with an > output-only memory operand, because the DSE pass can't optimize inline > asm. This can be handled by something like https://reviews.llvm.org/D74853 (that's a WIP, still breaks the kernel) Not sure though, how good this is compared to a function attribute. _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel