On Mon, 2023-08-07 at 09:13 +0200, Jo Van Bulck wrote: > On 03.08.23 05:58, Huang, Kai wrote: > > Putting aside whether we should consider building the selftests using "-Os", it > > would be helpful to explain how can the "-Os" break the existing code, so that > > we can review why this fix is reasonable. Perhaps it's obvious to others but > > it's not obvious to me what can go wrong here. > > I dug deeper into this and the real problem here is that the enclave ELF > is linked with -static but also needs to be relocatable, which, in my > understanding, is not what -static is meant for (i.e., the man pages > says -static overrides -pie). Particularly, with -static I noticed that > global variables are hard-coded assuming the ELF is loaded at base > address zero. > > When reading more on this, it seems like the proper thing to do for > static and relocatable binaries is to compile with -static-pie, which is > added since gcc 8 [1] (and similarly supported by clang). Thanks for analysing! > > As a case in point, to hopefully make this clearer, consider the > following C function: > > extern uint8_t __enclave_base; > void *get_base(void) { > return &__enclave_base; > } > > Compiling with -static and -fPIC hard codes the __enclave_base symbol to > zero (the start of the ELF enclave image): > > 00000000000023f4 <get_base>: > 23f4: f3 0f 1e fa endbr64 > 23f8: 55 push %rbp > 23f9: 48 89 e5 mov %rsp,%rbp > 23fc: 48 c7 c0 00 00 00 00 mov $0x0,%rax > 2403: 5d pop %rbp > 2404: c3 ret > > Compiling with -static-pie and -fPIE properly emits a RIP-relative address: > > 00000000000023f4 <get_base>: > 23f4: f3 0f 1e fa endbr64 > 23f8: 55 push %rbp > 23f9: 48 89 e5 mov %rsp,%rbp > 23fc: 48 8d 05 fd db ff ff lea -0x2403(%rip),%rax # 0 > <__enclave_base> > 2403: 5d pop %rbp > 2404: c3 ret > > Now, the fact that it currently *happens* to work is a mere coincidence > of how the local encl_op_array initialization is compiled without > optimizations with -static -fPIC: > > 00000000000023f4 <encl_body>: > /* snipped */ > 2408: 48 8d 05 ec fe ff ff lea -0x114(%rip),%rax > # 22fb <do_encl_op_put_to_buf> > 240f: 48 89 45 b0 mov %rax,-0x50(%rbp) > 2413: 48 8d 05 18 ff ff ff lea -0xe8(%rip),%rax > # 2332 <do_encl_op_get_from_buf> > 241a: 48 89 45 b8 mov %rax,-0x48(%rbp) > 241e: 48 8d 05 44 ff ff ff lea -0xbc(%rip),%rax > # 2369 <do_encl_op_put_to_addr> > /* snipped */ > > When compiling with optimizations with -static -fPIC -Os, encl_op_array > is instead initialized with a prepared copy from .data: > > 00000000000021b5 <encl_body>: > /* snipped */ > 21bc: 48 8d 35 3d 2e 00 00 lea 0x2e3d(%rip),%rsi > # 5000 <encl_buffer+0x2000> > 21c3: 48 8d 7c 24 b8 lea -0x48(%rsp),%rdi > 21c8: b9 10 00 00 00 mov $0x10,%ecx > 21cd: f3 a5 rep movsl %ds:(%rsi),%es:(%rdi) > /* snipped */ How is the "prepared copy" prepared, exactly? Could you paste the relevant code here too? IMHO w/o it it's hard to tell whether the code could be wrong or not after relocating. > > Thus, in this optimized code, encl_op_array will have function pointers > that are *not* relocated. The compilation assumes the -static binary has > base address zero and is not relocatable: > > $ readelf -r test_encl.elf > > There are no relocations in this file. > > When compiling with -static-pie -PIE -Os, the same code is emitted *but* > the binary is relocatable: > > $ readelf -r test_encl.elf > > Relocation section '.rela.dyn' at offset 0x4000 contains 12 entries: > Offset Info Type Sym. Value Sym. Name > + Addend > # tcs1.ossa > 000000000010 000000000008 R_X86_64_RELATIVE 7000 > # tcs1.oentry > 000000000020 000000000008 R_X86_64_RELATIVE 21e3 > # tcs2.ossa > 000000001010 000000000008 R_X86_64_RELATIVE 8000 > # tcs2.oentry > 000000001020 000000000008 R_X86_64_RELATIVE 21e3 > # encl_op_array > 000000006000 000000000008 R_X86_64_RELATIVE 2098 > 000000006008 000000000008 R_X86_64_RELATIVE 20ae > 000000006010 000000000008 R_X86_64_RELATIVE 20c4 > 000000006018 000000000008 R_X86_64_RELATIVE 20d7 > 000000006020 000000000008 R_X86_64_RELATIVE 20ea > 000000006028 000000000008 R_X86_64_RELATIVE 203e > 000000006030 000000000008 R_X86_64_RELATIVE 2000 > 000000006038 000000000008 R_X86_64_RELATIVE 20ef > > Apparently, for static-pie binaries, glibc includes a > _dl_relocate_static_pie routine that will perform the relocations as > part of the startup [2,3]. > I am not sure whether all those 'rela.dyn' matters due to the reason you mentioned below ... > Since the enclave loading process is > different and glibc is not included, we cannot rely on these relocations > to be performed and we need to do them manually. > ... here. Those relocation table are not used by enclave builder anyway. Only ".tsc" ".text" and ".data" + some heap are built as enclave. > Note: the first 4 > symbols in the relocation table above are from the TCS initialization in > test_encl_bootstrap.S and should *not* be relocated (as these are > relative to enclave base as per SGX spec). I don't quite follow here. Per my understanding TCS pages can be any page within the enclave. I don't quite understand what does "relocated" mean here. > > Bottom line: the way I see it, the enclave should either ensure no > relocations are needed, or perform the relocations manually where > needed, or include a _dl_relocate_static_pie equivalent that parses the > .rela.dyn ELF section and patches all relocations (minus the TCS > symbols). Since the latter (most general) approach is likely going to > make the selftest enclave unnecessarily complex by including ELF parsing > etc, I propose to simply relocate the function-pointer table manually > (which is indeed the only place that needs relocations). I think we are kinda mixing two things together: 1) the "relocation" supported by the "non-enclave" normal case, where the compiler/linker generates the PIC code, and the loader does the "runtime" fixup for those in the "rela.dyn"; 2) the "relocation" for the enclave case, where the compiler/linker still generates the PIC code, but the "enclave loader" loads the enclave into a random virtual address of the process. Obviously the "enclave loader" (implemented in this selftests/sgx/...) isn't as powerful as the "real loader" in the normal case. In fact, reading the code, IIUC, it simply gathers ".tsc"/".text"/".data" sections from the ELF file (and plus some heap) and load them into the enclave. Now the important thing is: those sections are "contiguous" in the enclave. That means the kernel needs to build the enclave ELF file with those sections "contiguously" in the same order too as a single piece, and this single piece can work at any random address that the "enclave loader" loads the enclave. Any address fixing up due to different location of ".data"/".tsc" section at loading time cannot be generated. This should be the thing that we need to focus on. That being said, I think ideally there shouldn't be _ANY_ "rela.dyn" in the enclave ELF file. > > I will include code to properly compile the selftest enclave with > -static-pie as per above and relocate the function-pointer table in the > next patch revision. I agree we should use "-static-pie" + "-fPIE" (or "-fPIC" is also OK??). However I am yet not convinced the "relocate function-pointer" thing. If you can paste the relevant code it would be helpful. Or am I missing something big?