On Thu, Jun 27, 2024 at 4:00 PM Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote: > > Andrii Nakryiko <andrii@xxxxxxxxxx> writes: > > > The need to get ELF build ID reliably is an important aspect when > > dealing with profiling and stack trace symbolization, and > > /proc/<pid>/maps textual representation doesn't help with this. > > > > To get backing file's ELF build ID, application has to first resolve > > VMA, then use it's start/end address range to follow a special > > /proc/<pid>/map_files/<start>-<end> symlink to open the ELF file (this > > is necessary because backing file might have been removed from the disk > > or was already replaced with another binary in the same file path. > > > > Such approach, beyond just adding complexity of having to do a bunch of > > extra work, has extra security implications. Because application opens > > underlying ELF file and needs read access to its entire contents (as far > > as kernel is concerned), kernel puts additional capable() checks on > > following /proc/<pid>/map_files/<start>-<end> symlink. And that makes > > sense in general. > > I was curious about this statement. It has still certainly potential > for side channels e.g. for files that are execute only, or with > some other special protection. > > But actually just looking at the parsing code it seems to fail basic > TOCTTOU rules, and since you don't check if the VMA mapping is executable > (I think), so there's no EBUSY checking for writes, it likely is exploitable. > > > /* only supports phdr that fits in one page */ > if (ehdr->e_phnum > > (PAGE_SIZE - sizeof(Elf64_Ehdr)) / sizeof(Elf64_Phdr)) > <---------- check in memory > return -EINVAL; > > phdr = (Elf64_Phdr *)(page_addr + sizeof(Elf64_Ehdr)); > > <---- but page is shared in the page cache. So if anybody manages to map > it for write > > > for (i = 0; i < ehdr->e_phnum; ++i) { <----- this loop can go > off into the next page. > if (phdr[i].p_type == PT_NOTE && > !parse_build_id(page_addr, build_id, size, > page_addr + phdr[i].p_offset, > phdr[i].p_filesz)) > return 0; > > Here's an untested patch > > Yep, makes sense. I'm currently reworking this whole lib/buildid.c implementation to remove all the restrictions on data being in the first page only, and making it work in a faultable context more reliably. I can audit the code for TOCTOU issues and incorporate your feedback. I'll probably post the patch set next week, will cc you as well. > diff --git a/lib/buildid.c b/lib/buildid.c > index 7954dd92e36c..6c022fcd03ec 100644 > --- a/lib/buildid.c > +++ b/lib/buildid.c > @@ -72,19 +72,20 @@ static int get_build_id_32(const void *page_addr, unsigned char *build_id, > Elf32_Ehdr *ehdr = (Elf32_Ehdr *)page_addr; > Elf32_Phdr *phdr; > int i; > + unsigned phnum = READ_ONCE(ehdr->e_phnum); > > /* only supports phdr that fits in one page */ > - if (ehdr->e_phnum > > + if (phnum > > (PAGE_SIZE - sizeof(Elf32_Ehdr)) / sizeof(Elf32_Phdr)) > return -EINVAL; > > phdr = (Elf32_Phdr *)(page_addr + sizeof(Elf32_Ehdr)); > > - for (i = 0; i < ehdr->e_phnum; ++i) { > + for (i = 0; i < phnum; ++i) { > if (phdr[i].p_type == PT_NOTE && > !parse_build_id(page_addr, build_id, size, > page_addr + phdr[i].p_offset, > - phdr[i].p_filesz)) > + READ_ONCE(phdr[i].p_filesz))) > return 0; > } > return -EINVAL; > @@ -97,15 +98,16 @@ static int get_build_id_64(const void *page_addr, unsigned char *build_id, > Elf64_Ehdr *ehdr = (Elf64_Ehdr *)page_addr; > Elf64_Phdr *phdr; > int i; > + unsigned phnum = READ_ONCE(ehdr->e_phnum); > > /* only supports phdr that fits in one page */ > - if (ehdr->e_phnum > > + if (phnum > > (PAGE_SIZE - sizeof(Elf64_Ehdr)) / sizeof(Elf64_Phdr)) > return -EINVAL; > > phdr = (Elf64_Phdr *)(page_addr + sizeof(Elf64_Ehdr)); > > - for (i = 0; i < ehdr->e_phnum; ++i) { > + for (i = 0; i < phnum; ++i) { > if (phdr[i].p_type == PT_NOTE && > !parse_build_id(page_addr, build_id, size, > page_addr + phdr[i].p_offset,