Hi! Sorry, I missed this originally because it got filed into my lkml archive and not kernel-hardening, but no one actually reads lkml directly, myself included -- it's mostly a thread archive. I'll update my filters, and I've added a handful of people to CC that might be interested in looking at this too. Here's the full email, I trimmed heavily since it's very detailed: https://lore.kernel.org/lkml/20200324215049.GA3710@xxxxxxxxxx/ On Tue, Mar 24, 2020 at 10:50:49PM +0100, Adam Zabrocki wrote: > Some curiosities which are interesting to point out: > > 1) Linus Torvalds in 2012 suspected that such 'overflow' might be possible. > You can read more about it here: > > https://www.openwall.com/lists/kernel-hardening/2012/03/11/4 > > 2) Solar Designer in 1999(!) was aware about the problem that 'exit_signal' can > be abused. The kernel didn't protect it at all at that time. So he came up > with the idea to introduce those two counters to deal with that problem. > Originally, these counters were defined as "long long" type. However, during > the revising between September 14 and September 16, 1999 he switched from > "long long" to "int" and introduced integer wraparound handling. His patches > were merged to the kernel 2.0.39 and 2.0.40. > > 3) It is worth to read the Solar Designer's message during the discussion about > the fix for the problem CVE-2012-0056 (I'm referencing this problem later in > that write-up about "Problem II"): > > https://www.openwall.com/lists/kernel-hardening/2012/03/11/12 There was some effort made somewhat recently to get this area fixed: https://lore.kernel.org/linux-fsdevel/1474663238-22134-3-git-send-email-jann@xxxxxxxxx/ Nothing ultimately landed, but it's worth seeing if we could revitalize interest. Part of Jann's series was also related to fixing issues with cred_guard_mutex, which is getting some traction now too: https://lore.kernel.org/lkml/AM6PR03MB5170938306F22C3CF61CC573E4CD0@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > In short, if you hold the file descriptor open over an execve() (e.g. share it > with child) the old VM is preserved (refcounted) and might be never released. > Essentially, mother process' VM will be still in memory (and pointer to it is > valid) even if the mother process passed an execve(). > This is some kind of 'memory leak' scenario. I did a simple test where process > open /proc/self/maps file and calls clone() with CLONE_FILES flag. Next mother > 'overwrite' itself by executing SUID binary (doesn't need to be SUID), and child > was still able to use the original file descriptor - it's valid. It'd be worth exploring where the resource counting is happening for this. I haven't looked to see how much of the VM stays in kernel memory in this situation. It probably wouldn't be hard to count it against an rlimit or something. Thanks for the details! I hope someone will have time to look into this. It's a bit of a "long timeframe attack", so it's not gotta a lot of priority (obviously). :) -Kees -- Kees Cook