On Sun, May 10, 2020 at 10:16:25AM +0800, wzt wzt wrote: > This is a new kernel harden project called hksp(huawei kernel self > protection), hope some of the mitigation ideas may help you, thanks. > patch: https://github.com/cloudsec/hksp Hi, Thanks for letting people know about this! I took a quick look through the patch and I think there are some interesting bits in there. As Greg mentioned, it would probably be worth splitting these changes out into separate patches with full commit logs, documentation, etc. Then it'll be easier to follow the patch submission process[1]. And, similarly, please send individual patches inline so people can comment on them directly. As you've mentioned, this code is still in early stages, but I have two general observations that might help direct further work: - Some features are x86-only. What do you think about extending those features to other architectures? - Many of the features are notification-only, in that once a bad situation is detected, it will only perform a printk() and do no mitigation. It would be best to add situation-specific mitigations (though note that the kernel avoids using BUG()[2]). I have some other notes below... [1] https://www.kernel.org/doc/html/latest/process/submitting-patches.html [2] https://www.kernel.org/doc/html/latest/process/deprecated.html#bug-and-bug-on > ============================= > Huawei kernel self protection > ============================= > > Cred guard > ---------- > - random cred's magic. I think this would be easy to upstream. Perhaps note that this is a per-boot random value now. And please don't printk the value[3]: https://github.com/cloudsec/hksp/blob/master/hksp.patch#L986 - you could just redefine CRED_MAGIC to be the global instead of changing the code that uses CRED_MAGIC. - the global credit magic should likely not live in init_cred, since that takes up space in every task, and gets copied all over the place. I think a single global variable would be better. > most kernel exploit try to find some offsets in struct cred, > but it depends on CONFIG_DEBUG_CREDENTIALS, then need to compute > the right offset by that kernel config, so mostly the exploit code > is something like that: > if (tmp0 == 0x43736564 || tmp0 == 0x44656144) > i += 4; In this feature's description, please include a note that it changes the effort needed by an attacker to needing an additional memory content exposure to mount such a cred attack now. (i.e. they need to read a valid cred magic to copy during a cred write attack.) > - detect shellcode like: > commit_creds(prepare_kernel_cred(0)); > the common kernel code is never write like that. Is this meant to refer to this code? https://github.com/cloudsec/hksp/blob/master/hksp.patch#L1000 This only appears to check that prepare_kernel_cred() was called from kernel space? Given SMEP and KPTI (which has emulated SMEP), is this check useful on x86? > Namespace Guard > --------------- > This feature detects pid namespace escape via kernel exploits. > The current public method to bypass namespace is hijack init_nsproxy > to current process: > switch_task_namespaces_p(current, init_nsproxy_p); > commit_creds(prepare_kernel_cred(0)); This is check_pid_ns()? This appears to only get checked on process death? What's your plan for this check? https://github.com/cloudsec/hksp/blob/master/hksp.patch#L1140 > Rop stack pivot > -------------- > - user process stack can't be is mmap area. > - check kernel stack range at each system call ret. > the rsp pointer can point below __PAGE_OFFSET. This is check_stack_pivot()? Same question about only being checked on process death. Also, why are certain uid ranges excluded from this check? https://github.com/cloudsec/hksp/blob/master/hksp.patch#L1043 Also, I think this check needs a lot more specialization because lots of multithreaded application intentionally put their stacks into the mmap range. Perhaps a per-process flag that indicated the process's expectation about its thread stacks? There is a lot of history here in attempts to track userspace stacks. You can see it in commits like these that tried to track it and how they had to be reverted: https://git.kernel.org/linus/b76437579d13 https://git.kernel.org/linus/65376df58217 https://git.kernel.org/linus/b18cb64ead40 Having something that better attempted to track what a process has done might be more sensible? (i.e. if it has never called clone(), maybe it can have a strict stack check, otherwise, something else?) > > Slub harden > ----------- > - redzone/poison randomization. This is interesting -- are production systems using redzoning? Regardless, it seems like a reasonable change to provide. Though, again, please don't print the value. ;) https://github.com/cloudsec/hksp/blob/master/hksp.patch#L1611 > - double free enhance. > old slub can only detect continuous double free bugs. > kfree(obj1) > kfree(obj1) > > hksp can detect no continuous double/multi free bugs. > kfree(obj1) > kfree(obj2) > kfree(obj1) > > or > > kfree(obj1) > kfree(obj2) > kfree(obj3) > kfree(obj1) Is this the code? https://github.com/cloudsec/hksp/blob/master/hksp.patch#L1377 (Please don't use panic() -- a better attempt to mitigate is needed) This check appears to be pretty expensive -- it's walking all objects looking for a duplicate? > - clear the next object address information when using kmalloc function. Is this the code? https://github.com/cloudsec/hksp/blob/master/hksp.patch#L1589 What's the benefit of this change? It looks pretty inexpensive; could it be done by default? > Proc info leak > -------------- > Protect important file with no read access for non root user. > set /proc/{modules,keys,key-users}, > /proc/sys/kernel/{panic,panic_on_oops,dmesg_restrict,kptr_restrict,keys}, > /proc/sys/vm/{mmap_min_addr} as 0640. These seem like general changes that could be broken out. I don't see why there would be much objection to making these less discoverable. (Though I would point out that most attackers are going to assume all these features, etc, are enabled.) > Aslr hardended > -------------- > User stack aslr enhanced. > Old user process's stack is between 0-1G on 64bit. > the actually random range is 0-2^24. > we introduce STACK_RND_BITS to control the range dynamically. > > echo "24" > /proc/sys/vm/stack_rnd_bits > > we also randomize the space between elf_info and environ. > And randomize the space between stack and elf_info. This needs some pretty extensive checking and validation that there will be no memory range collisions. Past changes to ASLR ranges have caused hard-to-find on-execve() crashes with some unlucky randomization, etc. But otherwise, yeah, I think it'd be really nice to have this be dynamic as was done with the mmap ASLR. > Ptrace hardened > --------------- > Disallow attach to non child process. > This can prevent process memory inject via ptrace. Hmm, this looks like duplication of the existing Yama LSM controls? What's different? > Sm*p hardened > ------------- > Check smap&smep when return from kernel space via a syscall, > this can detect some kernel exploit code to bypass smap & smep > feature via rop attack technology. I don't think this works correctly: you're checking the shadow, not the real CR4. This seems like it somewhat complements the existing pinning? https://git.kernel.org/linus/873d50d58f67ef15d2777b5e7f7a5268bb1fbae2 I like the idea of doing this kind of sanity checking at process exit, though. It may be quite expensive, though (reading CR4 is slow). > Raw socket enhance > ------------------ > Enhance raw socket for ipv4 protocol. > - TCP data cannot be sent over raw sockets. > echo 1 > /proc/sys/net/ipv4/raw_tcp_disabled > - UDP datagrams with an invalid source address cannot be sent > over raw sockets. The IP source address for any outgoing UDP > datagram must exist on a network interface or the datagram is > dropped. This change was made to limit the ability of malicious > code to create distributed denial-of-service attacks and limits > the ability to send spoofed packets (TCP/IP packets with a forged > source IP address). > echo 1 > /proc/sys/net/ipv4/raw_udp_verify > - A call to the bind function with a raw socket for the IPPROTO_TCP > protocol is not allowed. > echo 1 > /proc/sys/net/ipv4/raw_bind_disabled These all seem pretty reasonable, though I suspect there will be a lot of push-back under the expectation that there are already controls in place to avoid these kinds of things (e.g. access to raw sockets is already restricted). Regardless, I would break these up into individual patches with justifications and rationales. > Kernel self guard > ----------------- > Ksguard is an anti rootkit tool on kernel level. > Currently it can detect 4 types of kernel rootkits, > These are the most popluar rootkits type on unix world. > > - keyboard notifer rootkits. > - netfilter hooks rootkits. > - tty sniffer rootkits and other DKOM(direct kernel object modify) rootkits. > - system call table hijack rootkits. These kinds of checks are extremely hard to justify in upstream. Any system where standard kernel interfaces are being used to subvert the kernel are basically impossible to defend against. If something is loading malicious kernel modules, there's virtually no way to defend the kernel. A better solution is signed kernel modules, etc. > Install: > /sbin/insmod /lib/modules/5.6.7/kernel/security/ksguard/ksguard.ko > > Feature: > Detect keyboard notifer rootkits: > echo "1" > /proc/ksguard/state > > Detect netfilter hooks rootkits: > echo "2" > /proc/ksguard/state > > Detect tty sniffer rootkits: > echo "3" > /proc/ksguard/state > > Detect syscall table pointer: > echo "4" > /proc/ksguard/state If there is a rationale that isn't solved with kernel module signing, this will need to be spelled out clearly. (Similarly, if this is a post-exploitation forensics tool, again, the attacker can just change how modules are loaded, etc.) > Arbitrary code guard > -------------------- > we extended the libc personality() to support: > - mmap can't memory with PROT_WRITE|PROT_EXEC. > - mprtect can't change PROT_WRITE to PROT_EXEC. How does this compare to SARA? https://lore.kernel.org/kernel-hardening/1562410493-8661-1-git-send-email-s.mesoraca16@xxxxxxxxx/ I do like the idea of a simple W^X protection in userspace, almost like the PR_SET_NO_NEW_PRIVS flag that will allow a process to declare that it is not a JIT, etc. > Code integrity guard > -------------------- > To support certificate for user process execve. > it can prevent some internet explorer to load > third party so librarys. This seems like it could be implemented with the IMA LSM? > Hide symbol > ----------- > Hide symbols from /proc/kallsyms. Can you describe how this complements kptr_restrict and the %p hashing? This removes a symbol entirely from the list, but I'm curious what specific benefit that provides above the other mentioned features? Which symbols would you suggest get added to this list? (Should there be a starting baseline block list?) I look forward to further versions; thanks again for sending this! -- Kees Cook