The attached program is a benchmark to iterate avc_has_perm_noaudit() in userspace. In the case when it is not reasonable to launch background worker process/thread, we need to check the status of in-kernel selinux before and after avc lookups to ensure the cached avc entries are still valid and queries to selinux were done atomically. If we use netlink socket to check the status of selinux. [kaigai@saba ~]$ gcc status_test.c -l selinux [kaigai@saba ~]$ time -p ./a.out 1 real 65.44 user 23.91 sys 41.48 It we use the new /selinux/status with mmap(2) [kaigai@saba ~]$ time -p ./a.out 0 real 4.71 user 4.68 sys 0.02 Although it loops 1,000,000 times, but it is not a corner case in RDBMS with row-level security on large tables, or key-value-store that handles massive number of requests from web applications. Well, the /selinux/status is strongly desired feature from the viewpoint of userspace object manager. If I need to revise the patch something, please point out it. Thanks, (2010/09/07 9:03), KaiGai Kohei wrote: > Do we have any issues to be revised about this patch? > If nothing to fix any more, please include this feature toward the next > merge window. > > It shall provide a key feature to implement a security server in userspace > application as loadable module, without expensive system-calls and invasive > patches. At least, PostgreSQL and Memcached need it. :-) > > Thanks, > > (2010/09/02 17:16), KaiGai Kohei wrote: >> (2010/08/28 12:24), KaiGai Kohei wrote: >>> (2010/08/28 1:19), Eric Paris wrote: >>>> On Fri, Aug 27, 2010 at 11:48 AM, Eric Paris<eparis@xxxxxxxxxxxxxx> wrote: >>>>> 2010/8/27 KaiGai Kohei<kaigai@xxxxxxxxxxxxx>: >>>>>> I revised the /selinux/status implementation. >>>>>> >>>>>> * It becomes to report 'deny_unknown'. Userspace object manager >>>>>> also reference this flag to decide its behavior when the loaded >>>>>> policy does not support expected object classes. >>>>>> * It provided PAGE_READONLY to remap_pfn_range() as page protection >>>>>> flag independent from argument of mmap(2), but it was uncommon. >>>>>> I fixed to pass vma->vm_page_prot instead of the hardwired flag >>>>>> according to any other implementation style. >>>>>> Now it returns an error, if user tries to map /selinux/status as >>>>>> writable pages. >>>>> >>>>> I really hate blowing 4k of memory on every system to show 40 bytes of >>>>> data on just a few systems. Is there any change we could allocate the >>>>> page the first time it is needed rather that at boot? I know compared >>>>> to the size of policy and other memory usage in SELinux it's odd for >>>>> me to complain, but I've decided to get on a reduction if possible >>>>> kick. >>>>> >>>>> Only other comment is that __initcall() is deprecated and we are >>>>> supposed to use device_initcall() now. >>>>> >>>>> If you plan to use it, I'll ack if you change both of those things.... >>>> >>>> actually if you move to dynamic allocation of the status page and use >>>> static DEFINE_SPINLOCK instead of static spinlock_t you can get rid of >>>> the __init() code altogether.... >>>> >>> >>> I revised the patch. >>> It was changed the selinux_kernel_page being allocated at the first time >>> when application tries to reference the /selinux/status. >>> At the same time, it declares selinux_status_lock using DEFINE_MUTEX(), >>> so whole of the __init section has gone. >>> >>> In addition, I changed first member of the selinux_kernel_status from >>> 'length' to 'version', because sizeof(struct ...) is aligned to 64bit >>> boundary (24bytes) on x86_64 system, although it is actually 20bytes. >>> If we want to add a 32bit member in the future, 'length' may not inform >>> applications enough. >>> >> How about getting the feature? >> Although I've not found out this idea for a long time, it is quite helpful >> feature to implement SE-PostgreSQL (and other upcoming userspace object >> managers) in less invasive way. >> >> I fixed up two minor points in the patch, as follows: >> * The 4K of status page becomes allocated at the file_operations::open() >> method, because it seems to me a bit unnatural that either read() or >> mmap() fails due to memory allocation error. >> * I forgot to eliminate an unnecessary declaration of extern variable. >> >> Signed-off-by: KaiGai Kohei<kaigai@xxxxxxxxxxxxx> >> -- >> security/selinux/include/security.h | 21 ++++++ >> security/selinux/selinuxfs.c | 56 ++++++++++++++ >> security/selinux/ss/Makefile | 2 +- >> security/selinux/ss/services.c | 3 + >> security/selinux/ss/status.c | 135 +++++++++++++++++++++++++++++++++++ >> 5 files changed, 216 insertions(+), 1 deletions(-) >> >> diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h >> index 1f7c249..e390e31 100644 >> --- a/security/selinux/include/security.h >> +++ b/security/selinux/include/security.h >> @@ -191,5 +191,26 @@ static inline int security_netlbl_sid_to_secattr(u32 sid, >> >> const char *security_get_initial_sid_context(u32 sid); >> >> +/* >> + * status notifier using mmap interface >> + */ >> +extern struct page *selinux_kernel_status_page(void); >> + >> +#define SELINUX_KERNEL_STATUS_VERSION 1 >> +struct selinux_kernel_status >> +{ >> + u32 version; /* version number of thie structure */ >> + u32 sequence; /* sequence number of seqlock logic */ >> + u32 enforcing; /* current setting of enforcing mode */ >> + u32 policyload; /* times of policy reloaded */ >> + u32 deny_unknown; /* current setting of deny_unknown */ >> + /* >> + * The version> 0 supports above members. >> + */ >> +} __attribute__((packed)); >> + >> +extern void selinux_status_update_setenforce(int enforcing); >> +extern void selinux_status_update_policyload(int seqno); >> + >> #endif /* _SELINUX_SECURITY_H_ */ >> >> diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c >> index 79a1bb6..a2e7a85 100644 >> --- a/security/selinux/selinuxfs.c >> +++ b/security/selinux/selinuxfs.c >> @@ -110,6 +110,7 @@ enum sel_inos { >> SEL_COMPAT_NET, /* whether to use old compat network packet controls */ >> SEL_REJECT_UNKNOWN, /* export unknown reject handling to userspace */ >> SEL_DENY_UNKNOWN, /* export unknown deny handling to userspace */ >> + SEL_STATUS, /* export current status using mmap() */ >> SEL_INO_NEXT, /* The next inode number to use */ >> }; >> >> @@ -171,6 +172,7 @@ static ssize_t sel_write_enforce(struct file *file, const char __user *buf, >> if (selinux_enforcing) >> avc_ss_reset(0); >> selnl_notify_setenforce(selinux_enforcing); >> + selinux_status_update_setenforce(selinux_enforcing); >> } >> length = count; >> out: >> @@ -205,6 +207,59 @@ static const struct file_operations sel_handle_unknown_ops = { >> .llseek = generic_file_llseek, >> }; >> >> +static int sel_open_handle_status(struct inode *inode, struct file *filp) >> +{ >> + struct page *status = selinux_kernel_status_page(); >> + >> + if (!status) >> + return -ENOMEM; >> + >> + filp->private_data = status; >> + >> + return 0; >> +} >> + >> +static ssize_t sel_read_handle_status(struct file *filp, char __user *buf, >> + size_t count, loff_t *ppos) >> +{ >> + struct page *status = filp->private_data; >> + >> + BUG_ON(!status); >> + >> + return simple_read_from_buffer(buf, count, ppos, >> + page_address(status), >> + sizeof(struct selinux_kernel_status)); >> +} >> + >> +static int sel_mmap_handle_status(struct file *filp, >> + struct vm_area_struct *vma) >> +{ >> + struct page *status = filp->private_data; >> + unsigned long size = vma->vm_end - vma->vm_start; >> + >> + BUG_ON(!status); >> + >> + /* only allows one page from the head */ >> + if (vma->vm_pgoff> 0 || size != PAGE_SIZE) >> + return -EIO; >> + /* disallow writable mapping */ >> + if (vma->vm_flags& VM_WRITE) >> + return -EPERM; >> + /* disallow mprotect() turns it into writable */ >> + vma->vm_flags&= ~VM_MAYWRITE; >> + >> + return remap_pfn_range(vma, vma->vm_start, >> + page_to_pfn(status), >> + size, vma->vm_page_prot); >> +} >> + >> +static const struct file_operations sel_handle_status_ops = { >> + .open = sel_open_handle_status, >> + .read = sel_read_handle_status, >> + .mmap = sel_mmap_handle_status, >> + .llseek = generic_file_llseek, >> +}; >> + >> #ifdef CONFIG_SECURITY_SELINUX_DISABLE >> static ssize_t sel_write_disable(struct file *file, const char __user *buf, >> size_t count, loff_t *ppos) >> @@ -1612,6 +1667,7 @@ static int sel_fill_super(struct super_block *sb, void *data, int silent) >> [SEL_CHECKREQPROT] = {"checkreqprot",&sel_checkreqprot_ops, S_IRUGO|S_IWUSR}, >> [SEL_REJECT_UNKNOWN] = {"reject_unknown",&sel_handle_unknown_ops, S_IRUGO}, >> [SEL_DENY_UNKNOWN] = {"deny_unknown",&sel_handle_unknown_ops, S_IRUGO}, >> + [SEL_STATUS] = {"status",&sel_handle_status_ops, S_IRUGO}, >> /* last one */ {""} >> }; >> ret = simple_fill_super(sb, SELINUX_MAGIC, selinux_files); >> diff --git a/security/selinux/ss/Makefile b/security/selinux/ss/Makefile >> index 15d4e62..974e11c 100644 >> --- a/security/selinux/ss/Makefile >> +++ b/security/selinux/ss/Makefile >> @@ -5,5 +5,5 @@ >> EXTRA_CFLAGS += -Isecurity/selinux -Isecurity/selinux/include >> obj-y := ss.o >> >> -ss-y := ebitmap.o hashtab.o symtab.o sidtab.o avtab.o policydb.o services.o conditional.o mls.o >> +ss-y := ebitmap.o hashtab.o symtab.o sidtab.o avtab.o policydb.o services.o conditional.o mls.o status.o >> >> diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c >> index 9ea2fec..494ff52 100644 >> --- a/security/selinux/ss/services.c >> +++ b/security/selinux/ss/services.c >> @@ -1791,6 +1791,7 @@ int security_load_policy(void *data, size_t len) >> selinux_complete_init(); >> avc_ss_reset(seqno); >> selnl_notify_policyload(seqno); >> + selinux_status_update_policyload(seqno); >> selinux_netlbl_cache_invalidate(); >> selinux_xfrm_notify_policyload(); >> return 0; >> @@ -1870,6 +1871,7 @@ int security_load_policy(void *data, size_t len) >> >> avc_ss_reset(seqno); >> selnl_notify_policyload(seqno); >> + selinux_status_update_policyload(seqno); >> selinux_netlbl_cache_invalidate(); >> selinux_xfrm_notify_policyload(); >> >> @@ -2374,6 +2376,7 @@ out: >> if (!rc) { >> avc_ss_reset(seqno); >> selnl_notify_policyload(seqno); >> + selinux_status_update_policyload(seqno); >> selinux_xfrm_notify_policyload(); >> } >> return rc; >> diff --git a/security/selinux/ss/status.c b/security/selinux/ss/status.c >> new file mode 100644 >> index 0000000..eeab696 >> --- /dev/null >> +++ b/security/selinux/ss/status.c >> @@ -0,0 +1,135 @@ >> +/* >> + * mmap based event notifications for SELinux >> + * >> + * Author: KaiGai Kohei<kaigai@xxxxxxxxxxxxx> >> + * >> + * Copyright (C) 2010 NEC corporation >> + * >> + * This program is free software; you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License version 2, >> + * as published by the Free Software Foundation. >> + */ >> +#include<linux/kernel.h> >> +#include<linux/gfp.h> >> +#include<linux/mm.h> >> +#include<linux/mutex.h> >> +#include "avc.h" >> +#include "services.h" >> + >> +/* >> + * The selinux_status_page shall be exposed to userspace applications >> + * using mmap interface on /selinux/status. >> + * It enables to notify applications a few events that will cause reset >> + * of userspace access vector without context switching. >> + * >> + * The selinux_kernel_status structure on the head of status page is >> + * protected from concurrent accesses using seqlock logic, so userspace >> + * application should reference the status page according to the seqlock >> + * logic. >> + * >> + * Typically, application checks status->sequence at the head of access >> + * control routine. If it is odd-number, kernel is updating the status, >> + * so please wait for a moment. If it is changed from the last sequence >> + * number, it means something happen, so application will reset userspace >> + * avc, if needed. >> + * In addition, application should also checks the sequence number at >> + * tail of the access control routine. If it is changed from the value >> + * on the head, it means kernel status was changed under processing the >> + * routine. In this case, application should repeat to run the routine >> + * from head, but we expect it is much rare case. >> + * In most case, application can confirm the kernel status is not changed >> + * without any system call invocations. >> + * Hopefully, libselinux encapsulates this logic. >> + */ >> +static struct page *selinux_status_page = NULL; >> +static DEFINE_MUTEX(selinux_status_lock); >> + >> +/* >> + * selinux_kernel_status_page >> + * >> + * It returns a reference to selinux_status_page. If the status page is >> + * not allocated yet, it also tries to allocate it at the first time. >> + */ >> +struct page *selinux_kernel_status_page(void) >> +{ >> + struct selinux_kernel_status *status; >> + struct page *result = NULL; >> + >> + mutex_lock(&selinux_status_lock); >> + if (!selinux_status_page) >> + { >> + selinux_status_page = alloc_page(GFP_KERNEL|__GFP_ZERO); >> + if (selinux_status_page) >> + { >> + status = page_address(selinux_status_page); >> + >> + status->version = SELINUX_KERNEL_STATUS_VERSION; >> + status->sequence = 0; >> + status->enforcing = selinux_enforcing; >> + /* >> + * NOTE: the next policyload event shall set >> + * a positive value on the status->policyload, >> + * although it may not be 1, but never zero. >> + * So, application can know it was updated. >> + */ >> + status->policyload = 0; >> + status->deny_unknown = !security_get_allow_unknown(); >> + } >> + } >> + result = selinux_status_page; >> + mutex_unlock(&selinux_status_lock); >> + >> + return result; >> +} >> + >> +/* >> + * selinux_status_update_setenforce >> + * >> + * It updates status of the current enforcing/permissive mode. >> + */ >> +void selinux_status_update_setenforce(int enforcing) >> +{ >> + struct selinux_kernel_status *status; >> + >> + mutex_lock(&selinux_status_lock); >> + if (selinux_status_page) >> + { >> + status = page_address(selinux_status_page); >> + >> + status->sequence++; >> + smp_wmb(); >> + >> + status->enforcing = enforcing; >> + >> + smp_wmb(); >> + status->sequence++; >> + } >> + mutex_unlock(&selinux_status_lock); >> +} >> + >> +/* >> + * selinux_status_update_policyload >> + * >> + * It updates status of the times of policy reloaded, and current >> + * setting of deny_unknown. >> + */ >> +void selinux_status_update_policyload(int seqno) >> +{ >> + struct selinux_kernel_status *status; >> + >> + mutex_lock(&selinux_status_lock); >> + if (selinux_status_page) >> + { >> + status = page_address(selinux_status_page); >> + >> + status->sequence++; >> + smp_wmb(); >> + >> + status->policyload = seqno; >> + status->deny_unknown = !security_get_allow_unknown(); >> + >> + smp_wmb(); >> + status->sequence++; >> + } >> + mutex_unlock(&selinux_status_lock); >> +} >> > > -- KaiGai Kohei <kaigai@xxxxxxxxxxxxx>
#include <errno.h> #include <fcntl.h> #include <selinux/selinux.h> #include <selinux/avc.h> #include <stdio.h> #include <stdint.h> #include <stdlib.h> #include <string.h> #include <sys/mman.h> #include <sys/types.h> #include <sys/stat.h> #include <unistd.h> #define N_TRIAL 10000000 typedef struct { uint32_t version; uint32_t sequence; uint32_t enforcing; uint32_t policyload; uint32_t deny_unknown; } selinux_status_t; static uint32_t compat_seqno = 0; static int test_callback(int code) { compat_seqno++; return 0; } int main(int argc, const char *argv[]) { selinux_status_t *status; security_context_t scontext = "staff_u:staff_r:staff_t:s0"; security_context_t tcontext = "system_u:object_r:etc_t:s0"; security_id_t ssid, tsid; security_class_t tclass; struct av_decision avd; int compat, fd, i; if (argc < 2) { printf("usage: %s (1: compat| 0: mmap)\n", argv[0]); return 1; } compat = atoi(argv[1]); if (avc_open(NULL, 0) < 0) { printf("failed on avc_open() : %s\n", strerror(errno)); return 1; } /* * I think existing avc code is a bit rough, because it checks * netlink socket on the head of avc_has_perm_noaudit(), but * not on the tail of the function. It cannot ensure the kernel * status is not updated during the function. * So, we also need to check it after the avc lookups, then * it shall repeat from the head, if something updated. */ avc_netlink_acquire_fd(); if (compat) { union selinux_callback cb; /* * callback functions to update sequence number */ cb.func_setenforce = test_callback; selinux_set_callback(SELINUX_CB_SETENFORCE, cb); cb.func_policyload = test_callback; selinux_set_callback(SELINUX_CB_POLICYLOAD, cb); } else { /* * Open /selinux/status and mmap it. */ fd = open("/selinux/status", O_RDONLY); if (fd < 0) { printf("failed to open /selinux/status : %s\n", strerror(errno)); return 1; } status = mmap(NULL, sysconf(_SC_PAGESIZE), PROT_READ, MAP_SHARED, fd, 0); if (status == MAP_FAILED) { printf("failed to mmap : %s\n", strerror(errno)); return 1; } } /* * Start benchmarking */ for (i = 0; i < N_TRIAL; i++) { uint32_t seqno; repeat: if (compat) { avc_netlink_check_nb(); seqno = compat_seqno; } else { seqno = status->sequence; __sync_synchronize(); } tclass = string_to_security_class("file"); avc_context_to_sid_raw(scontext, &ssid); avc_context_to_sid_raw(tcontext, &tsid); avc_has_perm_noaudit(ssid, tsid, tclass, -1, NULL, &avd); if (compat) { avc_netlink_check_nb(); if (seqno != compat_seqno) goto repeat; } else { __sync_synchronize(); if (seqno != status->sequence) goto repeat; } } }