Yes indeed this is a stress test on ARM64 device with multicore where most of the cores /tasks are stuck in avc_reclaim_node . We still see this issue even after picking the earlier patch " selinux: ensure we cleanup the internal AVC counters on error in avc_insert() commit: d8db60cb23e4" Where selinux_state during issue was as below where all the slots are NULL and the count was more than threshold. Which seem to be calling avc_reclaim_node always and as the all the slots are empty its going for full for- loop with locks and unlock and taking too long . Not sure what could make the slots null , for sure its not due to flush() /Reset(). We think that still we need to call avc_kill_node in update_node function . Adding the patch below can you please review or correct the following patch . selinux_state = ( disabled = FALSE, enforcing = TRUE, checkreqprot = FALSE, initialized = TRUE, policycap = (TRUE, TRUE, TRUE, FALSE, FALSE, TRUE), avc = 0xFFFFFF9BEFF1E890 -> ( avc_cache_threshold = 512, /* <<<<<not configured and its with default*/ avc_cache = ( slots = ((first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first /*<<<< all are NULL */ slots_lock = ((rlock = (raw_lock = (val = (counter = 0), locked = 0, pending = 0, locked_pending = 0, tail = 0), magic = 3735899821, owner_cpu = 4294967295, owner = 0xFFFFFFFFFFFFFFFF, dep_map = (key = 0xFFFFFF9BEFF298A8, cla lru_hint = (counter = 616831529), active_nodes = (counter = 547), /*<<<<< increased more than 512*/ latest_notif = 1)), ss = 0xFFFFFF9BEFF2E578) -- In AVC update we don't call avc_node_kill() when avc_xperms_populate() fails, resulting in the avc->avc_cache.active_nodes counter having a false value.In last patch this changes was missed , so correcting it. Change-Id: Ic0298162cc766c0f21be7ab232e259766654dad3 Signed-off-by: Jaihind Yadav<jaihindyadav@xxxxxxxxxxxxxx> --- security/selinux/avc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security/selinux/avc.c b/security/selinux/avc.c index 91d24c2..3d1cff2 100644 --- a/security/selinux/avc.c +++ b/security/selinux/avc.c @@ -913,7 +913,7 @@ static int avc_update_node(struct selinux_avc *avc, if (orig->ae.xp_node) { rc = avc_xperms_populate(node, orig->ae.xp_node); if (rc) { - kmem_cache_free(avc_node_cachep, node); + avc_node_kill(avc, node); goto out_unlock; } } -- Regards, Ravi -----Original Message----- From: Stephen Smalley <sds@xxxxxxxxxxxxx> Sent: Wednesday, December 11, 2019 9:24 PM To: rsiddoji@xxxxxxxxxxxxxx; selinux@xxxxxxxxxxxxxxx Cc: paul@xxxxxxxxxxxxxx; linux-security-module@xxxxxxxxxxxxxxx Subject: Re: Looks like issue in handling active_nodes count in 4.19 kernel . On 12/11/19 10:35 AM, rsiddoji@xxxxxxxxxxxxxx wrote: > Thanks for tacking the patch fwd . On the question : > > Actually issue started when we were seeing most of the time "avc_reclaim_node" in the stack . > Which on debugging further avc_cache.active_nodes was already in 7K+ > nodes and as the logic is > > As below . > if (atomic_inc_return(&avc->avc_cache.active_nodes) > avc->avc_cache_threshold) > avc_reclaim_node(avc); > > So if the active_nodes count is > 512 (if not configured) we will be always be calling avc_reclaim_node() and eventually for each node insert we will be calling avc_reclaim_node and might be expansive then using > cache and advantage of cache might be null and void due to this overhead? Was this on a system with the default avc_cache_threshold value or was it set higher by the distro/user? If it was still 512 or any value significantly less than 7K, then the bug is that it ever reached 7K in the first place. The first bug should only trigger under severe memory pressure. The other potential reason for growing numbers of active nodes would be cache thrashing leading to avc_reclaim_node() being unable to take the lock on any buckets and therefore unable to release nodes. Possibly you need a larger cache threshold set on this system. It can be set via /sys/fs/selinux/avc/cache_threshold. Allowing AVC_CACHE_RECLAIM to also be set via selinuxfs or computed relative to avc_cache_threshold would make sense as a further improvement. > > Thanks , > Ravi > > -----Original Message----- > From: selinux-owner@xxxxxxxxxxxxxxx <selinux-owner@xxxxxxxxxxxxxxx> On > Behalf Of Stephen Smalley > Sent: Wednesday, December 11, 2019 8:18 PM > To: rsiddoji@xxxxxxxxxxxxxx; selinux@xxxxxxxxxxxxxxx > Cc: paul@xxxxxxxxxxxxxx; linux-security-module@xxxxxxxxxxxxxxx > Subject: Re: Looks like issue in handling active_nodes count in 4.19 kernel . > > On 12/11/19 9:37 AM, Stephen Smalley wrote: >> On 12/9/19 1:30 PM, rsiddoji@xxxxxxxxxxxxxx wrote: >>> Thanks for quick response , yes it will be helpful if you can raise >>> the change . >>> On the second issue in avc_alloc_node we are trying to check the >>> slot status as active_nodes > 512 ( default ) Where checking >>> the occupancy should be corrected as active_nodes >>>> 80% of slots occupied or 16*512 or >>> May be we need to use a different logic . >> >> Are you seeing an actual problem with this in practice, and if so, >> what exactly is it that you are seeing and do you have a reproducer? > > BTW, on Linux distributions, there is an avcstat(8) utility that can > be used to monitor the AVC statistics, or you can directly read the > stats from the kernel via /sys/fs/selinux/avc/cache_stats > >> >>> >>>> /*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc) >>>> */ >>>> >>>> if (atomic_inc_return(&avc->avc_cache.active_nodes) > >>>> avc->avc_cache_threshold) // default threshold is >>>> 512 >>>> avc_reclaim_node(avc); >>>> >>> >>> Regards, >>> Ravi >>> >>> -----Original Message----- >>> From: selinux-owner@xxxxxxxxxxxxxxx <selinux-owner@xxxxxxxxxxxxxxx> >>> On Behalf Of Stephen Smalley >>> Sent: Monday, December 9, 2019 11:35 PM >>> To: rsiddoji@xxxxxxxxxxxxxx; selinux@xxxxxxxxxxxxxxx >>> Cc: paul@xxxxxxxxxxxxxx; linux-security-module@xxxxxxxxxxxxxxx >>> Subject: Re: Looks like issue in handling active_nodes count in 4.19 >>> kernel . >>> >>> On 12/9/19 10:55 AM, rsiddoji@xxxxxxxxxxxxxx wrote: >>>> Hi team , >>>> Looks like we have issue in handling the "active_nodes" count in >>>> the Selinux - avc.c file. >>>> Where avc_cache.active_nodes increase more than slot array and >>>> code frequency calling of avc_reclaim_node() from >>>> avc_alloc_node() ; >>>> >>>> Where following are the 2 instance which seem to possible culprits >>>> which are seen on 4.19 kernel . Can you comment if my understand >>>> is wrong. >>>> >>>> >>>> #1. if we see the active_nodes count is incremented in >>>> avc_alloc_node >>>> (avc) which is called in avc_insert() Where if the code take >>>> failure path on avc_xperms_populate the code will not decrement >>>> this counter . >>>> >>>> >>>> static struct avc_node *avc_insert(struct selinux_avc *avc, >>>> u32 ssid, u32 tsid, u16 tclass, >>>> struct av_decision *avd, .... >>>> node = avc_alloc_node(avc); //incremented here .... >>>> rc = avc_xperms_populate(node, xp_node); // >>>> possibilities of this getting failure is there . >>>> if (rc) { >>>> kmem_cache_free(avc_node_cachep, node); // but on >>>> failure we are not decrementing active_nodes ? >>>> return NULL; >>>> } >>> >>> I think you are correct; we should perhaps be calling >>> avc_node_kill() here as we do in an earlier error path? >>> >>>> >>>> #2. where it looks like the logic on comparing the active_nodes >>>> against avc_cache_threshold seems wired as the count of active >>>> nodes is always going to be >>>> more than 512 will may land in simply removing /calling >>>> avc_reclaim_node frequently much before the slots are full maybe we >>>> are not using cache at best ? >>>> we should be comparing with some high watermark ? or my >>>> understanding wrong ? >>>> /*@ static struct avc_node *avc_alloc_node(struct selinux_avc *avc) >>>> */ >>>> >>>> if (atomic_inc_return(&avc->avc_cache.active_nodes) > >>>> avc->avc_cache_threshold) // default threshold is >>>> 512 >>>> avc_reclaim_node(avc); >>>> >>> >>> Not entirely sure what you are asking here. avc_reclaim_node() >>> should reclaim multiple nodes up to AVC_CACHE_RECLAIM. Possibly >>> that should be configurable via selinuxfs too, and/or calculated >>> from avc_cache_threshold in some way? >>> >>> Were you interested in creating a patch to fix the first issue above >>> or looking to us to do so? >>> >>> >>> >> > >