Yes this is the first time that we are getting this stress tested done on v4.19 kernel . We had not tested this prior version of kernel though . Current proposed changes seems to really help and testing is still going on . As per the delta it looks change 6b6bc620 seem to be missing in earlier version of kernel not sure if this was the cause. Br , Ravi. -----Original Message----- From: Stephen Smalley <sds@xxxxxxxxxxxxx> Sent: Tuesday, December 17, 2019 9:54 PM To: Ravi Kumar Siddojigari <rsiddoji@xxxxxxxxxxxxxx>; selinux@xxxxxxxxxxxxxxx Cc: paul@xxxxxxxxxxxxxx; linux-security-module@xxxxxxxxxxxxxxx Subject: Re: Looks like issue in handling active_nodes count in 4.19 kernel . On 12/17/19 10:52 AM, Stephen Smalley wrote: > On 12/17/19 10:40 AM, Ravi Kumar Siddojigari wrote: >> Yes indeed this is a stress test on ARM64 device with multicore >> where most of the cores /tasks are stuck in avc_reclaim_node . >> We still see this issue even after picking the earlier patch " >> selinux: ensure we cleanup the internal AVC counters on error in >> avc_insert() commit: d8db60cb23e4" >> Where selinux_state during issue was as below where all the slots >> are NULL and the count was more than threshold. >> Which seem to be calling avc_reclaim_node always and as the all the >> slots are empty its going for full for- loop with locks and unlock >> and taking too long . >> Not sure what could make the slots null , for sure its not due to >> flush() /Reset(). We think that still we need to call avc_kill_node >> in update_node function . >> Adding the patch below can you please review or correct the following >> patch . >> >> >> selinux_state = ( >> disabled = FALSE, >> enforcing = TRUE, >> checkreqprot = FALSE, >> initialized = TRUE, >> policycap = (TRUE, TRUE, TRUE, FALSE, FALSE, TRUE), >> avc = 0xFFFFFF9BEFF1E890 -> ( >> avc_cache_threshold = 512, /* <<<<<not configured and its >> with default*/ >> avc_cache = ( >> slots = ((first = 0x0), (first = 0x0), (first = 0x0), (first >> = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), >> (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first >> /*<<<< all are NULL */ >> slots_lock = ((rlock = (raw_lock = (val = (counter = 0), >> locked = 0, pending = 0, locked_pending = 0, tail = 0), magic = >> 3735899821, owner_cpu = 4294967295, owner = 0xFFFFFFFFFFFFFFFF, >> dep_map = (key = 0xFFFFFF9BEFF298A8, cla >> lru_hint = (counter = 616831529), >> active_nodes = (counter = 547), /*<<<<< increased more >> than 512*/ >> latest_notif = 1)), >> ss = 0xFFFFFF9BEFF2E578) >> >> >> -- >> In AVC update we don't call avc_node_kill() when >> avc_xperms_populate() fails, resulting in the >> avc->avc_cache.active_nodes counter having a false value.In last patch this changes was missed , so correcting it. >> >> Change-Id: Ic0298162cc766c0f21be7ab232e259766654dad3 >> Signed-off-by: Jaihind Yadav<jaihindyadav@xxxxxxxxxxxxxx> >> --- >> security/selinux/avc.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/security/selinux/avc.c b/security/selinux/avc.c index >> 91d24c2..3d1cff2 100644 >> --- a/security/selinux/avc.c >> +++ b/security/selinux/avc.c >> @@ -913,7 +913,7 @@ static int avc_update_node(struct selinux_avc >> *avc, >> if (orig->ae.xp_node) { >> rc = avc_xperms_populate(node, orig->ae.xp_node); >> if (rc) { >> - kmem_cache_free(avc_node_cachep, node); >> + avc_node_kill(avc, node); >> goto out_unlock; >> } >> } >> -- > > That looks correct to me; I guess that one got missed by the prior fix. > Still not sure how your AVC got into that state though... > > Acked-by: Stephen Smalley <sds@xxxxxxxxxxxxx> BTW, have you been running these stress tests on earlier kernels too? If so, what version(s) are known to pass them? I ask because this code has been present since v4.3 and this is the first such report.