On Thu, 2017-09-28 at 08:58 +0000, Marian Rainer-Harbach wrote: > Hi everyone, > we are running a small 389 DS cluster on two RHEL 7.4 machines. The version installed is the most recent in the Red Hat repositories, 1.3.6.1-19.el7_4. 389 DS is used as user storage for the Keycloak single sign-on system. It contains about 150k person objects. > > To test the whole system, we are running load tests each night. These tests login 100 users per second in Keycloak for 15 minutes, which in turn authenticates the users against 389 DS. On our machines, this normally results in a very low CPU load by 389 DS, about 10-25%. > > Up to now we used SSHA512 as password hashing algorithm. We now would like to switch to PBKDF2: As a first test, we changed the password of the user that Keycloak uses to bind to 389 DS to PBKDF2 hashing. In this configuration, we encountered a problem: When running the load tests, the system behaves normally for the first few minutes. After this, 389 DS CPU usage suddenly jumps to almost 800% on one of the servers (the machines have 8 CPUs) and authentications become very slow. This continues for the remaining runtime of the load test. When running the test again, 389 DS again behaves normally for the first few minutes, then CPU usage jumps to 800%. > > When changing the password hash back to SSHA512, everything is fine again. > > To me this looks like a bug in 389 DS. Please let me know what information to provide so you can investigate. > Hey mate, It is and is not a bug at the same time. The root issue is confidential (I don't know why I don't think it needs to be) https://bugzilla.redhat.com/show_bug.cgi?id=1439272 Because we use the NSS crypto provider, we are affected by this. the tl;dr is that NSS uses the wrong pbkdf2 algo, so it's twice as slow as openssl. We also set a time factor in the pbkdf2 code. You can see that here: https://pagure.io/389-ds-base/blob/master/f/ldap/servers/plugins/pwdstorage/pbkdf2_pwd.c#_263 The summary is we want an attacker to have to spend a minimal amount of time to attempt a hash, in this case: https://pagure.io/389-ds-base/blob/master/f/ldap/servers/plugins/pwdstorage/pbkdf2_pwd.c#_48 Finally, we have a cap that says "if we think it's too little, use this baseline", here it's 10,000 So what that amounts to is: * Either our time factor is too high, and that causes your high CPU and latency, * Your machine at start up calculated say .... 3000 rounds for the time cap, but then pushed it up to 10,000, and this is amplified by the NSS implementation bug. I would be happy to open a bug about this. I think that perhaps the issue is that our minimum is too high, and that our time factor is too high. https://pagure.io/389-ds-base/issue/49387 Hope that helps, -- Sincerely, William Brown Software Engineer Red Hat, Australia/Brisbane
Attachment:
signature.asc
Description: This is a digitally signed message part
_______________________________________________ 389-users mailing list -- 389-users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to 389-users-leave@xxxxxxxxxxxxxxxxxxxxxxx