I'm trying to setup the OIDC provider for RGW so that I can have roles that can be assumed by people logging into their regular Azure AD identities. The client I'm planning to use is Cyberduck – it seems like one of the few GUI S3 clients that manages the OIDC login process in a way that could work for relatively naive users. I've gotten a fair ways down the road. I've been able to configure Cyberduck so that it performs the login with Azure AD, gets an identity token, and then sends it to Ceph to engage with the AssumeRoleWithWebIdentity process. However, I then get an error, which shows up in the Ceph rgw logs like this: 2024-07-08T17:18:09.749+0000 7fb2d7845700 0 req 15967124976712370684 1.284013867s sts:assume_role_web_identity Signature validation failed: evp verify final failed: 0 error:0407008A:rsa routines:RSA_padding_check_PKCS1_type_1:invalid padding I turned the logging for rgw up to 20 to see if I could follow along to see how much of the process succeeds and learn more about what fails. I can then see logging messages from this file in the source code: https://github.com/ceph/ceph/blob/08d7ff952d78d1bbda04d5ff7e3db1e733301072/src/rgw/rgw_rest_sts.cc We get to WebTokenEngine::get_from_jwt, and it logs the JWT payload in a way that seems to be as expected. The logs then indicate that a request is sent to the /.well-known/openid-configuration endpoint that appears to be appropriate for the issuer of the JWT. The logs eventually indicate what looks like a successful and appropriate response to that. The logs then show that a request is sent to the jwks_uri that is indicated in the openid-configuration document. The response to that is logged, and it appears to be appropriate. We then get some logging starting with "Certificate is", so it looks like we're getting as far as WebTokenEngine::validate_signature. So, several things appear to have happened successfully – we've loading the OIDC provider that corresponds to the iss, and we've found a client ID that corresponds to what I registered when I configured things. (This is why I say we appear to be a fair ways down the road – a lot of this is working). It looks as though what's happening in the code now is that it's iterating through the certificates given in the jwks_uri content. There are 6 certificates listed, but the code only gets as far as the first one. Looking at the code, what appears to be happening is that, among the various certificates in the jwks_uri, it's finding the first one which matches a thumbprint registered with Ceph (that is, which I registered with Ceph). This must be succeeding (for the first certificate), because the "Signature validation failed" logging comes later. So, the code does verify that the thumbprint of the first certificate matches one of the thumbprints I registered with Ceph for this OIDC provider. We then get to a part of the code where it tries to verify the JWT using the certificate, with jwt::verify. Given what gets logged ("Signature validateion failed: ", this must be throwing an exception. The thing I find surprising about this is that there really isn't any reason to think that the first certificate listed in the jwks_uri content is going to be the certificate used to sign the JWT. If I understand JWT correctly, it's appropriate to sign the JWT with any of the certificates listed in the jwks_uri content. Furthermore, the JWT header includes a reference to the kid, so it's possible for Ceph to know exactly which certificate the JWT purports to be signed by. And, Ceph knows that there might be multiple thumbprints, because we can register 5. So, the logic of trying the first valid certificate in x5c and then stopping if it fails seems broken, actually. I suppose what I could do as a workaround is try to figure out whether Azure AD is consistently using the same kid to sign the JWTs for me, and then only register that thumbprint with Ceph. Then, Ceph would actually choose the correct certificate (as the others wouldn't match a thumbprint I registered). I may try this – in part, just to verify what I think is happening. But it would be awfully fragile – I don't believe there is any requirement in JWT to just use one of the certificates listed in x5c. An alternative would be to try rewriting the code to apply a different kind of logic. The way it ought to work (it seems to me) is something like this: * Get the openid_configuration, and get the jwks stuff from the jwks_uri (which Ceph does already). * Look at the header of the JWT to see which kid it purports to be signed by. * Find the certificate that corresponds to that kid (from the jwks_uri content) * Validate the JWT with that certificate. That ought to work, at least given what I'm seeing. (But, I'm not a JWT expert, so I don't know whether there is something unusual in how Azure AD generates JWT's and handles the jwks_uri content). Anyway, I'm curious whether anyone else has been trying to get this to work with Azure AD, and whether they have run into similar problems. And, of course, whether I appear to be misunderstanding anything about how this is supposed to work. Ryan Rempel Director of Information Technology Canadian Mennonite University _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx