Re: [PATCH v6 2/2] s390/crypto: New s390 specific protected key hash phmac

Harald Freudenberger <freude@xxxxxxxxxxxxx> · Mon, 02 Dec 2024 18:25:22 +0100

On 2024-11-29 15:48, Herbert Xu wrote:
On Fri, Nov 29, 2024 at 12:10:58PM +0100, Harald Freudenberger wrote:

+static inline int phmac_keyblob2pkey(const u8 *key, unsigned int 
keylen,
+				     struct phmac_protkey *pk)
+{
+	int i, rc = -EIO;
+
+	/* try three times in case of busy card */
+	for (i = 0; rc && i < 3; i++) {
+		if (rc == -EBUSY && msleep_interruptible(1000))
+			return -EINTR;

You can't sleep in an ahash algorithm either.  What you can do
however is schedule a delayed work and pick up where you left
off.  That's how asynchronous completion works.

But my question still stands, under what circumstances can
this fail? I don't think storage folks will be too happy with
a crypto algorithm that can produce random failures.

Cheers,

- The attempt to derive a protected key usable by the cpacf instructions
  depends of the raw key material used. For 'clear key' material the
  derivation process is a simple instruction which can't fail.
  A more preferred way however is to use 'secure key' material which
  is transferred to a crypto card and then re-wrapped to be usable
  with cpacf instructions. This requires communication with a crypto
  card and thus may fail - because there is no card at all or there
  is temporarily no card available or the card is in bad state. If there
  is no usable card the AP bus returns -EBUSY at the pkey_key2protkey()
  function and triggers an asynchronous bus scan. As long as this scan
  is running (usually about 100ms or so) the -EBUSY is returned to 
indicate
  that the caller should retry "later". Other states are covered with
  other return codes like ENODEV or EIO and the caller is not supposed
  to loop but should fail. When there is no accessible hardware 
available
  to derive a protected key either the user or the admin broke something
  or something went really the bad way and then there is no help but the
  storage device must fail.
- How can it happen that a re-derive is needed? A re-derive is triggered 
when
  the cpacf instruction detects that the protected key is not valid any 
more.
  A protected key includes a verification pattern (hash) of the firmware 
key
  used to encrypt the key. This hash is checked on each invocation of a
  cpacf instruction. So when the code execution "awakes" on another 
machine
  ("live guest migration" of an KVM guest to another machine) the next
  cpacf instruction will complain about verification pattern mismatch 
and
  the protected key needs to get re-derived from the source material.
  It could also happen via suspend/resume on the very same machine when
  there is something in between (for example the whole machine runs a
  cold-start). It does NOT happen out of the sudden without any reason,
  but the code affected is not aware of any "live guest migration" or
  "suspend/resume cycle" and thus as the crypto algorithm implementation 
has
  no awareness of a "live guest migration" just happened - it looks like
  this occurred suddenly.
- Do I get you right, that a completion is ok? I always had the 
impression
  that waiting on a completion is also a sleeping act and thus not 
allowed?

Thanks for your help and being so patient with us.