Re: Bug in atmel-ecc driver

Uwe Kleine-König <u.kleine-koenig@xxxxxxxxxxxxxx> · Wed, 18 May 2022 23:36:38 +0200

Hello,

[fixed Ard's email address]

On Wed, May 18, 2022 at 10:07:32AM +0000, Tudor.Ambarus@xxxxxxxxxxxxx wrote:
> On 5/17/22 17:33, Uwe Kleine-König wrote:
> > On Tue, May 17, 2022 at 01:11:22PM +0000, Tudor.Ambarus@xxxxxxxxxxxxx wrote:
> >> On 5/17/22 13:24, Uwe Kleine-König wrote:
> >>> On Fri, May 13, 2022 at 03:59:54PM +0200, Uwe Kleine-König wrote:
> >>>> TL;DR: when a device bound to the drivers/crypto/atmel-ecc.c driver is
> >>>> unbound while tfm_count isn't zero, this probably results in a
> >>>> use-after-free.
> >>>>
> >>>> The .remove function has:
> >>>>
> >>>> 	if (atomic_read(&i2c_priv->tfm_count)) {
> >>>>                 dev_err(&client->dev, "Device is busy\n");
> >>>>                 return -EBUSY;
> >>>>         }
> >>>>
> >>>> before actually calling the cleanup stuff. If this branch is hit the
> >>>> result is likely:
> >>>>
> >>>>  - "Device is busy" from drivers/crypto/atmel-ecc.c
> >>>>  - "remove failed (EBUSY), will be ignored" from the i2c core
> >>>>  - the devm cleanup callbacks are called, including the one kfreeing
> >>>>    *i2c_priv
> >>>>  - at a later time atmel_ecc_i2c_client_free() is called which does
> >>>>    atomic_dec(&i2c_priv->tfm_count);
> >>>>  - *boom*
> >>>>
> >>>> I think to fix that you need to call get_device for the i2c device
> >>>> before increasing tfm_count (and a matching put_device when decreasing
> >>>> it). Having said that the architecture of this driver looks strange to
> >>>> me, so there might be nicer fixes (probably with more effort).
> >>> I tried to understand the architecture a bit, what I found is
> >>> irritating. So the atmel-ecc driver provides a static struct kpp_alg
> >>> atmel_ecdh_nist_p256 which embeds a struct crypto_alg (.base). During
> >>> .probe() it calls crypto_register_kpp on that global kpp_alg. That is,
> >>> if there are two or more devices bound to this driver, the same kpp_alg
> >>> structure is registered repeatedly.  This involves (among others)
> >>>
> >>>  - refcount_set(&atmel_ecdh_nist_p256.base.cra_refcount)
> >>>    in crypto_check_alg()
> >>>  - INIT_LIST_HEAD(&atmel_ecdh_nist_p256.base.cra_users)
> >>>    in __crypto_register_alg()
> >>>
> >>> and then a check about registering the same alg twice which makes the
> >>> call crypto_register_alg() return -EEXIST. So if a second device is
> >>> bound, it probably corrupts the first device and then fails to probe.
> >>>
> >>> So there can always be (at most) only one bound device which somehow
> >>> makes the whole logic in atmel_ecdh_init_tfm ->
> >>> atmel_ecc_i2c_client_alloc to select the least used(?) i2c client among
> >>> all the bound devices ridiculous.
> >> It's been a while since I last worked with ateccx08, but as far as I remember
> >> it contains 3 crypto IPs (ecdh, ecdsa, sha) that communicate over the same
> >> i2c address. So if someone adds support for all algs and plug in multiple
> >> ateccx08 devices, then the distribution of tfms across the i2c clients may work.
> > It would require to register the crypto backends independent of the
> > .probe() routine though.
> > 
> >> Anyway, if you feel that the complexity is superfluous as the code is now, we
> >> can get rid of the i2c_client_alloc logic and add it later on when/if needed.
> > If it's you who acts, do whatever pleases you. If it's me I'd go for a
> > quick and simple solution to get back to what I originally want to do
> > with this driver.
> > 
> > So I'd go for something like
> > 
> > diff --git a/drivers/crypto/atmel-ecc.c b/drivers/crypto/atmel-ecc.c
> > index 333fbefbbccb..e7f3f4793c55 100644
> > --- a/drivers/crypto/atmel-ecc.c
> > +++ b/drivers/crypto/atmel-ecc.c
> > @@ -349,8 +349,13 @@ static int atmel_ecc_remove(struct i2c_client *client)
> >  
> >  	/* Return EBUSY if i2c client already allocated. */
> >  	if (atomic_read(&i2c_priv->tfm_count)) {
> > -		dev_err(&client->dev, "Device is busy\n");
> > -		return -EBUSY;
> > +		/*
> > +		 * After we return here, the memory backing the device is freed.
> > +		 * If there is still some action pending, it probably involves
> > +		 * accessing free'd memory.
> 
> would be good to explain why i2c core will ignore -EBUSY.

In general it's impossible to do error handling (e.g. retry calling the
remove callback) because the device that is removed might be physically
removed. That mostly doesn't apply to i2c, but that's how the device
model works. So if you look into the i2c core: The remove callback is
called from i2c_device_remove() which is the remove callback for the i2c
bus. It's prototype is:

	static void i2c_device_remove(struct device *dev)

so there is no way to pass an error to the device core layer.
Until fc7a6209d5710618eb4f72a77cd81b8d694ecf89 this wasn't so obvious
and many busses returned an int, that was however ignored by the driver
core. The quest here is to change the bus specific methods to return
void. So the eventual goal here is for i2c:

diff --git a/include/linux/i2c.h b/include/linux/i2c.h
index fbda5ada2afc..066b541a0d5d 100644
--- a/include/linux/i2c.h
+++ b/include/linux/i2c.h
@@ -273,7 +273,7 @@ struct i2c_driver {
 
 	/* Standard driver model interfaces */
 	int (*probe)(struct i2c_client *client, const struct i2c_device_id *id);
-	int (*remove)(struct i2c_client *client);
+	void (*remove)(struct i2c_client *client);
 
 	/* New driver model interface to aid the seamless removal of the
 	 * current probe()'s, more commonly unused than used second parameter.
 

to don't let i2c driver authors assume they can return an error value.

See

	a0386bba70934d42f586eaf68b21d5eeaffa7bd0
	b2c943e52705b211d1aa0633c9196150cf30be47
	15f83bb0191261adece5a26bfdf93c6eccdbc0bb

for a few more examples.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |
Attachment:
signature.asc

Description: PGP signature