+Cc: Greg
Hi Greg,
On 2022/11/11 23:51, James Bottomley wrote:
On Fri, 2022-11-11 at 22:44 +0800, Yang Yingliang wrote:
In sas_rphy_add(), if transport_add_device() fails, the device
is not added, the return value is not checked, it won't goto
error path, when removing rphy in normal remove path, it causes
null-ptr-deref, because transport_remove_device() is called to
remove the device that was not added.
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000108
pc : device_del+0x54/0x3d0
lr : device_del+0x37c/0x3d0
Call trace:
device_del+0x54/0x3d0
attribute_container_class_device_del+0x28/0x38
transport_remove_classdev+0x6c/0x80
attribute_container_device_trigger+0x108/0x110
transport_remove_device+0x28/0x38
sas_rphy_remove+0x50/0x78 [scsi_transport_sas]
sas_port_delete+0x30/0x148 [scsi_transport_sas]
do_sas_phy_delete+0x78/0x80 [scsi_transport_sas]
device_for_each_child+0x68/0xb0
sas_remove_children+0x30/0x50 [scsi_transport_sas]
sas_rphy_remove+0x38/0x78 [scsi_transport_sas]
sas_port_delete+0x30/0x148 [scsi_transport_sas]
do_sas_phy_delete+0x78/0x80 [scsi_transport_sas]
device_for_each_child+0x68/0xb0
sas_remove_children+0x30/0x50 [scsi_transport_sas]
sas_remove_host+0x20/0x38 [scsi_transport_sas]
scsih_remove+0xd8/0x420 [mpt3sas]
Fix this by checking and handling return value of
transport_add_device()
in sas_rphy_add().
Fixes: c7ebbbce366c ("[SCSI] SAS transport class")
Signed-off-by: Yang Yingliang <yangyingliang@xxxxxxxxxx>
---
v1 -> v2:
Update commit message.
---
drivers/scsi/scsi_transport_sas.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/scsi_transport_sas.c
b/drivers/scsi/scsi_transport_sas.c
index 74b99f2b0b74..accc0afa8f77 100644
--- a/drivers/scsi/scsi_transport_sas.c
+++ b/drivers/scsi/scsi_transport_sas.c
@@ -1526,7 +1526,11 @@ int sas_rphy_add(struct sas_rphy *rphy)
error = device_add(&rphy->dev);
if (error)
return error;
- transport_add_device(&rphy->dev);
+ error = transport_add_device(&rphy->dev);
+ if (error) {
+ device_del(&rphy->dev);
+ return error;
+ }
transport_configure_device(&rphy->dev);
if (sas_bsg_initialize(shost, rphy))
printk("fail to a bsg device %s\n", dev_name(&rphy-
dev));
There is a slight problem with doing this in that if
transport_device_add() ever fails it's likely because memory pressure
caused the allocation of the internal_container to fail. What that
means is that the visible sysfs attributes don't get added, but
otherwise the rphy is fully functional as far as the driver sees it, so
this condition doesn't have to be a fatal error which kills the device.
There are two ways of handling this:
1. The above to move the condition from an ignored to a fatal error.
It's so rare that we almost never see it in practice and if it
ever happened, the machine is so low on memory that something
else is bound to fail an allocation and kill the device anyway,
so treating it as non-fatal likely serves no purpose.
2. Simply to make the assumption that transport_remove_device() is
idempotent true by adding a flag in the internal_class to signify
removal is required. This would preserve current behaviour and
have the bonus that it only requires a single patch, not one
patch per transport class object that has this problem.
I'd probably prefer 2. since it's way less work, but others might have
different opinions.
Current some callers ignore the return value of transport_add_device(),
if it fails,
it will cause null-ptr-deref in transport_remove_device().
James suggested that add some check in transport_remove_device(), so all can
be fix in one patch.
Do you have any suggestion for this ?
Thanks,
Yang
James
.