Re: [PATCH] driver core: Add log when devtmpfs create node failed

yangxingui <yangxingui@xxxxxxxxxx> · Thu, 23 May 2024 19:01:40 +0800

On 2024/5/23 17:35, Greg KH wrote:
On Thu, May 23, 2024 at 05:23:07PM +0800, yangxingui wrote:
Hi Greg,

On 2024/5/23 15:25, Greg KH wrote:
On Thu, May 23, 2024 at 09:50:09AM +0800, yangxingui wrote:
Hi, Greg

On 2024/5/22 20:23, Greg KH wrote:
On Wed, May 22, 2024 at 11:43:46AM +0000, Xingui Yang wrote:
Currently, no exception information is output when devtmpfs create node
failed, so add log info for it.

Why?  Who is going to do something with this?
We execute the lsscsi command after the disk is connected, we occasionally
find that some disks do not have dev nodes and these disks cannot be used.

Ok, but why do you think that devtmpfs create failed?
I found that lsscsi will traverse the dev node and obtain device major and
min. If no matching dev node is found, it will display "-       ".

However, there is no abnormal log output during disk scanning. We analyze
that it may be caused by the failure of devtmpfs create dev node, so the log
is added here.

But is that the case?  Why is devtmpfs failing?  Shouldn't we fix that
instead?
My subsequent reply touches on these points.

The lscsi command query results and kernel logs as follows:

[root@localhost]# lsscsi
[9:0:4:0]	disk	ATA	ST10000NM0086-2A SN05	-

kernel: [586669.541218] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0
link_rate=10(sata)
kernel: [586669.541341] sas: phy-9:0 added to port-9:0, phy_mask:0x1
(5000000000000900)
kernel: [586669.541511] sas: DOING DISCOVERY on port 0, pid:2330731
kernel: [586669.541518] hisi_sas_v3_hw 0000:b4:04.0: dev[4:5] found
kernel: [586669.630816] sas: Enter sas_scsi_recover_host busy: 0 failed: 0
kernel: [586669.665960] hisi_sas_v3_hw 0000:b4:04.0: phydown: phy0
phy_state=0xe
kernel: [586669.665964] hisi_sas_v3_hw 0000:b4:04.0: ignore flutter phy0
down
kernel: [586669.863360] hisi_sas_v3_hw 0000:b4:04.0: phyup: phy0
link_rate=10(sata)
kernel: [586670.024482] ata19.00: ATA-10: ST10000NM0086-2AA101, SN05, max
UDMA/133
kernel: [586670.024487] ata19.00: 19532873728 sectors, multi 16: LBA48 NCQ
(depth 32), AA
kernel: [586670.027471] ata19.00: configured for UDMA/133
kernel: [586670.027490] sas: --- Exit sas_scsi_recover_host: busy: 0 failed:
0 tries: 1
kernel: [586670.037541] sas: ata19: end_device-9:0:
model:ST10000NM0086-2AA101 serial:            ZA2B3PR2
kernel: [586670.100856] scsi 9:0:4:0: Direct-Access     ATA ST10000NM0086-2A
SN05 PQ: 0 ANSI: 5
kernel: [586670.101114] sd 9:0:4:0: [sdk] 19532873728 512-byte logical
blocks: (10.0 TB/9.10 TiB)
kernel: [586670.101116] sd 9:0:4:0: [sdk] 4096-byte physical blocks
kernel: [586670.101125] sd 9:0:4:0: [sdk] Write Protect is off
kernel: [586670.101137] sd 9:0:4:0: [sdk] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
kernel: [586670.101620] sd 9:0:4:0: Attached scsi generic sg10 type 0
kernel: [586670.101714] sas: DONE DISCOVERY on port 0, pid:2330731, result:0
kernel: [586670.101731] sas: sas_form_port: phy0 belongs to port0
already(1)!
kernel: [586670.152512] sd 9:0:4:0: [sdk] Attached SCSI disk

Looks like sdk was found properly, what's the problem?

Yes, this problem occurs occasionally. There is no exception log when
scanning the disk, but the disk cannot be used. It has been confirmed that
it is related to fio testing. When the dev node does not exist, fio may
actively create this file.

So that's a userspace issue.  If a device node is to be created, and the
file is already present with that name, yes, we will fail to create it
as obviously userspace did not want us to do so.

It's not the kernel's job to protect userspace from doing foolish things
itself, right?  :)
Yes.

If we want to solve this problem, should we delete the existing files first
when creating a dev node?

No.
Ok.

Or just print a prompt indicating that the dev node creation failed.

We can do that, but will that cause error messages to be printed out for
normal situations today where userspace does this on purpose?

Again, this isn't fixing the root problem here (which is userspace doing
something it shouldn't be doing), adding kernel log messages might be
just noise at this point in time given that it has been operating this
way for many years, if not decades.
Yes, there is currently no fix for the problem, and it doesn't usually 
happen. Once it occurs, the device will be unavailable and difficult to 
locate. In addition, there are many possibilities for the failure of 
devtmpfs to create a dev node, including currently recognized scenarios 
and memory allocation failures, etc.

Thanks,
Xingui
.