On 6/5/24 02:00, Paul M Stillwell Jr wrote: > On 6/4/2024 6:51 AM, Jiwei Sun wrote: >> From: Jiwei Sun <sunjw10@xxxxxxxxxx> >> >> During booting into the kernel, the following error message appears: >> >> (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: Unable to get real path for '/sys/bus/pci/drivers/vmd/0000:c7:00.5/domain/device'' >> (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: /dev/nvme1n1 is not attached to Intel(R) RAID controller.' >> (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: No OROM/EFI properties for /dev/nvme1n1' >> (udev-worker)[2149]: nvme1n1: '/sbin/mdadm -I /dev/nvme1n1'(err) 'mdadm: no RAID superblock on /dev/nvme1n1.' >> (udev-worker)[2149]: nvme1n1: Process '/sbin/mdadm -I /dev/nvme1n1' failed with exit code 1. >> >> This symptom prevents the OS from booting successfully. >> >> After a NVMe disk is probed/added by the nvme driver, the udevd executes >> some rule scripts by invoking mdadm command to detect if there is a >> mdraid associated with this NVMe disk. The mdadm determines if one >> NVMe devce is connected to a particular VMD domain by checking the >> domain symlink. Here is the root cause: >> >> Thread A Thread B Thread mdadm >> vmd_enable_domain >> pci_bus_add_devices >> __driver_probe_device >> ... >> work_on_cpu >> schedule_work_on >> : wakeup Thread B >> nvme_probe >> : wakeup scan_work >> to scan nvme disk >> and add nvme disk >> then wakeup udevd >> : udevd executes >> mdadm command >> flush_work main >> : wait for nvme_probe done ... >> __driver_probe_device find_driver_devices >> : probe next nvme device : 1) Detect the domain >> ... symlink; 2) Find the >> ... domain symlink from >> ... vmd sysfs; 3) The >> ... domain symlink is not >> ... created yet, failed >> sysfs_create_link >> : create domain symlink >> >> sysfs_create_link() is invoked at the end of vmd_enable_domain(). >> However, this implementation introduces a timing issue, where mdadm >> might fail to retrieve the vmd symlink path because the symlink has not >> been created yet. >> >> Fix the issue by creating VMD domain symlinks before invoking >> pci_bus_add_devices(). >> >> Signed-off-by: Jiwei Sun <sunjw10@xxxxxxxxxx> >> Suggested-by: Adrian Huang <ahuang12@xxxxxxxxxx> >> --- >> v2 changes: >> - Add "()" after function names in subject and commit log >> - Move sysfs_create_link() after vmd_attach_resources() >> >> --- >> drivers/pci/controller/vmd.c | 6 +++--- >> 1 file changed, 3 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/pci/controller/vmd.c b/drivers/pci/controller/vmd.c >> index 87b7856f375a..d0e33e798bb9 100644 >> --- a/drivers/pci/controller/vmd.c >> +++ b/drivers/pci/controller/vmd.c >> @@ -925,6 +925,9 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) >> dev_set_msi_domain(&vmd->bus->dev, >> dev_get_msi_domain(&vmd->dev->dev)); >> + WARN(sysfs_create_link(&vmd->dev->dev.kobj, &vmd->bus->dev.kobj, >> + "domain"), "Can't create symlink to domain\n"); >> + > > I think you should move the sysfs_remove_link() line in vmd_remove() down as well. Indeed, thanks for your suggestion. I will modify it in v3 patch. Thanks, Regards, Jiwei > > Paul > >> vmd_acpi_begin(); >> pci_scan_child_bus(vmd->bus); >> @@ -964,9 +967,6 @@ static int vmd_enable_domain(struct vmd_dev *vmd, unsigned long features) >> pci_bus_add_devices(vmd->bus); >> vmd_acpi_end(); >> - >> - WARN(sysfs_create_link(&vmd->dev->dev.kobj, &vmd->bus->dev.kobj, >> - "domain"), "Can't create symlink to domain\n"); >> return 0; >> } >>