Hi Konrad, Konrad Rzeszutek wrote: >>> Could also be a race condition that is present in SLES10 + RHEL5 >>> kernels. Where the SysFS directories are created (and the udev event it >>> sent out), but the kernel hasn't populated the SysFS directories. So >>> when multipathd tries to read them it finds no pertient information and >>> shoves it off to the 'orphan' state. >>> >> Really? With SLES10? Have you actually observed this? > > With SLES10 SP2 to be exact. It wasn't an issue with SLES10 since the > initial patch was there. The equipment I used to test this was an > AX150FC with failed batteries (so no cache writes) and with a failed > controller so it would run extra slow. > >> We're running multipath _after_ udev has processed the event. > > Right, the one where the SysFS directory is created. Then multipatd > reads the data. I remember posting it here and mentioning that this > problem exists on SLES10SP2 and RHEL5 but not on the upstream kernels. > >> And udev already waited for sysfs, so we should be safe there. > > Not so. The udev gets the SCSI uevent creation, creates the /dev/sdX, and > so. But the kernel hasn't yet fully populated the SysFS entries (so > /sys/block/sdX/device/vendor does exist, but has no data in it). >> It might be applicable to mainline multipath-tools, but > > It really depends on how the SysFS directories are populated and how > slow the SCSI target is. > >> the SLES10 one ... I'd be surprised. >> >> Well, reasonably surprised. multipath keeps on throwing >> an amazing number of issues still. >> >> Do you have more information here? > > Here is the patch along with a detailed description. > > The "multipath-tools-add-wait" patch is a backport/write of the > wait_for_file routine used in the sysfs_get_[vendor|model|rev] > macros. The SLES10 SP2 back-ported a lot of the upstream features > of multipath, and one of those was getting rid of this function. > I haven't yet found out the reason why it was deleted - looks > as if a mistake as the upstream kernel _should_ cause the same > set of problems with multipath. > [update: Upstream kernel has this fixed] > > The reason a wait is necessary is due to the way the kernel > sends the event. When a SCSI device is added the SCSI subsystem > pursues this path: > > _sysfs_add_sdev: > calls device_add ... > [ '/devices/platform/host16/session6/target16:0:0/16:0:0:17'] uevent > bus_attach_device > bus_for_each_drv > driver_probe_device > sd_probe > ['/class/scsi_disk/16:0:0:17' ] uevent > add_disk > ['/block/sdai'] [ Here multipath starts its job ] > > calls class_device_add ... > [ '/class/scsi_device/16:0:0:17' ] uevent > sg_add: > [ '/class/scsi_generic/sg35' ] uevent > > > done with device_add, and now we add the attributes: > --> scsi_sysfs_sdev_attrs[i].vendor, model, rev <-- THIS is the > problem. > > [Multipathd at the 'block/sdai' event has started analyzing the data, and > it reads the SysFS, but the 'vendor', 'model' have no data so multipathd > discards them an orphans the devices. That data gets to be there once > 'device_add' is finished.] > Ah. Hmm. Seems you are correct. I'll have to apply the patch, then. Fancy opening a bugzilla for it? Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel