On 05/25/2011 11:29 PM, Christophe Varoqui wrote:
on a 'multipathd reconfigure' command, the uxclient gets stuck and the
multipathd daemon strace shows:
$ sudo strace -f -p 17721
Process 17721 attached with 7 threads - interrupt to quit
[pid 17757] futex(0x7fdc6a1540a4, FUTEX_WAIT_PRIVATE, 3, NULL
<unfinished ...>
[pid 17756] futex(0x11167f0, FUTEX_WAIT_PRIVATE, 2, NULL
<unfinished ...>
[pid 17755] ioctl(3, DM_DEV_WAIT<unfinished ...>
[pid 17724] futex(0x11167f0, FUTEX_WAIT_PRIVATE, 2, NULL
<unfinished ...>
[pid 17723] recvmsg(6,<unfinished ...>
[pid 17722] futex(0x110a1b4, FUTEX_WAIT_PRIVATE, 15, NULL
<unfinished ...>
[pid 17721] futex(0x612624, FUTEX_WAIT_PRIVATE, 1, NULL
ok, I dug it to 9e7b4d8d6fa8dc9433c1e60d4bd6717aec2f5296
Here you add acquire/release the vector lock inside
multipathd/main.c:reconfigure(), but as seen in the following LCKDBG
trace, the lock is already acquired in
multipathd/main.c:uxsock_trigger()
Hence the lock -> lock = hang.
I commited and pushed a partial revert of
9e7b4d8d6fa8dc9433c1e60d4bd6717aec2f5296
But maybe you'd rather see us stop acquiring the lock from
uxsock_trigger() to acquire more selectively in the functions called
from there ... Please comment.
Hmm. Yes, your fix appears to be correct.
I had several locking issues during startup (calling cli commands
while the daemon is still starting up is a nice way of testing it),
and several (unsuccessful) attempts in fixing it.
Real cause was a missing locking during initial configuration,
so it looks as if 9e7b4d8d6fa8dc9433c1e60d4bd6717aec2f5296
was in fact a left-over from the earlier attempts.
So yeah, your patch seems to be fine.
Will be doing more testing here.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@xxxxxxx +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 NÃrnberg
GF: J. Hawn, J. Guild, F. ImendÃrffer, HRB 16746 (AG NÃrnberg)
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel