On 08/11/2016 01:33 PM, Mike Christie wrote:
Could you try the attached patch. I found two segfaults. If check_path
returns less than 0 then we free the path and so we cannot call repair
on it. If libcheck_init fails it memsets the checker, so we cannot call
repair on it too.
I moved the repair call to the specific paths that the path is down.
Hello Mike,
Thanks for the patch. Unfortunately even with this patch applied I can
still trigger a segfault sporadically:
# valgrind --read-var-info=yes multipathd -d
Aug 11 14:02:21 | mpathbf: load table [0 2097152 multipath 3
queue_if_no_path pg_init_retries 50 0 2 1 queue-length 0 1 1 8:160 1000
queue-length 0 1 1 8:64 1000]
Aug 11 14:02:21 | mpathbf: event checker started
Aug 11 14:02:21 | sdk [8:160]: path added to devmap mpathbf
Aug 11 14:02:21 | sdd: add path (uevent)
==2452== Thread 4:
==2452== Jump to the invalid address stated on the next line
==2452== at 0x0: ???
==2452== by 0x409BBE: repair_path (main.c:1451)
==2452== by 0x40A905: check_path (main.c:1715)
==2452== by 0x40AE72: checkerloop (main.c:1808)
==2452== by 0x5047473: start_thread (pthread_create.c:333)
==2452== by 0x671B3EC: clone (clone.S:109)
==2452== Address 0x0 is not stack'd, malloc'd or (recently) free'd
==2452==
==2452==
==2452== Process terminating with default action of signal 11 (SIGSEGV)
==2452== Bad permissions for mapped region at address 0x0
==2452== at 0x0: ???
==2452== by 0x409BBE: repair_path (main.c:1451)
==2452== by 0x40A905: check_path (main.c:1715)
==2452== by 0x40AE72: checkerloop (main.c:1808)
==2452== by 0x5047473: start_thread (pthread_create.c:333)
==2452== by 0x671B3EC: clone (clone.S:109)
==2452==
(gdb) list main.c:1451
1446 void repair_path(struct path * pp)
1447 {
1448 if (pp->state != PATH_DOWN)
1449 return;
1450
1451 checker_repair(&pp->checker);
1452 if (strlen(checker_message(&pp->checker)))
1453 LOG_MSG(1, checker_message(&pp->checker));
1454 }
1455
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel