Re: [PATCH] multipathd: avoid crash in uevent_cleanup()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>> Mar 02 11:40:35 localhost.localdomain multipathd[85474]: directio
>> checker refcount 6
>> Mar 02 11:40:35 localhost.localdomain multipathd[85474]: lxk free tur
>> checker  //checker_put
> 
> 
> So we do not see "unloading tur checker". Like you said, that suggests
> that the crash occurs between libcheck_free() and the thread exiting.



"lxk free tur checker" is add in free_checker called by checker_put.
I don't change the level of "unloading tur checker", so we don't see it.

@@ -58,7 +58,7 @@ void free_checker (struct checker * c)
                return;
        c->refcount--;
        if (c->refcount) {
-               condlog(3, "%s checker refcount %d",
+               condlog(2, "%s checker refcount %d",
                        c->name, c->refcount);
                return;
        }
@@ -77,6 +77,7 @@ void free_checker (struct checker * c)
                        pthread_join(ct->thread, NULL);
                };
        }
+       condlog(2, "lxk free %s checker", c->name);
        FREE(c);
 }


> I suggest you put a message in tur.c:libcheck_free (), AFTER the call
> to cleanup_context(), printing the values of "running" and "holders"
> Anyway:
> 
> 	holders = uatomic_sub_return(&ct->holders, 1);
> 	if (!holders)
> 		cleanup_context(ct);
> 
> Whatever mistakes we have made, only one actor can have seen 
> holders == 0, and have called cleanup_context().
> 

diff --git a/libmultipath/checkers/tur.c b/libmultipath/checkers/tur.c
index 4ea63af..900f960 100644
--- a/libmultipath/checkers/tur.c
+++ b/libmultipath/checkers/tur.c
@@ -105,8 +105,11 @@ void libcheck_free (struct checker * c)
                        pthread_cancel(ct->thread);
                ct->thread = 0;
                holders = uatomic_sub_return(&ct->holders, 1);
-               if (!holders)
+               if (!holders) {
+                       running = uatomic_xchg(&ct->running, 0);
                        cleanup_context(ct);
+                       condlog(2, "lxk tur running is %d", running);
+               }
                c->context = NULL;
        }
        return;


Here I add running print but it is zero.

> The stacks you have shown indicate that the instruction pointers were
> broken. That would suggest something similar as dicussed in the ML
> thread leading to 38ffd89 ("libmultipath: prevent DSO unloading with
> astray checker threads"). Your logs show "tur checker refcount 1", so
> the next call to checker_put would have unloaded the DSO. 

Here I test 0.8.5 master code with commit 38ffd89. There is no crash
in five hours (without patch, crash happen in running test script
for 30 to 40 minutes.)

Regards,
Lixiaokeng



--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux