On Tue, 30 Nov 2010 16:03:16 +0000 "Kwolek, Adam" <adam.kwolek@xxxxxxxxx> wrote: > The problem is that, when raid0 array is about unfreezing and this is single/last array in container, > Ping to this container causes to mdmon not to exit. > In such condition managemon receives message and in handle_message() for ping case, calls wakeup_monitor() > and then goes in to loop for monitor_loop_cnt update > 1. this occurs after timeout > 2. when this happens managemon stops on pselect() and as there is nothing to monitor in never wakeups. > 3. monitor waits to be allowed to exit on open handlers. > > How can this be resolved: > 1. do not ping for last raid0 array during unfreezing (I've reworked patch to meet this condition) > 2. guard waiting for monitor_loop_cnt change in handle_message() with: > if (container->arrays) > > 3. change in manage member condition: > if (sigterm) > Wakeup_monitor(); > > To > if (sigterm || (container->arrays == NULL)) > Wakeup_monitor(); > > This causes additional monitor wakeup. > > Any of method causes mdmon to exit as expected. > In cases 2 and 3 it takes a while (we are waiting on communication timeouts). > Method 1 is fast and we are not blocking mdmon exit by communication. Thanks for the explanation! I definitely want to fix the managemon/monitor interaction so that it doesn't hang as you describe. I might end up with something a lot more heavy-weight that the changes you suggest. It might still be OK to include your option '1' as well - I decide when you post the patch. thanks, NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html