On 07/29/2013 08:55 AM, NeilBrown wrote: > Hi Martin. > > I don't think the state change needs to happen while mdmon is in the select > call. It just need to happen between one call to read_and_act, and the next. > And everything happens between one call and the next... > > If sync_action is 'recovery' one time and then something else that isn't > 'idle' the next time, then that would cause the transition to get lost. > Can that ever happen? Do you see a particular transition that bypasses > 'idle'? > It is possible there is some race here... > > I'll try out your test script can see if I can reproduce it. I don't see "recover" followed by anything else but "idle". But what I do observe is that the "recover" status isn't seen at all. I've attached a comparison of good and bad case for the same test. BAD CASE: mdmon processes the metadata update for the first member and writes the metadata. when it calls read_and_act after that, the recovery on the first array is already finished, and it will call set_disk, and call sync_metadata() again. This double metadata write takes a long time. Meanwhile, the manager sent the update for the second member and started the recovery on it. When mdmon comes down to processing this update, the recovery on the 2nd array is already finished, and it never sees "recover" or "frozen" state on it. Consequently, it doesn't realize that there ever was a recovery. The example is somehow pathological because the test arrays are unrealistically small. In normal situations we'd expect recovery to take much longer than 2 meta data writes. However, it doesn't feel good to know that this can happen. My current idea to solve this is yet another separate thread just for monitoring kernel state changes. Don't have it ready yet, though. Martin > > > >> >> Martin >> >> PS: In that context, reading mdmon-design.txt, is it allowed at all to >> add dprintf() messages in the code path called by mdmon? That would also >> affect some DDF methods where I currently have lots of debug code. > > Yes, you can have dprintf messages anywhere. However if debugging is > enabled, then I don't promise that mdmon will even try to survive low memory > conditions. > > NeilBrown
"*" means monitor wakeup array (1) is the RAID10, (0) the RAID5 GOOD (2) Monitor Manager activate_spare(1) * 1:frozen 0: idle * process_update(1) ... activate_spare(0) sync_metadata 1: recover 0: frozen remove_old(md125) * process_update(0) sync_metadata 1: recover 0: recover * 1: recover 0: recover remove_old(md126) ... * 1: idle 0: recover set_disk sync_metadata ... * 1: clean 0: clean set_disk sync_metadata BAD Monitor Manager activate_spare(1) * 1: frozen 0: idle * remove_old(md125) 1: recover 0: idle * process_update(1) ... activate_spare(0) sync_metadata remove_old(md126) 1: idle 0: idle set_disk sync_metadata * process_update(0) sync_metadata 1: idle 0: idle (3s later)