On Wed, 16 Nov 2011 03:13:51 +0400 CoolCold <coolthecold@xxxxxxxxx> wrote: > As I promised I was collecting data, but forgot to return to that > problem, bumping thread returned me to that state ;) > So, data was collected for almost the month - from 31 August to 26 September: > root@gamma2:/root# grep -A 1 dirty component_examine.txt |head > Bitmap : 44054 bits (chunks), 190 dirty (0.4%) > Wed Aug 31 17:32:16 MSD 2011 > > root@gamma2:/root# grep -A 1 dirty component_examine.txt |tail -n 2 > Bitmap : 44054 bits (chunks), 1 dirty (0.0%) > Mon Sep 26 00:28:33 MSD 2011 > > As i can understand from that dump, it was bitmap examination (-X key) > of component /dev/sdc3 of raid /dev/md3. > Decreasing happend, though after some increase on 23 of September, and > first decrease to 0 happened on 24 of September (line number 436418). > > So almost for month, dirty count was no decreasing! > I'm attaching that log, may be it will help somehow. Thanks a lot. Any idea what happened at on Fri Sep 23?? Between 6:23am and midnight the number of dirty bits dropped from 180 to 2. This does seem to suggest that md is just losing track of some of the pages of bits and once they are modified again md remembers to flush them and write them out - which is a fairly safe way to fail. The one issue I have found is that set_page_attr uses a non-atomic __set_bit because it should always be called under a spinlock. But bitmap_write_all() - which is called when a spare is added - calls it without the spinlock so that could corrupt some of the bits. Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature