Re: raid1 bitmap and multiple removed disks

NeilBrown <neilb@xxxxxxxx> · Thu, 24 Nov 2016 11:26:54 +1100

On Wed, Nov 23 2016, Diego Guella wrote:

> (2nd attempt: the previous one didn't make it)
> Hi,
>
> I am using linux raid1 for a double-purpose: redundancy and backup.
>
> I have a raid1 array of 5 disks, 3 of which are kept for backup purposes.
> Let's call disks A, B, C, D, E.
> Disks A and B are _always_ connected to the system.
> Disks C, D, E are backup disks.
> Here follows a description of how I use the backup disks.
> This morning I connect disk C, and let it resync.
> Tomorrow morning, I shut down the system, remove disk C and keep it away 
> as a daily backup.
> I connect the next disk (D), then start up the system.
> Linux raid1 recognizes the "old" disk and does not allow it to enter the 
> array (this is evidenced by system logs).
> I then add disk D to the array, and let it resync.

So this would be a full resync - right?

> The next day, I connect the next disk (E), and so on, rotating them.
> The "connect and disconnect" is always performed when the system is 
> powered off, although sometimes I hot-connect the disk with the system 
> already powered up.
> The purpose of this is to have an emergency backup: I can disconnect ALL 
> disks from the system and connect only one of the daily backups, going 
> "back to the past"(TM).
>
> This array has a write-intent bitmap, in order to speed up the resync 
> (it is a 4TB array, and sometimes it needs nearly 20 hours to resync 
> without bitmaps due to system load).
>
> This worked flawlessly (for some years) until some days ago, when the 
> array suffered a strange inconsistency, and the filesystem nearly gone 
> nuts in about 20 minutes of uptime. I will elaborate more on this
> later.

Did you ever test your backups?

>
> Since that problem happened, some questions come to my mind:
> What raid1 bitmaps allow me to do?

- accelerate resync after a crash.
- accelerate recovery when you remove a drive and re-add it.

> Can they record _correctly_ the state of multiple removed disks, in 
> order to overwrite only out-of-sync chunks of multiple removed disks?

All that is recorded is the set of regions which have been written to
since the array was last in a non-degraded state.

> In other words, am I allowed to do what I described above?

If the recovery that happened when you swapped drives was not a full
recovery, then probably not.

> If not, can I change something in my actions in order to have a daily 
> backup using raid1?

I wrote something about this a few years ago...
 http://permalink.gmane.org/gmane.linux.raid/35074

or this thread
  http://www.spinics.net/lists/raid/msg35532.html

NeilBrown

>
>
> System details:
> # cat /etc/debian_version
> 6.0.10
> # mdadm --version
> mdadm - v3.1.4 - 31st August 2010
Attachment:
signature.asc

Description: PGP signature