On 04/08/2015 06:55 PM, NeilBrown wrote:
On Wed, 8 Apr 2015 14:24:14 -0500 Goldwyn Rodrigues <rgoldwyn@xxxxxxx> wrote:
This extends the capabilites of re-adding a failed device
to the clustering environment.
A new function gather_bitmaps gathers set bits from bitmaps of
all nodes, sends a message to all nodes to readd the disk
and then initiates the recovery process.
Question: Do you see a race in sending a READD and then performing
the bitmap resync/recovery? Should the initiating node perform the
recovery before sending the READD message? The recovery will send a
METADATA_UPDATE anyways.
The RE-ADD has to happen *before* the bitmaps are gathered.
After the RE-ADD, all writes will go to the new device.
Any write before that RE-ADD will be recorded in the bitmap.
To ensure that the recovery handles all regions affected by writes, it needs
to know about all writes that didn't go to the new device. So it needs to
collect bitmaps only once new writes have started going to the new device.
Is that clear? If not, I'll try again.
Yes, I understood your point. Performing the re-add later would miss on
the ones between the recovery and the re-add.
--
Goldwyn
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html