On November 24, 2014 12:28:08 PM EST, Anshuman Aggarwal <anshuman.aggarwal@xxxxxxxxx> wrote: >On 24 November 2014 at 18:49, Greg Freemyer <greg.freemyer@xxxxxxxxx> >wrote: >> >> >> On November 24, 2014 1:48:48 AM EST, Anshuman Aggarwal ><anshuman.aggarwal@xxxxxxxxx> wrote: >>>Sandeep, >>> This isn't exactly RAID4 (only thing in common is a single parity >>>disk but the data is not striped at all). I did bring it up on the >>>linux-raid mailing list and have had a short conversation with Neil. >>>He wasn't too excited about device mapper but didn't indicate why or >>>why not. >> >> If it was early in your proposal it may simply be he didn't >understand it. >> >> The delayed writes to the parity disk you described would have been >tough for device mapper to manage. It doesn't typically maintain its >own longer term buffers, so that would have been something that might >have given him concern. The only reason you provided was reduced wear >and tear for the parity drive. >> >> Reduced wear and tear in this case is a red herring. The kernel >already buffers writes to the data disk, so no need to separately >buffer parity writes. > >Fair enough, the delay in buffering for the parity writes is an >independent issue which can be deferred easily. > >> >>>I would like to have this as a layer for each block device on top of >>>the original block devices (intercepting write requests to the block >>>devices and updating the parity disk). Is device mapper the write >>>interface? >> >> I think yes, but dm and md are actually separate. I think of dm as a >subset of md, but if you are going to really do this you will need to >learn the details better than I know them: >> >> https://www.kernel.org/doc/Documentation/device-mapper/dm-raid.txt >> >> You will need to add code to both the dm and md kernel code. >> >> I assume you know that both mdraid (mdadm) and lvm userspace tools >are used to manage device mapper, so you would have to add user space >support to mdraid/lvm as well. >> >>> What are the others? >> >> Well btrfs as an example incorporates a lot of raid capability into >the filesystem. Thus btrfs is a monolithic driver that has consumed >much of the dm/md layer. I can't speak to why they are doing that, but >I find it troubling. Having monolithic aspects to the kernel has >always been something the Linux kernel avoided. >> >>> Also if I don't store the metadata on >>>the block device itself (to allow the block device to be unaware of >>>the RAID4 on top...how would the kernel be informed of which devices >>>together form the Split RAID. >> >> I don't understand the question. > >mdadm typically has a metadata superblock stored on the block device >which identifies the block device as part of the RAID and typically >prevents it from directly recognized by file system code . I was >wondering if Split RAID block devices can be made to be unaware to the >RAID scheme on top and be fully mountable and usable without the raid >drivers (of course invalidating the parity if any of them are written >to). This allows a parity disk to be added to existing block devices >without having to setup the superblock on the underlying devices. > >Hope that is clear now? Thank you, I knew about the superblock, but didn't realize that was what you were talking about. Does this address your desire? https://raid.wiki.kernel.org/index.php/RAID_superblock_formats#mdadm_v3.0_--_Adding_the_Concept_of_User-Space_Managed_External_Metadata_Formats Fyi: I'm ignorant of any real details and I have not used the above new feature, but it seems to be what you asking for. >> >> I haven't thought through the process, but with mdraid/lvm you would >identify the physical drives as under dm control. (mdadm for md, >pvcreate for dm). Then configure the split raid setup. >> >> Have you gone through the process of creating a raid5 with mdadm. If >not at least read a howto about it. >> >> https://raid.wiki.kernel.org/index.php/RAID_setup > >Actually, I have maintained a RAID5, RAID6 6 disk cluster with mdadm >for more than a few years and handled multiple failures. I am >reasonably familiar with md reconstruction too. It is the performance >oriented but disk intensive nature of mdadm that I would like to vary >on for a home media server. > >> >> I assume you would have mdadm form your multi-disk split raid volume >composed of all the physical disks, then use lvm commands to define the >block range on the the first drive as a lv (logical volume). Same for >the other data drives. >> >> Then use mkfs to put a filesystem on each lv. > >Maybe it can also be done via md raid creating a partitionable array >where each partition corresponds to an underlying block device without >any striping. > I think I agree. >> >> The filesystem has no knowledge there is a split raid below it. It >simply reads/writes to the overall, device mapper is layered below it >and triggers the required i/o calls. >> >> Ie. For a read, it is a straight passthrough. For a write, the old >data and old parity have to be read in, modified, written out. Device >mapper does this now for raid 4/5/6, so most of the code is in place. > >Exactly. Reads are passthrough, writes lead to the parity write being >triggered. Only remaining concern for me is that the md super block >will require block device to be initialized using mdadm. That can be >acceptable I suppose, but an ideal solution would be able to use >existing block devices (which would be untouched)...put passthrough >block device on top of them and manage the parity updation on the >parity block device. The information about which block devices >comprise the array can be stored in a config file etc and does not >need a superblock as badly as a raid setup. Hopefully the new user space feature does just that. Greg -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. _______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies