RAID and LVM alignment when expanding PVs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey,
Before posting in here i have spent some time searching for answers
also on the LVM irc channel, I actually don't use the linux raid
(mdadm) but my question on the behavior of mdadm in conjunction with
the LVM might answer it.
The server is going to run MySQL with InnoDB, as i'm not going to
change InnoDB default block size so the blocksize it uses is 16KiB.
I have an LSI Controller with a BBU and 25 SSDs disks, (i have total
28 at the moment, 3 are hot spares)
I created 5 Logical Drives (LD), each one is a RAID5 (4+1) with a
stripe size of 256KiB and presented them to the OS.
These gave me /dev/sdd /dev/sde /dev/sdf /dev/sdg and /dev/sdh (/dev/sd[defgh]).
What i wish to achieve is a full utilization of those LDs using one
block device with a way to expand in the future.
At first i opted to use SW RAID (mdadm) using RAID0, and then create
an LVM on top of it, but at the moment i can't expand RAID0 (i know it
will be possible in the future) so i decided to go with LVM striping.
So i have created 5 PVs aligned at 1024KiB (4*256KiB), though it's
LVM's default in RHEL 6.

$ pvcreate -M2 --pvmetadatacopies 2 --dataalignment=1024k /dev/sdd
$ pvcreate -M2 --pvmetadatacopies 2 --dataalignment=1024k /dev/sde
$ pvcreate -M2 --pvmetadatacopies 2 --dataalignment=1024k /dev/sdf
$ pvcreate -M2 --pvmetadatacopies 2 --dataalignment=1024k /dev/sdg
$ pvcreate -M2 --pvmetadatacopies 2 --dataalignment=1024k /dev/sdh

Then i created the VG:
$ vgcreate -M2 --vgmetadatacopies 2 vg1 /dev/sdd /dev/sde /dev/sdf
/dev/sdg /dev/sdh

And then i created a striped LV with 256KiB stripe size and left 5%
free for snapshots (not that i will do much of them, perhaps once a
week for a backup and then delete it):
$ lvcreate -i 5 -I 256k -n lv1 -l 95%FREE vg1 /dev/sdd /dev/sde
/dev/sdf /dev/sdg /dev/sdh

As i am using XFS i created it with the following parameters:
$ mkfs.xfs -d su=256k,sw=20 /dev/vg1/lv1

And i used the following mount options:
noatime,nodiratime,nobarrier,logbufs=8,logbsize=256k

So everything should be aligned now.

When the time comes and i will need to expand the capacity on the LV i
will not be able to add more PVs into the LV as it's a striped LV
(unless i will multiply it by two, but i won't expand in this rate).
My only option in this setup is to expand the underlying RAID5 by
adding more disks into it, I will grow by a multiple of 2 per LD.
I will then have 5 LDs, each one is RAID5 (6+1) with a stripe size of 256KiB.
I was under the impression it will be simple, i will expand the PVs,
then the VG and then the LV, this will work, but now the PV won't be
aligned (or maybe i'm wrong here?)
Now my stripe width is 1536KiB (6*256KiB), but the PVs were created
with dataalignment of 1024KiB, which means after adding more disks to
the underlying RAID5 LDs I am not longer aligned at stripe boundaries.

As you know, it's not possible to change the PV metadata
(dataalignment) after it was created.
So i checked perhaps i can create those PVs (/dev/sd[defgh]) with no
metadata copies at all (--pvmetadatacopies 0) and then add two small
devices just for the metadata into the VG (I will never expand those
devices, so it will be aligned).
But after creating the PVs with --pvmetadatacopies 0, I saw that it
still saves small metadata (PVUUID, LABEL, etc..) and for obvious
reasons.
I searched and saw it's not possible to write the metadata of the PVs
to the end of the block device so if i'll grow the LDs (PVs) The data
will no longer be aligned.

Because i know i can't extend the LV by adding one or two PVs this is
my only option, my estimated growth this way is up to 160 disks and i
will probably migrate the RAID5 to RAID6 (RAID5 with that amount of
disks is not that reliable), so i will have 5 x RAID6 30+2 at most.

The other option i see that might work is creating a linear LV from
those 5 LDs, this will mean i will be able to grow by adding more PVs
(creating more RAID5 4+1 LDs) to the VG and then extend the LV.
As my MySQL InnoDB default blocksize is 16KiB (and i have no future
plans to change it) In either setup single write/read will go to one
disk.
The problem i have in this setup is that i couldn't make it work, I
know i need to align the XFS allocation group with the LV boundaries,
but i couldn't find a way to do it correctly, during my benchmarks i
utilized only 1 disk and didn't get that much parallel I/O (regardless
of threads).
My other concern is if it's possible after extending the LV to tweak
xfs AGs (so I will still be on the LV boundaries).

I ask the question in here, because I think in either way, be it HW
RAID or SW RAID (mdadm), expanding the PV (underlying block device), i
will be in the same situation, and as i know mdadm and LVM had some
kind of integration (LVM read mdadm sysfs for alignment, etc..)
perhaps i miss something.

Thanks for any help!
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux