Re: XFS on top of LVM span in AWS. Stripe or are AG's good enough?

Jeff Gibson <jgibson@xxxxxxxxxxxxxxx> · Tue, 16 Aug 2016 17:05:11 +0000

>On Mon, Aug 15, 2016 at 11:36:14PM +0000, Jeff Gibson wrote:
>> So I'm creating an LVM volume with 8 AWS EBS disks that are
>> spanned (linear) per Redhat's documentation for Gluster
>> (https://access.redhat.com/documentation/en-US/Red_Hat_Storage/3.1/html/Deployment_Guide_for_Public_Cloud/>ch02s03.html#Provisioning_Storage_for_Three-way_Replication_Volumes).
>> 
>> 2 questions-
>> 
>> 1.  Will XFS's Allocation Groups essentially stripe the data for
>> me
>
>No. XFS does not stripe data. It does, however, *distribute* data
>different AGs according to locality policy (e.g. inode32 vs
>inode64), so it uses all the AGs as the directory structure grows.
Poor wording on my part.  By "essentially stripe" I mean distribute data throughout all of the EBS subvolumes instead of just using one EBS subvolume at a time until full.  I do plan on using inode64.

>> or should I stripe the underlying volumes with LVM?
>
>No, you're using EBS. Forget anything you know about storage layout
>and geometry, because EBS has no guaranteed physical layout you can
>optimise for.
Right.  However there could still be some gains from striping due to IOP limits for single volumes. - That is the combined IOPS for all the volumes striped together can be higher than they are for a single volume. 

>> I'm not
>> worried as much about data integrity with a stripe/span since
>> Gluster is doing the redundancy work.
>> 
>> 2.  AWS volumes sometimes have inconsistent performance.  If I
>> understand things correctly, AG's run in parallel.
>
>Define "run". AGs can allocate/free blocks in parallel.
By run I meant read/write data to/from the AGs.

>If IO does
>not require allocation, then AGs play no part in the IO path.
Can you explain this a bit please?  From my understanding data is written and read from space inside of AGs, so I don't see how it couldn't be part of the IO path.  Or do you simply mean reads just use inodes and don't care about the AGs?

>> In a
>> non-striped volume, if some of the AGs are temporarily slower to
>> respond than others due to one of the underlying volumes being
>> slow, will XFS prefer the quicker responding AGs
>
>No, it does not.
>
>> or is I/O always
>> evenly distributed?
>
>No, it is not.
>
>> If XFS prefers the more responsive AG's it
>> seems to me that it would be better NOT to stripe the underlying
>> disk since all AG's that are distributed in a stripe will
>> continuously hit all component volumes, including the slow volume
>> (unless if XFS compensates for this?)
>
>I think you have the wrong idea about what allocation groups do.
I'm reading the XFS File System Structure doc on xfs.org.  It says, "XFS filesystems are divided into a number of equally sized chunks called Allocation Groups. Each AG can almost be thought of as an individual filesystem." so that's where most of my assumptions are coming from.

>They are for maintaining allocation concurrency and locality of
>related objects on disk - they have no influence on where IO is
>directed based on IO load or response time.
I understand that XFS has locality as far as trying to write files to the same AG as the parent directory.  Are there other cases?
I get that it's probably not measuring the responsiveness of each AG. I guess what I'm trying to ask is - will XFS *indirectly* compensate if one subvolume is busier?  For example, if writes to a "slow" subvolume and resident AGs take longer to complete, will XFS tend to prefer to use other less-busy AGs more often (with the exception of locality) for writes?  What is the basic algorithm for determining where new data is written?  In load-balancer terms, does it round-robin, pick the least busy, etc?

Thank you very much!
JG

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs