Re: Wierd lvm2 performance problems

Luca Berra <bluca@comedia.it> · Wed, 22 Apr 2009 09:38:50 +0200

On Tue, Apr 21, 2009 at 07:24:19PM +0200, Sven Eschenberg wrote:
Hi Luca,

I gave this a little more thought ...

Luca Berra schrieb:
Because when you _write_ incomplete stripes, the raid code
would need to do a read-modify-write of the parity block.

Okay, the question is, how often, if you modify files at random, do you 
really write a full stripe, even if the cache holds back all modification 
for a couple minutes. I wonder how often you can take advantage of this in 
normal mixed load situations.
i am no expert in filesystem internals, but i believe the idea is
minimize r-m-w, non necessarily writing always full stripes
i.e
default raid5 4+1, chunk 64k stripe 256k
you write a 800k file starting with chunk 1233 it has to r-m-w stripe 308
and 311, ad full stripe 309 and 310.
if the fs was aware of the underlying device it would try allocate the
file starting from chunk 1236, resulting in 2 full stripe and only one
r-m-w

Filesystem, like ext3/4 and xfs have the ability to account for stripe
size in the block allocator to prevent unnecessary read-modify-writes,
but if you do not stripe-align the start of the filesystem you cannot
take advantage of this.

Okay, understood, but doesn't this imply, as long as my application running 
on top of an md and/or LV ontop of an md cannot take advantage of the 
layout information, it doesn't matter at all. I do see the advantage, I.E. 
if you have an RDBMS that can operate and organize itself ontop of some 
blockdevice which has a certain layout, or any filesystem taking this into 
account.
In contrast, if I am to export the blockdevice as iSCSI target in a plain 
NAS, this doesn't help me at all.
probably not, unless the iscsi client is also optimized

Now, even if I properly stripe align the pe_start, what happens if I am 
doing a whole disk online capacity expansion? As long as LVM cannot realign 
everything online, and the filesystem can realign itself (or update it's 
layout accordingly) online, this is pretty much pointless.
afaik lvm cannot realign itself automatically, i believe it is doable
manually by pvmoving away the first pe (or the first n pe, depending on
configuration), vgcfgbackup, vi, vgcfgrestore.
then you only have to realign PEs.
another option is planning for possible capacity upgrades and using
n1*n2 .. nn * chunk_size as unit for both pe_start and pe_size *
number_of_pe_i_align_lv_size_to (see my previous mail about non n^2
stripe size).  This is at most 3*4*5*7*chunk_size.
Filesystems _can_ be taught to update their layout (for future writes,
that is): ext3/4 with tune2fs, xfs with sunit/swidth mount options.

In the end it all comes down to, that in most cases aligning doesn't help, 
at leats not, if the whoel array configuration might change over time - or 
am I mistaken there?
It all comes down to, that performance tuning is bound to the
environment we are tuning for. some choices may give performance boosts
in one environment, but be detrimental in another.
Sometimes it is not even clear at a project start what the best route
is, sometimes unforseen changes disrupt a well tought setup.
being able to adapt to all possible future changes is probably
impossible, still a little bit of foretought is not completely wasted.

L.

--
Luca Berra -- bluca@comedia.it
        Communication Media & Services S.r.l.
 /"\
 \ /     ASCII RIBBON CAMPAIGN
  X        AGAINST HTML MAIL
 / \

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/