Re: Why LVM metadata locations are not properly aligned

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21.4.2016 06:08, Ming-Hung Tsai wrote:
Hi,

I'm trying to find any opportunity to accelerate LVM metadata IO, in order to
take lvm-thin snapshots in a very short time. My scenario is connecting
lvm-thin volumes to a Windows host, then taking snapshots on those volumes for
Windows VSS (Volume Shadow Copy Service). Since that the Windows VSS can only
suspend IO for 10 seconds, LVM should finish taking snapshots within 10 seconds.


Hmm do you observe taking a snapshot takes more then a second ?
IMHO the largest portion of time should be the 'disk' synchronization
when suspending  (full flush and fs sync)
Unless you have lvm2 metadata in range of MiB (and lvm2 was not designed for that) - you should be well bellow a second...

However, it's hard to achieve that if the PV is busy running IO. The major

Changing disk scheduler to deadline ?
Lowering percentage of dirty-pages ?


overhead is LVM metadata IO. There are some issues:

While your questions are valid points for discussion - you will save couple disk reads - but this will not save your time problem a lot if you have overloaded disk I/O system.
Note lvm2 is using direct I/O which is your trouble maker here I guess...


1. The metadata locations (raw_locn::offset) are not properly aligned.
    Function _aligned_io() requires the IO to be logical-block aligned,
    but metadata locations returned by next_rlocn_offset() are 512-byte aligned.
    If a device's logical block size is greater than 512b, then LVM need to use
    bounce buffer to do the IO.
    How about setting raw_locn::offset to logical-block boundary?
    (or max(logical_block_size, physical_block_size) for 512-byte logical-/4KB
     physical-block drives?)

This looks like a bug - lvm2 should start to write metadata always on physical block aligned position.


2. In most cases, the memory buffers passed to dev_read() and dev_write() are
    not aligned. (e.g, raw_read_mda_header(), _find_vg_rlocn())

3. Why LVM uses such complex process to update metadata?
    The are three operations to update metadata: write, pre-commit, then commit.
    Each operation requires one header read (raw_read_mda_header),
    one metadata checking (_find_vg_rlocn()), and metadata update via bounce
    buffer. So we need at least 9 reads and 3 writes for one PV.
    Could we simplify that?

It's been already simplified once ;) and we have lost quite important property
of validation of written data during pre-commit - which is quite useful when
user is running on misconfigured multipath device...

Each state has its logic and with each state we need to be sure data are there. This doesn't sound like a problem with a single PV - but in a server world of many different kind of misconfiguration and failing devices it may be more important then you might think.

The valid idea might be - to maybe support 'riskier' variant of metadata update, where lvm2 might skip some disk security checking, but may not catch all trouble associated - thus you may run for days with dm table you will not find then in your lvm2 metadata....



4. Commit fb003cdf & a3686986 causes additional metadata read.
    Could we improve that? (We had checked the metadata in _find_vg_rlocn())

Fight with disk corruption and duplications is a major topic in lvm2....
But ATM are fishing for bigger fish :)
So yes this optimizations are in a queue - but not as top priority.


5. Feature request: could we take multiple snapshots in a batch, to reduce
    the number of metadata IO operations?
    e.g., lvcraete vg1/lv1 vg1/lv2 vg1/lv3 --snapshot
    (I know that it would be trouble for the --addtag options...)

Yes another already existing and planned RFE - to have support for
atomic snapshot for multiple device at once - in a queue.


    This post mentioned that lvresize will support resizing multiple volumes,

It's not about resizing mutliple volume with once command,
it's about resizing data & metadata in one command via policy more correctly/

    but I think that taking multiple snapshots is also helpful.
    https://www.redhat.com/archives/linux-lvm/2016-February/msg00023.html
    > There is also some ongoing work on better lvresize support for more then 1
    > single LV. This will also implement better approach to resize of lvmetad
    > which is using different mechanism in kernel.

    Possible IOCTL sequence:
      dm-suspend origin0
      dm-message create_snap 3 0
      dm-message set_transaction_id 3 4

Every transaction update here - needs lvm2 metadata confirmation - i.e. double-commit lvm2 does not allow to jump by more then 1 transaction here,
and the error path also cleans 1 transaction.


      dm-resume origin0
      dm-suspend origin1
      dm-message create_snap 4 1
      dm-message set_transaction_id 4 5
      dm-resume origin1
      dm-suspend origin2
      dm-message create_snap 5 2
      dm-message set_transaction_id 5 6
      dm-resume origin2
      ...

6. Is there any other way to accelerate LVM operation? I had enabled lvmetad,
    setting global_filter and md_component_detection=0 in lvm.conf.

Reducing number of PVs with metadata in case your VG has lots of PVs
(may reduce metadata resistance in case PVs with them are lost...)

Filters are magic - try to accept only devices which are potential PVs and reject everything else. (by default every device is accepted and scanned...)

Disabling archiving & backup in filesystem (in lvm.conf) may help a lot if you run lots of lvm2 commands and you do not care about archive.

Checking /etc/lvm/archive is not full of thousands of files.

Checking with  'strace -tttt' what delays your command.

And yes - there are always couple on going transmutation in lvm2 which may have introduced some performance regression - so open BZ is always useful if you spot such thing.

Regards

Zdenek

_______________________________________________
linux-lvm mailing list
linux-lvm@redhat.com
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/



[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux