Fwd: how do i push my changes in dm layer to main stream

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 






Hello,
I'm working on device-mapper multipath (dm-multipath).
This patch set adds a new hook for device-mapper in deciding the health of the
Of the multipath which helps in getting the deterministic Application IO throughput.
 
This patch set is preliminary tested on active-active 2 paths storage.
But the patch set still needs work and is not ready for inclusion.
I'm posting it because I'd like to get comments about high-level
design before going further in details.
 
 
 
This patch set should be applied on top of 3.10.0 #18
 
 
====================================================================
Background
=-=-=-=-=-=
 
        “Sick but not Dead” MPIO Path
       Path goes into Failed state because of path IO error as seen by DM driver
       When the multipath daemon issues TUR command  finds health of the failed path is good, makes the same path into Active state
       Path repeatedly toggles between Failed and Active Path States
        DM IO is retried on path where we are hitting multiple errors.
        Causing erratic (non-deterministic) Application IO throughput
 
The current existing DM layer doesn't consider the amount of errors to decide the health of the path.
Since the failed path is becoming active immediately when the tur command succeeds the end user will be in a
Assumption that all the multipaths are in good state.
When we run some of the field tests with this scenario we saw a non-deterministic io throughput
 
 
 
 
=====================================================================
Design Overview
=-=-=-=-=-=-=-=-=
 
        Deterministically bring the path to “Faulty” state 
       Configure per-DM device data with
        IO error threshold and time window for the error threshold to be hit 
       Declare a path Faulty when error threshold is hit within the configured time window 
       Place the path in the failed state for a predefined time configured by the administrator
 using the config file
       Even though multipath daemon validates the path using TUR command which succeeds
 and tries to re-instantiate the path ignore the re-instantiate of the path for a predefined time if the err threshold is hit.
        Give time for Administrator to correct the “Sick But not Dead” path and bring Path to Active
        Auto Enablement of a Faulty Path to Active State after a fixed time duration (given as a config data for each DM)
       Admin can set the Deterministic MPIO behavior on per-DM device basis
-         It implies the failed path will be reinstantiated  either by admin or when the timeout expires.
        The above configs will be made persistent across server reboots
 
Expected benefit:
-Deterministic Application IO throughput.
-We can give a time for the administrator to analyze the path failure and recover the path.
- user space tools need minimum change .
 
The above feature will be enabled only if the corresponding variables are defined in multipath.conf
 
Since these changes are irrespective of the underlying algorithms which they are using in dm layer.
The changes are applied in dm.c and dm-mpath.c
 
alloc_dev(),reinstate_path(),parse_path(),fail_path() are the functions which are going to be changed.
 
 
Need more comments on this as we started the testing and the results look determenestic.


Regards,
Muneendra.

On Thu, Dec 15, 2016 at 3:00 PM, muneendra kumar <muneendra737@xxxxxxxxx> wrote:
Hello,
This is the place where iam currently working and the details are given below
 
I'm working on device-mapper multipath (dm-multipath).
This patch set adds a new hook for device-mapper in deciding the health of the
Of the multipath which helps in getting the deterministic Application IO throughput.
 
This patch set is preliminary tested on active-active 2 paths storage.
But the patch set still needs work and is not ready for inclusion.
I'm posting it because I'd like to get comments about high-level
design before going further in details.
 
 
 
This patch set should be applied on top of 3.10.0 #18
 
 
====================================================================
Background
=-=-=-=-=-=
 
        “Sick but not Dead” MPIO Path
       Path goes into Failed state because of path IO error as seen by DM driver
       When the multipath daemon issues TUR command  finds health of the failed path is good, makes the same path into Active state
       Path repeatedly toggles between Failed and Active Path States
        DM IO is retried on path where we are hitting multiple errors.
        Causing erratic (non-deterministic) Application IO throughput
 
The current existing DM layer doesn't consider the amount of errors to decide the health of the path.
Since the failed path is becoming active immediately when the tur command succeeds the end user will be in a
Assumption that all the multipaths are in good state.
When we run some of the field tests with this scenario we saw a non-deterministic io throughput
 
 
 
 
=====================================================================
Design Overview
=-=-=-=-=-=-=-=-=
 
        Deterministically bring the path to “Faulty” state 
       Configure per-DM device data with
        IO error threshold and time window for the error threshold to be hit 
       Declare a path Faulty when error threshold is hit within the configured time window 
       Place the path in the failed state for a predefined time configured by the administrator
 using the config file
       Even though multipath daemon validates the path using TUR command which succeeds
 and tries to re-instantiate the path ignore the re-instantiate of the path for a predefined time if the err threshold is hit.
        Give time for Administrator to correct the “Sick But not Dead” path and bring Path to Active
        Auto Enablement of a Faulty Path to Active State after a fixed time duration (given as a config data for each DM)
       Admin can set the Deterministic MPIO behavior on per-DM device basis
-         It implies the failed path will be reinstantiated  either by admin or when the timeout expires.
        The above configs will be made persistent across server reboots
 
Expected benefit:
-Deterministic Application IO throughput.
-We can give a time for the administrator to analyze the path failure and recover the path.
- user space tools need minimum change .
 
 
Since these changes are irrespective of the underlying algorithms which they are using in dm layer.
The changes are applied in dm.c and dm-mpath.c
 
alloc_dev(),reinstate_path(),parse_path(),fail_path() are the functions which are going to be changed.
 
 
Need more comments on this as we started the testing and the results look determenestic.

On Mon, Dec 5, 2016 at 9:35 PM, muneendra kumar <muneendra737@xxxxxxxxx> wrote:
Thanks a lot for sharing the info.
I will discuss the problem in detail in my earlier mail,

Regards,
Muneendra.

On Mon, Dec 5, 2016 at 5:45 PM, Zdenek Kabelac <zkabelac@xxxxxxxxxx> wrote:
Dne 5.12.2016 v 07:29 muneendra kumar napsal(a):
Hi,
This is a general question.
If i do any changes in both multipath tool and dm driver (kernel).
How do i push my changes into main stream.
Can someone explain me the process so that it will help me a lot.



Hi

You propose your changes here on the list - you get a review and
it the patches are found useful - maintainer of dm subsystem
will accept them.

Note - it's usually better to ask and discuss 'ahead' what is your problem
and how do you want to improve/fix it.
So you avoid losing time on implementing unacceptable patch.

Regards

Zdenek







--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel

[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux