Yes, you are right. The added information might not be worth the extra allocation in the global vector entries, especially now that round-robin is not even the default selector anymore. On Thu, Feb 20, 2014 at 6:55 PM, Merla, ShivaKrishna <ShivaKrishna.Merla@xxxxxxxxxx> wrote: >> -----Original Message----- >> From: christophe.varoqui@xxxxxxxxx >> [mailto:christophe.varoqui@xxxxxxxxx] On Behalf Of Christophe Varoqui >> Sent: Thursday, February 20, 2014 12:58 AM >> To: Merla, ShivaKrishna >> Cc: dm-devel@xxxxxxxxxx; hare@xxxxxxx >> Subject: Re: [PATCH]multipath-tools: Re-ordering of child paths in priority >> group for round-robin path selector >> >> Interesting work indeed. >> With all this new information gathered, it would be nice to have new >> 'show' wildcards to report it. > > Hi Christophe, thanks for your inputs. I'm not sure if users are interested in > PCI information to be displayed. We gather the info dynamically to group > paths and reorder and currently we don't store this information in global > vectors or path group structures in mp. Also this will be needed for only > round-robin path selector as others will select paths in different order than > pushed by multipath. So having "show" wildcards to report this data might > not be good idea. Let me know if you think of other way. >> >> best regards, >> Christophe Varoqui >> OpenSVC >> >> On Thu, Feb 20, 2014 at 12:25 AM, Merla, ShivaKrishna >> <ShivaKrishna.Merla@xxxxxxxxxx> wrote: >> > The default Linux bus scanning order is depth first. Due to this default >> > ordering of paths in priority group is by host. With round robin path selector >> > this will lead to i/o clumping on each host controller and starvation on other >> > host controller for certain duration before routing i/o to alternate one. >> > Even with multiple LUNs sharing same host controllers, it will not yield flat >> > evenly distributed i/o pattern. This will impact performance when more >> than >> > eight active paths are available per LUN. This will be even more worse >> when >> > repeat count is more than one. >> > >> > This patch addresses this issue by re-ordering paths in priority group by >> > alternate host ports and even PCI adapters. This re-ordering is only >> necessary >> > for round-robin path selector. We have observed better IOPS and >> Throughput with >> > this crafted ordering of paths. >> > >> > Paths belonging to same host are grouped into host groups and hosts from >> same >> > adapter are grouped into adapter groups. Later paths are selected to yield >> > crafted order to help route i/o to alternate host ports and adapters >> available >> > within priority group of LUN. >> > >> > Signed-off-by: Shiva Krishna Merla<shivakrishna.merla@xxxxxxxxxx> >> > Reviewed-by: Krishnasamy >> Somasundaram<somasundaram.krishnasamy@xxxxxxxxxx> >> > --- >> > libmultipath/configure.c | 224 >> ++++++++++++++++++++++++++++++++++++++++++++++ >> > libmultipath/discovery.c | 87 ++++++++++++++++++ >> > libmultipath/structs.c | 84 +++++++++++++++++ >> > libmultipath/structs.h | 25 +++++- >> > 4 files changed, 419 insertions(+), 1 deletions(-) >> > >> > diff --git a/libmultipath/configure.c b/libmultipath/configure.c >> > index 8c09791..74e5a0b 100644 >> > --- a/libmultipath/configure.c >> > +++ b/libmultipath/configure.c >> > @@ -39,6 +39,214 @@ >> > #include "uxsock.h" >> > #include "wwids.h" >> > >> > +/* group paths in pg by host adapter >> > + */ >> > +int group_by_host_adapter(struct pathgroup *pgp, vector adapters) >> > +{ >> > + struct adapter_group *agp; >> > + struct host_group *hgp; >> > + struct path *pp, *pp1; >> > + char adapter_name1[SLOT_NAME_SIZE]; >> > + char adapter_name2[SLOT_NAME_SIZE]; >> > + int i, j; >> > + int found_hostgroup = 0; >> > + >> > + while (VECTOR_SIZE(pgp->paths) > 0) { >> > + >> > + pp = VECTOR_SLOT(pgp->paths, 0); >> > + >> > + if (sysfs_get_host_adapter_name(pp, adapter_name1)) >> > + return 1; >> > + /* create a new host adapter group >> > + */ >> > + agp = alloc_adaptergroup(); >> > + if (!agp) >> > + goto out; >> > + agp->pgp = pgp; >> > + >> > + strncpy(agp->adapter_name, adapter_name1, >> SLOT_NAME_SIZE); >> > + store_adaptergroup(adapters, agp); >> > + >> > + /* create a new host port group >> > + */ >> > + hgp = alloc_hostgroup(); >> > + if (!hgp) >> > + goto out; >> > + if (store_hostgroup(agp->host_groups, hgp)) >> > + goto out; >> > + >> > + hgp->host_no = pp->sg_id.host_no; >> > + agp->num_hosts++; >> > + if (store_path(hgp->paths, pp)) >> > + goto out; >> > + >> > + hgp->num_paths++; >> > + /* delete path from path group >> > + */ >> > + vector_del_slot(pgp->paths, 0); >> > + >> > + /* add all paths belonging to same host adapter >> > + */ >> > + vector_foreach_slot(pgp->paths, pp1, i) { >> > + if (sysfs_get_host_adapter_name(pp1, adapter_name2)) >> > + goto out; >> > + if (strcmp(adapter_name1, adapter_name2) == 0) { >> > + found_hostgroup = 0; >> > + vector_foreach_slot(agp->host_groups, hgp, j) { >> > + if (hgp->host_no == pp1->sg_id.host_no) { >> > + if (store_path(hgp->paths, pp1)) >> > + goto out; >> > + hgp->num_paths++; >> > + found_hostgroup = 1; >> > + break; >> > + } >> > + } >> > + if (!found_hostgroup) { >> > + /* this path belongs to new host port >> > + * within this adapter >> > + */ >> > + hgp = alloc_hostgroup(); >> > + if (!hgp) >> > + goto out; >> > + >> > + if (store_hostgroup(agp->host_groups, hgp)) >> > + goto out; >> > + >> > + agp->num_hosts++; >> > + if (store_path(hgp->paths, pp1)) >> > + goto out; >> > + >> > + hgp->host_no = pp1->sg_id.host_no; >> > + hgp->num_paths++; >> > + } >> > + /* delete paths from original path_group >> > + * as they are added into adapter group now >> > + */ >> > + vector_del_slot(pgp->paths, i); >> > + i--; >> > + } >> > + } >> > + } >> > + return 0; >> > + >> > +out: /* add back paths into pg as re-ordering failed >> > + */ >> > + vector_foreach_slot(adapters, agp, i) { >> > + vector_foreach_slot(agp->host_groups, hgp, j) { >> > + while (VECTOR_SIZE(hgp->paths) > 0) { >> > + pp = VECTOR_SLOT(hgp->paths, 0); >> > + if (store_path(pgp->paths, pp)) >> > + condlog(3, "failed to restore " >> > + "path %s into path group", >> > + pp->dev); >> > + vector_del_slot(hgp->paths, 0); >> > + } >> > + } >> > + } >> > + free_adaptergroup(adapters); >> > + return 1; >> > +} >> > + >> > +/* re-order paths in pg by alternating adapters and host ports >> > + * for optimized selection >> > + */ >> > +int order_paths_in_pg_by_alt_adapters(struct pathgroup *pgp, vector >> adapters, >> > + int total_paths) >> > +{ >> > + int next_adapter_index = 0; >> > + int num_adapters = 0; >> > + struct adapter_group *agp; >> > + struct host_group *hgp; >> > + struct path *pp; >> > + >> > + num_adapters = VECTOR_SIZE(adapters); >> > + >> > + while (total_paths > 0) { >> > + agp = VECTOR_SLOT(adapters, next_adapter_index); >> > + >> > + hgp = VECTOR_SLOT(agp->host_groups, agp->next_host_index); >> > + >> > + if (!hgp->num_paths) { >> > + agp->next_host_index++; >> > + agp->next_host_index %= agp->num_hosts; >> > + next_adapter_index++; >> > + next_adapter_index %= VECTOR_SIZE(adapters); >> > + continue; >> > + } >> > + >> > + pp = VECTOR_SLOT(hgp->paths, 0); >> > + >> > + if (store_path(pgp->paths, pp)) >> > + return 1; >> > + >> > + total_paths--; >> > + >> > + vector_del_slot(hgp->paths, 0); >> > + >> > + hgp->num_paths--; >> > + >> > + agp->next_host_index++; >> > + agp->next_host_index %= agp->num_hosts; >> > + next_adapter_index++; >> > + next_adapter_index %= VECTOR_SIZE(adapters); >> > + } >> > + >> > + /* all paths are added into path_group >> > + * in crafted child order >> > + */ >> > + return 0; >> > +} >> > + >> > +/* round-robin: order paths in path group to alternate >> > + * between all host adapters >> > + */ >> > +int rr_optimize_path_order(struct pathgroup *pgp) >> > +{ >> > + vector adapters; >> > + struct path *pp; >> > + int total_paths; >> > + int i; >> > + >> > + total_paths = VECTOR_SIZE(pgp->paths); >> > + vector_foreach_slot(pgp->paths, pp, i) { >> > + if (pp->sg_id.proto_id != SCSI_PROTOCOL_FCP || >> > + pp->sg_id.proto_id != SCSI_PROTOCOL_SAS || >> > + pp->sg_id.proto_id != SCSI_PROTOCOL_ISCSI || >> > + pp->sg_id.proto_id != SCSI_PROTOCOL_SRP) { >> > + /* return success as default path order >> > + * is maintained in path group >> > + */ >> > + return 0; >> > + } >> > + } >> > + adapters = vector_alloc(); >> > + if (!adapters) >> > + return 0; >> > + >> > + /* group paths in path group by host adapters >> > + */ >> > + if (group_by_host_adapter(pgp, adapters)) { >> > + condlog(3, "Failed to group paths by adapters"); >> > + free_adaptergroup(adapters); >> > + return 0; >> > + } >> > + >> > + /* re-order paths in pg to alternate between adapters and host ports >> > + */ >> > + if (order_paths_in_pg_by_alt_adapters(pgp, adapters, total_paths)) { >> > + condlog(3, "Failed to re-order paths in pg by adapters " >> > + "and host ports"); >> > + free_adaptergroup(adapters); >> > + /* return failure as original paths are >> > + * removed form pgp >> > + */ >> > + return 1; >> > + } >> > + >> > + free_adaptergroup(adapters); >> > + return 0; >> > +} >> > + >> > extern int >> > setup_map (struct multipath * mpp, char * params, int params_size) >> > { >> > @@ -100,6 +308,22 @@ setup_map (struct multipath * mpp, char * >> params, int params_size) >> > */ >> > mpp->bestpg = select_path_group(mpp); >> > >> > + /* re-order paths in all path groups in an optimized way >> > + * for round-robin path selectors to get maximum throughput. >> > + */ >> > + if (!strncmp(mpp->selector, "round-robin", 11)) { >> > + vector_foreach_slot(mpp->pg, pgp, i) { >> > + if (VECTOR_SIZE(pgp->paths) <= 2) >> > + continue; >> > + if (rr_optimize_path_order(pgp)) { >> > + condlog(2, "cannot re-order paths for " >> > + "optimization: %s", >> > + mpp->alias); >> > + return 1; >> > + } >> > + } >> > + } >> > + >> > /* >> > * transform the mp->pg vector of vectors of paths >> > * into a mp->params strings to feed the device-mapper >> > diff --git a/libmultipath/discovery.c b/libmultipath/discovery.c >> > index 228ffd3..b88edd5 100644 >> > --- a/libmultipath/discovery.c >> > +++ b/libmultipath/discovery.c >> > @@ -317,6 +317,93 @@ sysfs_get_tgt_nodename (struct path *pp, char * >> node) >> > return 1; >> > } >> > >> > +int sysfs_get_host_adapter_name(struct path *pp, char *adapter_name) >> > +{ >> > + int proto_id; >> > + >> > + if (!pp || !adapter_name) >> > + return 1; >> > + >> > + proto_id = pp->sg_id.proto_id; >> > + >> > + if (proto_id != SCSI_PROTOCOL_FCP || >> > + proto_id != SCSI_PROTOCOL_SAS || >> > + proto_id != SCSI_PROTOCOL_ISCSI || >> > + proto_id != SCSI_PROTOCOL_SRP) { >> > + return 1; >> > + } >> > + /* iscsi doesn't have adapter info in sysfs >> > + * get ip_address for grouping paths >> > + */ >> > + if (pp->sg_id.proto_id == SCSI_PROTOCOL_ISCSI) >> > + return sysfs_get_iscsi_ip_address(pp, adapter_name); >> > + >> > + /* fetch adapter pci name for other protocols >> > + */ >> > + return sysfs_get_host_pci_name(pp, adapter_name); >> > +} >> > + >> > +int sysfs_get_host_pci_name(struct path *pp, char *pci_name) >> > +{ >> > + struct udev_device *hostdev, *parent; >> > + char host_name[HOST_NAME_LEN]; >> > + char *driver_name, *value; >> > + >> > + if (!pp || !pci_name) >> > + return 1; >> > + >> > + sprintf(host_name, "host%d", pp->sg_id.host_no); >> > + hostdev = udev_device_new_from_subsystem_sysname(conf- >> >udev, >> > + "scsi_host", host_name); >> > + if (!hostdev) >> > + return 1; >> > + >> > + parent = udev_device_get_parent(hostdev); >> > + while (parent) { >> > + driver_name = udev_device_get_driver(parent); >> > + if (!driver_name) { >> > + parent = udev_device_get_parent(parent); >> > + continue; >> > + } >> > + if (!strcmp(driver_name, "pcieport")) >> > + break; >> > + parent = udev_device_get_parent(parent); >> > + } >> > + if (parent) { >> > + /* pci_device found >> > + */ >> > + value = udev_device_get_sysname(parent); >> > + >> > + strncpy(pci_name, value, SLOT_NAME_SIZE); >> > + udev_device_unref(hostdev); >> > + return 0; >> > + } >> > + udev_device_unref(hostdev); >> > + return 1; >> > +} >> > + >> > +int sysfs_get_iscsi_ip_address(struct path *pp, char *ip_address) >> > +{ >> > + struct udev_device *hostdev; >> > + char host_name[HOST_NAME_LEN]; >> > + char *value; >> > + >> > + sprintf(host_name, "host%d", pp->sg_id.host_no); >> > + hostdev = udev_device_new_from_subsystem_sysname(conf- >> >udev, >> > + "iscsi_host", host_name); >> > + if (hostdev) { >> > + value = udev_device_get_sysattr_value(hostdev, >> > + "ip_address"); >> > + if (value) { >> > + strncpy(ip_address, value, SLOT_NAME_SIZE); >> > + udev_device_unref(hostdev); >> > + return 0; >> > + } else >> > + udev_device_unref(hostdev); >> > + } >> > + return 1; >> > +} >> > + >> > static void >> > sysfs_set_rport_tmo(struct multipath *mpp, struct path *pp) >> > { >> > diff --git a/libmultipath/structs.c b/libmultipath/structs.c >> > index 049f17d..30d247d 100644 >> > --- a/libmultipath/structs.c >> > +++ b/libmultipath/structs.c >> > @@ -18,6 +18,70 @@ >> > #include "blacklist.h" >> > #include "prio.h" >> > >> > +struct adapter_group * >> > +alloc_adaptergroup(void) >> > +{ >> > + struct adapter_group *agp; >> > + >> > + agp = (struct adapter_group *)MALLOC(sizeof(struct adapter_group)); >> > + >> > + if (!agp) >> > + return NULL; >> > + >> > + agp->host_groups = vector_alloc(); >> > + if (!agp->host_groups) { >> > + FREE(agp); >> > + agp = NULL; >> > + } >> > + return agp; >> > +} >> > + >> > +void free_adaptergroup(vector adapters) >> > +{ >> > + int i; >> > + struct adapter_group *agp; >> > + >> > + vector_foreach_slot(adapters, agp, i) { >> > + free_hostgroup(agp->host_groups); >> > + FREE(agp); >> > + } >> > + vector_free(adapters); >> > +} >> > + >> > +void free_hostgroup(vector hostgroups) >> > +{ >> > + int i; >> > + struct host_group *hgp; >> > + >> > + if (!hostgroups) >> > + return; >> > + >> > + vector_foreach_slot(hostgroups, hgp, i) { >> > + vector_free(hgp->paths); >> > + FREE(hgp); >> > + } >> > + vector_free(hostgroups); >> > +} >> > + >> > +struct host_group * >> > +alloc_hostgroup(void) >> > +{ >> > + struct host_group *hgp; >> > + >> > + hgp = (struct host_group *)MALLOC(sizeof(struct host_group)); >> > + >> > + if (!hgp) >> > + return NULL; >> > + >> > + hgp->paths = vector_alloc(); >> > + >> > + if (!hgp->paths) { >> > + FREE(hgp); >> > + hgp = NULL; >> > + } >> > + return hgp; >> > +} >> > + >> > struct path * >> > alloc_path (void) >> > { >> > @@ -242,6 +306,26 @@ store_pathgroup (vector pgvec, struct pathgroup * >> pgp) >> > return 0; >> > } >> > >> > +int >> > +store_hostgroup(vector hostgroupvec, struct host_group * hgp) >> > +{ >> > + if (!vector_alloc_slot(hostgroupvec)) >> > + return 1; >> > + >> > + vector_set_slot(hostgroupvec, hgp); >> > + return 0; >> > +} >> > + >> > +int >> > +store_adaptergroup(vector adapters, struct adapter_group * agp) >> > +{ >> > + if (!vector_alloc_slot(adapters)) >> > + return 1; >> > + >> > + vector_set_slot(adapters, agp); >> > + return 0; >> > +} >> > + >> > struct multipath * >> > find_mp_by_minor (vector mpvec, int minor) >> > { >> > diff --git a/libmultipath/structs.h b/libmultipath/structs.h >> > index 64de06e..772a7d7 100644 >> > --- a/libmultipath/structs.h >> > +++ b/libmultipath/structs.h >> > @@ -15,7 +15,8 @@ >> > #define BLK_DEV_SIZE 33 >> > #define PATH_SIZE 512 >> > #define NAME_SIZE 512 >> > - >> > +#define HOST_NAME_LEN 8 >> > +#define SLOT_NAME_SIZE 32 >> > >> > #define SCSI_VENDOR_SIZE 9 >> > #define SCSI_PRODUCT_SIZE 17 >> > @@ -251,6 +252,20 @@ struct pathgroup { >> > char * selector; >> > }; >> > >> > +struct adapter_group { >> > + char adapter_name[SLOT_NAME_SIZE]; >> > + struct pathgroup *pgp; >> > + int num_hosts; >> > + vector host_groups; >> > + int next_host_index; >> > +}; >> > + >> > +struct host_group { >> > + int host_no; >> > + int num_paths; >> > + vector paths; >> > +}; >> > + >> > struct path * alloc_path (void); >> > struct pathgroup * alloc_pathgroup (void); >> > struct multipath * alloc_multipath (void); >> > @@ -263,6 +278,14 @@ void free_multipath_attributes (struct multipath >> *); >> > void drop_multipath (vector mpvec, char * wwid, enum free_path_mode >> free_paths); >> > void free_multipathvec (vector mpvec, enum free_path_mode >> free_paths); >> > >> > +struct adapter_group * alloc_adaptergroup(void); >> > +struct host_group * alloc_hostgroup(void); >> > +void free_adaptergroup(vector adapters); >> > +void free_hostgroup(vector hostgroups); >> > + >> > +int store_adaptergroup(vector adapters, struct adapter_group *agp); >> > +int store_hostgroup(vector hostgroupvec, struct host_group *hgp); >> > + >> > int store_path (vector pathvec, struct path * pp); >> > int store_pathgroup (vector pgvec, struct pathgroup * pgp); >> > -- -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel