Hi Alasdair, Thank you for the review and comments. I'm totally agree with your comments. I found that you have already updated the patch in your editing tree, thanks. But I'm considering to update the patch to be simpler. Please see below. On 05/09/2009 09:49 AM +0900, Alasdair G Kergon wrote: > On Fri, Apr 24, 2009 at 05:08:02PM +0900, Kiyoshi Ueda wrote: >> + struct selector *s = (struct selector *) ps->context; >> + status_type_t type, char *result, unsigned int maxlen) >> + int sz = 0; >> + DMEMIT("%u %lu ", atomic_read(&pi->in_flight_size), > > (similar comments) OK, you have already fixed the pointer cast, the sz type and the print format for in_flight_size. I'll fix the type of maxlen. >> +/* >> + * Returns: >> + * < 0 : pi1 is better >> + * 0 : no difference between pi1 and pi2 >> + * > 0 : pi2 is better >> + */ > > Please document the parameters and algorithm. > Nothing explains what the performance value is for and examples of anticipated values. OK, I'll add comments and documentation like the attached patch. >> + DMEMIT("%u %lu ", pi->repeat_count, pi->perf); > > Need to handle both 32 and 64 bit archs: cast to llu. > >> + if ((argc == 2) && (sscanf(argv[1], "%lu", &perf) != 1)) { > > Please do sscanf for size_t the same way as dm-crypt does. > (Or scan for lu, if the intent is to impose a size limit ahead of the later > do_div() then cast explicitly.) > >> + do_div(sz1, pi1->perf); >> + do_div(sz2, pi2->perf); > > Or sector_div ? > > But what's the performance of those divisions like? > Is there a better way of getting the same answer? > Is there validation limiting the size of 'perf'? I tried to use multiplication before, but it didn't work well, because overflow happened when 'in_flight_size' and 'perf' had pretty big values, and then I got wrong comparison results. So I decided to use division. However, if I limit the 'perf' value in smaller range (e.g. 0 to 100), such overflow shouldn't happen. So I should be able to use multiplication. Also, I don't need to use size_t for 'perf', because 'unsinged' should be enough for such small values. > (Also, did you check there's adequate locking & memory barriers around all the > atomics?) Yes, I did and I found no problem in both dm-queue-length and dm-service-time. Thanks, Kiyoshi Ueda
Add documents/comments for dm-service-time. Signed-off-by: Kiyoshi Ueda <k-ueda@xxxxxxxxxxxxx> Signed-off-by: Jun'ichi Nomura <j-nomura@xxxxxxxxxxxxx> --- Documentation/device-mapper/dm-service-time.txt | 87 ++++++++++++++++++++++++ drivers/md/dm-service-time.c | 26 +++++-- 2 files changed, 107 insertions(+), 6 deletions(-) Index: 2.6.30-rc5/Documentation/device-mapper/dm-service-time.txt =================================================================== --- /dev/null +++ 2.6.30-rc5/Documentation/device-mapper/dm-service-time.txt @@ -0,0 +1,87 @@ +dm-service-time +=============== + +dm-service-time is a path selector module for device-mapper targets, +which selects a path with the shortest estimated service time for +the incoming I/O. + +The service time for each path is estimated by dividing the total size +of in-flight I/Os on a path with the performance value of the path. +The performance value is a relative throughput value among all paths +in a path-group, and it can be specified as a table argument. + +The path selector name is 'service-time'. + +Table parameters for each path: [<repeat_count> [<performance>]] + <repeat_count>: The number of I/Os to dispatch using the selected + path before switching to the next path. + If not given, internal default is used. To check + the default value, see the activated table. + <performance>: The relative throughput value of the path + among all paths in the path-group. + If not given, minimum value '1' is used. + If '0' is given, the path isn't selected while + other paths having a positive value are available. + +Status for each path: <status> <fail-count> <in-flight-size> <performance> + <status>: 'A' if the path is active, 'F' if the path is failed. + <fail-count>: The number of path failures. + <in-flight-size>: The size of in-flight I/Os on the path. + <performance>: The relative throughput value of the path + among all paths in the path-group. + + +Algorithm +========= + +dm-service-time adds the I/O size to 'in-flight-size' when the I/O is +dispatched and substracts when completed. +Basically, dm-service-time selects a path having minimum service time +which is calculated by: + + ('in-flight-size' + 'size-of-incoming-io') / 'performance' + +However, some optimizations below are used to reduce the calculation +as much as possible. + + 1. If the paths have the same 'performance', skip the division + and just compare the 'in-flight-size'. + + 2. If the paths have the same 'in-flight-size', skip the division + and just compare the 'performance'. + + 3. If some paths have non-zero 'performance' and others have zero + 'performance', ignore those paths with zero 'performance'. + +If such optimizations can't be applied, calculate service time, and +compare service time. +If calculated service time is equal, the path having maximum 'performance' +may be better. So compare 'performance' then. + + +Examples +======== +In case that 2 paths (sda and sdb) are used with repeat_count == 128 +and sda has an average throughput 1GB/s and sdb has 4GB/s, 'performance' +value may be '1' for sda and '4' for sdb. + +# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4" \ + dmsetup create test +# +# dmsetup table +test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4 +# +# dmsetup status +test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 1 8:16 A 0 0 4 + + +Or '2' for sda and '8' for sdb would be also true. + +# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8" \ + dmsetup create test +# +# dmsetup table +test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8 +# +# dmsetup status +test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 2 8:16 A 0 0 8 Index: 2.6.30-rc5/drivers/md/dm-service-time.c =================================================================== --- 2.6.30-rc5.orig/drivers/md/dm-service-time.c +++ 2.6.30-rc5/drivers/md/dm-service-time.c @@ -105,22 +105,27 @@ static int st_add_path(struct path_selec unsigned repeat_count = ST_MIN_IO; unsigned long long tmpll = 1; + /* + * Arguments: [<repeat_count> [<performance>]] + * <repeat_count>: The number of I/Os before switching path. + * If not given, default (ST_MIN_IO) is used. + * <performance>: The relative throughput value of the path + * among all paths in the path-group. + * If not given, minimum value '1' is used. + * If '0' is given, the path isn't selected while + * other paths having a positive value are + * available. + */ if (argc > 2) { *error = "service-time ps: incorrect number of arguments"; return -EINVAL; } - /* First path argument is number of I/Os before switching path. */ if ((argc > 0) && (sscanf(argv[0], "%u", &repeat_count) != 1)) { *error = "service-time ps: invalid repeat count"; return -EINVAL; } - /* - * Second path argument is a relative performance value. - * If 0 is given, the path isn't used while other paths having - * a positive value are available. - */ if ((argc == 2) && (sscanf(argv[1], "%llu", &tmpll) != 1)) { *error = "service-time ps: invalid performance value"; return -EINVAL; @@ -164,10 +169,19 @@ static int st_reinstate_path(struct path } /* + * Compare the estimated service time of 2 paths, pi1 and pi2, + * for the incoming I/O. + * * Returns: * < 0 : pi1 is better * 0 : no difference between pi1 and pi2 * > 0 : pi2 is better + * + * Description: + * Basically, the service time is estimated by: + * ('pi->in-flight-size' + 'incoming') / 'pi->perf' + * To reduce the calculation, some optimizations are made. + * (See comments inline) */ static int st_compare_load(struct path_info *pi1, struct path_info *pi2, size_t incoming)
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel