On Thu, Apr 12, 2018 at 10:35:02PM +0200, Martin Wilck wrote: > On Thu, 2018-04-12 at 13:36 -0500, Benjamin Marzinski wrote: > > On Wed, Apr 04, 2018 at 06:16:23PM +0200, Martin Wilck wrote: > > > In "find_multipaths smart" mode, use time stamps under > > > /dev/shm/multipath/find_multipaths to track waiting for multipath > > > siblings. When a path is first encountered and is "maybe" > > > multipath, create a > > > file under /dev/shm, set its modification time to the expiry time > > > of the > > > timer, and set the FIND_MULTIPATHS_WAIT_UNTIL variable. On later > > > calls, also set > > > FIND_MULTIPATHS_WAIT_UNTIL to the expiry time (but don't change the > > > time > > > stamp) if it's not expired yet, or 0 if it is expired. Set > > > FIND_MULTIPATHS_WAIT_UNTIL even if enough evidence becomes > > > available to decide > > > if the path needs to be multipathed - this enables the udev rules > > > to detect > > > that this is a device a timer has been started for, and stop it. By > > > using > > > /dev/shm, we share information about "smart" timers between initrd > > > and root > > > file system, and thus only calculate the timeout once. > > > > > > Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> > > > --- > > > libmultipath/file.c | 2 +- > > > libmultipath/file.h | 1 + > > > multipath/main.c | 133 > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > 3 files changed, 135 insertions(+), 1 deletion(-) > > > > > > + > > > +/** > > > + * find_multipaths_check_timeout(wwid, tmo) > > > + * Helper for "find_multipaths smart" > > > + * > > > + * @param[in] pp: path to check / record > > > + * @param[in] tmo: configured timeout for this WWID, or value <= 0 > > > for checking > > > + * @param[out] until: timestamp until we must wait, > > > CLOCK_REALTIME, if return > > > + * value is FIND_MULTIPATHS_WAITING > > > + * @returns: FIND_MULTIPATHS_WAIT_DONE, if waiting has finished > > > + * @returns: FIND_MULTIPATHS_ERROR, if internal error occured > > > + * @returns: FIND_MULTIPATHS_NEVER, if tmo is 0 and we didn't wait > > > for this > > > + * device > > > + * @returns: FIND_MULTIPATHS_WAITING, if timeout hasn't expired > > > + */ > > > +static int find_multipaths_check_timeout(const struct path *pp, > > > long tmo, > > > + struct timespec *until) > > > +{ > > > + char path[PATH_MAX]; > > > + struct timespec now, ftimes[2], tdiff; > > > + struct stat st; > > > + long fd; > > > + int r, err, retries = 0; > > > + > > > + clock_gettime(CLOCK_REALTIME, &now); > > > + > > > > I'm worried about using pp->dev_t here with no method of removing > > these > > files. What happens when a path device, say 8:32 is removed and a > > completely new device comes in reusing the same dev_t? > > Hm, what else should we use? devnode name is even worse, and most other > options are a can of worms... In the worst case, the new device > wouldn't be waited for (or not long enough), because the marker existed > before it was detected. > > I could simply add a rule that removes the marker in case of a "remove" > uevent, ok? Fair enough. Nothing really bad happens, and removing the markers on remove uevents would solve it. > > > > > + if (snprintf(path, sizeof(path), "%s/%s", shm_find_mp_dir, > > > pp->dev_t) > > > + >= sizeof(path)) { > > > + condlog(1, "%s: path name overflow", __func__); > > > > shouldn't this be: > > return FIND_MULTIPATHS_ERROR; > > Bah, thank for catching it. > > > > static int print_cmd_valid(int k, const vector pathvec, > > > struct config *conf) > > > { > > > static const int vals[] = { 1, 0, 2 }; > > > + int wait = FIND_MULTIPATHS_NEVER; > > > + struct timespec until; > > > + struct path *pp; > > > > > > if (k < 0 || k >= sizeof(vals)) > > > return 1; > > > > > > + if (k == 2) { > > > + /* > > > + * Caller ensures that pathvec[0] is the path to > > > + * examine. > > > + */ > > > + pp = VECTOR_SLOT(pathvec, 0); > > > + select_find_multipaths_timeout(conf, pp); > > > + wait = find_multipaths_check_timeout( > > > + pp, pp->find_multipaths_timeout, &until); > > > + if (wait != FIND_MULTIPATHS_WAITING) > > > + k = 1; > > > + } else if (pathvec != NULL) { > > > + pp = VECTOR_SLOT(pathvec, 0); > > > + wait = find_multipaths_check_timeout(pp, 0, > > > &until); > > > + } > > > + if (wait == FIND_MULTIPATHS_WAITING) > > > + printf("FIND_MULTIPATHS_WAIT_UNTIL=\"%ld.%06ld\"\n > > > ", > > > + until.tv_sec, until.tv_nsec/1000); > > > + else if (wait == FIND_MULTIPATHS_WAIT_DONE) > > > + printf("FIND_MULTIPATHS_WAIT_UNTIL=\"0\"\n"); > > > > If we get an error trying to check the timeout, should we just keep > > FIND_MULTIPATHS_WAIT_UNTIL the same? Or would it be better to set it > > to > > 0, and fail the smart claiming? > > The latter is what we do. We set k=1 if find_multipaths_check_timeout() > returns anything but FIND_MULTIPATHS_WAITING, resulting in > DM_MULTIPATH_DEVICE_PATH="0". Oops. I got confused here. -Ben > Regards > Martin > > -- > Dr. Martin Wilck <mwilck@xxxxxxxx>, Tel. +49 (0)911 74053 2107 > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel