yumkam@xxxxxxxxx (Yuriy M. Kaminskiy) writes: > Karel Zak <kzak@xxxxxxxxxx> writes: > >> On Sat, Apr 09, 2016 at 12:40:18AM +0300, Yuriy M. Kaminskiy wrote: >>> I think this algorithm should work: >>> >>> 1) walk /dev/block/${dev}/slaves recursively (e.g. lvm-over- luks- >>> over-lvm- over-md, first-level slaves won't work), collect all >>> whole-disk devices; >>> 2) sort this list by device numbers (to avoid AB/BA deadlocks), remove >>> duplicates; >>> 3) acquire all locks consequently. >>> >>> There are some unavoidable flaws (e.g. if someone alters structure while fsck >>> is running), and some things could be improved, but it should work in >>> most cases (and if it fails, it is just no worse than current behavior). >>> >>> I've tested with LVM and LUKS-over-LVM on (debian's) 3.16 kernel, it seems >>> works as expected. >>> >>> What is not covered: /dev/loop* (and I have no plans for it). >>> >>> Comments? NACKs? >> >> The question is if we really need it :-) > > Yes? > >> All the care fsck has about whole-disks is about performance, because >> we assume (on classic rotation disks) that execute fsck on more >> partitions in the same time is bad idea. >> >> The problem with stacked devices is that we have no clue about sectors >> mapping. Your patch locks all the disks, but maybe kernel will read >> only from some subset of the disks, etc. > > Is this likely scenario? (Or even possible? I can imagine that, with > lvm-over-raid1+, kernel theoretically can schedule one fsck instance to > read only slaveA, and another to read only from slaveB, thus avoiding > seeking; but I suspect, in practice it will just roundrobin all requests > over both slaves, thus same seekstorm.) > >> From my point of view the best way is not to care about it in >> userspace at all and depend on kernel I/O scheduling only. The > > No problem. Please set in-kernel fsck-friendly io scheduler for fsck > instead. (Oh, there are none exists?) > >> optimization probably still makes sense for classic layout >> "HDD->partition", but I don't think we need to extend this >> functionality to another use-cases. > > `fsck -A` does not execute fsck for any stacked devices in > parallel. `fsck -l` does. This is inconsistent and unfair. Alternative > (simplier) patch attached (compile-tested only). v2 (path variable and snprintf removed, added comment about unlock/unlink).
diff --git a/disk-utils/fsck.c b/disk-utils/fsck.c index 84d2dcc..bd9114a 100644 --- a/disk-utils/fsck.c +++ b/disk-utils/fsck.c @@ -109,6 +109,7 @@ struct fsck_instance { int lock; /* flock()ed lockpath file descriptor or -1 */ char *lockpath; /* /run/fsck/<diskname>.lock or NULL */ + int lock_stacked; /* flock()ed stacked file descriptor or -1 */ int exit_status; struct timeval start_time; @@ -365,6 +366,21 @@ static void lock_disk(struct fsck_instance *inst) if (!diskname) diskname = diskpath; + /* + * Note: stacked.lock can be share-locked and MUST NOT be removed + * upon unlock! + */ + inst->lock_stacked = open(FSCK_RUNTIME_DIRNAME "/stacked.lock", + O_RDONLY|O_CREAT|O_CLOEXEC, + S_IWUSR|S_IRUSR|S_IRGRP|S_IROTH); + if (inst->lock_stacked >= 0) { + if (flock(inst->lock_stacked, + fs_is_stacked(inst->fs) ? LOCK_EX : LOCK_SH) != 0) { + close(inst->lock_stacked); /* failed */ + inst->lock_stacked = -1; + } + } + xasprintf(&inst->lockpath, FSCK_RUNTIME_DIRNAME "/%s.lock", diskname); if (verbose) @@ -410,10 +426,12 @@ static void unlock_disk(struct fsck_instance *inst) printf(_("Unlocking %s.\n"), inst->lockpath); close(inst->lock); /* unlock */ + close(inst->lock_stacked); /* unlock */ free(inst->lockpath); inst->lock = -1; + inst->lock_stacked = -1; inst->lockpath = NULL; }