On Fri, 2013-09-27 at 15:52 +0200, Hannes Reinecke wrote: > On 09/27/2013 10:37 AM, Alasdair G Kergon wrote: > > But this still dodges the fundamental problem: > > > > What is the right value to use for the timeout? > > - How long should you wait for a path to (re)appear? > > - In the current model, reinstating a path is a userspace > > responsibility. > > > And with my proposed patch it would still be userspace which is > setting the timeout. > Currently, no_path_retry is not a proper measure anyway, as it's > depending on the time multipathd takes to complete a path check > round. Which depends on the number of device, the state of those etc. > > > The timeout, as proposed, is being used in two conflicting ways: > > - How long to wait for path recovery when all paths went down > > That would be set via the new 'no_path_timeout' feature, which would > be set instead of the (multipath-internal) no_path_retry > setting. Yes, this matches our setup as well. > > - How long to wait when the system locks without enough free > > memory even to reinstate a path (because of broken userspace > > code) before having multipath fail queued I/O in a desperate > > attempt at releasing memory to assist recovery > Do we even handle that case currently? My understanding is that the current code doesn't, no, but if it does I would love to know how. > Methinks this is precisely the use-case this is supposed to address. Yes, exactly. > When currently 'no_path_retry' is set _and_ we're running under a > low-mem condition there is a quite large likelyhood that the > multipath daemon will be killed by the OOM-killer or not able to > send any dm messages down to the kernel, as the latter most likely > require some memory allocations. > > So in the current 'no_path_retry' scenario the maps would have been > created with 'queue_if_no_path', and the daemon would have to reset > the 'queue_if_no_path' flag if the no_path_retry value expires. > Which it might not be able to do so due to the above scenario. > > So with the proposed 'no_path_timeout' we would enable the dm-mpath > module to terminate all outstanding I/O, irrespective on all > userland conditions. Which seems like an improvement to me ... And to me, which is why I went in this direction in the first place. I could see no dependable way to deal with outside of the kernel; if I had, I would have taken it, since userspace changes are _much_ easier for us to deal with than kernel changes. -- Frank Mayhar 310-460-4042 -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel