Re: [RFC][PATCH 3/3][cr][v2]: fileleases: C/R of an in-progress lease.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 06/16/2010 07:18 AM, Jamie Lokier wrote:
> Sukadev Bhattiprolu wrote:
>> If process P1 has a F_WRLCK lease on file F1 and process P2 opens the
>> file, P2's open() blocks for lease_break_time (45 seconds) and P1 gets
>> a SIGIO to cleanup it lease in preparation for P2's open.  If the two
>> processes are checkpointed/restarted in this window, we should address
>> following two issues:
>>
>> 	- P1 should get a SIGIO only once for the lease (i.e if P1 got the
>> 	  SIGIO before checkpoint, it should not get the SIGIO after restart).
>>
>> 	- If R seconds remain in the lease, P2's open should be blocked for
>> 	  at least the R seconds, so P1 has the time to clean up its lease.
>> 	  The previous patch gives P1 the entire lease_break_time but that
>> 	  can leave P2 stalled for 2*lease_break_time.
>>
>> To address first, we add a field ->fl_break_notified to "remember" if we
>> notified the lease-holder already. We save this field in the checkpoint
>> image and when restarting, we notify the lease-holder only if this field
>> is not set.
>>
>> To address the second issue, we also checkpoint the ->fl_break_time for
>> an in-progress lease. When restarting the process, we ensure that the
>> lease-holder sleeps only for the remaining-lease rather than the entire
>> lease.
>>
>> These two fixes sound like an approximation (see comments in do_setlease()
>> and __break_lease() below) and are also a bit kludgy (hence a separate patch
>> for now).
>>
>> Appreciate comments on how we can do this better. Specifically:
>>
>> 	- do we even need to try and address the second issue above or
>> 	  just let P1 have the entire lease_break_time again ?
>>
>> 	- theoretically, the R seconds should start counting after *all*
>> 	  processes in the application-process tree have been restarted,
>> 	  since P1 waits inside the kernel for a portion of the remaining
>> 	  lease - should we then add a delta to R ?
> 
> For P1 running out of time would be an abnormal error condition which
> it might not be prepared to handle gracefully, and it's best if c/r
> doesn't add new ones of those, perhaps due to the process tree restart
> taking up too much of the time, perhaps exarcerbated by the processes
> themselves if they take a burst of CPU reacting to the restart.
> 
> Also, P1 userspace may use an algorithm which is dependent on
> lease_break_time, so it can do "check for lease break events -> no
> events", and subsequently "check the time" is sufficient to confirm
> that a particular file has not been opened and its contents can be
> assumed unchanged.  Checking the time is faster.
> 
> P2 is unlikely to care about the timeout, as long as it doesn't get
> permanently stuck.
> 
> So I would let P1 have the lease_break_time again, and/or set the
> times ticking after the whole process tree is restarted and make sure
> to round up any errors in favour of P1.
> 
> -- Jamie
> 

For what it's worth, the restored break-time will be relative to after
all the processes are created (because that's where the kernel part of
the restart begins) - but yes, before they are fully restored.

So the problem that you describe exists, however I'm uncomfortable with
forcing this behavior in the kernel - because they may be other cases
in which userspace expect the lease to end within a certain time, and
such behavior may break it.

An alternative approach would be to modify the value of the lease break
time in _userspace_ - eg. by "massasging" the checkpoint image as we
feed it to the kenrel. This provides the flexibility to choose which
policy to use, by the application user/developer.

For example, we could have 'restart --lease-reset ...." parameter to
induce this behavior, or have it be default, and use 'restart --lease-keep'
to avoid it.

Oren.



--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux