Re: [PATCH 11/12] NFSv4.1: layoutcommit

Fred Isaman <iisaman@xxxxxxxxxx> · Thu, 24 Mar 2011 12:54:52 -0400

On Thu, Mar 24, 2011 at 12:48 PM, Trond Myklebust
<Trond.Myklebust@xxxxxxxxxx> wrote:
> On Thu, 2011-03-24 at 18:37 +0200, Benny Halevy wrote:
>> On 2011-03-24 15:57, William A. (Andy) Adamson wrote:
>> >>> Only whole file layout support means that there is only one IOMODE_RW layout
>> >>> segment.
>> >>>
>> >>> Signed-off-by: Andy Adamson <andros@xxxxxxxxxx>
>> >>> Signed-off-by: Alexandros Batsakis <batsakis@xxxxxxxxxx>
>> >>> Signed-off-by: Boaz Harrosh <bharrosh@xxxxxxxxxxx>
>> >>> Signed-off-by: Dean Hildebrand <dhildeb@xxxxxxxxxx>
>> >>> Signed-off-by: Fred Isaman <iisaman@xxxxxxxxxxxxxx>
>> >>> Signed-off-by: Mingyang Guo <guomingyang@xxxxxxxxxxxx>
>> >>> Signed-off-by: Tao Guo <guotao@xxxxxxxxxxxx>
>> >>> Signed-off-by: Zhang Jingwang <zhangjingwang@xxxxxxxxxxxx>
>> >>> Tested-by: Boaz Harrosh <bharrosh@xxxxxxxxxxx>
>> >>> Signed-off-by: Benny Halevy <bhalevy@xxxxxxxxxxx>
>> >>
>> >> The code in this patch is new and different enough from the one I/we
>> >> signed-off originally that they don't make sense here.
>> >
>> > Hi Benny
>> >
>> > OK with me
>> >
>> >>>
>> >>> +             /* references matched in nfs4_layoutcommit_release */
>> >>> +             wdata->lseg->pls_lc_cred =
>> >>> +                     get_rpccred(wdata->args.context->state->owner->so_cred);
>> >>> +             mark_inode_dirty_sync(wdata->inode);
>> >>> +             dprintk("%s: Set layoutcommit for inode %lu ",
>> >>> +                     __func__, wdata->inode->i_ino);
>> >>> +     }
>> >>> +     if (end_pos > wdata->lseg->pls_end_pos)
>> >>> +             wdata->lseg->pls_end_pos = end_pos;
>> >>
>> >> The end_pos is essentially per inode, why maintain it per lseg?
>> >> How do you see this working with multiple lsegs in mind?
>> >
>> > The end-pos is per lseg, not per inode - each layoutcommit applies to
>> > a range of WRITES for a layoutsegment over the LAYOUTCOMMIT range.
>> >
>> > From Section 18.42.3
>> > .  The byte-range being committed is
>> >    specified through the byte-range (loca_offset and loca_length).  This
>> >    byte-range MUST overlap with one or more existing layouts previously
>> >    granted via LAYOUTGET
>> >
>> >
>> >    Also, loca_last_write_offset MUST overlap the range
>> >    described by loca_offset and loca_length.
>> >
>> > For the multiple lseg case: if the lsegs are merged, bookeeping
>> > end_pos per lseg just works. If a layoutdriver does not use merged
>> > lsegs, then there is a bit of work to do to walk the list of lsegs and
>> > determine the final end_pos for a given LAYOUTCOMMIT.  If there are
>> > multiple non-contiguous lsegs, each used for WRITEs then multiple
>> > LAYOUTCOMMITs will need to be sent, otherwise the LAYOUTCOMMIT
>> > byte-range will not overlap as required.
>> >
>>
>> For the current layout types I believe that the LAYOUTCOMMIT can "merge"
>> multiple layout segments into a single LAYOUTCOMMIT, with a byte range
>> covering all segments and a last_byte_written offset which is just the maximum.
>> Future layout types may need this method though...
>
> Is that safe?
>
> What if I'm doing blocks and have written layout segment 1 & 3, but not
> layout segment 2? I don't want to have the MDS commit layout segment 2,
> and make the (lack of) data there visible to future readers.
>

No, it is not safe.  Avoiding this problem is one of the major reasons
for putting the bookkeeping in the lseg.

Fred
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html