Re: [GIT PULL] vfs: Add support for timestamp limits

Mike Marshall <hubcap@xxxxxxxxxxxx> · Thu, 5 Sep 2019 12:58:27 -0400

I spoke to Walt Ligon about the versioning code, and also shared
this thread with him. He isn't a fan of the versioning code.
and we think it should go. As I read through the commit messages
from when the versioning code was added, it relates to mtime
on directories. If a directory is read, and it has enough entries,
it might take several operations to collect all the entries. During
this time the directory might change. The versioning is a way to
tell that something changed between one of the operations...

   commit: 7878027e9c2 (Oct 2004)
     - added a directory version that is passed back from the server to
       the client on each successful readdir call (happens to be the
       directory's mtime encoded as an opaque uint64_t)

We will work to see if we can figure out what we need to do to Orangefs
on both the userspace side and the kernel module side to have all 64 bit
time values.

I've also read up some on the y2038 cleanup work that Arnd and Deepa
have been doing...

Thanks to Arnd and Deepa for looking so deeply into the Orangefs
userspace code...

-Mike

On Sat, Aug 31, 2019 at 6:50 PM Deepa Dinamani <deepa.kernel@xxxxxxxxx> wrote:
>
> > I think it's unclear from the orangefs source code what the intention is,
> > as there is a mixed of signed and unsigned types used for the inode
> > stamps:
> >
> > #define encode_PVFS_time encode_int64_t
> > #define encode_int64_t(pptr,x) do { \
> >     *(int64_t*) *(pptr) = cpu_to_le64(*(x)); \
> >     *(pptr) += 8; \
> > } while (0)
> > #define decode_PVFS_time decode_int64_t
> > #define decode_int64_t(pptr,x) do { \
> >     *(x) = le64_to_cpu(*(int64_t*) *(pptr)); \
> >     *(pptr) += 8; \
> > } while (0)
> >
> > This suggests that making it unsigned may have been an accident.
> >
> > Then again,  it's clearly and consistently printed as unsigned in
> > user space:
> >
> >         gossip_debug(
> >             GOSSIP_GETATTR_DEBUG, " VERSION is %llu, mtime is %llu\n",
> >             llu(s_op->attr.mtime), llu(resp_attr->mtime));
>
> I think I had noticed these two and decided maybe the intention was to
> use unsigned types.
>
> > A related issue I noticed is this:
> >
> > PVFS_time PINT_util_mktime_version(PVFS_time time)
> > {
> >     struct timeval t = {0,0};
> >     PVFS_time version = (time << 32);
> >
> >     gettimeofday(&t, NULL);
> >     version |= (PVFS_time)t.tv_usec;
> >     return version;
> > }
> > PVFS_time PINT_util_mkversion_time(PVFS_time version)
> > {
> >     return (PVFS_time)(version >> 32);
> > }
> > static PINT_sm_action getattr_verify_attribs(
> >         struct PINT_smcb *smcb, job_status_s *js_p)
> > {
> > ...
> >     resp_attr->mtime = PINT_util_mkversion_time(s_op->attr.mtime);
> > ...
> > }
> >
> > which suggests that at least for some purposes, the mtime field
> > is only an unsigned 32-bit number (1970..2106). From my readiing,
> > this affects the on-disk format, but not the protocol implemented
> > by the kernel.
> >
> > atime and ctime are apparently 64-bit, but mtime is only 32-bit
> > seconds, plus a 32-bit 'version'. I suppose the server could be
> > fixed to allow a larger range, but probably would take it out of
> > the 'version' bits, not the upper half.
>
> I had missed this part. Thanks.
>
> > To be on the safe side, I suppose the kernel can only assume
> > an unsigned 32-bit range to be available. If the server gets
> > extended beyond that, it would have to pass a feature flag.
>
> This makes sense to me also. And, as Arnd pointed out on the IRC, if
> there are negative timestamps that are already in use, this will be a
> problem for those use cases.
> I can update tha patch to use limits 0-u32_max.
>
> -Deepa