Re: Hadoop and Ceph client/mds view of modification time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(Sorry for the dupe message. vger rejected due to HTML).

Thanks, I'll try this patch this morning.

Client B should perform a single stat after a notification from Client
A. But, won't Sage's patch still be required, since Client A needs the
MDS time to pass to Client B?

On Tue, Nov 20, 2012 at 12:20 PM, Sam Lang <sam.lang@xxxxxxxxxxx> wrote:
> On 11/20/2012 01:44 PM, Noah Watkins wrote:
>>
>> This is a description of the clock synchronization issue we are facing
>> in Hadoop:
>>
>> Components of Hadoop use mtime as a versioning mechanism. Here is an
>> example where Client B tests the expected 'version' of a file created
>> by Client A:
>>
>>    Client A: create file, write data into file.
>>    Client A: expected_mtime <-- lstat(file)
>>    Client A: broadcast expected_mtime to client B
>>    ...
>>    Client B: mtime <-- lstat(file)
>>    Client B: test expected_mtime == mtime
>
>
> Here's a patch that might work to push the setattr out to the mds every time
> (the same as Sage's patch for getattr).  This isn't quite writeback, as it
> waits for the setattr at the server to complete before returning, but I
> think that's actually what you want in this case.  It needs to be enabled by
> setting client setattr writethru = true in the config.  Also, I haven't
> tested that it sends the setattr, just a basic test of functionality.
>
> BTW, if its always client B's first stat of the file, you won't need Sage's
> patch.
>
> -sam
>
> diff --git a/src/client/Client.cc b/src/client/Client.cc
> index 8d4a5ac..a7dd8f7 100644
> --- a/src/client/Client.cc
> +++ b/src/client/Client.cc
> @@ -4165,6 +4165,7 @@ int Client::_getattr(Inode *in, int mask, int uid, int
> gid)
>
>  int Client::_setattr(Inode *in, struct stat *attr, int mask, int uid, int
> gid)
>  {
> +  int orig_mask = mask;
>    int issued = in->caps_issued();
>
>    ldout(cct, 10) << "_setattr mask " << mask << " issued " <<
> ccap_string(issued) << dendl;
> @@ -4219,7 +4220,7 @@ int Client::_setattr(Inode *in, struct stat *attr, int
> mask, int uid, int gid)
>        mask &= ~(CEPH_SETATTR_MTIME|CEPH_SETATTR_ATIME);
>      }
>    }
> -  if (!mask)
> +  if (!cct->_conf->client_setattr_writethru && !mask)
>      return 0;
>
>    MetaRequest *req = new MetaRequest(CEPH_MDS_OP_SETATTR);
> @@ -4229,6 +4230,10 @@ int Client::_setattr(Inode *in, struct stat *attr,
> int mask, int uid, int gid)
>    req->set_filepath(path);
>    req->inode = in;
>
> +  // reset mask back to original if we're meant to do writethru
> +  if (cct->_conf->client_setattr_writethru)
> +    mask = orig_mask;
> +
>    if (mask & CEPH_SETATTR_MODE) {
>      req->head.args.setattr.mode = attr->st_mode;
>      req->inode_drop |= CEPH_CAP_AUTH_SHARED;
> diff --git a/src/common/config_opts.h b/src/common/config_opts.h
> index cc05095..51a2769 100644
> --- a/src/common/config_opts.h
> +++ b/src/common/config_opts.h
> @@ -178,6 +178,7 @@ OPTION(client_oc_max_dirty, OPT_INT, 1024*1024* 100)
> // MB * n  (dirty OR tx.
>  OPTION(client_oc_target_dirty, OPT_INT, 1024*1024* 8) // target dirty (keep
> this smallish)
>  OPTION(client_oc_max_dirty_age, OPT_DOUBLE, 5.0)      // max age in cache
> before writeback
>  OPTION(client_oc_max_objects, OPT_INT, 1000)      // max objects in cache
> +OPTION(client_setattr_writethru, OPT_BOOL, false)  // send the attributes
> to the mds server
>  // note: the max amount of "in flight" dirty data is roughly (max - target)
>  OPTION(fuse_use_invalidate_cb, OPT_BOOL, false) // use fuse 2.8+ invalidate
> callback to keep page cache consistent
>  OPTION(fuse_big_writes, OPT_BOOL, true)
>
>
>>
>> Since mtime may be set in Ceph by both client and MDS, inconsistent
>> mtime view is possible when clocks are not adequately synchronized.
>>
>> Here is a test that reproduces the problem. In the following output,
>> issdm-18 has the MDS, and issdm-22 is a non-Ceph node with its time
>> set to an hour earlier than the MDS node.
>>
>> nwatkins@issdm-22:~$ ssh issdm-18 date && ./test
>> Tue Nov 20 11:40:28 PST 2012           // MDS TIME
>> local time: Tue Nov 20 10:42:47 2012  // Client TIME
>> fstat time: Tue Nov 20 11:40:28 2012  // mtime seen after file
>> creation (MDS time)
>> lstat time: Tue Nov 20 10:42:47 2012  // mtime seen after file write
>> (client time)
>>
>> Here is the code used to produce that output.
>>
>> #include <errno.h>
>> #include <sys/fcntl.h>
>> #include <sys/time.h>
>> #include <unistd.h>
>> #include <sys/types.h>
>> #include <sys/stat.h>
>> #include <dirent.h>
>> #include <sys/xattr.h>
>> #include <stdio.h>
>> #include <string.h>
>> #include <assert.h>
>> #include <cephfs/libcephfs.h>
>> #include <time.h>
>>
>> int main(int argc, char **argv)
>> {
>>          struct stat st;
>>          struct ceph_mount_info *cmount;
>>          struct timeval tv;
>>
>>          /* setup */
>>          ceph_create(&cmount, "admin");
>>          ceph_conf_read_file(cmount,
>> "/users/nwatkins/Projects/ceph.conf");
>>          ceph_mount(cmount, "/");
>>
>>          /* print local time for reference */
>>          gettimeofday(&tv, NULL);
>>          printf("local time: %s", ctime(&tv.tv_sec));
>>
>>          /* create a file */
>>          char buf[256];
>>          sprintf(buf, "/somefile.%d", getpid());
>>          int fd = ceph_open(cmount, buf, O_WRONLY|O_CREAT, 0);
>>          assert(fd > 0);
>>
>>          /* get mtime for this new file */
>>          memset(&st, 0, sizeof(st));
>>          int ret = ceph_fstat(cmount, fd, &st);
>>          assert(ret == 0);
>>          printf("fstat time: %s", ctime(&st.st_mtime));
>>
>>          /* write some data into the file */
>>          ret = ceph_write(cmount, fd, buf, sizeof(buf), -1);
>>          assert(ret == sizeof(buf));
>>          ceph_close(cmount, fd);
>>
>>          memset(&st, 0, sizeof(st));
>>          ret = ceph_lstat(cmount, buf, &st);
>>          assert(ret == 0);
>>          printf("lstat time: %s", ctime(&st.st_mtime));
>>
>>          ceph_shutdown(cmount);
>>          return 0;
>> }
>>
>> Note that this output is currently using the short patch from
>> http://marc.info/?l=ceph-devel&m=133178637520337&w=2 which forces
>> getattr to always go to the MDS.
>>
>> diff --git a/src/client/Client.cc b/src/client/Client.cc
>> index 4a9ae3c..2bb24b7 100644
>> --- a/src/client/Client.cc
>> +++ b/src/client/Client.cc
>> @@ -3858,7 +3858,7 @@ int Client::readlink(const char *relpath, char
>> *buf, loff_t \
>> size)
>>   int Client::_getattr(Inode *in, int mask, int uid, int gid)
>>   {
>> -  bool yes = in->caps_issued_mask(mask);
>> +  bool yes = false; //in->caps_issued_mask(mask);
>>
>>     ldout(cct, 10) << "_getattr mask " << ccap_string(mask) << "
>> issued=" << yes << \
>> dendl;  if (yes)
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux