Hadoop and Ceph client/mds view of modification time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a description of the clock synchronization issue we are facing
in Hadoop:

Components of Hadoop use mtime as a versioning mechanism. Here is an
example where Client B tests the expected 'version' of a file created
by Client A:

  Client A: create file, write data into file.
  Client A: expected_mtime <-- lstat(file)
  Client A: broadcast expected_mtime to client B
  ...
  Client B: mtime <-- lstat(file)
  Client B: test expected_mtime == mtime

Since mtime may be set in Ceph by both client and MDS, inconsistent
mtime view is possible when clocks are not adequately synchronized.

Here is a test that reproduces the problem. In the following output,
issdm-18 has the MDS, and issdm-22 is a non-Ceph node with its time
set to an hour earlier than the MDS node.

nwatkins@issdm-22:~$ ssh issdm-18 date && ./test
Tue Nov 20 11:40:28 PST 2012           // MDS TIME
local time: Tue Nov 20 10:42:47 2012  // Client TIME
fstat time: Tue Nov 20 11:40:28 2012  // mtime seen after file
creation (MDS time)
lstat time: Tue Nov 20 10:42:47 2012  // mtime seen after file write
(client time)

Here is the code used to produce that output.

#include <errno.h>
#include <sys/fcntl.h>
#include <sys/time.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <dirent.h>
#include <sys/xattr.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include <cephfs/libcephfs.h>
#include <time.h>

int main(int argc, char **argv)
{
        struct stat st;
        struct ceph_mount_info *cmount;
        struct timeval tv;

        /* setup */
        ceph_create(&cmount, "admin");
        ceph_conf_read_file(cmount, "/users/nwatkins/Projects/ceph.conf");
        ceph_mount(cmount, "/");

        /* print local time for reference */
        gettimeofday(&tv, NULL);
        printf("local time: %s", ctime(&tv.tv_sec));

        /* create a file */
        char buf[256];
        sprintf(buf, "/somefile.%d", getpid());
        int fd = ceph_open(cmount, buf, O_WRONLY|O_CREAT, 0);
        assert(fd > 0);

        /* get mtime for this new file */
        memset(&st, 0, sizeof(st));
        int ret = ceph_fstat(cmount, fd, &st);
        assert(ret == 0);
        printf("fstat time: %s", ctime(&st.st_mtime));

        /* write some data into the file */
        ret = ceph_write(cmount, fd, buf, sizeof(buf), -1);
        assert(ret == sizeof(buf));
        ceph_close(cmount, fd);

        memset(&st, 0, sizeof(st));
        ret = ceph_lstat(cmount, buf, &st);
        assert(ret == 0);
        printf("lstat time: %s", ctime(&st.st_mtime));

        ceph_shutdown(cmount);
        return 0;
}

Note that this output is currently using the short patch from
http://marc.info/?l=ceph-devel&m=133178637520337&w=2 which forces
getattr to always go to the MDS.

diff --git a/src/client/Client.cc b/src/client/Client.cc
index 4a9ae3c..2bb24b7 100644
--- a/src/client/Client.cc
+++ b/src/client/Client.cc
@@ -3858,7 +3858,7 @@ int Client::readlink(const char *relpath, char
*buf, loff_t \
size)
 int Client::_getattr(Inode *in, int mask, int uid, int gid)
 {
-  bool yes = in->caps_issued_mask(mask);
+  bool yes = false; //in->caps_issued_mask(mask);

   ldout(cct, 10) << "_getattr mask " << ccap_string(mask) << "
issued=" << yes << \
dendl;  if (yes)
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux