Hi all, I've been trying to track down a bug that looks like data corruption coming from the Gluster FUSE client. I'm pretty new to Gluster, so perhaps there are better ways to get to the bottom of this. I have a distributed, replicated volume that was initialized as 3.5.1 and it has been functioning fine in general. Recently we discovered a write corruption issue that is always reproducible by one of our binaries. The scenarios is as follows: There is one writer appending binary entries that have a sequence number and checksum. Writes are small - 38 bytes each - and there is no fsync while the file is streaming out. If this is the only activity, everything is perfectly fine. However, if that file is simultaneously opened for reading, in a tail-like fashion, the data written into gluster is corrupt. Reading the file back out of disk reveals a dropped message (a missing sequence number). This happens when the reader and writer are different processes, so this is not a bug in our userspace. What is even more interesting it that the first run during the writing reports the sequence number of the failure differently than all subsequent tests. It feels like there is some kind of cache coherency issue here, but after putting glusterfs into -L DEBUG and following along, I'm not closer to a understanding the issue. stracing glusterfs suggests that the missing records were actually sent to the remote glusterfs daemons. I have appended that strace at the bottom. For reference, here is a bit of strace that shows the overall data flow: The file is opened for writing like so: 32684 open(".../logtest", O_WRONLY|O_CREAT, 0664) = 7 32684 fcntl(7, F_SETFD, FD_CLOEXEC) = 0 32684 fcntl(7, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE) 32684 fcntl(7, F_SETFL, O_WRONLY|O_APPEND|O_LARGEFILE) = 0 32686 write(7, "...", 38) = 38 32686 nanosleep({0, 1000000}, NULL) = 0 The file is opened for reading like so: open(".../logtest", O_RDONLY) = 6 fstat(6, {st_mode=S_IFREG|0644, st_size=912, ...}) = 0 lseek(6, 0, SEEK_CUR) = 0 read(6, "..."..., 131072) = 912 read(6, "", 131072) = 0 lseek(6, 0, SEEK_SET) = 0 read(6, "..."..., 912) = 912 nanosleep({0, 10000000}, NULL) = 0 read(6, "...", 131072) = 114 read(6, "", 131072) = 0 lseek(6, 0, SEEK_SET) = 0 read(6, "..."..., 1026) = 1026 nanosleep({0, 10000000}, NULL) = 0 read(6, "...", 131072) = 114 read(6, "", 131072) = 0 lseek(6, 0, SEEK_SET) = 0 read(6, "..."..., 1140) = 1102 lseek(6, 38, SEEK_CUR) = 1140 nanosleep({0, 10000000}, NULL) = 0 read(6, "...", 131072) = 114 I have been trying to duplicate the failure with a small python script, but I have as yet been unable to do so. This C++ binary does reproduce it quite reliably, but the build system for this is quite complex and not readily shareable. I hope you can give me some advice about where to look next. Thanks, -Mike Strace from glusterfs client. fd 22 a tcp socket connected to glusterfsd on a remote host. 30049 writev(22, [{"\200\0\0\342", 4}, {"\0\0\0M\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\r\0\5\363\227\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 68}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\1\0\0\0\0\0\0\1\n\0\0\0&\0\0\204\1\0\0\0J\0\0\0\2\0\0\0\31\0\0\0\2glusterfs.write-is-append\0000\0\0\0\0\27\0\0\0\2glusterfs.open-fd-count\0004\0\0\0", 120}, {"\31D\6#\0\0\0\6\23\221\33-\357\306\250@\0\0\0\5\0\0\0\0\273\\^\v8\356\243\372\10\1\30\0 \7", 38}], 4) = 230 This is the last record that shows up in the glusterfs file before the "gap" - sequence number 7. 30049 writev(22, [{"\200\0\0\314", 4}, {"\0\0\0N\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\"\0\5\363\227\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\204\236\205QP\177\0\0\0\0\0\0\0\0\0\0", 68}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0d\0\0\0\2\0\0\0\33\0\0\0\ftrusted.afr.lab-0-client-12\0\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\33\0\0\0\ftrusted.afr.lab-0-client-13\0\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0", 136}], 3) = 208 30049 writev(22, [{"\200\0\0\220", 4}, {"\0\0\0O\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\31\0\5\363\227\0\0\0L\0\0ul\0\0'\24\0\0\23\210\0\0\0\f\0\0\0\4\0\0\0\6\0\0\0.\0\0\0~\0\0\17\241\0\0\17\242\0\0\17\243\0\0\17\245\0\0\17\246\0\0\23\210\0\0'\20\0\0'\21\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 116}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\3\0\0\0\0", 28}], 3) = 148 30049 writev(22, [{"\200\0\0\240", 4}, {"\0\0\0P\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\36\0\5\363\227\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\364-\206QP\177\0\0\0\0\0\0\0\0\0\0", 68}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0010\0\0\0\0\0\0\0L\0\0\0\0\0\0\0\0\0\0\0\21lab-0-replicate-6\0\0\0\0\0\0\0", 92}], 3) = 164 30049 writev(22, [{"\200\0\0\240", 4}, {"\0\0\0Q\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\36\0\5\363\227\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\204\236\205QP\177\0\0\0\0\0\0\0\0\0\0", 68}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\2\0\0\0\2\0\0\0\0\0\0\0\0\0\0\1\n\0\0\0\0\0\0\0&\0\0\0\0\0\0\0\0\0\0\0\21lab-0-replicate-6\0\0\0\0\0\0\0", 92}], 3 <unfinished ...> 30058 writev(22, [{"\200\0\0\240", 4}, {"\0\0\0R\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\36\0\5\363\227\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\10\264\205QP\177\0\0\0\0\0\0\0\0\0\0", 68}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\1|\0\0\0\0\0\0\0&\0\0\0\0\0\0\0\0\0\0\0\21lab-0-replicate-6\0\0\0\0\0\0\0", 92}], 3 <unfinished ...> 30049 writev(22, [{"\200\0\0\314", 4}, {"\0\0\0S\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\"\0\5\363\227\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\364-\206QP\177\0\0\0\0\0\0\0\0\0\0", 68}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0d\0\0\0\2\0\0\0\33\0\0\0\ftrusted.afr.lab-0-client-13\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\33\0\0\0\ftrusted.afr.lab-0-client-12\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0", 136}], 3) = 208 30049 writev(22, [{"\200\0\0\314", 4}, {"\0\0\0T\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\"\0\5\363\227\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\10\264\205QP\177\0\0\0\0\0\0\0\0\0\0", 68}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0d\0\0\0\2\0\0\0\33\0\0\0\ftrusted.afr.lab-0-client-13\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\33\0\0\0\ftrusted.afr.lab-0-client-12\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0", 136}], 3) = 208 30049 writev(22, [{"\200\0\1\10", 4}, {"\0\0\0U\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\r\0\5\363\227\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 68}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0010\0\0\0L\0\0\204\1\0\0\0J\0\0\0\2\0\0\0\31\0\0\0\2glusterfs.write-is-append\0000\0\0\0\0\27\0\0\0\2glusterfs.open-fd-count\0004\0\0\0", 120}, {"\31D\6#\0\0\0\6\23\221\33-\357\350\336\20\0\0\0\5\0\0\0\0\345Ob/@P\226\233\10\1\30\0 \10\31D\6#\0\0\0\6\23\221\33-\357\373\321\240\0\0\0\5\0\0\0\0\27$\341,\303\324\25\202\10\1\30\0 \t", 76}], 4) = 268 30049 writev(22, [{"\200\0\0\342", 4}, {"\0\0\0V\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\r\0\5\363\227\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 68}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\1\0\0\0\0\0\0\1|\0\0\0&\0\0\204\1\0\0\0J\0\0\0\2\0\0\0\31\0\0\0\2glusterfs.write-is-append\0000\0\0\0\0\27\0\0\0\2glusterfs.open-fd-count\0004\0\0\0", 120}, {"\31D\6#\0\0\0\6\23\221\33-\360\16\212\230\0\0\0\5\0\0\0\0\4t\22\330\323\r4\10\10\1\30\0 \n", 38}], 4) = 230 30049 writev(22, [{"\200\0\0\314", 4}, {"\0\0\0W\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\"\0\5\363\227\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\364-\206QP\177\0\0\0\0\0\0\0\0\0\0", 68}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0d\0\0\0\2\0\0\0\33\0\0\0\ftrusted.afr.lab-0-client-12\0\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\33\0\0\0\ftrusted.afr.lab-0-client-13\0\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0", 136}], 3) = 208 30049 writev(22, [{"\200\0\0\314", 4}, {"\0\0\0X\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\"\0\5\363\227\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\10\264\205QP\177\0\0\0\0\0\0\0\0\0\0", 68}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0d\0\0\0\2\0\0\0\33\0\0\0\ftrusted.afr.lab-0-client-12\0\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\33\0\0\0\ftrusted.afr.lab-0-client-13\0\377\377\377\377\0\0\0\0\0\0\0\0\0\0\0\0", 136}], 3) = 208 30049 writev(22, [{"\200\0\0\240", 4}, {"\0\0\0Y\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\f\0\5\363\227\0\0\0L\0\0ul\0\0'\24\0\0\23\210\0\0\0\f\0\0\0\4\0\0\0\6\0\0\0.\0\0\0~\0\0\17\241\0\0\17\242\0\0\17\243\0\0\17\245\0\0\17\246\0\0\23\210\0\0'\20\0\0'\21\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 116}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\3\0\0\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0", 44}], 3) = 164 30049 writev(22, [{"\200\0\0\240", 4}, {"\0\0\0Z\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\36\0\5\363\227\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\364-\206QP\177\0\0\0\0\0\0\0\0\0\0", 68}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\2\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0010\0\0\0\0\0\0\0L\0\0\0\0\0\0\0\0\0\0\0\21lab-0-replicate-6\0\0\0\0\0\0\0", 92}], 3) = 164 30049 writev(22, [{"\200\0\0\240", 4}, {"\0\0\0[\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\36\0\5\363\227\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\10\264\205QP\177\0\0\0\0\0\0\0\0\0\0", 68}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\2\0\0\0\2\0\0\0\0\0\0\0\0\0\0\1|\0\0\0\0\0\0\0&\0\0\0\0\0\0\0\0\0\0\0\21lab-0-replicate-6\0\0\0\0\0\0\0", 92}], 3) = 164 30058 writev(22, [{"\200\0\0\240", 4}, {"\0\0\0\\\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\36\0\5\363\227\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\364-\206QP\177\0\0\0\0\0\0\0\0\0\0", 68}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0010\0\0\0\0\0\0\0&\0\0\0\0\0\0\0\0\0\0\0\21lab-0-replicate-6\0\0\0\0\0\0\0", 92}], 3) = 164 30049 writev(22, [{"\200\0\0\314", 4}, {"\0\0\0]\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\"\0\5\363\227\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\364-\206QP\177\0\0\0\0\0\0\0\0\0\0", 68}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0d\0\0\0\2\0\0\0\33\0\0\0\ftrusted.afr.lab-0-client-13\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\33\0\0\0\ftrusted.afr.lab-0-client-12\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0", 136}], 3) = 208 This is the first record that shows up in the file after the gap - sequence number 11. Somehow, records 8,9,10 are getting squashed. 30049 writev(22, [{"\200\0\0\342", 4}, {"\0\0\0^\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\r\0\5\363\227\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 68}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0010\0\0\0&\0\0\204\1\0\0\0J\0\0\0\2\0\0\0\31\0\0\0\2glusterfs.write-is-append\0000\0\0\0\0\27\0\0\0\2glusterfs.open-fd-count\0004\0\0\0", 120}, {"\31D\6#\0\0\0\6\23\221\33-\360mc\330\0\0\0\5\0\0\0\0\366\37\221\333\365V\4\223\10\1\30\0 \v", 38}], 4 <unfinished ...> 30049 writev(22, [{"\200\0\0\200", 4}, {"\0\0\0_\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\20\0\5\363\227\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\364-\206QP\177\0\0\0\0\0\0\0\0\0\0", 68}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\32\0\0\0\1\0\0\0\v\0\0\0\2batch-fsync\0001\0\0\0", 60}], 3) = 132 30049 writev(22, [{"\200\0\0\240", 4}, {"\0\0\0`\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\36\0\5\363\227\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10D\21\206QP\177\0\0\0\0\0\0\0\0\0\0", 68}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\1\0\0\0\0\0\0\0\0\0\0\1V\0\0\0\0\0\0\0&\0\0\0\0\0\0\0\0\0\0\0\21lab-0-replicate-6\0\0\0\0\0\0\0", 92}], 3) = 164 30049 writev(22, [{"\200\0\0\314", 4}, {"\0\0\0a\0\0\0\0\0\0\0\2\0\23\320\5\0\0\1J\0\0\0\"\0\5\363\227\0\0\0\34\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10D\21\206QP\177\0\0\0\0\0\0\0\0\0\0", 68}, {"\300}T\25\223\275Hd\245:%\37UM5\230\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0d\0\0\0\2\0\0\0\33\0\0\0\ftrusted.afr.lab-0-client-13\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\33\0\0\0\ftrusted.afr.lab-0-client-12\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0", 136}], 3) = 208 _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users