Re: 3.6.2, file-write events out of order - data missing temporarily

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Thanks for your reply.

No, the submitters are running Perl daemons, but the actual copy/move is
done by a call to rsync (for bandwidth limited transfer) and then Perl's
builtin rename function to move the file into the correct place. I also
shortened the filenames and paths in my initial example to make it
easier on the eyes :)

I've since created a fresh gluster volume called "callrec" and mounted
it in /mnt/callrec, and have made appropriate path changes in my
scripts. An extract from the log since doing that, of one of these copy
+move operations:

2015-04-10 12:25:14  rsync --bwlimit=5000 /dev/shm/callrec.9380.tmp /mnt/callrec/to_process/tmpspool/callrec_36602_1428664258.85282_1_group.control
2015-04-10 12:25:14  rename /mnt/callrec/to_process/tmpspool/callrec_36602_1428664258.85282_1_group.control /mnt/callrec/to_process/control_files/callrec_36602_1428664258.85282_1_group.control

(I'm only using rsync for the file copy because it's a convenient way to
do a bandwidth limited transfer).

The "control_files" directory is where the processing servers pick up
the job files.

I tried the command you suggested:

[root@gluster1a-1 ~]# gluster volume set callrec performance.flush-behind off
volume set: success
[root@gluster1a-1 ~]#

However, I'm still seeing the same issue, that does not appear to have
fixed it. I then tried restarting the volume and then umounting and
remounting it on the clients in case that helped, but it didn't.

Cheers,
Kingsley.

On Fri, 2015-04-10 at 07:03 -0400, Krutika Dhananjay wrote:
> Hi,
> 
> 
> So are the "submitter" clients exactly doing the same commands that
> you just pasted:
> i.e., cp /localdir/job1234.txt /mnt/gv0/to_process/tmpspool followed
> by mv /mnt/gv0/to_process/tmpspool/job1234.txt /mnt/gv0/to_process in
> a loop?
> Or are they executing a hand-written program perhaps which open()s the
> file, write()s to the file, and then executes a rename() syscall?
> 
> 
> Also, is this issue hit if you turn off flush-behind (by doing a
> `gluster volume set <VOLNAME> performance.flush-behind off`)?
> 
> 
> -Krutika
> 
> ______________________________________________________________________
>         From: "Kingsley" <gluster@xxxxxxxxxxxxxxxxxxx>
>         To: gluster-users@xxxxxxxxxxx
>         Sent: Friday, April 10, 2015 2:01:54 PM
>         Subject:  3.6.2,        file-write events out
>         of order - data missing temporarily
>         
>         
>         Hi,
>         
>         
>         We're running gluster 3.6.2 on CentOS 7, using a
>         replicate-only volume
>         with 4 way replication.
>         
>         
>         We have 10 hosts mounting the volume - 6 running CentOS 6 that
>         submit
>         jobs to a "to-process" directory on the gluster volume, and 4
>         running
>         CentOS 7 that process entries from that directory.
>         
>         
>         So that the 4 "processor" machines don't read partly written
>         files, the
>         submitting machines write to a tmpspool subdirectory first
>         (subdirectory
>         of the to-process directory on the gluster volume) and then
>         move it into
>         the main to-process directory once written, eg:
>         
>         
>         cp /localdir/job1234.txt /mnt/gv0/to_process/tmpspool
>         mv /mnt/gv0/to_process/tmpspool/job1234.txt /mnt/gv0/to_process
>         
>         
>         These job files are small (less than 500 bytes).
>         
>         
>         However, if one of the processor machines picks up one of the
>         files
>         quite quickly after it appears, it sees a smaller (ie not
>         fully written)
>         file. If it waits a few seconds and tries again, the file is
>         complete.
>         
>         
>         Is this a known bug that might be fixed in 3.6.3, or is it a
>         new issue?
>         
>         
>         One I recently saw was a 441 byte file that was moved from
>         tmpspool into
>         to_process by the client machine, but was read from to_process
>         as a 391
>         byte file by one of the processing machines with the last 2
>         lines
>         missing, but read again 3 seconds later with all of the data
>         in place.
>         
>         
>         Curiously, when there is data missing, it's always whole
>         lines; the
>         temporarily-short file never seems to end half way along a
>         line of text.
>         
>         
>         Cheers,
>         Kingsley.
>         
>         
>         _______________________________________________
>         Gluster-users mailing list
>         Gluster-users@xxxxxxxxxxx
>         http://www.gluster.org/mailman/listinfo/gluster-users
> 
> 

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux