Vijay,
I haven't heard back from anyone yet. I have some more information
about one of the problems.
I have a program that write()'s to a file, keeping the file open.
While this program is writing, restart the nodes one by one. After
the nodes have been restarted no new data is written to the file.
However, the program doing the write() still gets the correct num
bytes returned by the system call and behaves as if everything is
working when it clearly isn't.
Meanwhile, if I tail this same file on another client while I reboot
the nodes, I eventually get "tail: /gluster/m/test: File descriptor in
bad state"
At some point gluster realizes it can't deal with this file and
reports back file descriptor in bad state to the reader, but continues
to happily report success to the program doing the writes.
The first part of this problem (open files not surviving gluster
restarts) seems like a pretty major design flaw that needs to be
fixed. The second part (gluster not reporting the error to the
writer when gluster chokes) is a critical problem that needs to be
fixed. However, it seems that there isn't much interest in fixing
these types of things. I've spent some time reading back in the mail
archives and there seems to be a pattern of instability and silence on
the part of the developers. This really isn't the way to make your
project a success and get advocates of your software.
I want to help identify issues and provide information to help get
things fixed, but I feel like i'm talking to deaf ears.
Please advice on how I can help on these issues.
--brian
On Aug 31, 2009, at 12:58 PM, Brian Hirt wrote:
Vijay,
Yes, I am using the same distributed-replicate scenario.
The file in the export directory does contains the correct
information, but somewhere along the line something being
communicated to the operating system by gluster must be wrong. I
say this because the client trying to read from an open file is not
getting the proper data returned from the system calls which seems
to point to a bug in glusterfs.
I've also run into something the might be related but seems much
more serious. A program writing to a glusterfs file will fail when
you restart You can recreate the problem by:
1) have a program open a file on a glusterfs, write data to a file
periodically
2) while the file is being written to, one by one restart all the
gluster servers, waiting for the previous server to come back online
At all points in time, three of the four gluster servers are up and
running, however the program trying to write data to the file
fails. This is a huge issue for any program that keeps a file open
for writing for more than a second or two.
As for the temporary files created by rsync, I'm willing to believe
they are benign in this particular situation. However, something
seems wrong the idea that gluster would expect to have a file, try
to lstat it only to find it's not there. Shouldn't gluster know
where the files it maintains are? It really feels like a race
condition that will be triggered in other situations where it's not
so benign.
Thanks for any help you can provide.
--brian
On Aug 30, 2009, at 10:05 AM, Vijay Bellur wrote:
Brian Hirt wrote:
I'm running into some problems where one process is writing a log
file to a and another is reading from it. The process reading the
file is not behaving as expected.
I am assuming you are using the distributed-replicate scenario that
you mentioned in the previous mail. Can you please confirm if the
file in the export
directory contains data that you did not intend to create?
I'm also continuing to get hundreds of the errors I mentioned in
that message with rsync.
[2009-08-28 10:21:20] E [posix.c:1155:posix_chmod] posix: lstat
on /gluster/exports/redacted/.1218486082-01.jpg.nkOkw9 failed: No
such file or directory
These are usually to do with temporary files created during a
rsync. These error messages would be benign in nature unless you
notice a discrepancy between the original and rsync'd directories.
Regards,
Vijay
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel