MORE AFR problems with 1.4qa63

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 10:17 AM 11/24/2008, Anand Avati wrote:
>Please see comments/questions inline.
>
>I'm also noticing a problem with file times using AFR
>
>it seems that the file times get set to the time the file was AFR'ed
>to the other server.
>
>
>Do you mean "heal"ed to the other server? In normal operation AFR 
>modifies both servers together at the same time.

well. I honestly dont know exactly what's happening.
the servers shouldn't need to "heal" because they're never offline 
and not out of communication, so yes, it should be modifying both 
simultaneously, but yet I constantly am getting Input/Output errors.

>here's what happens.
>we have a process which modifies a file at 1:17 on server1
>this file get's AFR'ed to server 2, but it takes some time and the
>file gets there at 1:18
>
>
>What do you mean 'a file is modified at 1:17 on server1' ? Is it 
>modifying the backend directly? Is it modifying from the mountpoint 
>with server2 offline? Or are you just considering a network delay 
>pushing the 'modification' to happen a minute late on server2?

When I notice the problem most is when I upload files with 
dreamweaver via FTP.  later I'll update the same file go to push it 
and it will report back that the file was modified on the server at 
1:18 and our last push to the server was 1:17, so it thinks the file 
was modified.
looking at the logs it just seems things aren't working in 
general.  these Input/Output errors shouldn't ever happen, and it's just odd.

everything is using the gluster mount point.  the only time I touch 
the back end filesystem is when I delete because the log says "should 
be deleted from all but the preferred server" and, since the servers 
are always communicating, I can't understand why this happens at all.

>so, the process which updated the file knows it was updated at 1:17,
>it now connects to the other server and sees that the file there is
>newer than it thinks it should be so it raises an error.
>
>
>As long as both the servers are online, the times are returned from 
>the first subvolume, so in both the cases the process should see the 
>mtime at 1:17.

should yes.. but this is NOT what's happening.


>Also, I believe this is part of the problem with what I'm currently
>getting, which are a bunch of Input/Output errors from gluster itself.
>the error logs look like this:
>[afr-self-heal-data.c:767:afr_sh_data_fix] home: Unable to resolve
>conflicting data of /XYZ/public_html/brokenfile. Please resolve
>manually by deleting the file /XYZ/public_html/brokenfile from all
>but the preferred subvolume
>[fuse-bridge.c:605:fuse_fd_cbk] glusterfs-fuse: 3013026: OPEN()
>/XYZ/public_html/brokenfile => -1 (Input/output error)
>
>the frustration is that in these cases both servers are on and active
>and working yet, gluster seems to be causing it's own
>problems.  Again, I believe it's dues to the timestamps on the
>underlying filesystem not being what is expected.
>
>
>The EIO problem is unrelated to mtimes. We are investigating the EIO 
>problem already.

ok. that's good to know that it's not related, hopefully this will be 
fixed soon too :)

>avati





[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux