At 06:43 AM 1/8/2009, artur.k wrote: >if the disc will fall on a single server glaster remove files from >the second (good server) ? > >noc-xx-2:/mnt/glusterfs# echo 7 > 7 >bash: 7: Input/output error >noc-xx-2:/mnt/glusterfs# >noc-xx-2:/mnt/glusterfs# echo 7 > 7 >bash: 7: Input/output error >noc-xx-2:/mnt/glusterfs# echo 7 > 7 >noc-xx-2:/mnt/glusterfs# >noc-xx-2:/mnt/glusterfs# > > >ehh... if the 7 file exist only once server. on second server >deleted by rm -f trac-xx-1:/var/storage/glusterfs/7 by deleting random files from the back-end filesystem, you are NOT simulating a disk failure. if there is a disk failure, you end up with a CLEAN/EMPTY filesystem which has NO extended attributes. As such, when you start gluster, it will see empty and no attributes. As such it will check with the other AFR/HA server and will see that is has extended attributes and they are up to date and that it is the place to get the latest files form. Then it was auto-magically start 'healing' the entire disk. I've, in fact, had this exact situation happen to me in late December and I watched magically as the entire filesystem was recovered from the other server. by simply deleting parts of the filesystem, you are not affecting the extended attributes so once the filesystem is managed by gluster again, gluster tries to put it back into the last known good state from glusters perspective. So, again, if you're trying to simulate a failed drive, then do this: turn off gluster delete the entire directory tree (including the last directory in the mountpoint) then make the directory again (this clears out the extended attributes) and then remoung the gluster mountpoint. For the examples you gave, here's what I'm understanding: on the client you have the gluster filesystem mounted on /mnt/gluster on the server you are serving files from: /var/storage/glusterfs Now, once you are using /var/storage/glusterfs within gluster you SHOULD NOT EVER modify this filesystem OUTSIDE of gluster. IF you want to modify this filesystem you should mount it via gluster on the server, and then make changes that way. so, if you want to remove a file: /var/storage/glusterfs/xyz you should create a client volume on the server with the HA brick in it (it doesn't necessarily have to have both servers, it can have just itself as it's own server), but (I think) it needs to have the ha/afr brick configured since that's the layer which affects the extended attributes. then mount this somewhere on the server /var/local/glusterfs and then you can rm /var/local/glusterfs/xyz you can verify that it was removed from /var/storage/glusterfs/ and then you can restart the client and you'll see that it now auto-heals as you expect. this is because the extended attributes on /var/storage/glusterfs are updated and correct. I suppose the best analogy to what you're simulating with your examples would be to basically take a hard drive, scrach the platter, stick it back in and wonder why it's not giving you the right data. you're effectively altering the "media" the filesytem runs on by altering the underlying filesystem outside of gluster. As for your input/output errors, again, make sure you're on a recent build since I think most of those are addressed.