On 01/04/2013 10:18 PM, ??? wrote: > Yes the filesystem shutdown cause glusterfs confuse since glusterfsd > is still live for that brick and it tries to retrive file extended > attributes and fails. > When access some of the files from client side "Input output error > occur", the symptom is same as underlying filesystem doesn't support > extended attribute. (For example create a volume on /dev/shm) > However I still hope glusterfs replica to handle this kind if failure > since this is what it supposed to do. (Fault tolerance?for single > hardware failure) > Your corruption issue aside, I was able to reproduce the EIO errors by running an untar on a replica volume and shutting down the XFS filesystem for one of my bricks (via the 'godown' utility in xfstests). I've filed the following gluster bug to track: https://bugzilla.redhat.com/show_bug.cgi?id=892730 Thanks for calling this out. Brian > 2013/1/4, Brian Foster <bfoster at redhat.com>: >> On 01/04/2013 01:00 AM, ??? wrote: >>> Dear gluster experts, >>> >>> Glusterfs replica is supposed to handle hardware failure of one >>> brick.(For example power outage etc). However we recently encounter an >>> issue related to xfs file system crash and shutdown. When it happens >>> the whole volume dones't work. Some files are inaccessible and even >>> worse some directories become inaccessible which make thousands of >>> files missing. >>> To handle it we have to force shutdown the peer. This solves the >>> problem but our services are impacted and data loose happens. >>> Glusterfs replica should be able to handle brick filesystem shutdown >>> smoothly. What's your opinion to avoid this kind of failure? >>> >> >> Hi, >> >> First, I would suggest you independently characterize your XFS crash to >> the XFS mailing list (xfs at oss.sgi.com): >> >> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F >> >> Hopefully they can help assess the state and possible recovery of your >> local filesystem. How to proceed on the gluster side of things probably >> depends on the outcome of that analysis. My guess is that the filesystem >> going into a shutdown state probably causes confusion for gluster, due >> to the runtime limitations it imposes on the filesystem. I haven't >> actually tested an active gluster mount on a brick in the shutdown >> state, so I can't specifically characterize the state (at minimum, I'd >> expect read-only behavior), but I'll give it a try and see what happens... >> >> Brian >> > >