On Fri, 2016-07-15 at 21:41 +0530, Ravishankar N wrote: > On 07/15/2016 09:32 PM, Kingsley wrote: > > On Fri, 2016-07-15 at 21:06 +0530, Ravishankar N wrote: > >> On 07/15/2016 08:48 PM, Kingsley wrote: > >>> I don't have star installed so I used ls, > >> Oops typo. I meant `stat`. > >>> but yes they all have 2 links > >>> to them (see below). > >>> > >> Everything seems to be in place for the heal to happen. Can you tailf > >> the output of shd logs on all nodes and manually launch gluster vol heal > >> volname? > >> Use DEBUG log level if you have to and examine the output for clues. > > I presume I can do that with this command: > > > > gluster volume set callrec diagnostics.brick-log-level DEBUG > shd is a client process, so it is diagnostics.client-log-level. This > would affect your mounts too. > > > > How can I find out what the log level is at the moment, so that I can > > put it back afterwards? > INFO. you can also use `gluster volume reset`. Thanks. > >> Also, some dumb things to check: are all the bricks really up and is the > >> shd connected to them etc. > > All bricks are definitely up. I just created a file on a client and it > > appeared in all 4 bricks. > > > > I don't know how to tell whether the shd is connected to all of them, > > though. > Latest messages like "connected to client-xxx " and "disconnected from > client-xxx" in the shd logs. Just like in the mount logs. This has revealed something. I'm now seeing lots of lines like this in the shd log: [2016-07-15 16:20:51.098152] D [afr-self-heald.c:516:afr_shd_index_sweep] 0-callrec-replicate-0: got entry: eaa43674-b1a3-4833-a946-de7b7121bb88 [2016-07-15 16:20:51.099346] D [client-rpc-fops.c:1523:client3_3_inodelk_cbk] 0-callrec-client-2: remote operation failed: Stale file handle [2016-07-15 16:20:51.100683] D [client-rpc-fops.c:2686:client3_3_opendir_cbk] 0-callrec-client-2: remote operation failed: Stale file handle. Path: <gfid:eaa43674-b1a3-4833-a946-de7b7121bb88> (eaa43674-b1a3-4833-a946-de7b7121bb88) [2016-07-15 16:20:51.101180] D [client-rpc-fops.c:1627:client3_3_entrylk_cbk] 0-callrec-client-2: remote operation failed: Stale file handle [2016-07-15 16:20:51.101663] D [client-rpc-fops.c:1627:client3_3_entrylk_cbk] 0-callrec-client-2: remote operation failed: Stale file handle [2016-07-15 16:20:51.102056] D [client-rpc-fops.c:1627:client3_3_entrylk_cbk] 0-callrec-client-2: remote operation failed: Stale file handle These lines continued to be written to the log even after I manually launched the self heal (which it told me had been launched successfully). I also tried repeating that command on one of the bricks that was giving those messages, but that made no difference. Client 2 would correspond to the one that had been offline, so how do I get the shd to reconnect to that brick? I did a ps but I couldn't see any processes with glustershd in the name, else I'd have tried sending that a HUP. Cheers, Kingsley. _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users