Check your client logs. I have seen that with network issues causing disconnects. Harry Mangalam <hjmangalam at gmail.com> wrote: >Thanks for your comments. > >I use mdadm on many servers and I've seen md numbering like this a fair >bit. Usually it occurs after a another RAID has been created and the >numbering shifts. Neil Brown (mdadm's author) , seems to think it's fine. > So I don't think that's the problem. And you're right - this is a >Frankengluster made from a variety of chassis and controllers and normally >it's fine. As Brian noted, it's all the same to gluster, mod some small >local differences in IO performance. > >Re the size difference, I'll explicitly rebalance the brick after the >fix-layout finishes, but I'm even more worried about this fantastic >increase in CPU usage and its effect on user performance. > >In the fix-layout routines (still running), I've seen CPU usage of >glusterfsd rise to ~400% and loadavg go up to >15 on all the servers >(except the pbs3, the one that originally had that problem). That high >load does not last long tho (maybe a few mintes - we've just installed >nagios on these nodes and I'm getting a ton of emails about load increasing >and then decreasing on all the nodes (except pbs3). When the load goes >very high on a server node, the user-end performance drops appreciably. > >hjm > > > >On Sat, Aug 11, 2012 at 4:20 AM, Brian Candler <B.Candler at pobox.com> wrote: > >> On Sat, Aug 11, 2012 at 12:11:39PM +0100, Nux! wrote: >> > On 10.08.2012 22:16, Harry Mangalam wrote: >> > >pbs3:/dev/md127 8.2T 5.9T 2.3T 73% /bducgl <--- >> > >> > Harry, >> > >> > The name of that md device (127) indicated there may be something >> > dodgy going on there. A device shouldn't be named 127 unless some >> > problems occured. Are you sure your drives are OK? >> >> I have systems with /dev/md127 all the time, and there's no problem. It >> seems to number downwards from /dev/md127 - if I create md array on the >> same >> system it is /dev/md126. >> >> However, this does suggest that the nodes are not configured identically: >> two are /dev/sda or /dev/sdb, which suggests either plain disk or hardware >> RAID, while two are /dev/md0 or /dev/127, which is software RAID. >> >> Although this could explain performance differences between the nodes, this >> is transparent to gluster and doesn't explain why the files are unevenly >> balanced - unless there is one huge file which happens to have been >> allocated to this node. >> >> Regards, >> >> Brian. >> >> _______________________________________________ >> Gluster-users mailing list >> Gluster-users at gluster.org >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >> > > > >-- >Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine >[m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 >415 South Circle View Dr, Irvine, CA, 92697 [shipping] >MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://gluster.org/cgi-bin/mailman/listinfo/gluster-users