so, I had a drive failure on one of my boxes and it lead to discovery of numerous issues today: 1) when a drive is failing and one of the AFR servers is dealing with IO errors, the other one freaks out and sometimes crashes, but doesn't seem to ever network timeout. 2) when starting gluster on the server with the new empty drive, it gave me a bunch of errors about things being out of sync and to delete a file from all but the preferred server. this struck me as odd, since the thing was empty. so I used the favorite child, but this isn't a preferred solution long term. 3) one of the directories had 20GB of data in it.... I went to do an ls of the directory and had to wait while it auto-healed all the files.. while this is helpful, it would be nice to have gotten back the directory listing without having to wait for 20GB of data to get sent over the network. 4) while the other server was down, the up server kept failing.. signal 11? and I had to constantly remount the filesystem. It was giving me messages about the other node being down which was fine but then it'd just die after a while.. consistently.