errors during rebalance, EC2

bcipriano at zerovfx.com (Brian Cipriano) · Wed, 06 Feb 2013 16:13:51 -0500

Hi all -

Having some issues doing a rebalance on our gluster, hoping to get some 
input.

We're running gluster 3.3.0 on Amazon EC2 nodes. When we try to do a 
rebalance, we see a very high error rate. Lots of these:

[2013-02-06 02:53:17.032693] W [client3_1-fops.c:474:client3_1_stat_cbk] 
0-uswest2-client-0: remote operation failed: No such file or directory

[2013-02-06 02:53:17.361387] W 
[client3_1-fops.c:258:client3_1_mknod_cbk] 0-uswest2-client-4: remote 
operation failed: File exists. Path: /path/to/file 
(00000000-0000-0000-0000-000000000000)

I.e., various errors indicating a failure to copy files. When these 
errors occur, they seem to result in corrupted files.

Has anyone else had this problem? Any suggestions?

Our bricks in this case are AWS EBS volumes, i.e., they are virtual 
drives that are networked, but appear to the system as locally attached 
drives. I wonder if this has something to do with it - some slight lag 
is causing gluster or fuse to think there's a file error, when really it 
just needs to wait a bit longer?

Thanks for your help,

- brian