Maybe I was not very clear in the previous email: even the forced rebalance got stuck in the same folder (not many details in this case from the log), counting million of files that were not existant, so I stopped that. Then, before the following paragraph in which I described the fact that I removed many files not on purpose, because I was stuck, the time suddenly is skipped to "this past night" (I'm in japan) Stefano On Sun, Jun 2, 2013 at 3:05 PM, Stefano Sinigardi < stefano.sinigardi at gmail.com> wrote: > Dear Vijay, > the filesystem is ext4, on a GPT structured disk, formatted by Ubuntu > 12.10. > The rebalance I did was with the command > > gluster volume rebalance data start > > but in the log it got stuck on a file that I cannot remember (was a small > working .cpp file, saying that it was going to be moved to an much more > occupied replica, and it repeated this message until writing a log that was > a few GB). > Then I stopped it and restarted with > > gluster volume rebalance data start force > > in order to get rid of this problems about files going to bricks already > highly occupied. > Because I was almost stuck, remembering that a rebalance solved another > problem I had as a miracle, I retried it, but got stuck in a .dropbox-cache > folder. That is not a very important folder, so I thought I could remove > it. I launched a script to find all the files looking at all the bricks but > removing them from the fuse mountpoint. I don't know what went wrong (the > script is very simple, the problem maybe was that it was 4 am in the night) > but the fact is that files got removed calling rm at the bricks > mountpoints, not the fuse one. So I think that now I'm in a even worse > situation that before. I just stopped working on it, asking for some time > from my colleagues (at least data is still there, on the bricks, just > sparse on all of them) in order to think well about how to proceed (maybe > destroying it and rebuilding it, but it will be very time consuming as I > don't have so much free space elsewere to save everything, also it's very > difficult to save from the fuse mountpoint as it's not listing all the > files) > > Thanks a lot for your support. > In any case, I'm learning really a lot. > > Stefano > > > > > On Sun, Jun 2, 2013 at 2:52 PM, Vijay Bellur <vbellur at redhat.com> wrote: > >> On 05/31/2013 03:18 PM, Stefano Sinigardi wrote: >> >>> Dear Xavier, >>> I realized that the volume was not build properly when doing the first >>> analyses suggested by Davide, but I'm sure that this is not the problem >>> and so I quickly dismissed it. Also, we need a replica but not so >>> strictly, maybe in the future with the next volume I'll build it >>> properly. Anyway, yes, the volume got birth on "pedrillo" with a >>> replica-2 and the next day was expanded on "osmino", again with >>> replica-2, just by adding bricks and doing a rebalance, that was just >>> tried. I'm saying "tried" because it got "stuck", consuming a lot of RAM >>> (almost all, 16 GB), and it was counting million of files that I think >>> don't even exist on the volume, so I stopped it. Do you think that it >>> might be worth restarting? >>> >> >> I might have missed this detail in the thread. What is the disk >> filesystem on the bricks? >> >> Can you list the exact rebalance command that was triggered? >> >> >> Thanks, >> Vijay >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130602/3ca8ad41/attachment-0001.html>