glusterfs missing files on ls

stefano.sinigardi at gmail.com (Stefano Sinigardi) · Sun, 2 Jun 2013 15:09:09 +0900

Maybe I was not very clear in the previous email: even the forced rebalance
got stuck in the same folder (not many details in this case from the log),
counting million of files that were not existant, so I stopped that.

Then, before the following paragraph in which I described the fact that I
removed many files not on purpose, because I was stuck, the time suddenly
is skipped to "this past night" (I'm in japan)

     Stefano

On Sun, Jun 2, 2013 at 3:05 PM, Stefano Sinigardi <
stefano.sinigardi at gmail.com> wrote:

> Dear Vijay,
> the filesystem is ext4, on a GPT structured disk, formatted by Ubuntu
> 12.10.
> The rebalance I did was with the command
>
> gluster volume rebalance data start
>
> but in the log it got stuck on a file that I cannot remember (was a small
> working .cpp file, saying that it was going to be moved to an much more
> occupied replica, and it repeated this message until writing a log that was
> a few GB).
> Then I stopped it and restarted with
>
> gluster volume rebalance data start force
>
> in order to get rid of this problems about files going to bricks already
> highly occupied.
> Because I was almost stuck, remembering that a rebalance solved another
> problem I had as a miracle, I retried it, but got stuck in a .dropbox-cache
> folder. That is not a very important folder, so I thought I could remove
> it. I launched a script to find all the files looking at all the bricks but
> removing them from the fuse mountpoint. I don't know what went wrong (the
> script is very simple, the problem maybe was that it was 4 am in the night)
> but the fact is that files got removed calling rm at the bricks
> mountpoints, not the fuse one. So I think that now I'm in a even worse
> situation that before. I just stopped working on it, asking for some time
> from my colleagues (at least data is still there, on the bricks, just
> sparse on all of them) in order to think well about how to proceed (maybe
> destroying it and rebuilding it, but it will be very time consuming as I
> don't have so much free space elsewere to save everything, also it's very
> difficult to save from the fuse mountpoint as it's not listing all the
> files)
>
> Thanks a lot for your support.
> In any case, I'm learning really a lot.
>
>     Stefano
>
>
>
>
> On Sun, Jun 2, 2013 at 2:52 PM, Vijay Bellur <vbellur at redhat.com> wrote:
>
>> On 05/31/2013 03:18 PM, Stefano Sinigardi wrote:
>>
>>> Dear Xavier,
>>> I realized that the volume was not build properly when doing the first
>>> analyses suggested by Davide, but I'm sure that this is not the problem
>>> and so I quickly dismissed it. Also, we need a replica but not so
>>> strictly, maybe in the future with the next volume I'll build it
>>> properly. Anyway, yes, the volume got birth on "pedrillo" with a
>>> replica-2 and the next day was expanded on "osmino", again with
>>> replica-2, just by adding bricks and doing a rebalance, that was just
>>> tried. I'm saying "tried" because it got "stuck", consuming a lot of RAM
>>> (almost all, 16 GB), and it was counting million of files that I think
>>> don't even exist on the volume, so I stopped it. Do you think that it
>>> might be worth restarting?
>>>
>>
>> I might have missed this detail in the thread. What is the disk
>> filesystem on the bricks?
>>
>> Can you list the exact rebalance command that was triggered?
>>
>>
>> Thanks,
>> Vijay
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://supercolony.gluster.org/pipermail/gluster-users/attachments/20130602/3ca8ad41/attachment-0001.html>