Re: Cannot delete some empty dirs and weird sizes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 1, 2012 at 9:02 AM, Amon Ott <a.ott@xxxxxxxxxxxx> wrote:
> On Tuesday 31 January 2012 wrote Gregory Farnum:
>> On Tue, Jan 31, 2012 at 4:00 AM, Amon Ott <a.ott@xxxxxxxxxxxx> wrote:
>> > Hi again!
>> >
>> > We are running Ceph 0.41 and kernel 3.2.2 with current for-linus code
>> > (commit 3d882ce47de80e0294a536bec771b5651885b4d3) now.
>> >
>> > After some heavy workloads we see quite a few directories that cannot be
>> > deleted, although ls and find show that they are empty. rmdir says they
>> > are not empty.
>> >
>> > Additionally, ceph reports various weird size values for some, but not
>> > all of them:
>> > ls -la .tmp/tiny61/.mozilla/firefox/default.yat/
>> > insgesamt 0
>> > drwxr-xr-x 1 tiny61 users 18446744073705748665 25. Jan 10:02 .
>> > drwxr-xr-x 1 tiny61 users 18446744073705748665 25. Jan 10:02 ..
>> >
>> > Is this a known or a new bug? Can it be related to .snap pseudo dirs? The
>> > problem appeared without ever using snapshots, though.
>>
>> I believe this is new. Based on the odd sizes (that's a 64-bit -1
>> interpreted as unsigned, fyi), my guess is that the "recursive
>> accounting" statistics are off and that's leading the MDS to believe
>> the directory is not empty even though it is. It's unlikely to be
>> directly related to snapshots, though it's not impossible.
>>
>> Have you seen this on more than one MDS? If it's reproducible we could
>> more easily figure out the cause; otherwise the best we can do is to
>> maybe fix up the specific instance of it.
>
> I had to recreate ceph fs several times today because of kernel problems. Now
> I have only one dir that is wrong:
> ls -la .tmp/tiny14/.config/pcmanfm/LXDE/
> insgesamt 0
> drwxr-xr-x 1 32252 users 393  1. Feb 15:19 .
> drwxr-xr-x 1 32252 users   0  1. Feb 17:21 ..
>
> This is probably caused by another reboot I had to do, although I think ceph
> should have recovered here. Might also be caused by this setting that I tried
> for a while, it is off now:
> mds standby replay = true
> With this setting, if the active mds gets killed, no mds is able to become
> active, so everything hangs. Had to reboot again.

Hrm. That setting simply tells the non-active MDSes that they should
follow the journal of the active MDS(es). They should still go active
if the MDS they're following fails — although it does slightly
increase the chances of them running into the same bugs in code and
dying at the same time.


> Found that in mds log, the reported wrong size matches the dir total:
>
> 2012-02-01 17:21:51.306561 4f830b70 mds.0.cache.dir(1000000b055) _fetched
> badness: got (but i already had) [inode 100000066c9
> [2,head] /tiny14/.config/pcmanfm/
> LXDE.conf auth v4 s=393 n(v0 b393 1=1+0) (iversion lock) cr={4711=0-4194304@1}
> caps={5313=pAsLsXsFscr/-@1} | caps 0x1d13c600] mode 33188 mtime 2012-01-24
> 15:55:59.0000002012-02-01 17:21:51.306646 4f830b70 log [ERR] : loaded dup
> inode 100000066c9 [2,head] v7 at /tiny14/.config/pcmanfm/LXDE/pcmanfm.conf,
> but inode 100000066c9.head v4 already exists
> at /tiny14/.config/pcmanfm/LXDE.conf
> 2012-02-01 17:21:51.349424 4f830b70 mds.0.cache.dir(100000066ae) mismatch
> between head items and fnode.fragstat! printing dentries
> 2012-02-01 17:21:51.349457 4f830b70 mds.0.cache.dir(100000066ae)
> get_num_head_items() = 2; fnode.fragstat.nfiles=0 fnode.fragstat.nsubdirs=1
> 2012-02-01 17:21:51.349493 4f830b70 mds.0.cache.dir(100000066ae) [dentry
> #1/tiny14/.config/pcmanfm/LXDE [2,head] auth (dversion lock) pv=0 v=16
> inode=0x1cff3828 | inodepin 0x1b9f4de0]
> 2012-02-01 17:21:51.349521 4f830b70 mds.0.cache.dir(100000066ae) [dentry
> #1/tiny14/.config/pcmanfm/LXDE.conf [2,head] auth (dn xlock x=1 by
> 0x1ab21200) (dversion lock w=1 last_client=5313) pv=17 v=16 ap=2+2
> inode=0x1d13c600 | request lock inodepin authpin 0x1b90f064]
> 2012-02-01 17:21:51.349552 4f830b70 mds.0.cache.dir(100000066ae) mismatch
> between child accounted_rstats and my rstats!
> 2012-02-01 17:21:51.349573 4f830b70 mds.0.cache.dir(100000066ae) total of
> child dentrys: n(v0 rc2012-02-01 15:19:55.517733 b786 3=2+1)
> 2012-02-01 17:21:51.349591 4f830b70 mds.0.cache.dir(100000066ae) my rstats:
> n(v3 rc2012-02-01 15:19:55.517733 b393 2=1+1)
> 2012-02-01 17:21:51.349616 4f830b70 mds.0.cache.dir(100000066ae) [dentry
> #1/tiny14/.config/pcmanfm/LXDE [2,head] auth (dversion lock) pv=0 v=16
> inode=0x1cff3828 | inodepin 0x1b9f4de0] n(v0 rc2012-02-01 15:19:55.517733
> b393 2=1+1)
> 2012-02-01 17:21:51.349643 4f830b70 mds.0.cache.dir(100000066ae) [dentry
> #1/tiny14/.config/pcmanfm/LXDE.conf [2,head] auth (dn xlock x=1 by
> 0x1ab21200) (dversion lock w=1 last_client=5313) pv=17 v=16 ap=2+2
> inode=0x1d13c600 | request lock inodepin authpin 0x1b90f064] n(v0 b393 1=1+0)
>
>
> Then killed the active mds, another takes over and suddenly the missing file
> appears:
> .tmp/tiny14/.config/pcmanfm/LXDE/insgesamt 1
> drwxr-xr-x 1 32252 users 393  1. Feb 15:19 .
> drwxr-xr-x 1 32252 users   0  1. Feb 17:21 ..
> -rw-r--r-- 1 root  root  393 24. Jan 15:55 pcmanfm.conf

So were you able to delete that file once it reappeared?

> Restarted the original mds, it does not appear in "ceph mds dump", although it
> is running at 100% cpu. Same happened with other mds processes after killing
> and starting, now I have only one left that is working correctly.

Remind me, you do only have one active MDS, correct?
Did you look at the logs and see what the MDS was doing with 100% cpu?

> Will leave the cluster in this state now and have another look tomorrow -
> maybe the spinning mds processes recover by some miracle.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux