Re: Cannot delete some empty dirs and weird sizes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tuesday 31 January 2012 wrote Gregory Farnum:
> On Tue, Jan 31, 2012 at 4:00 AM, Amon Ott <a.ott@xxxxxxxxxxxx> wrote:
> > Hi again!
> >
> > We are running Ceph 0.41 and kernel 3.2.2 with current for-linus code
> > (commit 3d882ce47de80e0294a536bec771b5651885b4d3) now.
> >
> > After some heavy workloads we see quite a few directories that cannot be
> > deleted, although ls and find show that they are empty. rmdir says they
> > are not empty.
> >
> > Additionally, ceph reports various weird size values for some, but not
> > all of them:
> > ls -la .tmp/tiny61/.mozilla/firefox/default.yat/
> > insgesamt 0
> > drwxr-xr-x 1 tiny61 users 18446744073705748665 25. Jan 10:02 .
> > drwxr-xr-x 1 tiny61 users 18446744073705748665 25. Jan 10:02 ..
> >
> > Is this a known or a new bug? Can it be related to .snap pseudo dirs? The
> > problem appeared without ever using snapshots, though.
>
> I believe this is new. Based on the odd sizes (that's a 64-bit -1
> interpreted as unsigned, fyi), my guess is that the "recursive
> accounting" statistics are off and that's leading the MDS to believe
> the directory is not empty even though it is. It's unlikely to be
> directly related to snapshots, though it's not impossible.
>
> Have you seen this on more than one MDS? If it's reproducible we could
> more easily figure out the cause; otherwise the best we can do is to
> maybe fix up the specific instance of it.

I had to recreate ceph fs several times today because of kernel problems. Now 
I have only one dir that is wrong:
ls -la .tmp/tiny14/.config/pcmanfm/LXDE/
insgesamt 0
drwxr-xr-x 1 32252 users 393  1. Feb 15:19 .
drwxr-xr-x 1 32252 users   0  1. Feb 17:21 ..

This is probably caused by another reboot I had to do, although I think ceph 
should have recovered here. Might also be caused by this setting that I tried 
for a while, it is off now:
mds standby replay = true
With this setting, if the active mds gets killed, no mds is able to become 
active, so everything hangs. Had to reboot again.

Found that in mds log, the reported wrong size matches the dir total:

2012-02-01 17:21:51.306561 4f830b70 mds.0.cache.dir(1000000b055) _fetched  
badness: got (but i already had) [inode 100000066c9
[2,head] /tiny14/.config/pcmanfm/
LXDE.conf auth v4 s=393 n(v0 b393 1=1+0) (iversion lock) cr={4711=0-4194304@1} 
caps={5313=pAsLsXsFscr/-@1} | caps 0x1d13c600] mode 33188 mtime 2012-01-24 
15:55:59.0000002012-02-01 17:21:51.306646 4f830b70 log [ERR] : loaded dup 
inode 100000066c9 [2,head] v7 at /tiny14/.config/pcmanfm/LXDE/pcmanfm.conf, 
but inode 100000066c9.head v4 already exists 
at /tiny14/.config/pcmanfm/LXDE.conf
2012-02-01 17:21:51.349424 4f830b70 mds.0.cache.dir(100000066ae) mismatch 
between head items and fnode.fragstat! printing dentries
2012-02-01 17:21:51.349457 4f830b70 mds.0.cache.dir(100000066ae) 
get_num_head_items() = 2; fnode.fragstat.nfiles=0 fnode.fragstat.nsubdirs=1
2012-02-01 17:21:51.349493 4f830b70 mds.0.cache.dir(100000066ae) [dentry 
#1/tiny14/.config/pcmanfm/LXDE [2,head] auth (dversion lock) pv=0 v=16 
inode=0x1cff3828 | inodepin 0x1b9f4de0]
2012-02-01 17:21:51.349521 4f830b70 mds.0.cache.dir(100000066ae) [dentry 
#1/tiny14/.config/pcmanfm/LXDE.conf [2,head] auth (dn xlock x=1 by 
0x1ab21200) (dversion lock w=1 last_client=5313) pv=17 v=16 ap=2+2 
inode=0x1d13c600 | request lock inodepin authpin 0x1b90f064]
2012-02-01 17:21:51.349552 4f830b70 mds.0.cache.dir(100000066ae) mismatch 
between child accounted_rstats and my rstats!
2012-02-01 17:21:51.349573 4f830b70 mds.0.cache.dir(100000066ae) total of 
child dentrys: n(v0 rc2012-02-01 15:19:55.517733 b786 3=2+1)
2012-02-01 17:21:51.349591 4f830b70 mds.0.cache.dir(100000066ae) my rstats:              
n(v3 rc2012-02-01 15:19:55.517733 b393 2=1+1)
2012-02-01 17:21:51.349616 4f830b70 mds.0.cache.dir(100000066ae) [dentry 
#1/tiny14/.config/pcmanfm/LXDE [2,head] auth (dversion lock) pv=0 v=16 
inode=0x1cff3828 | inodepin 0x1b9f4de0] n(v0 rc2012-02-01 15:19:55.517733 
b393 2=1+1)
2012-02-01 17:21:51.349643 4f830b70 mds.0.cache.dir(100000066ae) [dentry 
#1/tiny14/.config/pcmanfm/LXDE.conf [2,head] auth (dn xlock x=1 by 
0x1ab21200) (dversion lock w=1 last_client=5313) pv=17 v=16 ap=2+2 
inode=0x1d13c600 | request lock inodepin authpin 0x1b90f064] n(v0 b393 1=1+0)


Then killed the active mds, another takes over and suddenly the missing file 
appears:
.tmp/tiny14/.config/pcmanfm/LXDE/insgesamt 1
drwxr-xr-x 1 32252 users 393  1. Feb 15:19 .
drwxr-xr-x 1 32252 users   0  1. Feb 17:21 ..
-rw-r--r-- 1 root  root  393 24. Jan 15:55 pcmanfm.conf

Restarted the original mds, it does not appear in "ceph mds dump", although it 
is running at 100% cpu. Same happened with other mds processes after killing 
and starting, now I have only one left that is working correctly.

Will leave the cluster in this state now and have another look tomorrow - 
maybe the spinning mds processes recover by some miracle.

Amon Ott
-- 
Dr. Amon Ott
m-privacy GmbH           Tel: +49 30 24342334
Am Köllnischen Park 1    Fax: +49 30 24342336
10179 Berlin             http://www.m-privacy.de

Amtsgericht Charlottenburg, HRB 84946

Geschäftsführer:
 Dipl.-Kfm. Holger Maczkowsky,
 Roman Maczkowsky

GnuPG-Key-ID: 0x2DD3A649
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux