On Mon, Jan 25, 2016 at 9:43 PM, Burkhard Linke <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote: > Hi, > > On 01/25/2016 01:05 PM, Yan, Zheng wrote: >> >> On Mon, Jan 25, 2016 at 3:43 PM, Burkhard Linke >> <Burkhard.Linke@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote: >>> >>> Hi, >>> >>> there's a rogue file in our CephFS that we are unable to remove. Access >>> to >>> the file (removal, move, copy, open etc.) results in the MDS starting to >>> spill out the following message to its log file: >>> >>> 2016-01-25 08:39:09.623398 7f472a0ee700 0 mds.0.cache >>> open_remote_dentry_finish bad remote dentry [dentry #1/<some file> >>> [2,head] >>> auth REMOTE(reg) (dversion lock) pv=0 v=1779105 inode=0 0x24bb98c0] >>> >>> I have to restart the MDS to keep it from filling up the log file >>> partition. >>> How can I get rid of this file? >>> >> running following command can delete the file >> >> first, flush mds journal >> >> # ceph daemon mds.xxx flush journal >> >> then, running following command >> >> #rados -p metadata listomapkeys 1.00000000 >> >> the output should contain a entry named "<some file>_head" >> >> finally run command >> >> #rados -p metadata rmomapkey 1.00000000 <some file>_head >> >> >> >> Before running above commands, please help us debug this issue. Set >> debug_mds = 10, restart mds and access the bad file. > > The situation is already partly resolved due to the restart of the MDS > service. This had to happen twice (after trying to remove the file and after > trying to move it to a different directory). The file was first listed in > the target directory of the move operation, but vanished after some time. > Nonetheless it is still available as RADOS object ('rados getxattr <inode > hex>.00000000 parent' contains the string 'stray', get dumps its content). > which pool does the Rados object live in ? Regards Yan, Zheng > But listing the now empty directory results in a single log entry: > > 2016-01-25 14:24:10.205747 7f9d9c8c3700 0 mds.0.cache > open_remote_dentry_finish bad remote dentry [dentry #1/volumes/attic/<some > filename> [2,head] auth REMOTE(reg) (dversion lock) pv=0 v=183069 inode=0 > 0x10527d1c0] > > After setting debug level to 10 (using ceph daemon ... config set ... ), the > following output refers to the broken file: > > 2016-01-25 14:25:23.204785 7f9d9afbf700 10 mds.0.cache open_remote_dentry > [dentry #1/volumes/attic/<some filename> [2,head] auth REMOTE(reg) (dversion > loc > k) pv=0 v=183069 inode=0 0x105cee640] > 2016-01-25 14:25:23.204794 7f9d9afbf700 10 mds.0.cache open_ino 10002af7f78 > pool -1 want_replica 1 > 2016-01-25 14:25:23.204802 7f9d9afbf700 10 mds.0.cache do_open_ino_peer > 10002af7f78 active 0 all 0 checked 0 > 2016-01-25 14:25:23.204806 7f9d9afbf700 10 mds.0.cache all MDS peers have > been checked > 2016-01-25 14:25:23.207729 7f9d9c8c3700 10 MDSIOContextBase::complete: > 32C_IO_MDC_OpenInoBacktraceFetched > 2016-01-25 14:25:23.207744 7f9d9c8c3700 10 mds.0.cache > _open_ino_backtrace_fetched ino 10002af7f78 errno -2 > 2016-01-25 14:25:23.207749 7f9d9c8c3700 10 mds.0.cache no object in pool 7, > retrying pool 8 > 2016-01-25 14:25:23.210403 7f9d9c8c3700 10 MDSIOContextBase::complete: > 32C_IO_MDC_OpenInoBacktraceFetched > 2016-01-25 14:25:23.210444 7f9d9c8c3700 10 mds.0.cache > _open_ino_backtrace_fetched ino 10002af7f78 errno -2 > 2016-01-25 14:25:23.210450 7f9d9c8c3700 10 mds.0.cache failed to open ino > 10002af7f78 > 2016-01-25 14:25:23.210452 7f9d9c8c3700 10 mds.0.cache open_ino_finish ino > 10002af7f78 ret -2 > 2016-01-25 14:25:23.210524 7f9d9c8c3700 10 MDSInternalContextBase::complete: > 22C_MDC_OpenRemoteDentry > 2016-01-25 14:25:23.210530 7f9d9c8c3700 0 mds.0.cache > open_remote_dentry_finish bad remote dentry [dentry #1/volumes/attic/<some > filename> [2,head] auth REMOTE(reg) (dversion lock) pv=0 v=183069 inode=0 > 0x105cee640] > 2016-01-25 14:25:23.210543 7f9d9c8c3700 10 MDSInternalContextBase::complete: > 18C_MDS_RetryRequest > 2016-01-25 14:25:23.210545 7f9d9c8c3700 7 mds.0.server > dispatch_client_request client_request(client.1297540:16260418 readdir > #100023ff2a0 2016-01-25 14:25:23.218538) v2 > > 100023ff2a0 is the containing directory, the output is written during > running ls for that directory. Metadata pool id is 8, pool 7, 12 and 20 are > datapools > > Just send me a notice if you need more debug output (either mail or IRC). > The other commands mentioned above have not been run yet. > > > Regards, > Burkhard > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com