Re: Production 12.2.1 CephFS keeps crashing (assert(inode_map.count(in->vino()) == 0)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 13, 2017 at 6:45 PM, Tobias Prousa <tobias.prousa@xxxxxxxxx> wrote:
> Hi there,
>
> sorry to disturb you again but I'm still not there. After restoring my
> CephFS to a working state (with a lot of help from Yan, Zheng, thank you so
> much), I got my CephFS back working by restarting MDSs and ramping up first
> clients to use it. Everything looked promising for about 1 hour. Then
> suddenly MDSs started failing, standbys dropped in until all standbys were
> used up and last MDSs failed.
>
> /build/ceph-12.2.2/src/mds/MDCache.cc: 258: FAILED
> assert(inode_map.count(in->vino()) == 0)
>
>  ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous
> (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x102) [0x55b9e585baf2]
>  2: (MDCache::add_inode(CInode*)+0x285) [0x55b9e5601495]
>  3: (Server::prepare_new_inode(boost::intrusive_ptr<MDRequestImpl>&, CDir*,
> inodeno_t, unsigned int, file_layout_t*)+0x1089) [0x55b9e55ac169]
>  4:
> (Server::handle_client_openc(boost::intrusive_ptr<MDRequestImpl>&)+0xf86)
> [0x55b9e55ae1c6]
>  5:
> (Server::dispatch_client_request(boost::intrusive_ptr<MDRequestImpl>&)+0xcd0)
> [0x55b9e55bfc50]
>  6: (Server::handle_client_request(MClientRequest*)+0x2b6) [0x55b9e55bff86]
>  7: (Server::dispatch(Message*)+0x37b) [0x55b9e55c474b]
>  8: (MDSRank::handle_deferrable_message(Message*)+0x7fc) [0x55b9e553a15c]
>  9: (MDSRank::_dispatch(Message*, bool)+0x1db) [0x55b9e554756b]
>  10: (MDSRankDispatcher::ms_dispatch(Message*)+0x15) [0x55b9e5548335]
>  11: (MDSDaemon::ms_dispatch(Message*)+0xf3) [0x55b9e5531b73]
>  12: (DispatchQueue::entry()+0x7ca) [0x55b9e5b58a9a]
>  13: (DispatchQueue::DispatchThread::entry()+0xd) [0x55b9e58e05fd]
>  14: (()+0x8064) [0x7fe6fa563064]
>  15: (clone()+0x6d) [0x7fe6f963362d]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
> I now reduced load by removing most active clients again, only 3 mostly idle
> clients remaining and MDS keeps up and running again. As soon as I start
> adding clients trouble returns.
>
> As long as MDS is up data consistency looks good, there simply seems to be
> something in the objects that upsets MDS to assertion.

It seems there still are used inode numbers are wrongly marked as free
in inodetable.  Probably you didn't take enough inode number when you
ran "cephfs-table-tool take_inos".  please try:

- remove all clients
- run 'ceph daemon mds.x flush journal'
- stop mds
- run 'cephfs-journal-tool event recover_dentries summary'
- run 'cephfs-journal-tool journal reset'
- run 'cephfs-table-tool all reset session'
- use 'cephfs-table-tool take_inos' to remove more free inode numbers
from inodetable. (remove 100k should be enough)


>
> Any idea anyone?
>
> Again, this all started to happen after I "trivially" upgraded my long-lived
> cluster (deployes back in bobtail age) from 12.2.1 to 12.2.2 last friday.
> Btw. the cluster went through all versions from bobtail up to luminous
> whenever a new version was officially released.
>
> Thank you so much for any pointer!
>
> Best regards,
> Tobi
>
>
>
> --
> -----------------------------------------------------------
> Dipl.-Inf. (FH) Tobias Prousa
> Leiter Entwicklung Datenlogger
>
> CAETEC GmbH
> Industriestr. 1
> D-82140 Olching
> www.caetec.de
>
> Gesellschaft mit beschränkter Haftung
> Sitz der Gesellschaft: Olching
> Handelsregister: Amtsgericht München, HRB 183929
> Geschäftsführung: Stephan Bacher, Andreas Wocke
>
> Tel.: +49 (0)8142 / 50 13 60
> Fax.: +49 (0)8142 / 50 13 69
>
> eMail: tobias.prousa@xxxxxxxxx
> Web:   http://www.caetec.de
> ------------------------------------------------------------
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux