Re: cephfs - blacklisted client coming back?

Frank Schilder <frans@xxxxxx> · Tue, 10 Nov 2020 10:09:20 +0000

Super, thanks! Yeah, I read that an unclean reconnect might lead to data loss and a proper mount/unmount is better. So far, any evicted client was rebooting, so the reconnect works fine for us with blacklisting. Good to know the alternative though.

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Dan van der Ster <dan@xxxxxxxxxxxxxx>
Sent: 10 November 2020 11:04:07
To: Frank Schilder
Cc: Andras Pataki; ceph-users
Subject: Re:  Re: cephfs - blacklisted client coming back?

On Tue, Nov 10, 2020 at 10:59 AM Frank Schilder <frans@xxxxxx> wrote:
>
> Hi Dan.
>
> > For our clusters we use the auto-reconnect settings
>
> Could you give me a hint what settings these are? Are they available in mimic?

Yes. On the mds you need:
    mds session blacklist on timeout = false
    mds session blacklist on evict = false

And on the fuse client you need:
   client reconnect stale = true

And kernels reconnect by default.

(There might be some consistency sacrificed by this config, but tbh we
never had an issue in a few years).

Cheers, Dan

>
> Thanks!
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Dan van der Ster <dan@xxxxxxxxxxxxxx>
> Sent: 10 November 2020 10:47:11
> To: Andras Pataki
> Cc: ceph-users
> Subject:  Re: cephfs - blacklisted client coming back?
>
> Hi Andras,
>
> I don't have much experience with blacklisting to know what is a safe default.
> For our clusters we use the auto-reconnect settings and never
> blacklist any clients.
>
> Cheers, Dan
>
> On Tue, Nov 10, 2020 at 2:10 AM Andras Pataki
> <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Hi Dan,
> >
> > That makes sense - the time between blacklist and magic comeback was
> > around 1 hour - thanks for the explanation.  Is this is a safe default?
> > At eviction, the MDS takes all caps from the client away, so if it comes
> > back in an hour, doesn't it then  write to files that it perhaps
> > shouldn't have access to?
> >
> > There is the other strange thing ceph-fuse was doing for an hour
> > (increased the objecter log level to 20).
> >
> > Here is the eviction:
> > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680 I was
> > blacklisted at osd epoch 1717894
> > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680
> > remove_session_caps still has dirty|flushing caps on
> > 0x100673a2613.head(faked_ino=0 ref=5 ll_ref=1
> > cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640
> > size=106/4194304 nlink=1 btime=0.000000 mtime=2020-11-09 14:34:28.139916
> > ctime=2020-11-09 14:34:28.139916 caps=- dirty_caps=Fw
> > objectset[0x100673a2613 ts 0/0 objects 1 dirty_or_tx 0]
> > parents=0x10067375a7c.head["pwaf-00680.ene"] 0x7fffd034b4d0)
> > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680
> > remove_session_caps still has dirty|flushing caps on
> > 0x100673a2614.head(faked_ino=0 ref=5 ll_ref=1
> > cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640
> > size=106/4194304 nlink=1 btime=0.000000 mtime=2020-11-09 14:34:28.145199
> > ctime=2020-11-09 14:34:28.145199 caps=- dirty_caps=Fw
> > objectset[0x100673a2614 ts 0/0 objects 1 dirty_or_tx 0]
> > parents=0x10067375a7c.head["pwaf-00685.ene"] 0x7fffd034bc20)
> > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680
> > remove_session_caps still has dirty|flushing caps on
> > 0x100673a2615.head(faked_ino=0 ref=5 ll_ref=1
> > cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640
> > size=106/4194304 nlink=1 btime=0.000000 mtime=2020-11-09 14:34:28.150306
> > ctime=2020-11-09 14:34:28.150306 caps=- dirty_caps=Fw
> > objectset[0x100673a2615 ts 0/0 objects 1 dirty_or_tx 0]
> > parents=0x10067375a7c.head["pwaf-00682.ene"] 0x7fffd034c1d0)
> > ... and a lot more of these ...
> >
> > then the following types of messages repeat:
> >
> > 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> > bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131065 dirty
> > firstbyte=32] waiters = {}
> > 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> > bh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131065 dirty
> > firstbyte=-42] waiters = {}
> > 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> > bh[ 0x7fff9b14d850 2380366~4 0x7fffac4d0460 (4) v 131065 dirty
> > firstbyte=32] waiters = {}
> > 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> > bh[ 0x7fff9bc966f0 2380370~8176 0x7fffac4d0460 (8176) v 131065 dirty
> > firstbyte=96] waiters = {}
> > ... about 200 or so of these ...
> >
> > followed by
> >
> > 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> > marking dirty again due to error bh[ 0x7fff60a0ae40 2026998~4
> > 0x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} r = -108
> > (108) Cannot send after transport endpoint shutdown
> > 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> > marking dirty again due to error bh[ 0x7ffe6c405f80 2051562~328804
> > 0x7fffac4d0460 (328804) v 131183 dirty firstbyte=-42] waiters = {} r =
> > -108 (108) Cannot send after transport endpoint shutdown
> > 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> > marking dirty again due to error bh[ 0x7fff9b14d850 2380366~4
> > 0x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} r = -108
> > (108) Cannot send after transport endpoint shutdown
> > 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> > marking dirty again due to error bh[ 0x7fff9bc966f0 2380370~8176
> > 0x7fffac4d0460 (8176) v 131183 dirty firstbyte=96] waiters = {} r = -108
> > (108) Cannot send after transport endpoint shutdown
> > ... about 200 or so of these ...
> >
> > then again:
> >
> > 2020-11-09 16:51:11.260 7fffdaffd700  7 objectcacher bh_write_scattered
> > bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131183 dirty
> > firstbyte=32] waiters = {}
> > 2020-11-09 16:51:11.260 7fffdaffd700  7 objectcacher bh_write_scattered
> > bh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131183 dirty
> > firstbyte=-42] waiters = {}
> > 2020-11-09 16:51:11.260 7fffdaffd700  7 objectcacher bh_write_scattered
> > bh[ 0x7fff9b14d850 2380366~4 0x7fffac4d0460 (4) v 131183 dirty
> > firstbyte=32] waiters = {}
> > 2020-11-09 16:51:11.260 7fffdaffd700  7 objectcacher bh_write_scattered
> > bh[ 0x7fff9bc966f0 2380370~8176 0x7fffac4d0460 (8176) v 131183 dirty
> > firstbyte=96] waiters = {}
> >
> > rejected again:
> >
> > 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit
> > marking dirty again due to error bh[ 0x7fff60a0ae40 2026998~4
> > 0x7fffac4d0460 (4) v 131349 dirty firstbyte=32] waiters = {} r = -108
> > (108) Cannot send after transport endpoint shutdown
> > 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit
> > marking dirty again due to error bh[ 0x7ffe6c405f80 2051562~328804
> > 0x7fffac4d0460 (328804) v 131349 dirty firstbyte=-42] waiters = {} r =
> > -108 (108) Cannot send after transport endpoint shutdown
> > 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit
> > marking dirty again due to error bh[ 0x7fff9b14d850 2380366~4
> > 0x7fffac4d0460 (4) v 131349 dirty firstbyte=32] waiters = {} r = -108
> > (108) Cannot send after transport endpoint shutdown
> > 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit
> > marking dirty again due to error bh[ 0x7fff9bc966f0 2380370~8176
> > 0x7fffac4d0460 (8176) v 131349 dirty firstbyte=96] waiters = {} r = -108
> > (108) Cannot send after transport endpoint shutdown
> >
> > repeating for the whole hour - using about 1Gbps bandwidth from just one
> > node.  Then the blacklist was lifted, and presumably the writes went
> > through:
> >
> > 2020-11-09 16:56:36.840 7fffdb7fe700 10 objectcacher bh_write_commit
> > clean bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 160426 clean
> > firstbyte=32] waiters = {}
> > 2020-11-09 16:56:36.840 7fffdb7fe700 10
> > objectcacher.object(100673a2670.00000008/head) try_merge_bh bh[
> > 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 160426 clean firstbyte=32]
> > waiters = {}
> > 2020-11-09 16:56:36.840 7fffdb7fe700 10
> > objectcacher.object(100673a2670.00000008/head) merge_left bh[
> > 0x7fffac4d05b0 0~2026998 0x7fffac4d0460 (2026998) v 6079 clean
> > firstbyte=-122] waiters = {} + bh[ 0x7fff60a0ae40 2026998~4
> > 0x7fffac4d0460 (4) v 160426 clean firstbyte=32] waiters = {}
> >
> > this looks like it is unsafe?
> >
> > Andras
> >
> >
> > On 11/9/20 7:21 PM, Dan van der Ster wrote:
> > > Hi Andras,
> > >
> > > The osd blocklist entries expire after 1hr by default:
> > >
> > >      Option("mon_osd_blacklist_default_expire", Option::TYPE_FLOAT,
> > > Option::LEVEL_ADVANCED)
> > >      .set_default(1_hr)
> > >      .add_service("mon")
> > >      .set_description("Duration in seconds that blacklist entries for clients "
> > >                       "remain in the OSD map"),
> > >
> > > (Check mon/OSDMonitor.cc for the implementation)
> > >
> > > Cheers, Dan
> > >
> > > On Mon, Nov 9, 2020 at 11:59 PM Andras Pataki
> > > <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > >> We had some network problems (high packet drops) to some cephfs client
> > >> nodes that run ceph-fuse (14.2.13) against a Nautilus cluster (on
> > >> version 14.2.8).  As a result a couple of clients got evicted (as one
> > >> would expect).  What was really odd is that the clients were trying to
> > >> flush data they had in cache and kept getting rejected by OSD's for
> > >> almost an hour, and then magically the data flush worked. When asked
> > >> afterwards, the client reported that it was no longer backlisted.  How
> > >> would that happen?  I certainly didn't run any commands to un-blacklist
> > >> a client and the docs say that otherwise the client will stay
> > >> blacklisted until the file system gets remounted.
> > >>
> > >> Here is the status of the client when it was backlisted:
> > >> [root@worker2033 ceph]# ceph daemon
> > >> /var/run/ceph/ceph-client.cephfs2.7698.93825141588944.asok status
> > >> {
> > >>       "metadata": {
> > >>           "ceph_sha1": "1778d63e55dbff6cedb071ab7d367f8f52a8699f",
> > >>           "ceph_version": "ceph version 14.2.13
> > >> (1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable)",
> > >>           "entity_id": "cephfs2",
> > >>           "hostname": "worker2033",
> > >>           "mount_point": "/mnt/ceph",
> > >>           "pid": "7698",
> > >>           "root": "/"
> > >>       },
> > >>       "dentry_count": 252,
> > >>       "dentry_pinned_count": 9,
> > >>       "id": 111995680,
> > >>       "inst": {
> > >>           "name": {
> > >>               "type": "client",
> > >>               "num": 111995680
> > >>           },
> > >>           "addr": {
> > >>               "type": "v1",
> > >>               "addr": "10.254.65.33:0",
> > >>               "nonce": 410851087
> > >>           }
> > >>       },
> > >>       "addr": {
> > >>           "type": "v1",
> > >>           "addr": "10.254.65.33:0",
> > >>           "nonce": 410851087
> > >>       },
> > >>       "inst_str": "client.111995680 10.254.65.33:0/410851087",
> > >>       "addr_str": "10.254.65.33:0/410851087",
> > >>       "inode_count": 251,
> > >>       "mds_epoch": 3376260,
> > >>       "osd_epoch": 1717896,
> > >>       "osd_epoch_barrier": 1717893,
> > >>       "blacklisted": true
> > >> }
> > >>
> > >> This corresponds to server side log messages:
> > >> 2020-11-09 15:56:31.578 7fffe59a4700  1 mds.0.3376160 Evicting (and
> > >> blacklisting) client session 111995680 (10.254.65.33:0/410851087)
> > >> 2020-11-09 15:56:31.578 7fffe59a4700  0 log_channel(cluster) log [INF] :
> > >> Evicting (and blacklisting) client session 111995680
> > >> (10.254.65.33:0/410851087)
> > >> 2020-11-09 15:56:31.706 7fffe59a4700  1 mds.0.3376160 Evicting (and
> > >> blacklisting) client session 111995680 (10.254.65.33:0/410851087)
> > >> 2020-11-09 15:56:31.706 7fffe59a4700  0 log_channel(cluster) log [INF] :
> > >> Evicting (and blacklisting) client session 111995680
> > >> (10.254.65.33:0/410851087)
> > >>
> > >> and them some time later (perhaps half an hour or so) I got this from
> > >> the client:
> > >>
> > >> [root@worker2033 ceph]# ceph daemon
> > >> /var/run/ceph/ceph-client.cephfs2.7698.93825141588944.asok status
> > >> {
> > >>       "metadata": {
> > >>           "ceph_sha1": "1778d63e55dbff6cedb071ab7d367f8f52a8699f",
> > >>           "ceph_version": "ceph version 14.2.13
> > >> (1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable)",
> > >>           "entity_id": "cephfs2",
> > >>           "hostname": "worker2033",
> > >>           "mount_point": "/mnt/ceph",
> > >>           "pid": "7698",
> > >>           "root": "/"
> > >>       },
> > >>       "dentry_count": 252,
> > >>       "dentry_pinned_count": 9,
> > >>       "id": 111995680,
> > >>       "inst": {
> > >>           "name": {
> > >>               "type": "client",
> > >>               "num": 111995680
> > >>           },
> > >>           "addr": {
> > >>               "type": "v1",
> > >>               "addr": "10.254.65.33:0",
> > >>               "nonce": 410851087
> > >>           }
> > >>       },
> > >>       "addr": {
> > >>           "type": "v1",
> > >>           "addr": "10.254.65.33:0",
> > >>           "nonce": 410851087
> > >>       },
> > >>       "inst_str": "client.111995680 10.254.65.33:0/410851087",
> > >>       "addr_str": "10.254.65.33:0/410851087",
> > >>       "inode_count": 251,
> > >>       "mds_epoch": 3376260,
> > >>       "osd_epoch": 1717897,
> > >>       "osd_epoch_barrier": 1717893,
> > >>       "blacklisted": false
> > >> }
> > >>
> > >> The cluster was otherwise healthy - nothing wrong with MDS's, or any
> > >> placement groups, etc.  I also don't see any further log messages
> > >> regarding eviction/backlisting in the MDS logs.  I didn't run any ceph
> > >> commands that would change the state of the cluster - I was just looking
> > >> around, increasing log levels.
> > >>
> > >> Any ideas how could that have happened?
> > >>
> > >> A separate problem (perhaps needs a ticket filed) that while the
> > >> ceph-fuse client was in a blacklisted state, it was retrying in an
> > >> infinite loop to flush data to the OSD's and got rejected every time.  I
> > >> have some logs for the details of this too.
> > >>
> > >> Andras
> > >> _______________________________________________
> > >> ceph-users mailing list -- ceph-users@xxxxxxx
> > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx