Re: cephfs - blacklisted client coming back?

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Tue, 10 Nov 2020 10:47:11 +0100

Hi Andras,

I don't have much experience with blacklisting to know what is a safe default.
For our clusters we use the auto-reconnect settings and never
blacklist any clients.

Cheers, Dan

On Tue, Nov 10, 2020 at 2:10 AM Andras Pataki
<apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi Dan,
>
> That makes sense - the time between blacklist and magic comeback was
> around 1 hour - thanks for the explanation.  Is this is a safe default?
> At eviction, the MDS takes all caps from the client away, so if it comes
> back in an hour, doesn't it then  write to files that it perhaps
> shouldn't have access to?
>
> There is the other strange thing ceph-fuse was doing for an hour
> (increased the objecter log level to 20).
>
> Here is the eviction:
> 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680 I was
> blacklisted at osd epoch 1717894
> 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680
> remove_session_caps still has dirty|flushing caps on
> 0x100673a2613.head(faked_ino=0 ref=5 ll_ref=1
> cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640
> size=106/4194304 nlink=1 btime=0.000000 mtime=2020-11-09 14:34:28.139916
> ctime=2020-11-09 14:34:28.139916 caps=- dirty_caps=Fw
> objectset[0x100673a2613 ts 0/0 objects 1 dirty_or_tx 0]
> parents=0x10067375a7c.head["pwaf-00680.ene"] 0x7fffd034b4d0)
> 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680
> remove_session_caps still has dirty|flushing caps on
> 0x100673a2614.head(faked_ino=0 ref=5 ll_ref=1
> cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640
> size=106/4194304 nlink=1 btime=0.000000 mtime=2020-11-09 14:34:28.145199
> ctime=2020-11-09 14:34:28.145199 caps=- dirty_caps=Fw
> objectset[0x100673a2614 ts 0/0 objects 1 dirty_or_tx 0]
> parents=0x10067375a7c.head["pwaf-00685.ene"] 0x7fffd034bc20)
> 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680
> remove_session_caps still has dirty|flushing caps on
> 0x100673a2615.head(faked_ino=0 ref=5 ll_ref=1
> cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640
> size=106/4194304 nlink=1 btime=0.000000 mtime=2020-11-09 14:34:28.150306
> ctime=2020-11-09 14:34:28.150306 caps=- dirty_caps=Fw
> objectset[0x100673a2615 ts 0/0 objects 1 dirty_or_tx 0]
> parents=0x10067375a7c.head["pwaf-00682.ene"] 0x7fffd034c1d0)
> ... and a lot more of these ...
>
> then the following types of messages repeat:
>
> 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131065 dirty
> firstbyte=32] waiters = {}
> 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131065 dirty
> firstbyte=-42] waiters = {}
> 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7fff9b14d850 2380366~4 0x7fffac4d0460 (4) v 131065 dirty
> firstbyte=32] waiters = {}
> 2020-11-09 16:51:10.236 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7fff9bc966f0 2380370~8176 0x7fffac4d0460 (8176) v 131065 dirty
> firstbyte=96] waiters = {}
> ... about 200 or so of these ...
>
> followed by
>
> 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> marking dirty again due to error bh[ 0x7fff60a0ae40 2026998~4
> 0x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} r = -108
> (108) Cannot send after transport endpoint shutdown
> 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> marking dirty again due to error bh[ 0x7ffe6c405f80 2051562~328804
> 0x7fffac4d0460 (328804) v 131183 dirty firstbyte=-42] waiters = {} r =
> -108 (108) Cannot send after transport endpoint shutdown
> 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> marking dirty again due to error bh[ 0x7fff9b14d850 2380366~4
> 0x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} r = -108
> (108) Cannot send after transport endpoint shutdown
> 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit
> marking dirty again due to error bh[ 0x7fff9bc966f0 2380370~8176
> 0x7fffac4d0460 (8176) v 131183 dirty firstbyte=96] waiters = {} r = -108
> (108) Cannot send after transport endpoint shutdown
> ... about 200 or so of these ...
>
> then again:
>
> 2020-11-09 16:51:11.260 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131183 dirty
> firstbyte=32] waiters = {}
> 2020-11-09 16:51:11.260 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131183 dirty
> firstbyte=-42] waiters = {}
> 2020-11-09 16:51:11.260 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7fff9b14d850 2380366~4 0x7fffac4d0460 (4) v 131183 dirty
> firstbyte=32] waiters = {}
> 2020-11-09 16:51:11.260 7fffdaffd700  7 objectcacher bh_write_scattered
> bh[ 0x7fff9bc966f0 2380370~8176 0x7fffac4d0460 (8176) v 131183 dirty
> firstbyte=96] waiters = {}
>
> rejected again:
>
> 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit
> marking dirty again due to error bh[ 0x7fff60a0ae40 2026998~4
> 0x7fffac4d0460 (4) v 131349 dirty firstbyte=32] waiters = {} r = -108
> (108) Cannot send after transport endpoint shutdown
> 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit
> marking dirty again due to error bh[ 0x7ffe6c405f80 2051562~328804
> 0x7fffac4d0460 (328804) v 131349 dirty firstbyte=-42] waiters = {} r =
> -108 (108) Cannot send after transport endpoint shutdown
> 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit
> marking dirty again due to error bh[ 0x7fff9b14d850 2380366~4
> 0x7fffac4d0460 (4) v 131349 dirty firstbyte=32] waiters = {} r = -108
> (108) Cannot send after transport endpoint shutdown
> 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit
> marking dirty again due to error bh[ 0x7fff9bc966f0 2380370~8176
> 0x7fffac4d0460 (8176) v 131349 dirty firstbyte=96] waiters = {} r = -108
> (108) Cannot send after transport endpoint shutdown
>
> repeating for the whole hour - using about 1Gbps bandwidth from just one
> node.  Then the blacklist was lifted, and presumably the writes went
> through:
>
> 2020-11-09 16:56:36.840 7fffdb7fe700 10 objectcacher bh_write_commit
> clean bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 160426 clean
> firstbyte=32] waiters = {}
> 2020-11-09 16:56:36.840 7fffdb7fe700 10
> objectcacher.object(100673a2670.00000008/head) try_merge_bh bh[
> 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 160426 clean firstbyte=32]
> waiters = {}
> 2020-11-09 16:56:36.840 7fffdb7fe700 10
> objectcacher.object(100673a2670.00000008/head) merge_left bh[
> 0x7fffac4d05b0 0~2026998 0x7fffac4d0460 (2026998) v 6079 clean
> firstbyte=-122] waiters = {} + bh[ 0x7fff60a0ae40 2026998~4
> 0x7fffac4d0460 (4) v 160426 clean firstbyte=32] waiters = {}
>
> this looks like it is unsafe?
>
> Andras
>
>
> On 11/9/20 7:21 PM, Dan van der Ster wrote:
> > Hi Andras,
> >
> > The osd blocklist entries expire after 1hr by default:
> >
> >      Option("mon_osd_blacklist_default_expire", Option::TYPE_FLOAT,
> > Option::LEVEL_ADVANCED)
> >      .set_default(1_hr)
> >      .add_service("mon")
> >      .set_description("Duration in seconds that blacklist entries for clients "
> >                       "remain in the OSD map"),
> >
> > (Check mon/OSDMonitor.cc for the implementation)
> >
> > Cheers, Dan
> >
> > On Mon, Nov 9, 2020 at 11:59 PM Andras Pataki
> > <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:
> >> We had some network problems (high packet drops) to some cephfs client
> >> nodes that run ceph-fuse (14.2.13) against a Nautilus cluster (on
> >> version 14.2.8).  As a result a couple of clients got evicted (as one
> >> would expect).  What was really odd is that the clients were trying to
> >> flush data they had in cache and kept getting rejected by OSD's for
> >> almost an hour, and then magically the data flush worked. When asked
> >> afterwards, the client reported that it was no longer backlisted.  How
> >> would that happen?  I certainly didn't run any commands to un-blacklist
> >> a client and the docs say that otherwise the client will stay
> >> blacklisted until the file system gets remounted.
> >>
> >> Here is the status of the client when it was backlisted:
> >> [root@worker2033 ceph]# ceph daemon
> >> /var/run/ceph/ceph-client.cephfs2.7698.93825141588944.asok status
> >> {
> >>       "metadata": {
> >>           "ceph_sha1": "1778d63e55dbff6cedb071ab7d367f8f52a8699f",
> >>           "ceph_version": "ceph version 14.2.13
> >> (1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable)",
> >>           "entity_id": "cephfs2",
> >>           "hostname": "worker2033",
> >>           "mount_point": "/mnt/ceph",
> >>           "pid": "7698",
> >>           "root": "/"
> >>       },
> >>       "dentry_count": 252,
> >>       "dentry_pinned_count": 9,
> >>       "id": 111995680,
> >>       "inst": {
> >>           "name": {
> >>               "type": "client",
> >>               "num": 111995680
> >>           },
> >>           "addr": {
> >>               "type": "v1",
> >>               "addr": "10.254.65.33:0",
> >>               "nonce": 410851087
> >>           }
> >>       },
> >>       "addr": {
> >>           "type": "v1",
> >>           "addr": "10.254.65.33:0",
> >>           "nonce": 410851087
> >>       },
> >>       "inst_str": "client.111995680 10.254.65.33:0/410851087",
> >>       "addr_str": "10.254.65.33:0/410851087",
> >>       "inode_count": 251,
> >>       "mds_epoch": 3376260,
> >>       "osd_epoch": 1717896,
> >>       "osd_epoch_barrier": 1717893,
> >>       "blacklisted": true
> >> }
> >>
> >> This corresponds to server side log messages:
> >> 2020-11-09 15:56:31.578 7fffe59a4700  1 mds.0.3376160 Evicting (and
> >> blacklisting) client session 111995680 (10.254.65.33:0/410851087)
> >> 2020-11-09 15:56:31.578 7fffe59a4700  0 log_channel(cluster) log [INF] :
> >> Evicting (and blacklisting) client session 111995680
> >> (10.254.65.33:0/410851087)
> >> 2020-11-09 15:56:31.706 7fffe59a4700  1 mds.0.3376160 Evicting (and
> >> blacklisting) client session 111995680 (10.254.65.33:0/410851087)
> >> 2020-11-09 15:56:31.706 7fffe59a4700  0 log_channel(cluster) log [INF] :
> >> Evicting (and blacklisting) client session 111995680
> >> (10.254.65.33:0/410851087)
> >>
> >> and them some time later (perhaps half an hour or so) I got this from
> >> the client:
> >>
> >> [root@worker2033 ceph]# ceph daemon
> >> /var/run/ceph/ceph-client.cephfs2.7698.93825141588944.asok status
> >> {
> >>       "metadata": {
> >>           "ceph_sha1": "1778d63e55dbff6cedb071ab7d367f8f52a8699f",
> >>           "ceph_version": "ceph version 14.2.13
> >> (1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable)",
> >>           "entity_id": "cephfs2",
> >>           "hostname": "worker2033",
> >>           "mount_point": "/mnt/ceph",
> >>           "pid": "7698",
> >>           "root": "/"
> >>       },
> >>       "dentry_count": 252,
> >>       "dentry_pinned_count": 9,
> >>       "id": 111995680,
> >>       "inst": {
> >>           "name": {
> >>               "type": "client",
> >>               "num": 111995680
> >>           },
> >>           "addr": {
> >>               "type": "v1",
> >>               "addr": "10.254.65.33:0",
> >>               "nonce": 410851087
> >>           }
> >>       },
> >>       "addr": {
> >>           "type": "v1",
> >>           "addr": "10.254.65.33:0",
> >>           "nonce": 410851087
> >>       },
> >>       "inst_str": "client.111995680 10.254.65.33:0/410851087",
> >>       "addr_str": "10.254.65.33:0/410851087",
> >>       "inode_count": 251,
> >>       "mds_epoch": 3376260,
> >>       "osd_epoch": 1717897,
> >>       "osd_epoch_barrier": 1717893,
> >>       "blacklisted": false
> >> }
> >>
> >> The cluster was otherwise healthy - nothing wrong with MDS's, or any
> >> placement groups, etc.  I also don't see any further log messages
> >> regarding eviction/backlisting in the MDS logs.  I didn't run any ceph
> >> commands that would change the state of the cluster - I was just looking
> >> around, increasing log levels.
> >>
> >> Any ideas how could that have happened?
> >>
> >> A separate problem (perhaps needs a ticket filed) that while the
> >> ceph-fuse client was in a blacklisted state, it was retrying in an
> >> infinite loop to flush data to the OSD's and got rejected every time.  I
> >> have some logs for the details of this too.
> >>
> >> Andras
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users@xxxxxxx
> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx