Super, thanks! Yeah, I read that an unclean reconnect might lead to data loss and a proper mount/unmount is better. So far, any evicted client was rebooting, so the reconnect works fine for us with blacklisting. Good to know the alternative though. Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Dan van der Ster <dan@xxxxxxxxxxxxxx> Sent: 10 November 2020 11:04:07 To: Frank Schilder Cc: Andras Pataki; ceph-users Subject: Re: Re: cephfs - blacklisted client coming back? On Tue, Nov 10, 2020 at 10:59 AM Frank Schilder <frans@xxxxxx> wrote: > > Hi Dan. > > > For our clusters we use the auto-reconnect settings > > Could you give me a hint what settings these are? Are they available in mimic? Yes. On the mds you need: mds session blacklist on timeout = false mds session blacklist on evict = false And on the fuse client you need: client reconnect stale = true And kernels reconnect by default. (There might be some consistency sacrificed by this config, but tbh we never had an issue in a few years). Cheers, Dan > > Thanks! > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Dan van der Ster <dan@xxxxxxxxxxxxxx> > Sent: 10 November 2020 10:47:11 > To: Andras Pataki > Cc: ceph-users > Subject: Re: cephfs - blacklisted client coming back? > > Hi Andras, > > I don't have much experience with blacklisting to know what is a safe default. > For our clusters we use the auto-reconnect settings and never > blacklist any clients. > > Cheers, Dan > > On Tue, Nov 10, 2020 at 2:10 AM Andras Pataki > <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote: > > > > Hi Dan, > > > > That makes sense - the time between blacklist and magic comeback was > > around 1 hour - thanks for the explanation. Is this is a safe default? > > At eviction, the MDS takes all caps from the client away, so if it comes > > back in an hour, doesn't it then write to files that it perhaps > > shouldn't have access to? > > > > There is the other strange thing ceph-fuse was doing for an hour > > (increased the objecter log level to 20). > > > > Here is the eviction: > > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680 I was > > blacklisted at osd epoch 1717894 > > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680 > > remove_session_caps still has dirty|flushing caps on > > 0x100673a2613.head(faked_ino=0 ref=5 ll_ref=1 > > cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640 > > size=106/4194304 nlink=1 btime=0.000000 mtime=2020-11-09 14:34:28.139916 > > ctime=2020-11-09 14:34:28.139916 caps=- dirty_caps=Fw > > objectset[0x100673a2613 ts 0/0 objects 1 dirty_or_tx 0] > > parents=0x10067375a7c.head["pwaf-00680.ene"] 0x7fffd034b4d0) > > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680 > > remove_session_caps still has dirty|flushing caps on > > 0x100673a2614.head(faked_ino=0 ref=5 ll_ref=1 > > cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640 > > size=106/4194304 nlink=1 btime=0.000000 mtime=2020-11-09 14:34:28.145199 > > ctime=2020-11-09 14:34:28.145199 caps=- dirty_caps=Fw > > objectset[0x100673a2614 ts 0/0 objects 1 dirty_or_tx 0] > > parents=0x10067375a7c.head["pwaf-00685.ene"] 0x7fffd034bc20) > > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680 > > remove_session_caps still has dirty|flushing caps on > > 0x100673a2615.head(faked_ino=0 ref=5 ll_ref=1 > > cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640 > > size=106/4194304 nlink=1 btime=0.000000 mtime=2020-11-09 14:34:28.150306 > > ctime=2020-11-09 14:34:28.150306 caps=- dirty_caps=Fw > > objectset[0x100673a2615 ts 0/0 objects 1 dirty_or_tx 0] > > parents=0x10067375a7c.head["pwaf-00682.ene"] 0x7fffd034c1d0) > > ... and a lot more of these ... > > > > then the following types of messages repeat: > > > > 2020-11-09 16:51:10.236 7fffdaffd700 7 objectcacher bh_write_scattered > > bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131065 dirty > > firstbyte=32] waiters = {} > > 2020-11-09 16:51:10.236 7fffdaffd700 7 objectcacher bh_write_scattered > > bh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131065 dirty > > firstbyte=-42] waiters = {} > > 2020-11-09 16:51:10.236 7fffdaffd700 7 objectcacher bh_write_scattered > > bh[ 0x7fff9b14d850 2380366~4 0x7fffac4d0460 (4) v 131065 dirty > > firstbyte=32] waiters = {} > > 2020-11-09 16:51:10.236 7fffdaffd700 7 objectcacher bh_write_scattered > > bh[ 0x7fff9bc966f0 2380370~8176 0x7fffac4d0460 (8176) v 131065 dirty > > firstbyte=96] waiters = {} > > ... about 200 or so of these ... > > > > followed by > > > > 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit > > marking dirty again due to error bh[ 0x7fff60a0ae40 2026998~4 > > 0x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} r = -108 > > (108) Cannot send after transport endpoint shutdown > > 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit > > marking dirty again due to error bh[ 0x7ffe6c405f80 2051562~328804 > > 0x7fffac4d0460 (328804) v 131183 dirty firstbyte=-42] waiters = {} r = > > -108 (108) Cannot send after transport endpoint shutdown > > 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit > > marking dirty again due to error bh[ 0x7fff9b14d850 2380366~4 > > 0x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} r = -108 > > (108) Cannot send after transport endpoint shutdown > > 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit > > marking dirty again due to error bh[ 0x7fff9bc966f0 2380370~8176 > > 0x7fffac4d0460 (8176) v 131183 dirty firstbyte=96] waiters = {} r = -108 > > (108) Cannot send after transport endpoint shutdown > > ... about 200 or so of these ... > > > > then again: > > > > 2020-11-09 16:51:11.260 7fffdaffd700 7 objectcacher bh_write_scattered > > bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131183 dirty > > firstbyte=32] waiters = {} > > 2020-11-09 16:51:11.260 7fffdaffd700 7 objectcacher bh_write_scattered > > bh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131183 dirty > > firstbyte=-42] waiters = {} > > 2020-11-09 16:51:11.260 7fffdaffd700 7 objectcacher bh_write_scattered > > bh[ 0x7fff9b14d850 2380366~4 0x7fffac4d0460 (4) v 131183 dirty > > firstbyte=32] waiters = {} > > 2020-11-09 16:51:11.260 7fffdaffd700 7 objectcacher bh_write_scattered > > bh[ 0x7fff9bc966f0 2380370~8176 0x7fffac4d0460 (8176) v 131183 dirty > > firstbyte=96] waiters = {} > > > > rejected again: > > > > 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit > > marking dirty again due to error bh[ 0x7fff60a0ae40 2026998~4 > > 0x7fffac4d0460 (4) v 131349 dirty firstbyte=32] waiters = {} r = -108 > > (108) Cannot send after transport endpoint shutdown > > 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit > > marking dirty again due to error bh[ 0x7ffe6c405f80 2051562~328804 > > 0x7fffac4d0460 (328804) v 131349 dirty firstbyte=-42] waiters = {} r = > > -108 (108) Cannot send after transport endpoint shutdown > > 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit > > marking dirty again due to error bh[ 0x7fff9b14d850 2380366~4 > > 0x7fffac4d0460 (4) v 131349 dirty firstbyte=32] waiters = {} r = -108 > > (108) Cannot send after transport endpoint shutdown > > 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit > > marking dirty again due to error bh[ 0x7fff9bc966f0 2380370~8176 > > 0x7fffac4d0460 (8176) v 131349 dirty firstbyte=96] waiters = {} r = -108 > > (108) Cannot send after transport endpoint shutdown > > > > repeating for the whole hour - using about 1Gbps bandwidth from just one > > node. Then the blacklist was lifted, and presumably the writes went > > through: > > > > 2020-11-09 16:56:36.840 7fffdb7fe700 10 objectcacher bh_write_commit > > clean bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 160426 clean > > firstbyte=32] waiters = {} > > 2020-11-09 16:56:36.840 7fffdb7fe700 10 > > objectcacher.object(100673a2670.00000008/head) try_merge_bh bh[ > > 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 160426 clean firstbyte=32] > > waiters = {} > > 2020-11-09 16:56:36.840 7fffdb7fe700 10 > > objectcacher.object(100673a2670.00000008/head) merge_left bh[ > > 0x7fffac4d05b0 0~2026998 0x7fffac4d0460 (2026998) v 6079 clean > > firstbyte=-122] waiters = {} + bh[ 0x7fff60a0ae40 2026998~4 > > 0x7fffac4d0460 (4) v 160426 clean firstbyte=32] waiters = {} > > > > this looks like it is unsafe? > > > > Andras > > > > > > On 11/9/20 7:21 PM, Dan van der Ster wrote: > > > Hi Andras, > > > > > > The osd blocklist entries expire after 1hr by default: > > > > > > Option("mon_osd_blacklist_default_expire", Option::TYPE_FLOAT, > > > Option::LEVEL_ADVANCED) > > > .set_default(1_hr) > > > .add_service("mon") > > > .set_description("Duration in seconds that blacklist entries for clients " > > > "remain in the OSD map"), > > > > > > (Check mon/OSDMonitor.cc for the implementation) > > > > > > Cheers, Dan > > > > > > On Mon, Nov 9, 2020 at 11:59 PM Andras Pataki > > > <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote: > > >> We had some network problems (high packet drops) to some cephfs client > > >> nodes that run ceph-fuse (14.2.13) against a Nautilus cluster (on > > >> version 14.2.8). As a result a couple of clients got evicted (as one > > >> would expect). What was really odd is that the clients were trying to > > >> flush data they had in cache and kept getting rejected by OSD's for > > >> almost an hour, and then magically the data flush worked. When asked > > >> afterwards, the client reported that it was no longer backlisted. How > > >> would that happen? I certainly didn't run any commands to un-blacklist > > >> a client and the docs say that otherwise the client will stay > > >> blacklisted until the file system gets remounted. > > >> > > >> Here is the status of the client when it was backlisted: > > >> [root@worker2033 ceph]# ceph daemon > > >> /var/run/ceph/ceph-client.cephfs2.7698.93825141588944.asok status > > >> { > > >> "metadata": { > > >> "ceph_sha1": "1778d63e55dbff6cedb071ab7d367f8f52a8699f", > > >> "ceph_version": "ceph version 14.2.13 > > >> (1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable)", > > >> "entity_id": "cephfs2", > > >> "hostname": "worker2033", > > >> "mount_point": "/mnt/ceph", > > >> "pid": "7698", > > >> "root": "/" > > >> }, > > >> "dentry_count": 252, > > >> "dentry_pinned_count": 9, > > >> "id": 111995680, > > >> "inst": { > > >> "name": { > > >> "type": "client", > > >> "num": 111995680 > > >> }, > > >> "addr": { > > >> "type": "v1", > > >> "addr": "10.254.65.33:0", > > >> "nonce": 410851087 > > >> } > > >> }, > > >> "addr": { > > >> "type": "v1", > > >> "addr": "10.254.65.33:0", > > >> "nonce": 410851087 > > >> }, > > >> "inst_str": "client.111995680 10.254.65.33:0/410851087", > > >> "addr_str": "10.254.65.33:0/410851087", > > >> "inode_count": 251, > > >> "mds_epoch": 3376260, > > >> "osd_epoch": 1717896, > > >> "osd_epoch_barrier": 1717893, > > >> "blacklisted": true > > >> } > > >> > > >> This corresponds to server side log messages: > > >> 2020-11-09 15:56:31.578 7fffe59a4700 1 mds.0.3376160 Evicting (and > > >> blacklisting) client session 111995680 (10.254.65.33:0/410851087) > > >> 2020-11-09 15:56:31.578 7fffe59a4700 0 log_channel(cluster) log [INF] : > > >> Evicting (and blacklisting) client session 111995680 > > >> (10.254.65.33:0/410851087) > > >> 2020-11-09 15:56:31.706 7fffe59a4700 1 mds.0.3376160 Evicting (and > > >> blacklisting) client session 111995680 (10.254.65.33:0/410851087) > > >> 2020-11-09 15:56:31.706 7fffe59a4700 0 log_channel(cluster) log [INF] : > > >> Evicting (and blacklisting) client session 111995680 > > >> (10.254.65.33:0/410851087) > > >> > > >> and them some time later (perhaps half an hour or so) I got this from > > >> the client: > > >> > > >> [root@worker2033 ceph]# ceph daemon > > >> /var/run/ceph/ceph-client.cephfs2.7698.93825141588944.asok status > > >> { > > >> "metadata": { > > >> "ceph_sha1": "1778d63e55dbff6cedb071ab7d367f8f52a8699f", > > >> "ceph_version": "ceph version 14.2.13 > > >> (1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable)", > > >> "entity_id": "cephfs2", > > >> "hostname": "worker2033", > > >> "mount_point": "/mnt/ceph", > > >> "pid": "7698", > > >> "root": "/" > > >> }, > > >> "dentry_count": 252, > > >> "dentry_pinned_count": 9, > > >> "id": 111995680, > > >> "inst": { > > >> "name": { > > >> "type": "client", > > >> "num": 111995680 > > >> }, > > >> "addr": { > > >> "type": "v1", > > >> "addr": "10.254.65.33:0", > > >> "nonce": 410851087 > > >> } > > >> }, > > >> "addr": { > > >> "type": "v1", > > >> "addr": "10.254.65.33:0", > > >> "nonce": 410851087 > > >> }, > > >> "inst_str": "client.111995680 10.254.65.33:0/410851087", > > >> "addr_str": "10.254.65.33:0/410851087", > > >> "inode_count": 251, > > >> "mds_epoch": 3376260, > > >> "osd_epoch": 1717897, > > >> "osd_epoch_barrier": 1717893, > > >> "blacklisted": false > > >> } > > >> > > >> The cluster was otherwise healthy - nothing wrong with MDS's, or any > > >> placement groups, etc. I also don't see any further log messages > > >> regarding eviction/backlisting in the MDS logs. I didn't run any ceph > > >> commands that would change the state of the cluster - I was just looking > > >> around, increasing log levels. > > >> > > >> Any ideas how could that have happened? > > >> > > >> A separate problem (perhaps needs a ticket filed) that while the > > >> ceph-fuse client was in a blacklisted state, it was retrying in an > > >> infinite loop to flush data to the OSD's and got rejected every time. I > > >> have some logs for the details of this too. > > >> > > >> Andras > > >> _______________________________________________ > > >> ceph-users mailing list -- ceph-users@xxxxxxx > > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx