Hi Andras, I don't have much experience with blacklisting to know what is a safe default. For our clusters we use the auto-reconnect settings and never blacklist any clients. Cheers, Dan On Tue, Nov 10, 2020 at 2:10 AM Andras Pataki <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote: > > Hi Dan, > > That makes sense - the time between blacklist and magic comeback was > around 1 hour - thanks for the explanation. Is this is a safe default? > At eviction, the MDS takes all caps from the client away, so if it comes > back in an hour, doesn't it then write to files that it perhaps > shouldn't have access to? > > There is the other strange thing ceph-fuse was doing for an hour > (increased the objecter log level to 20). > > Here is the eviction: > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680 I was > blacklisted at osd epoch 1717894 > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680 > remove_session_caps still has dirty|flushing caps on > 0x100673a2613.head(faked_ino=0 ref=5 ll_ref=1 > cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640 > size=106/4194304 nlink=1 btime=0.000000 mtime=2020-11-09 14:34:28.139916 > ctime=2020-11-09 14:34:28.139916 caps=- dirty_caps=Fw > objectset[0x100673a2613 ts 0/0 objects 1 dirty_or_tx 0] > parents=0x10067375a7c.head["pwaf-00680.ene"] 0x7fffd034b4d0) > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680 > remove_session_caps still has dirty|flushing caps on > 0x100673a2614.head(faked_ino=0 ref=5 ll_ref=1 > cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640 > size=106/4194304 nlink=1 btime=0.000000 mtime=2020-11-09 14:34:28.145199 > ctime=2020-11-09 14:34:28.145199 caps=- dirty_caps=Fw > objectset[0x100673a2614 ts 0/0 objects 1 dirty_or_tx 0] > parents=0x10067375a7c.head["pwaf-00685.ene"] 0x7fffd034bc20) > 2020-11-09 15:56:32.762 7fffda7fc700 -1 client.111995680 > remove_session_caps still has dirty|flushing caps on > 0x100673a2615.head(faked_ino=0 ref=5 ll_ref=1 > cap_refs={4=0,1024=0,4096=0,8192=0} open={3=1} mode=100640 > size=106/4194304 nlink=1 btime=0.000000 mtime=2020-11-09 14:34:28.150306 > ctime=2020-11-09 14:34:28.150306 caps=- dirty_caps=Fw > objectset[0x100673a2615 ts 0/0 objects 1 dirty_or_tx 0] > parents=0x10067375a7c.head["pwaf-00682.ene"] 0x7fffd034c1d0) > ... and a lot more of these ... > > then the following types of messages repeat: > > 2020-11-09 16:51:10.236 7fffdaffd700 7 objectcacher bh_write_scattered > bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131065 dirty > firstbyte=32] waiters = {} > 2020-11-09 16:51:10.236 7fffdaffd700 7 objectcacher bh_write_scattered > bh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131065 dirty > firstbyte=-42] waiters = {} > 2020-11-09 16:51:10.236 7fffdaffd700 7 objectcacher bh_write_scattered > bh[ 0x7fff9b14d850 2380366~4 0x7fffac4d0460 (4) v 131065 dirty > firstbyte=32] waiters = {} > 2020-11-09 16:51:10.236 7fffdaffd700 7 objectcacher bh_write_scattered > bh[ 0x7fff9bc966f0 2380370~8176 0x7fffac4d0460 (8176) v 131065 dirty > firstbyte=96] waiters = {} > ... about 200 or so of these ... > > followed by > > 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit > marking dirty again due to error bh[ 0x7fff60a0ae40 2026998~4 > 0x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} r = -108 > (108) Cannot send after transport endpoint shutdown > 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit > marking dirty again due to error bh[ 0x7ffe6c405f80 2051562~328804 > 0x7fffac4d0460 (328804) v 131183 dirty firstbyte=-42] waiters = {} r = > -108 (108) Cannot send after transport endpoint shutdown > 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit > marking dirty again due to error bh[ 0x7fff9b14d850 2380366~4 > 0x7fffac4d0460 (4) v 131183 dirty firstbyte=32] waiters = {} r = -108 > (108) Cannot send after transport endpoint shutdown > 2020-11-09 16:51:10.896 7fffdb7fe700 10 objectcacher bh_write_commit > marking dirty again due to error bh[ 0x7fff9bc966f0 2380370~8176 > 0x7fffac4d0460 (8176) v 131183 dirty firstbyte=96] waiters = {} r = -108 > (108) Cannot send after transport endpoint shutdown > ... about 200 or so of these ... > > then again: > > 2020-11-09 16:51:11.260 7fffdaffd700 7 objectcacher bh_write_scattered > bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 131183 dirty > firstbyte=32] waiters = {} > 2020-11-09 16:51:11.260 7fffdaffd700 7 objectcacher bh_write_scattered > bh[ 0x7ffe6c405f80 2051562~328804 0x7fffac4d0460 (328804) v 131183 dirty > firstbyte=-42] waiters = {} > 2020-11-09 16:51:11.260 7fffdaffd700 7 objectcacher bh_write_scattered > bh[ 0x7fff9b14d850 2380366~4 0x7fffac4d0460 (4) v 131183 dirty > firstbyte=32] waiters = {} > 2020-11-09 16:51:11.260 7fffdaffd700 7 objectcacher bh_write_scattered > bh[ 0x7fff9bc966f0 2380370~8176 0x7fffac4d0460 (8176) v 131183 dirty > firstbyte=96] waiters = {} > > rejected again: > > 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit > marking dirty again due to error bh[ 0x7fff60a0ae40 2026998~4 > 0x7fffac4d0460 (4) v 131349 dirty firstbyte=32] waiters = {} r = -108 > (108) Cannot send after transport endpoint shutdown > 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit > marking dirty again due to error bh[ 0x7ffe6c405f80 2051562~328804 > 0x7fffac4d0460 (328804) v 131349 dirty firstbyte=-42] waiters = {} r = > -108 (108) Cannot send after transport endpoint shutdown > 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit > marking dirty again due to error bh[ 0x7fff9b14d850 2380366~4 > 0x7fffac4d0460 (4) v 131349 dirty firstbyte=32] waiters = {} r = -108 > (108) Cannot send after transport endpoint shutdown > 2020-11-09 16:51:11.772 7fffdb7fe700 10 objectcacher bh_write_commit > marking dirty again due to error bh[ 0x7fff9bc966f0 2380370~8176 > 0x7fffac4d0460 (8176) v 131349 dirty firstbyte=96] waiters = {} r = -108 > (108) Cannot send after transport endpoint shutdown > > repeating for the whole hour - using about 1Gbps bandwidth from just one > node. Then the blacklist was lifted, and presumably the writes went > through: > > 2020-11-09 16:56:36.840 7fffdb7fe700 10 objectcacher bh_write_commit > clean bh[ 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 160426 clean > firstbyte=32] waiters = {} > 2020-11-09 16:56:36.840 7fffdb7fe700 10 > objectcacher.object(100673a2670.00000008/head) try_merge_bh bh[ > 0x7fff60a0ae40 2026998~4 0x7fffac4d0460 (4) v 160426 clean firstbyte=32] > waiters = {} > 2020-11-09 16:56:36.840 7fffdb7fe700 10 > objectcacher.object(100673a2670.00000008/head) merge_left bh[ > 0x7fffac4d05b0 0~2026998 0x7fffac4d0460 (2026998) v 6079 clean > firstbyte=-122] waiters = {} + bh[ 0x7fff60a0ae40 2026998~4 > 0x7fffac4d0460 (4) v 160426 clean firstbyte=32] waiters = {} > > this looks like it is unsafe? > > Andras > > > On 11/9/20 7:21 PM, Dan van der Ster wrote: > > Hi Andras, > > > > The osd blocklist entries expire after 1hr by default: > > > > Option("mon_osd_blacklist_default_expire", Option::TYPE_FLOAT, > > Option::LEVEL_ADVANCED) > > .set_default(1_hr) > > .add_service("mon") > > .set_description("Duration in seconds that blacklist entries for clients " > > "remain in the OSD map"), > > > > (Check mon/OSDMonitor.cc for the implementation) > > > > Cheers, Dan > > > > On Mon, Nov 9, 2020 at 11:59 PM Andras Pataki > > <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote: > >> We had some network problems (high packet drops) to some cephfs client > >> nodes that run ceph-fuse (14.2.13) against a Nautilus cluster (on > >> version 14.2.8). As a result a couple of clients got evicted (as one > >> would expect). What was really odd is that the clients were trying to > >> flush data they had in cache and kept getting rejected by OSD's for > >> almost an hour, and then magically the data flush worked. When asked > >> afterwards, the client reported that it was no longer backlisted. How > >> would that happen? I certainly didn't run any commands to un-blacklist > >> a client and the docs say that otherwise the client will stay > >> blacklisted until the file system gets remounted. > >> > >> Here is the status of the client when it was backlisted: > >> [root@worker2033 ceph]# ceph daemon > >> /var/run/ceph/ceph-client.cephfs2.7698.93825141588944.asok status > >> { > >> "metadata": { > >> "ceph_sha1": "1778d63e55dbff6cedb071ab7d367f8f52a8699f", > >> "ceph_version": "ceph version 14.2.13 > >> (1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable)", > >> "entity_id": "cephfs2", > >> "hostname": "worker2033", > >> "mount_point": "/mnt/ceph", > >> "pid": "7698", > >> "root": "/" > >> }, > >> "dentry_count": 252, > >> "dentry_pinned_count": 9, > >> "id": 111995680, > >> "inst": { > >> "name": { > >> "type": "client", > >> "num": 111995680 > >> }, > >> "addr": { > >> "type": "v1", > >> "addr": "10.254.65.33:0", > >> "nonce": 410851087 > >> } > >> }, > >> "addr": { > >> "type": "v1", > >> "addr": "10.254.65.33:0", > >> "nonce": 410851087 > >> }, > >> "inst_str": "client.111995680 10.254.65.33:0/410851087", > >> "addr_str": "10.254.65.33:0/410851087", > >> "inode_count": 251, > >> "mds_epoch": 3376260, > >> "osd_epoch": 1717896, > >> "osd_epoch_barrier": 1717893, > >> "blacklisted": true > >> } > >> > >> This corresponds to server side log messages: > >> 2020-11-09 15:56:31.578 7fffe59a4700 1 mds.0.3376160 Evicting (and > >> blacklisting) client session 111995680 (10.254.65.33:0/410851087) > >> 2020-11-09 15:56:31.578 7fffe59a4700 0 log_channel(cluster) log [INF] : > >> Evicting (and blacklisting) client session 111995680 > >> (10.254.65.33:0/410851087) > >> 2020-11-09 15:56:31.706 7fffe59a4700 1 mds.0.3376160 Evicting (and > >> blacklisting) client session 111995680 (10.254.65.33:0/410851087) > >> 2020-11-09 15:56:31.706 7fffe59a4700 0 log_channel(cluster) log [INF] : > >> Evicting (and blacklisting) client session 111995680 > >> (10.254.65.33:0/410851087) > >> > >> and them some time later (perhaps half an hour or so) I got this from > >> the client: > >> > >> [root@worker2033 ceph]# ceph daemon > >> /var/run/ceph/ceph-client.cephfs2.7698.93825141588944.asok status > >> { > >> "metadata": { > >> "ceph_sha1": "1778d63e55dbff6cedb071ab7d367f8f52a8699f", > >> "ceph_version": "ceph version 14.2.13 > >> (1778d63e55dbff6cedb071ab7d367f8f52a8699f) nautilus (stable)", > >> "entity_id": "cephfs2", > >> "hostname": "worker2033", > >> "mount_point": "/mnt/ceph", > >> "pid": "7698", > >> "root": "/" > >> }, > >> "dentry_count": 252, > >> "dentry_pinned_count": 9, > >> "id": 111995680, > >> "inst": { > >> "name": { > >> "type": "client", > >> "num": 111995680 > >> }, > >> "addr": { > >> "type": "v1", > >> "addr": "10.254.65.33:0", > >> "nonce": 410851087 > >> } > >> }, > >> "addr": { > >> "type": "v1", > >> "addr": "10.254.65.33:0", > >> "nonce": 410851087 > >> }, > >> "inst_str": "client.111995680 10.254.65.33:0/410851087", > >> "addr_str": "10.254.65.33:0/410851087", > >> "inode_count": 251, > >> "mds_epoch": 3376260, > >> "osd_epoch": 1717897, > >> "osd_epoch_barrier": 1717893, > >> "blacklisted": false > >> } > >> > >> The cluster was otherwise healthy - nothing wrong with MDS's, or any > >> placement groups, etc. I also don't see any further log messages > >> regarding eviction/backlisting in the MDS logs. I didn't run any ceph > >> commands that would change the state of the cluster - I was just looking > >> around, increasing log levels. > >> > >> Any ideas how could that have happened? > >> > >> A separate problem (perhaps needs a ticket filed) that while the > >> ceph-fuse client was in a blacklisted state, it was retrying in an > >> infinite loop to flush data to the OSD's and got rejected every time. I > >> have some logs for the details of this too. > >> > >> Andras > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx