The current setup is only for testing functionality with ceph. My idea was to install a production setup with suitable hardware if all goes fine ... MDS is running on a node with 4 GB RAM, 1Gb/E and 4 core processor, The metadata pool is on 3 OSDs servers with 2 OSD per node: 24 GB RAM, 12 core, 1Gb/E Monitor is running on the same node that the OSD server. I know that is a poor hardware but I only have one client writing and one client listing. [cephuser@storage1demo ~]$ ceph health detail HEALTH_WARN 1 MDSs report slow metadata IOs; 1/230761 objects misplaced (0.000%) MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs mdsstor1demo(mds.0): 100+ slow metadata IOs are blocked > 30 secs, oldest blocked for 895 secs OBJECT_MISPLACED 1/230761 objects misplaced (0.000%) [cephuser@storage1demo ~]$ ceph osd df ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS 0 hdd 1.81940 0.90002 1.8 TiB 183 GiB 1.6 TiB 9.83 1.07 147 1 hdd 0.45479 1.00000 466 GiB 43 GiB 423 GiB 9.16 1.00 45 2 hdd 1.81940 1.00000 1.8 TiB 182 GiB 1.6 TiB 9.80 1.07 109 3 hdd 1.81940 1.00000 1.8 TiB 143 GiB 1.7 TiB 7.67 0.83 112 4 hdd 1.81940 1.00000 1.8 TiB 182 GiB 1.6 TiB 9.78 1.06 118 5 hdd 1.81940 1.00000 1.8 TiB 165 GiB 1.7 TiB 8.87 0.96 109 TOTAL 9.6 TiB 899 GiB 8.7 TiB 9.19 MIN/MAX VAR: 0.83/1.07 STDDEV: 0.77 [cephuser@storage1demo ~]$ ceph osd status +----+--------------+-------+-------+--------+---------+--------+---------+-----------+ | id | host | used | avail | wr ops | wr data | rd ops | rd data | state | +----+--------------+-------+-------+--------+---------+--------+---------+-----------+ | 0 | storage1demo | 183G | 1679G | 8 | 20.0M | 0 | 0 | exists,up | | 1 | storage1demo | 42.6G | 423G | 4 | 9830k | 0 | 0 | exists,up | | 2 | storage2demo | 182G | 1680G | 15 | 24.8M | 0 | 0 | exists,up | | 3 | storage2demo | 142G | 1720G | 8 | 9833k | 0 | 0 | exists,up | | 4 | storage3demo | 182G | 1680G | 13 | 23.2M | 0 | 0 | exists,up | | 5 | storage3demo | 165G | 1697G | 7 | 13.6M | 0 | 0 | exists,up | +----+--------------+-------+-------+--------+---------+--------+---------+-----------+ [cephuser@storage1demo ~]$ rados df POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR cephfs_data 446 GiB 115328 0 230656 0 0 0 42754 111 GiB 2527943 5.0 TiB cephfs_metadata 50 MiB 35 0 105 0 0 0 166 38 MiB 24857 62 MiB total_objects 115363 total_used 899 GiB total_avail 8.7 TiB total_space 9.6 TiB While I'm doing "ls" the MDS daemon socket shows events of type: "event": "failed to xlock, waiting" "event": "failed to rdlock, waiting" MDS logs: 2019-09-12 10:35:33.582 7fbaebfaf700 1 mds.stor1demo Updating MDS map to version 941 from mon.0 2019-09-12 10:35:36.012 7fbae9521700 0 log_channel(cluster) log [WRN] : 1 slow requests, 0 included below; oldest blocked for > 51.253125 secs 2019-09-12 10:35:37.333 7fbaebfaf700 1 mds.stor1demo Updating MDS map to version 942 from mon.0 2019-09-12 10:35:41.012 7fbae9521700 0 log_channel(cluster) log [WRN] : 1 slow requests, 0 included below; oldest blocked for > 56.253176 secs 2019-09-12 10:35:41.332 7fbaebfaf700 1 mds.stor1demo Updating MDS map to version 943 from mon.0 2019-09-12 10:35:46.012 7fbae9521700 0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 61.253205 secs 2019-09-12 10:35:46.012 7fbae9521700 0 log_channel(cluster) log [WRN] : slow request 61.253204 seconds old, received at 2019-09-12 10:34:44.760404: client_request(client.4394:5797 getattr pAsLsXsFs #0x10000000f6e 2019-09-12 10:34:44.759327 caller_uid=1000, caller_gid=1000{}) currently failed to rdlock, waiting 2019-09-12 10:35:46.013 7fbae9521700 0 log_channel(cluster) log [WRN] : client.4375 isn't responding to mclientcaps(revoke), ino 0x10000000f6e pending pAsLsXsFsc issued pAsLsXsFscb, sent 61.253848 seconds ago 2019-09-12 10:35:49.333 7fbaebfaf700 1 mds.stor1demo Updating MDS map to version 944 from mon.0 2019-09-12 10:35:51.013 7fbae9521700 0 log_channel(cluster) log [WRN] : 1 slow requests, 0 included below; oldest blocked for > 66.253237 secs If I do "ls" from the same node that is writing then ls works fine. Seems a locked MDS troubles ? Thanks. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx