First noticed this problem in our ESXi/iSCSI cluster, but not I can replicate it in lab with just Ubuntu: 1. Create an image with journaling (and required exclusive-lock) feature 2. Mount the image, make a fs and write a large file to it: rbd-nbd map matte/scuttle2 /dev/nbd0 mkfs.xfs /dev/nbd0 mount -t xfs /dev/nbd0 /srv/exports/sclun69 xfs_io -c "extsize 256M" /srv/exports/sclun69 root@lumd1:/var/log# dd if=/dev/zero of=/srv/exports/sclun69/junk bs=1M count=2800000 2800000+0 records in 2800000+0 records out 2936012800000 bytes (2.9 TB, 2.7 TiB) copied, 35199.2 s, 83.4 MB/s 3. At some point, slow requests begin. 2018-03-06 22:00:00.000175 mon.lumc1 [INF] overall HEALTH_OK 2018-03-06 22:27:27.945814 mon.lumc1 [WRN] Health check failed: 1 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-03-06 22:27:34.406352 mon.lumc1 [WRN] Health check update: 10 slow requests are blocked > 32 sec (REQUEST_SLOW) 2018-03-06 22:27:38.496184 mon.lumc1 [INF] Health check cleared: REQUEST_SLOW (was: 10 slow requests are blocked > 32 sec) 2018-03-06 22:27:38.496215 mon.lumc1 [INF] Cluster is now healthy 2018-03-06 23:00:00.000196 mon.lumc1 [INF] overall HEALTH_OK 2018-03-06 23:29:45.538387 osd.4 [ERR] 12.308 shard 17: soid 12:10dbc229:::rbd_data.39e1022ae8944a.00000000000cd96d:head candidate had a read error 2018-03-06 23:29:56.937346 mon.lumc1 [ERR] Health check failed: 1 scrub errors (OSD_SCRUB_ERRORS) 2018-03-06 23:29:56.937415 mon.lumc1 [ERR] Health check failed: Possible data damage: 1 pg inconsistent (PG_DAMAGED) 2018-03-06 23:29:54.835693 osd.4 [ERR] 12.308 deep-scrub 0 missing, 1 inconsistent objects 2018-03-06 23:29:54.835703 osd.4 [ERR] 12.308 deep-scrub 1 errors 2018-03-07 00:00:00.000155 mon.lumc1 [ERR] overall HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent 2018-03-07 01:00:00.000201 mon.lumc1 [ERR] overall HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent 2018-03-07 02:00:00.000179 mon.lumc1 [ERR] overall HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent 2018-03-07 03:00:00.000235 mon.lumc1 [ERR] overall HEALTH_ERR 1 scrub errors; Possible data damage: 1 pg inconsistent ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable) -- Alex Gorbachev Storcium _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com