Journaling feature causes cluster to have slow requests and inconsistent PG

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



First noticed this problem in our ESXi/iSCSI cluster, but not I can
replicate it in lab with just Ubuntu:

1. Create an image with journaling (and required exclusive-lock) feature

2. Mount the image, make a fs and write a large file to it:

rbd-nbd map matte/scuttle2
/dev/nbd0

mkfs.xfs  /dev/nbd0
mount -t xfs /dev/nbd0 /srv/exports/sclun69
xfs_io -c "extsize 256M" /srv/exports/sclun69

root@lumd1:/var/log# dd if=/dev/zero of=/srv/exports/sclun69/junk
bs=1M count=2800000
2800000+0 records in
2800000+0 records out
2936012800000 bytes (2.9 TB, 2.7 TiB) copied, 35199.2 s, 83.4 MB/s

3. At some point, slow requests begin.

2018-03-06 22:00:00.000175 mon.lumc1 [INF] overall HEALTH_OK
2018-03-06 22:27:27.945814 mon.lumc1 [WRN] Health check failed: 1 slow
requests are blocked > 32 sec (REQUEST_SLOW)
2018-03-06 22:27:34.406352 mon.lumc1 [WRN] Health check update: 10
slow requests are blocked > 32 sec (REQUEST_SLOW)
2018-03-06 22:27:38.496184 mon.lumc1 [INF] Health check cleared:
REQUEST_SLOW (was: 10 slow requests are blocked > 32 sec)
2018-03-06 22:27:38.496215 mon.lumc1 [INF] Cluster is now healthy
2018-03-06 23:00:00.000196 mon.lumc1 [INF] overall HEALTH_OK
2018-03-06 23:29:45.538387 osd.4 [ERR] 12.308 shard 17: soid
12:10dbc229:::rbd_data.39e1022ae8944a.00000000000cd96d:head candidate
had a read error
2018-03-06 23:29:56.937346 mon.lumc1 [ERR] Health check failed: 1
scrub errors (OSD_SCRUB_ERRORS)
2018-03-06 23:29:56.937415 mon.lumc1 [ERR] Health check failed:
Possible data damage: 1 pg inconsistent (PG_DAMAGED)
2018-03-06 23:29:54.835693 osd.4 [ERR] 12.308 deep-scrub 0 missing, 1
inconsistent objects
2018-03-06 23:29:54.835703 osd.4 [ERR] 12.308 deep-scrub 1 errors
2018-03-07 00:00:00.000155 mon.lumc1 [ERR] overall HEALTH_ERR 1 scrub
errors; Possible data damage: 1 pg inconsistent
2018-03-07 01:00:00.000201 mon.lumc1 [ERR] overall HEALTH_ERR 1 scrub
errors; Possible data damage: 1 pg inconsistent
2018-03-07 02:00:00.000179 mon.lumc1 [ERR] overall HEALTH_ERR 1 scrub
errors; Possible data damage: 1 pg inconsistent
2018-03-07 03:00:00.000235 mon.lumc1 [ERR] overall HEALTH_ERR 1 scrub
errors; Possible data damage: 1 pg inconsistent


ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)



--
Alex Gorbachev
Storcium
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux