slow mds requests with random read test

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

We are performing couple performance tests on CephFS using fio. fio is run
in k8s pod and 3 pods will be up running mounting the same pvc to CephFS
volume. Here is command line for random read:
fio -direct=1 -iodepth=128 -rw=randread -ioengine=libaio -bs=4k -size=1G
-numjobs=5 -runtime=500 -group_reporting -directory=/tmp/cache
-name=Rand_Read_Testing_$BUILD_TIMESTAMP
The random read is performed very slow. Here is the cluster log from
dashboard:

5/30/23 8:13:16 PM

[INF]

Health check cleared: MDS_SLOW_REQUEST (was: 1 MDSs report slow requests)


5/30/23 8:13:16 PM

[INF]

Health check cleared: MDS_SLOW_METADATA_IO (was: 1 MDSs report slow
metadata IOs)


5/30/23 8:13:16 PM

[INF]

MDS health message cleared (mds.?): 1 slow metadata IOs are blocked > 30
secs, oldest blocked for 33 secs


5/30/23 8:13:16 PM

[INF]

MDS health message cleared (mds.?): 1 slow requests are blocked > 30 secs


5/30/23 8:13:14 PM

[WRN]

Health check update: 2 MDSs report slow requests (MDS_SLOW_REQUEST)


5/30/23 8:13:13 PM

[INF]

MDS health message cleared (mds.?): 1 slow requests are blocked > 30 secs


5/30/23 8:13:08 PM

[WRN]

Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST)


5/30/23 8:13:08 PM

[WRN]

Health check failed: 1 MDSs report slow metadata IOs (MDS_SLOW_METADATA_IO)


5/30/23 8:13:08 PM

[WRN]

slow request 34.213327 seconds old, received at
2023-05-30T12:12:33.951399+0000: client_request(client.270564:1406144
getattr pAsLsXsFs #0x700000103d0 2023-05-30T12:12:33.947323+0000
caller_uid=0, caller_gid=0{}) currently failed to rdlock, waiting


5/30/23 8:13:08 PM

[WRN]

1 slow requests, 1 included below; oldest blocked for > 34.213328 secs


5/30/23 8:13:07 PM

[WRN]

slow request 33.169703 seconds old, received at
2023-05-30T12:12:33.952078+0000: peer_request:client.270564:1406144
currently dispatched


5/30/23 8:13:07 PM

[WRN]

1 slow requests, 1 included below; oldest blocked for > 33.169704 secs


5/30/23 8:13:04 PM

[INF]

Cluster is now healthy


5/30/23 8:13:04 PM

[INF]

Health check cleared: MDS_SLOW_REQUEST (was: 1 MDSs report slow requests)


5/30/23 8:13:04 PM

[INF]

Health check cleared: MDS_SLOW_METADATA_IO (was: 1 MDSs report slow
metadata IOs)


5/30/23 8:13:04 PM

[INF]

MDS health message cleared (mds.?): 9 slow metadata IOs are blocked > 30
secs, oldest blocked for 45 secs


5/30/23 8:13:04 PM

[INF]

MDS health message cleared (mds.?): 2 slow requests are blocked > 30 secs


5/30/23 8:12:57 PM

[WRN]

2 slow requests, 0 included below; oldest blocked for > 44.954377 secs


5/30/23 8:12:52 PM

[WRN]

2 slow requests, 0 included below; oldest blocked for > 39.954313 secs


5/30/23 8:12:48 PM

[WRN]

Health check failed: 1 MDSs report slow requests (MDS_SLOW_REQUEST)


5/30/23 8:12:47 PM

[WRN]

slow request 34.935921 seconds old, received at
2023-05-30T12:12:12.185614+0000: client_request(client.270564:1406139
create #0x7000001045b/atomic7966567911433736706tmp
2023-05-30T12:12:12.182999+0000 caller_uid=0, caller_gid=0{}) currently
submit entry: journal_and_reply


5/30/23 8:12:47 PM

[WRN]

slow request 34.954254 seconds old, received at
2023-05-30T12:12:12.167281+0000: client_request(client.270564:1406138
rename #0x70000010457/build.xml #0x70000010457/atomic6590865221269854506tmp
2023-05-30T12:12:12.162999+0000 caller_uid=0, caller_gid=0{}) currently
submit entry: journal_and_reply


5/30/23 8:12:47 PM

[WRN]

2 slow requests, 2 included below; oldest blocked for > 34.954254 secs


5/30/23 8:12:44 PM

[WRN]

Health check failed: 1 MDSs report slow metadata IOs (MDS_SLOW_METADATA_IO)


5/30/23 8:12:41 PM

[INF]

Cluster is now healthy


5/30/23 8:12:41 PM

[INF]

Health check cleared: MDS_SLOW_REQUEST (was: 1 MDSs report slow requests)


5/30/23 8:12:41 PM

[INF]

MDS health message cleared (mds.?): 1 slow requests are blocked > 30 secs


5/30/23 8:12:40 PM

[INF]

Health check cleared: MDS_SLOW_METADATA_IO (was: 1 MDSs report slow
metadata IOs)


5/30/23 8:12:40 PM

[INF]

MDS health message cleared (mds.?): 1 slow metadata IOs are blocked > 30
secs, oldest blocked for 38 secs

However, random write test is performing very good.

Any suggestions on the problem?

Thanks,
Ben
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux