Re: LOCK_SYNC_MIX state makes "getattr" operations extremely slow when there are lots of clients issue writes or reads to the same file

"Yan, Zheng" <ukernel@xxxxxxxxx> · Tue, 6 Feb 2018 14:28:23 +0800

On Tue, Feb 6, 2018 at 12:19 PM, Xuehan Xu <xxhdx1985126@xxxxxxxxx> wrote:
> Hi, everyone.
>
> Recently, in our online Luminous Cephfs clusters, we found that there
> are occasionally lots of slow requests:
>
> 2018-01-16 15:16:53.233925 7fa59f7fe700  0 log_channel(cluster) log
> [WRN] : 49 slow requests, 5 included below; oldest blocked for >
> 89.995722 secs
> 2018-01-16 15:16:53.233934 7fa59f7fe700  0 log_channel(cluster) log
> [WRN] : slow request 39.966171 seconds old, received at 2018-01-16
> 15:16:13.267686: client_request(client.4925109:3759 getattr pAsLsXsFs
> #0x10000000fe0 2018-01-16 15:16:13.176343 caller_uid=33,
> caller_gid=33{}) currently failed to rdlock, waiting
> 2018-01-16 15:16:53.233938 7fa59f7fe700  0 log_channel(cluster) log
> [WRN] : slow request 39.949899 seconds old, received at 2018-01-16
> 15:16:13.283958: client_request(client.4925109:3772 getattr pAsLsXsFs
> #0x10000000fe0 2018-01-16 15:16:13.193343 caller_uid=33,
> caller_gid=33{}) currently failed to rdlock, waiting
> 2018-01-16 15:16:53.233942 7fa59f7fe700  0 log_channel(cluster) log
> [WRN] : slow request 39.899975 seconds old, received at 2018-01-16
> 15:16:13.333882: client_request(client.4925109:3785 getattr pAsLsXsFs
> #0x10000000fe0 2018-01-16 15:16:13.242344 caller_uid=33,
> caller_gid=33{}) currently failed to rdlock, waiting
> 2018-01-16 15:16:53.233946 7fa59f7fe700  0 log_channel(cluster) log
> [WRN] : slow request 39.965588 seconds old, received at 2018-01-16
> 15:16:13.268269: client_request(client.4925109:3760 getattr pAsLsXsFs
> #0x10000000fe0 2018-01-16 15:16:13.177343 caller_uid=33,
> caller_gid=33{}) currently failed to rdlock, waiting
> 2018-01-16 15:16:53.233949 7fa59f7fe700  0 log_channel(cluster) log
> [WRN] : slow request 39.964976 seconds old, received at 2018-01-16
> 15:16:13.268881: client_request(client.4925109:3761 getattr pAsLsXsFs
> #0x10000000fe0 2018-01-16 15:16:13.178343 caller_uid=33,
> caller_gid=33{}) currently failed to rdlock, waiting
> 2018-01-16 15:16:58.233997 7fa59f7fe700  0 log_channel(cluster) log
> [WRN] : 49 slow requests, 5 included below; oldest blocked for >
> 94.995785 secs
> 2018-01-16 15:16:58.234006 7fa59f7fe700  0 log_channel(cluster) log
> [WRN] : slow request 60.080351 seconds old, received at 2018-01-16
> 15:15:58.153569: client_request(client.4925109:3742 getattr pAsLsXsFs
> #0x10000003e7d 2018-01-16 15:15:58.062900 caller_uid=99,
> caller_gid=99{}) currently failed to rdlock, waiting
> 2018-01-16 15:16:58.234010 7fa59f7fe700  0 log_channel(cluster) log
> [WRN] : slow request 44.929603 seconds old, received at 2018-01-16
> 15:16:13.304317: client_request(client.4925109:3778 getattr pAsLsXsFs
> #0x10000000fe0 2018-01-16 15:16:13.213344 caller_uid=33,
> caller_gid=33{}) currently failed to rdlock, waiting
> 2018-01-16 15:16:58.234014 7fa59f7fe700  0 log_channel(cluster) log
> [WRN] : slow request 44.964894 seconds old, received at 2018-01-16
> 15:16:13.269026: client_request(client.4925109:3762 getattr pAsLsXsFs
> #0x10000000fe0 2018-01-16 15:16:13.178343 caller_uid=33,
> caller_gid=33{}) currently failed to rdlock, waiting
> 2018-01-16 15:16:58.234017 7fa59f7fe700  0 log_channel(cluster) log
> [WRN] : slow request 44.948258 seconds old, received at 2018-01-16
> 15:16:13.285662: client_request(client.4925109:3773 getattr pAsLsXsFs
> #0x10000000fcf 2018-01-16 15:16:13.194343 caller_uid=33,
> caller_gid=33{}) currently failed to rdlock, waiting
> 2018-01-16 15:16:58.234020 7fa59f7fe700  0 log_channel(cluster) log
> [WRN] : slow request 44.817911 seconds old, received at 2018-01-16
> 15:16:13.416009: client_request(client.4925109:3796 getattr pAsLsXsFs
> #0x10000000fe0 2018-01-16 15:16:13.325347 caller_uid=33,
> caller_gid=33{}) currently failed to rdlock, waiting
>
> Our use case is like this:
>        There are one client issuing writes to a file while tens of
> other clients issue reads and stat to the same file.
>
> After a series of test, we found that these slow requests are all
> "getattr" operations, and they are all blocked by CInode::filelock in
> LOCK_SYNC_MIX state in which rdlock cannot be acquired. Further
> analysis showed that, in our use case, due tothe writing client's
> continuous "wants"  of CEPH_CAP_GWR   , nearly every "getattr"
> operation's finish would lead to the inode's filelock go into
> LOCK_SYNC_MIX state in which the mds must wait for all clients holding
> caps other than the allowed "rwl" to release their unallowed caps and
> blocks all requests to acquire "rdlock" of the inode's filelock. Due
> to the massive number of reading clients, the wait could lasts for
> about 950ms, which means, for every 950 ms, there can nearly be only
> one getattr operation gets finished.
>
> Why must the acquisition of "rdlock" of an inode's filelock be blocked
> when the inode's filelock is in LOCK_SYNC_MIX state? And why must the
> mds wait for all clients to release their unallowed caps?
>

Getattr request for stat(2) needs to get inode size. MDS revokes Fw
capability from the writer, to prevent writer from changing inode size
at the same time. This is price of mantaining POSIX-compliant .

> Thanks:-)
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html