Dear all developers,
I use the rbd kernel module on the client-end, and when we test the
random write performance. The throughput is quit poor and always drops
to zero.
And I trace the development logs on the server-side and find that it
is always blocked in the function: get_object_context, getattr() and
_setattrs. The average time os about hundreds of milliseconds. Even
bad, the maximum latency is up to 4-6 seconds, so the throughput
observed on the client-side is always blocked several seconds. This is
really ruining the performance of the cluster.
Therefore, I carefully analyze those functions mentioned above
(get_object_context, getattr() and _setattrs). I cannot find any
blocked code except for the system calls for xattr like (fgetattr,
fsetattr, flistattr).
On the OSD node, I use the xfs file system as the underlying osd file
system. And by default, it will use the extend attribute feature of
the xfs to store ceph.user xattr (??_?? and ??snapset??). Since those
system calls are synchronized function call, I set the io-scheduler of
the disk to [Deadline] so that no reading meta-data will be blocked a
long time before it will be served. However, even though, the
performance is still quite poor and those functions mentioned above
are still blocked, sometimes, up to several seconds.
Therefore, I wanna know that how to solve this problem, does ceph
provide any user-space cache for xattr?
Does this problem caused by xfs file-system, its xattr system calls?
Furthermore, I try to stop the feature of xfs xattr by setting
??filestore_max_inline_xattrs_xfs = 0?? &&
??filestore_max_inline_xattr_size_xfs = 0??. So the xattr key/value
pair will be stored in omap implemented by LevelDB. It solves the
problem a bit, the maximum blocked interval drops to about 1-2 second.
But if the xattr read from the physical disk not the page cache, it
still quite slow.
So I wonder that is it a good idea to cache all xattr data in
use_space cache as for xattr, ??_??, the length is just 242 bytes if
we use xfs file-system? For hundred thousands of Objects, it will cost
just less than 100MB.
Best Regards,
Neal Yao
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html