Re: cephfs: some metadata operations take seconds to complete

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the replies.
I'll move all our testbed installation to Luminous and redo the tests.

Cheers,
Tyanko

On 17 October 2017 at 10:14, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
On Tue, Oct 17, 2017 at 1:07 AM, Tyanko Aleksiev
<tyanko.alexiev@xxxxxxxxx> wrote:
> Hi,
>
> At UZH we are currently evaluating cephfs as a distributed file system for
> the scratch space of an HPC installation. Some slow down of the metadata
> operations seems to occur under certain circumstances. In particular,
> commands issued after some big file deletion could take several seconds.
>
> Example:
>
> dd bs=$((1024*1024*128)) count=2048 if=/dev/zero of=./dd-test
> 274877906944 bytes (275 GB, 256 GiB) copied, 224.798 s, 1.2 GB/s
>
> dd bs=$((1024*1024*128)) count=2048 if=./dd-test of=./dd-test2
> 274877906944 bytes (275 GB, 256 GiB) copied, 1228.87 s, 224 MB/s
>
> ls; time rm dd-test2 ; time ls
> dd-test  dd-test2
>
> real    0m0.004s
> user    0m0.000s
> sys     0m0.000s
> dd-test
>
> real    0m8.795s
> user    0m0.000s
> sys     0m0.000s
>
> Additionally, the time it takes to complete the "ls" command appears to be
> proportional to the size of the deleted file. The issue described above is
> not limited to "ls" but extends to other commands:
>
> ls ; time rm dd-test2 ; time du -hs ./*
> dd-test  dd-test2
>
> real    0m0.003s
> user    0m0.000s
> sys     0m0.000s
> 128G    ./dd-test
>
> real    0m9.974s
> user    0m0.000s
> sys     0m0.000s
>
> What might be causing this behavior and eventually how could we improve it?
>

Seems like mds was waiting for journal flush, it can wait up to
'mds_tick_interval'. This issue should be fix in  luminous release.

Regards
Yan, Zheng

> Setup:
>
> - ceph version: 10.2.9, OS: Ubuntu 16.04, kernel: 4.8.0-58-generic,
> - 3 monitors,
> - 1 mds,
> - 3 storage nodes with 24 X 4TB disks on each node: 1 OSD/disk (72 OSDs in
> total). 4TB disks are used for the cephfs_data pool. Journaling is on SSDs,
> - we installed an 400GB NVMe disk on each storage node and aggregated the
> tree disks in crush rule. cephfs_metadata pool was then created using that
> rule and therefore is hosted on the NVMes. Journaling and data are on the
> same partition here.
>
> So far we are using the default ceph configuration settings.
>
> Clients are mounting the file system with the kernel driver using the
> following options (again default):
> "rw,noatime,name=admin,secret=<hidden>,acl,_netdev".
>
> Thank you in advance for the help.
>
> Cheers,
> Tyanko
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux