Re: CephFS: 'ls -alR' performance terrible unless Linux cache flushed

negillen negillen <negillen@xxxxxxxxx> · Tue, 16 Jun 2015 13:00:05 +0100

Thanks everyone,

update: I tried running on "node A":
# vmtouch -ev /storage/
# sync; sync

The problem persisted; one minute needed to 'ls -Ral' the dir (from node B).

After that I ran on node A:
# echo 2 > /proc/sys/vm/drop_caches    

And everything became suddenly fast on node B. ls, du, tar, all of them take a fraction of a second to complete on node B after dropping cache on A.

On Tue, Jun 16, 2015 at 12:52 PM, Jan Schermer <jan@xxxxxxxxxxx> wrote:
Have you tried just running “sync;sync” on the originating node? Does that achieve the same thing or not? (I guess it could/should).

Jan

On 16 Jun 2015, at 13:37, negillen negillen <negillen@xxxxxxxxx> wrote:

Thanks again,

even 'du' performance is terrible on node B (testing on a directory taken from Phoronix):

# time du -hs /storage/test9/installed-tests/pts/pgbench-1.5.1/
73M     /storage/test9/installed-tests/pts/pgbench-1.5.1/
real    0m21.044s
user    0m0.010s
sys     0m0.067s

Reading the files from node B doesn't seem to help with subsequent accesses in this case:

# time tar c /storage/test9/installed-tests/pts/pgbench-1.5.1/>/dev/null
real    1m47.650s
user    0m0.041s
sys     0m0.212s

# time tar c /storage/test9/installed-tests/pts/pgbench-1.5.1/>/dev/null
real    1m45.636s
user    0m0.042s
sys     0m0.214s

# time ls -laR /storage/test9/installed-tests/pts/pgbench-1.5.1>/dev/null

real    1m43.180s
user    0m0.069s
sys     0m0.236s

Of course, once I dismount the CephFS on node A everything gets as fast as it can be.

Am I missing something obvious here?
Yes I could drop the Linux cache as a 'fix' but that would drop the entire system's cache, sounds a bit extreme! :P 
Unless is there a way to drop the cache only for that single dir...?

On Tue, Jun 16, 2015 at 12:15 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
On Tue, Jun 16, 2015 at 12:11 PM, negillen negillen <negillen@xxxxxxxxx> wrote:

> Thank you very much for your reply!

>

> Is there anything I can do to go around that? e.g. setting access caps to be

> released after a short while? Or is there a command to manually release

> access caps (so that I could run it in cron)?

Well, you can drop the caches. ;)

More generally, you're running into a specific hole here. If your

clients are actually *accessing* the files then they should go into

shared mode and this will be much faster on subsequent accesses.

> This is quite a problem because we have several applications that need to

> access a large number of files and when we set them to work on CephFS

> latency skyrockets.

What kind of shared-file access do they have? If you have a bunch of

files being shared for read I'd expect this to be very fast. If

different clients are writing small amounts to them in round-robin

then that's unfortunately not going to work well. :(

-Greg

>

> Thanks again and regards.

>

> On Tue, Jun 16, 2015 at 10:59 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:

>>

>> On Mon, Jun 15, 2015 at 11:34 AM, negillen negillen <negillen@xxxxxxxxx>

>> wrote:

>> > Hello everyone,

>> >

>> > something very strange is driving me crazy with CephFS (kernel driver).

>> > I copy a large directory on the CephFS from one node. If I try to

>> > perform a

>> > 'time ls -alR' on that directory it gets executed in less than one

>> > second.

>> > If I try to do the same 'time ls -alR' from another node it takes

>> > several

>> > minutes. No matter how many times I repeat the command, the speed is

>> > always

>> > abysmal. The ls works fine on the node where the initial copy was

>> > executed

>> > from. This happens with any directory I have tried, no matter what kind

>> > of

>> > data is inside.

>> >

>> > After lots of experimenting I have found that in order to have fast ls

>> > speed

>> > for that dir from every node I need to flush the Linux cache on the

>> > original

>> > node:

>> > echo 3 > /proc/sys/vm/drop_caches

>> > Unmounting and remounting the CephFS on that node does the trick too.

>> >

>> > Anyone has a clue about what's happening here? Could this be a problem

>> > with

>> > the writeback fscache for the CephFS?

>> >

>> > Any help appreciated! Thanks and regards. :)

>>

>> This is a consequence of the CephFS "file capabilities" that we use to

>> do distributed locking on file states. When you copy the directory on

>> client A, it has full capabilities on the entire tree. When client B

>> tries to do a stat on each file in that tree, it doesn't have any

>> capabilities. So it sends a stat request to the MDS, which sends a cap

>> update to client A requiring it to pause updates on the file and share

>> its current state. Then the MDS tells client A it can keep going and

>> sends the stat to client B.

>> So that's:

>> B -> MDS

>> MDS -> A

>> A -> MDS

>> MDS -> B | MDS -> A

>> for every file you touch.

>>

>> I think the particular oddity you're encountering here is that CephFS

>> generally tries not to make clients drop their "exclusive" access caps

>> just to satisfy a stat. If you had client B doing something with the

>> files (like reading them) you would probably see different behavior.

>> I'm not sure if there's something effective we can do here or not

>> (it's just a bunch of heuristics when we should or should not drop

>> caps), but please file a bug on the tracker (tracker.ceph.com) with

>> this case. :)

>> -Greg

>

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com