Re: fsping, why you no work no mo?

Andras Pataki <apataki@xxxxxxxxxxxxxxxxxxxxx> · Thu, 13 Apr 2017 15:20:10 -0400



    Hi Dan,

    
    I don't have a solution to the problem, I can only second that we've
    also been seeing strange problems when more than one node accesses
    the same file in ceph and at least one of them opens it for
    writing.  I've tried verbose logging on the client (fuse), and it
    seems that the fuse client sends some cap request to the MDS and
    does not get a response sometimes.  And it looks like it has some 5
    second polling interval, and that sometimes (but not always) saves
    the day and the client continues with a 5 second-ish delay.  This
    does not happen when multiple processes open the file for reading,
    but it does when processes open it for writing (even if they never
    write to the file and only read afterwards).  I have some earlier
    mailing list messages from a week or two ago describing what we see
    more in detail (including log outputs).  I think the issue has in
    some way to do with cap requests being lost/miscommunicated between
    the client and the MDS.

    
    Andras

    
    On 04/13/2017 01:41 PM, Dan van der
      Ster wrote:

    
      Dear ceph-*,

        
        A couple weeks ago I wrote this simple tool to measure the
        round-trip latency of a shared filesystem.

        
           https://github.com/dvanders/fsping

        
        In our case, the tool is to be run from two clients who mount
        the same CephFS.

        
        First, start the server (a.k.a. the ping reflector) on one
        machine in a CephFS directory:

        
           ./fsping --server

        
        Then, from another client machine and in the same directory,
        start the fsping client (aka the ping emitter):

        
            ./fsping --prefix <prefix from the server above>

        
        The idea is that the "client" writes a syn file, the reflector
        notices it, and writes an ack file. The time for the client to
        notice the ack file is what I call the rtt.

        
        And the output looks like normal ping, so that's neat. (The
        README.md shows a working example)

        
        Anyway, two weeks ago when I wrote this, it was working very
        well on my CephFS clusters (running 10.2.5, IIRC). I was seeing
        ~20ms rtt for small files, which is more or less what I was
        expecting on my test cluster.

        
        But when I run fsping today, it does one of two misbehaviours:

        
          1. Most of the time it just hangs, both on the reflector and
        on the emitter. The fsping processes are stuck in some
        uninterruptible state -- only an MDS failover breaks them out. I
        tried with and without fuse_disable_pagecache -- no big
        difference.

        
          2. When I increase the fsping --size to 512kB, it works a bit
        more reliably. But there is a weird bimodal distribution with
        most "packets" having 20-30ms rtt, some ~20% having ~5-6 seconds
        rtt, and some ~5% taking ~10-11s. I suspected the
        mds_tick_interval -- but decreasing that didn't help.

        
        In summary, if someone is curious, please give this tool a try
        on your CephFS cluster -- let me know if its working or not (and
        what rtt you can achieve with which configuration).

        And perhaps a dev would understand why it is not working with
        latest jewel ceph-fuse / ceph MDS's?
        

        Best Regards,
        

        Dan
        

      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com