Re: long blocking with writes on rbds

Lionel Bouton <lionel-subscription@xxxxxxxxxxx> · Wed, 22 Apr 2015 19:56:42 +0200



    On 04/22/15 19:50, Lionel Bouton wrote:

    
      On 04/22/15 17:57, Jeff Epstein
        wrote:

      
        On 04/10/2015 10:10 AM, Lionel
          Bouton wrote:

        
          On 04/10/15 15:41, Jeff Epstein
            wrote:

          
            [...]

            This seems highly unlikely. We get very good performance
            without ceph. Requisitioning and manupulating block devices
            through LVM happens instantaneously. We expect that ceph
            will be a bit slower by its distributed nature, but we've
            seen operations block for up to an hour, which is clearly
            behind the pale. Furthermore, as the performance measure I
            posted show, read/write speed is not the bottleneck: ceph is
            simply waiting.

            
            So, does anyone else have any ideas why mkfs (and other
            operations) takes so long?

          
          As your use case is pretty unique and clearly not something
          Ceph was optimized for, if I were you I'd switch to a single
          pool with the appropriate number of pgs based on your pool
          size (replication) and the number of OSD you use (you should
          target 100 pgs/OSD to be in what seems the sweet spot) and
          create/delete rbd instead of the whole pool. You would be in
          "known territory" and any remaining performance problem would
          be easier to debug.

          
        I agree that this is a good suggestion. It took me a little
        while, but I've changed the configuration so that we now have
        only one pool, containing many rbds, and now all data is spread
        across all six of our OSD nodes. However, the performance has
        not perceptibly improved. We still have the occasional long
        (>10 minutes) wait periods during write operations, and the
        bottleneck still seems to be ceph, rather than the hardware: the
        blocking process (most usually, but not always, mkfs) is stuck
        in a wait state ("D" in ps) but no I/O is actually being
        performed, so one can surmise that the physical limitations of
        the disk medium are not the bottleneck. This is similar to what
        is being reported in the thread titled "100% IO Wait with CEPH
        RBD and RSYNC".

        
        Do you have some idea how I can diagnose this problem?

      
      I'll look at ceph -s output while you get these stuck process to
      see if there's any unusual activity (scrub/deep
      scrub/recovery/bacfills/...). Is it correlated in any way with rbd
      removal (ie: write blocking don't appear unless you removed at
      least one rbd for say one hour before the write performance
      problems).

    
    I'm not familiar with Amazon VMs. If you map the rbds using the
    kernel driver to local block devices do you have control over the
    kernel you run (I've seen reports of various problems with older
    kernels and you probably want the latest possible) ?

  
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com