Re: long blocking with writes on rbds

Lionel Bouton <lionel+ceph@xxxxxxxxxxx> · Wed, 22 Apr 2015 19:50:18 +0200



    On 04/22/15 17:57, Jeff Epstein wrote:

    
      On 04/10/2015 10:10 AM, Lionel Bouton
        wrote:

      
        On 04/10/15 15:41, Jeff Epstein
          wrote:

        
          [...]

          This seems highly unlikely. We get very good performance
          without ceph. Requisitioning and manupulating block devices
          through LVM happens instantaneously. We expect that ceph will
          be a bit slower by its distributed nature, but we've seen
          operations block for up to an hour, which is clearly behind
          the pale. Furthermore, as the performance measure I posted
          show, read/write speed is not the bottleneck: ceph is simply
            waiting.

          
          So, does anyone else have any ideas why mkfs (and other
          operations) takes so long?

        
        As your use case is pretty unique and clearly not something Ceph
        was optimized for, if I were you I'd switch to a single pool
        with the appropriate number of pgs based on your pool size
        (replication) and the number of OSD you use (you should target
        100 pgs/OSD to be in what seems the sweet spot) and
        create/delete rbd instead of the whole pool. You would be in
        "known territory" and any remaining performance problem would be
        easier to debug.

        
      I agree that this is a good suggestion. It took me a little while,
      but I've changed the configuration so that we now have only one
      pool, containing many rbds, and now all data is spread across all
      six of our OSD nodes. However, the performance has not perceptibly
      improved. We still have the occasional long (>10 minutes) wait
      periods during write operations, and the bottleneck still seems to
      be ceph, rather than the hardware: the blocking process (most
      usually, but not always, mkfs) is stuck in a wait state ("D" in
      ps) but no I/O is actually being performed, so one can surmise
      that the physical limitations of the disk medium are not the
      bottleneck. This is similar to what is being reported in the
      thread titled "100% IO Wait with CEPH RBD and RSYNC".

      
      Do you have some idea how I can diagnose this problem?

    
    I'll look at ceph -s output while you get these stuck process to see
    if there's any unusual activity (scrub/deep
    scrub/recovery/bacfills/...). Is it correlated in any way with rbd
    removal (ie: write blocking don't appear unless you removed at least
    one rbd for say one hour before the write performance problems).

    
    Best regards,

    
    Lionel Bouton

  
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com