Re: Safely Upgrading OS on a live Ceph Cluster

Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx> · Wed, 1 Mar 2017 15:39:03 +0100



    On 03/01/17 15:36, Heller, Chris wrote:

    
      I see. My journal is specified in ceph.conf. I'm not removing it
      from the OSD so sounds like flushing isn't needed in my case.
      

    Okay but it seems it's not right if it's saying it's a non-block
    journal. (meaning a file, not a block device).

    
    Double check your ceph.conf... make sure the path works, and somehow
    make sure the [osd.x] actually matches that osd (no idea how to test
    that, esp. if the osd doesn't start ... maybe just increase
    logging).

    
    Or just make a symlink for now, just to see if it solves the
    problem, which would imply the ceph.conf is wrong.

    
      -Chris

        
            On Mar 1, 2017, at 9:31 AM, Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx>
              wrote:
            

                On 03/01/17 14:41, Heller,
                  Chris wrote:

                
                 That is a good question, and I'm
                  not sure how to answer. The journal is on its own
                  volume, and is not a symlink. Also how does one flush
                  the journal? That seems like an important step when
                  bringing down a cluster safely.
                  

                You only need to flush the journal if you are removing
                it from the osd, replacing it with a different journal.

                
                So since your journal is on its own, then you need
                either a symlink in the osd directory named "journal"
                which points to the device (ideally not /dev/sdx but
                /dev/disk/by-.../), or you put it in the ceph.conf.

                
                And since it said you have a non-block journal now, it
                probably means there is a file... you should remove that
                (rename it to journal.junk until you're sure it's not an
                important file, and delete it later).
                
                  
                              This is where I've stopped.
                                All but one OSD came back online. One
                                has this backtrace:
                              

                                  2017-02-28
                                    17:44:54.884235 7fb2ba3187c0 -1
                                    journal FileJournal::_open:
                                    disabling aio for non-block journal.
                                     Use journal_force_aio to force use
                                    of aio anyway
                                
                              
                            Are the journals inline? or separate? If
                            they're separate, the above means the
                            journal symlink/config is missing, so it
                            would possibly make a new journal, which
                            would be bad if you didn't flush the old
                            journal before.

                            
                            And also just one osd is easy enough to
                            replace (which I wouldn't do until the
                            cluster settled down and recovered). So it's
                            lame for it to be broken, but it's still
                            recoverable if that's the only issue.

                          
    -- 

--------------------------------------------
Peter Maloney
Brockmann Consult
Max-Planck-Str. 2
21502 Geesthacht
Germany
Tel: +49 4152 889 300
Fax: +49 4152 889 333
E-mail: peter.maloney@xxxxxxxxxxxxxxxxxxxx
Internet: http://www.brockmann-consult.de
--------------------------------------------
  

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com