Re: Safely Upgrading OS on a live Ceph Cluster

Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx> · Wed, 1 Mar 2017 15:31:57 +0100



    On 03/01/17 14:41, Heller, Chris wrote:

    
      That is a good question, and I'm not sure how to answer. The
      journal is on its own volume, and is not a symlink. Also how does
      one flush the journal? That seems like an important step when
      bringing down a cluster safely.
      

    You only need to flush the journal if you are removing it from the
    osd, replacing it with a different journal.

    
    So since your journal is on its own, then you need either a symlink
    in the osd directory named "journal" which points to the device
    (ideally not /dev/sdx but /dev/disk/by-.../), or you put it in the
    ceph.conf.

    
    And since it said you have a non-block journal now, it probably
    means there is a file... you should remove that (rename it to
    journal.junk until you're sure it's not an important file, and
    delete it later).

    
      -Chris
      

            On Mar 1, 2017, at 8:37 AM, Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx>
              wrote:
            

                On 02/28/17 18:55, Heller,
                  Chris wrote:

                
                 Quick update. So I'm trying out
                  the procedure as documented here.
                  

                  So far I've:
                  

                  1. Stopped ceph-mds
                  2. set noout, norecover, norebalance,
                    nobackfill
                  3. Stopped all ceph-osd
                  4. Stopped ceph-mon
                  5. Installed new OS
                  6. Started ceph-mon
                  7. Started all ceph-osd
                  

                  This is where I've stopped. All but one
                    OSD came back online. One has this backtrace:
                  

                      2017-02-28 17:44:54.884235
                        7fb2ba3187c0 -1 journal FileJournal::_open:
                        disabling aio for non-block journal.  Use
                        journal_force_aio to force use of aio anyway
                    
                  
                Are the journals inline? or separate? If they're
                separate, the above means the journal symlink/config is
                missing, so it would possibly make a new journal, which
                would be bad if you didn't flush the old journal before.

                
                And also just one osd is easy enough to replace (which I
                wouldn't do until the cluster settled down and
                recovered). So it's lame for it to be broken, but it's
                still recoverable if that's the only issue.

              
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com