Re: Is it normal for a orch osd rm drain to take so long?

"Zach Heise (SSCC)" <heise@xxxxxxxxxxxx> · Thu, 2 Dec 2021 13:20:43 -0600



    Good morning David,
    Assuming you need/want to see the data about the other 31 OSDs,
      14 is showing:
    
      
          ID

          
          CLASS

          
          WEIGHT

          
          REWEIGHT

          
          SIZE

          
          RAW USE

          
          DATA

          
          OMAP

          
          META

          
          AVAIL

          
          %USE

          
          VAR

          
          PGS

          
          STATUS

          
          14

          
          hdd

          
          2.72899

          
          0

          
          0 B

          
          0 B

          
          0 B

          
          0 B

          
          0 B

          
          0 B

          
          0

          
          0

          
          1

          
          up

          
    Zach

      
    On 2021-12-01 5:20 PM, David Orman
      wrote:

    
      What's "ceph osd df" show?
      

        On Wed, Dec 1, 2021 at 2:20 PM
          Zach Heise (SSCC) <heise@xxxxxxxxxxxx>
          wrote:

        
            I wanted to swap out on existing OSD, preserve the
              number, and then remove the HDD that had it (osd.14 in
              this case) and give the ID of 14 to a new SSD that would
              be taking its place in the same node. First time ever
              doing this, so not sure what to expect.
             I followed the instructions here, using
              the --replace flag.

            
            However, I'm a bit concerned that the operation is taking
              so long in my test cluster. Out of 70TB in the cluster,
              only 40GB were in use. This is a relatively large OSD in
              comparison to others in the cluster (2.7TB versus 300GB
              for most other OSDs) and yet it's been 36 hours with the
              following status:
            ceph04.ssc.wisc.edu> ceph orch osd rm status
OSD_ID  HOST                 STATE     PG_COUNT  REPLACE  FORCE  DRAIN_STARTED_AT                  
14      ceph04.ssc.wisc.edu  draining  1         True     True   2021-11-30 15:22:23.469150+00:00


            Another note: I don't know why it has the "force = true"
              set; the command that I ran was just Ceph orch osd rm 14
              --replace, without specifying --force. Hopefully not a big
              deal but still strange.

            
            At this point is there any way to tell if it's still
              actually doing something, or perhaps it is hung? if it is
              hung, what would be the 'recommended' way to proceed? I
              know that I could just manually eject the HDD from the
              chassis and run the "ceph osd crush remove osd.14" command
              and then manually delete the auth keys, etc, but the
              documentation seems to state that this shouldn't be
              necessary if a ceph OSD replacement goes properly.

            
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx