Re: Extremely high OSD memory utilization on Kraken 11.2.0 (with XFS -or- bluestore)

Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx> · Sat, 15 Apr 2017 12:39:52 +0000



    How many PGs do you have? And did you
      change any config, like mds cache size? Show your ceph.conf.

      
      On 04/15/17 07:34, Aaron Ten Clay wrote:

    
                    Hi all,

                      
                    Our cluster is experiencing a very odd issue and I'm
                    hoping for some guidance on troubleshooting steps
                    and/or suggestions to mitigate the issue. tl;dr:
                    Individual ceph-osd processes try to allocate >
                    90GiB of RAM and are eventually nuked by oom_killer.

                    
                    I'll try to explain the situation in detail:

                    
                  We have 24-4TB bluestore HDD OSDs, and 4-600GB SSD
                  OSDs. The SSD OSDs are in a different CRUSH "root",
                  used as a cache tier for the main storage pools, which
                  are erasure coded and used for cephfs. The OSDs are
                  spread across two identical machines with 128GiB of
                  RAM each, and there are three monitor nodes on
                  different hardware.

                  
                Several times we've encountered crippling bugs with
                previous Ceph releases when we were on RC or betas, or
                using non-recommended configurations, so in January we
                abandoned all previous Ceph usage, deployed LTS Ubuntu
                16.04, and went with stable Kraken 11.2.0 with the
                configuration mentioned above. Everything was fine until
                the end of March, when one day we find all but a couple
                of OSDs are "down" inexplicably. Investigation reveals
                oom_killer came along and nuked almost all the ceph-osd
                processes.

                
              We've gone through a bunch of iterations of restarting the
              OSDs, trying to bring them up one at a time gradually, all
              at once, various configuration settings to reduce cache
              size as suggested in this ticket: http://tracker.ceph.com/issues/18924...

              
            I don't know if that ticket really pertains to our situation
            or not, I have no experience with memory allocation
            debugging. I'd be willing to try if someone can point me to
            a guide or walk me through the process.

            
          I've even tried, just to see if the situation was  transitory,
          adding over 300GiB of swap to both OSD machines. The OSD procs
          managed to allocate, in a matter of 5-10 minutes, more than
          300GiB of RAM pressure and became oom_killer victims once
          again.

          
        No software or hardware changes took place around the time this
        problem started, and no significant data changes occurred
        either. We added about 40GiB of ~1GiB files a week or so before
        the problem started and that's the last time data was written.

        
                    I can only assume we've found another crippling
                      bug of some kind, this level of memory usage is
                      entirely unprecedented. What can we do?

                      
                    Thanks in advance for any suggestions.

                    
                    -Aaron

                    
      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com