Re: ceph-fuse segfaults ( jewel 10.2.2)

Goncalo Borges <goncalo.borges@xxxxxxxxxxxxx> · Fri, 8 Jul 2016 17:01:45 +1000

    Hi Brad, Patrick, All...
    I think I've understood this second problem. In summary, it is
      memory related.
    This is how I found the source of the problem:

      1./ I copied and adapted the user application to run in another
        cluster of ours. The idea was for me to understand the
        application and run it myself to collect logs and so on...
      2./ Once I submit it to this other cluster, every thing went
        fine. I was hammering cephfs from multiple nodes without
        problems. This pointed to something different between the two
        clusters.
      3./ I've started to look better to the segmentation fault
        message, and assuming that the names of the methods and
        functions do mean something, the log seems related to issues on
        the management of objects in cache. This pointed to a memory
        related problem.

      4./ On the cluster where the application run successfully,
        machines have 48GB of RAM and 96GB of SWAP (don't know why we
        have such a large SWAP size, it is a legacy setup).

        # top

          top - 00:34:01 up 23 days, 22:21,  1 user,  load average:
          12.06, 12.12, 10.40

          Tasks: 683 total,  13 running, 670 sleeping,   0 stopped,   0
          zombie

          Cpu(s): 49.7%us,  0.6%sy,  0.0%ni, 49.7%id,  0.1%wa,  0.0%hi, 
          0.0%si,  0.0%st

          Mem:  49409308k total, 29692548k used, 19716760k free,  
          433064k buffers

          Swap: 98301948k total,        0k used, 98301948k free,
          26742484k cached

      5./ I have noticed that ceph-fuse (in 10.2.2) consumes about
        1.5 GB of virtual memory when there is no applications using the
        filesystem. 

         7152 root      20   0 1108m  12m 5496 S  0.0  0.0   0:00.04
          ceph-fuse 

      When I only have one instance of the user application running,
        ceph-fuse (in 10.2.2) slowly rises with time up to 10 GB of
        memory usage.
      if I submit a large number of user applications simultaneously,
        ceph-fuse goes very fast to ~10GB. 

        PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM   
        TIME+ 
        COMMAND                                                             

        18563 root      20   0 10.0g 328m 5724 S  4.0  0.7   1:38.00
        ceph-fuse                                                           

         4343 root      20   0 3131m 237m  12m S  0.0  0.5  28:24.56
        dsm_om_connsvcd                                                     

         5536 goncalo   20   0 1599m  99m  32m R 99.9  0.2  31:35.46
        python                                                              

        31427 goncalo   20   0 1597m  89m  20m R 99.9  0.2  31:35.88
        python                                                              

        20504 goncalo   20   0 1599m  89m  20m R 100.2  0.2  31:34.29
        python                                                             

        20508 goncalo   20   0 1599m  89m  20m R 99.9  0.2  31:34.20
        python                                                              

         4973 goncalo   20   0 1599m  89m  20m R 99.9  0.2  31:35.70
        python                                                              

         1331 goncalo   20   0 1597m  88m  20m R 99.9  0.2  31:35.72
        python                                                              

        20505 goncalo   20   0 1597m  88m  20m R 99.9  0.2  31:34.46
        python                                                              

        20507 goncalo   20   0 1599m  87m  20m R 99.9  0.2  31:34.37
        python                                                              

        28375 goncalo   20   0 1597m  86m  20m R 99.9  0.2  31:35.52
        python                                                              

        20503 goncalo   20   0 1597m  85m  20m R 100.2  0.2  31:34.09
        python                                                             

        20506 goncalo   20   0 1597m  84m  20m R 99.5  0.2  31:34.42
        python                                                              

        20502 goncalo   20   0 1597m  83m  20m R 99.9  0.2  31:34.32
        python   

      6./ On the machines where the user had the segfault, we have 16 GB
      of RAM and 1GB of SWAP

        Mem:  16334244k total,  3590100k used, 12744144k free,  
          221364k buffers

          Swap:  1572860k total,    10512k used,  1562348k free, 
          2937276k cached

      7./ I think what is happening is that once the user submits his
        sets of jobs, the memory usage goes to the very limit on this
        type machine, and the raise is actually to fast that ceph-fuse
        segfaults before OOM Killer can kill it. 

      8./ We have run the user application in the same type of
        machines but with 64 GB of RAM and 1GB of SWAP, and everything
        goes fine also here.

    So, in conclusion, our second problem (besides the locks which
      was fixed by Pat patch) is the memory usage profile of ceph-fuse
      in 10.2.2 which seems to be very different than what it was in
      ceph-fuse 9.2.0.
    Are there any ideas how can we limit the virtual memory usage of
    ceph-fuse in 10.2.2?

    Cheers

    Goncalo

    On 07/08/2016 09:54 AM, Brad Hubbard
      wrote:

      Hi Goncalo,

If possible it would be great if you could capture a core file for this with
full debugging symbols (preferably glibc debuginfo as well). How you do
that will depend on the ceph version and your OS but we can offfer help
if required I'm sure.

Once you have the core do the following.

$ gdb /path/to/ceph-fuse core.XXXX
(gdb) set pag off
(gdb) set log on
(gdb) thread apply all bt
(gdb) thread apply all bt full

Then quit gdb and you should find a file called gdb.txt in your
working directory.
If you could attach that file to http://tracker.ceph.com/issues/16610

Cheers,
Brad

On Fri, Jul 8, 2016 at 12:06 AM, Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:

        On Thu, Jul 7, 2016 at 2:01 AM, Goncalo Borges
<goncalo.borges@xxxxxxxxxxxxx> wrote:

          Unfortunately, the other user application breaks ceph-fuse again (It is a
completely different application then in my previous test).

We have tested it in 4 machines with 4 cores. The user is submitting 16
single core jobs which are all writing different output files (one per job)
to a common dir in cephfs. The first 4 jobs run happily and never break
ceph-fuse. But the remaining 12 jobs, running in the remaining 3 machines,
trigger a segmentation fault, which is completely different from the other
case.

ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
1: (()+0x297fe2) [0x7f54402b7fe2]
2: (()+0xf7e0) [0x7f543ecf77e0]
3: (ObjectCacher::bh_write_scattered(std::list<ObjectCacher::BufferHead*,
std::allocator<ObjectCacher::BufferHead*> >&)+0x36) [0x7f5440268086]
4: (ObjectCacher::bh_write_adjacencies(ObjectCacher::BufferHead*,
std::chrono::time_point<ceph::time_detail::real_clock,
std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> > >, long*,
int*)+0x22c) [0x7f5440268a3c]
5: (ObjectCacher::flush(long)+0x1ef) [0x7f5440268cef]
6: (ObjectCacher::flusher_entry()+0xac4) [0x7f5440269a34]
7: (ObjectCacher::FlusherThread::entry()+0xd) [0x7f5440275c6d]
8: (()+0x7aa1) [0x7f543ecefaa1]
 9: (clone()+0x6d) [0x7f543df6893d]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.

        This one looks like a very different problem. I've created an issue
here: http://tracker.ceph.com/issues/16610

Thanks for the report and debug log!

--
Patrick Donnelly
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    -- 
Goncalo Borges
Research Computing
ARC Centre of Excellence for Particle Physics at the Terascale
School of Physics A28 | University of Sydney, NSW  2006
T: +61 2 93511937

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com