Seeing huge number of open pipes per OSD process

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I am testing a Ceph cluster running Ceph v9.0.3 on Trusty using the
4.3rc4 kernel and I am seeing a huge number of open pipes on my OSD
processes as I run a sequential load on the system using a single Ceph
file system client.  A "lsof -n > file.txt" on one of the OSD servers
produced a 9GB file with 101 million lines. I have 6 OSD servers, each
with around 28 spinning drives (no SSDs) of assorted sizes from 300GB
to 4TB.

I wrote a script that used the output of "ls -l /proc/ODS-PID/fd" to
count all the open files, sockets and pipes on each OSD process and I
am seeing over 150K open pipes per server on the OSD processes. This
is the output from one of the 6 OSD servers:

HOST: dfss03, OSD: 97, PID: 1701, total fd: 2464, Pipes: 2020,
Sockets: 327, OSD Files: 111, Other Files: 6
HOST: dfss03, OSD: 47, PID: 1952, total fd: 17321, Pipes: 16580,
Sockets: 599, OSD Files: 136, Other Files: 6
HOST: dfss03, OSD: 27, PID: 2111, total fd: 2531, Pipes: 2122,
Sockets: 283, OSD Files: 120, Other Files: 6
HOST: dfss03, OSD: 51, PID: 2483, total fd: 13623, Pipes: 12930,
Sockets: 551, OSD Files: 136, Other Files: 6
HOST: dfss03, OSD: 86, PID: 2649, total fd: 2324, Pipes: 1926,
Sockets: 286, OSD Files: 106, Other Files: 6
HOST: dfss03, OSD: 92, PID: 2906, total fd: 2037, Pipes: 1650,
Sockets: 271, OSD Files: 110, Other Files: 6
HOST: dfss03, OSD: 32, PID: 3644, total fd: 1919, Pipes: 1554,
Sockets: 265, OSD Files: 94, Other Files: 6
HOST: dfss03, OSD: 56, PID: 3877, total fd: 2235, Pipes: 1840,
Sockets: 270, OSD Files: 119, Other Files: 6
HOST: dfss03, OSD: 81, PID: 4058, total fd: 1266, Pipes: 942, Sockets:
238, OSD Files: 80, Other Files: 6
HOST: dfss03, OSD: 2, PID: 6539, total fd: 15317, Pipes: 14590,
Sockets: 585, OSD Files: 136, Other Files: 6
HOST: dfss03, OSD: 37, PID: 7285, total fd: 2490, Pipes: 2058,
Sockets: 303, OSD Files: 123, Other Files: 6
HOST: dfss03, OSD: 121, PID: 10396, total fd: 15670, Pipes: 14956,
Sockets: 572, OSD Files: 136, Other Files: 6
HOST: dfss03, OSD: 76, PID: 13786, total fd: 17741, Pipes: 17000,
Sockets: 599, OSD Files: 136, Other Files: 6
HOST: dfss03, OSD: 73, PID: 14061, total fd: 2127, Pipes: 1744,
Sockets: 271, OSD Files: 106, Other Files: 6
HOST: dfss03, OSD: 136, PID: 18644, total fd: 16408, Pipes: 15702,
Sockets: 564, OSD Files: 136, Other Files: 6
HOST: dfss03, OSD: 23, PID: 21892, total fd: 11883, Pipes: 11198,
Sockets: 543, OSD Files: 136, Other Files: 6
HOST: dfss03, OSD: 8, PID: 24732, total fd: 11611, Pipes: 10952,
Sockets: 517, OSD Files: 136, Other Files: 6
HOST: dfss03, OSD: 140, PID: 25935, total fd: 2329, Pipes: 1926,
Sockets: 288, OSD Files: 109, Other Files: 6
HOST: dfss03, OSD: 18, PID: 27969, total fd: 2589, Pipes: 2190,
Sockets: 281, OSD Files: 112, Other Files: 6
HOST: dfss03, OSD: 131, PID: 28158, total fd: 1513, Pipes: 1176,
Sockets: 229, OSD Files: 102, Other Files: 6
HOST: dfss03, OSD: 12, PID: 28702, total fd: 2464, Pipes: 2050,
Sockets: 292, OSD Files: 116, Other Files: 6
HOST: dfss03, OSD: 116, PID: 40256, total fd: 16070, Pipes: 15348,
Sockets: 581, OSD Files: 136, Other Files: 5
HOST: dfss03, OSD: 106, PID: 41265, total fd: 2206, Pipes: 1816,
Sockets: 273, OSD Files: 111, Other Files: 6
HOST: dfss03, OSD: 112, PID: 43745, total fd: 2016, Pipes: 1624,
Sockets: 272, OSD Files: 114, Other Files: 6
HOST: dfss03, OSD: 42, PID: 44213, total fd: 1215, Pipes: 908,
Sockets: 214, OSD Files: 87, Other Files: 6
HOST: dfss03, OSD: 101, PID: 61234, total fd: 15438, Pipes: 14722,
Sockets: 574, OSD Files: 136, Other Files: 6
HOST: dfss03, OSD: 68, PID: 63906, total fd: 2404, Pipes: 2032,
Sockets: 254, OSD Files: 112, Other Files: 6
HOST: dfss03, OSD: 62, PID: 65931, total fd: 1920, Pipes: 1564,
Sockets: 248, OSD Files: 102, Other Files: 6
HOST: dfss03, OSD: 125, PID: 73596, total fd: 13319, Pipes: 12630,
Sockets: 547, OSD Files: 136, Other Files: 6
HOST: dfss03, Grand Total fd: 202450, Pipes: 187750, Sockets: 11097,
OSD Files: 3430, Other Files: 173

The Grand Total lines for all 6 OSD Servers are:
HOST: dfss01, Grand Total fd: 209425, Pipes: 194002, Sockets: 11653,
OSD Files: 3590, Other Files: 180
HOST: dfss02, Grand Total fd: 171960, Pipes: 159564, Sockets: 9390,
OSD Files: 2862, Other Files: 144
HOST: dfss03, Grand Total fd: 202450, Pipes: 187750, Sockets: 11097,
OSD Files: 3430, Other Files: 173
HOST: dfss04, Grand Total fd: 191108, Pipes: 176800, Sockets: 10724,
OSD Files: 3410, Other Files: 174
HOST: dfss05, Grand Total fd: 182578, Pipes: 168880, Sockets: 10275,
OSD Files: 3261, Other Files: 162
HOST: dfss06, Grand Total fd: 193653, Pipes: 179438, Sockets: 10649,
OSD Files: 3392, Other Files: 174

The numbers of open files keeps increasing over time. The cluster is
healthy, and is moving data, but I don't think this is good. I am not
sure what to look at. Is this expected?


More info on the systems and setup:

The pipe entries in the fd directory look like:
lr-x------ 1 root root 64 Oct  5 19:55 979 -> pipe:[364792]

There is a single Ceph file system client, using the kernel interface,
that is running the following script to generate the load:

PID=$$
while [ true ]
do
  dd if=/dev/zero of=test.dat.$PID  count=100000 bs=1M
  sleep 10
  dd of=/dev/null if=test.dat.$PID  bs=1M
  rm test.dat.$PID
done

The OSD are using BTRFS with the following options:
  filestore_btrfs_snap = false
  filestore_btrfs_clone_range = false
  filestore_journal_parallel = false
  osd_mount_options_btrfs = rw,noatime,autodefrag,user_subvol_rm_allowed
  osd_mkfs_options_btrfs = -f -m single -n 32768

Here is the script I wrote to count the open files:
#!/bin/bash
HOST=`hostname`
for I in `pgrep ceph-osd`
do
   OSD=`ps auxwww | grep "^root *$I " | perl -nae 'chomp;print $F[13]'`
   TF=`ls -l /proc/$I/fd 2> /dev/null | wc -l`
   PIPES=`ls -l /proc/$I/fd 2> /dev/null | grep -cF 'pipe:[' `
   SOCKETS=`ls -l /proc/$I/fd 2> /dev/null | grep -cF 'socket:[' `
   CEPHF=`ls -l /proc/$I/fd 2> /dev/null |  grep -cF '> /var/lib/ceph/osd/' `
   NCEPHF=$((TF-PIPES-SOCKETS-CEPHF))
   TSOCKETS=$((TSOCKETS+SOCKETS))
   TPIPES=$(($TPIPES+PIPES))
   TNCEPHF=$((TNCEPHF+NCEPHF))
   TCEPHF=$((TCEPHF+CEPHF))
   TTF=$((TTF+TF))
   echo "HOST: $HOST, OSD: $OSD, PID: $I, total fd: $TF, Pipes:
$PIPES, Sockets: $SOCKETS, OSD Files: $CEPHF, Other Files: $NCEPHF"
done
echo "HOST: $HOST, Grand Total fd: $TTF, Pipes: $TPIPES, Sockets:
$TSOCKETS, OSD Files: $TCEPHF, Other Files: $TNCEPHF"


I have Ceph file system snapshots turned on and in use on this cluster.

$ uname -a
Linux dfss06 4.3.0-040300rc4-generic #201510041330 SMP Sun Oct 4
17:32:10 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

$ ceph -v
ceph version 9.0.3 (7295612d29f953f46e6e88812ef372b89a43b9da)

$ ceph -s
 cluster c261c2dc-5e29-11e5-98ba-68b599c50db0
  health HEALTH_OK
  monmap e1: 3 mons at
{dfmon01=10.16.51.21:6789/0,dfmon02=10.16.51.22:6789/0,dfmon03=10.16.51.23:6789/0}
   election epoch 18, quorum 0,1,2 dfmon01,dfmon02,dfmon03
    mdsmap e3269: 1/1/1 up {0=dfmds02=up:active}, 1 up:standby
    osdmap e36471: 176 osds: 168 up, 168 in
    pgmap v702480: 18496 pgs, 4 pools, 49562 GB data, 12601 kobjects
         145 TB used, 100 TB / 246 TB avail
          18496 active+clean
  client io 24082 kB/s rd, 11 op/s

Thanks!
Eric
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux