CephFS client issue

David Z <david.z1003@xxxxxxxxx> · Mon, 15 Jun 2015 04:39:10 +0000 (UTC)

     On Monday, June 15, 2015 3:05 AM, "ceph-users-request@xxxxxxxxxxxxxx" <ceph-users-request@xxxxxxxxxxxxxx> wrote:

 Send ceph-users mailing list submissions to
    ceph-users@xxxxxxxxxxxxxx

To subscribe or unsubscribe via the World Wide Web, visit
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
or, via email, send a message with subject or body 'help' to
    ceph-users-request@xxxxxxxxxxxxxx

You can reach the person managing the list at
    ceph-users-owner@xxxxxxxxxxxxxx

When replying, please edit your Subject line so it is more specific
than "Re: Contents of ceph-users digest..."

Today's Topics:

   1. Re: Erasure coded pools and bit-rot protection (Pawe? Sadowski)
   2. CephFS client issue (Matteo Dacrema)
   3. Re: Erasure coded pools and bit-rot protection (Gregory Farnum)
   4. Re: CephFS client issue (Lincoln Bryant)
   5. Re: .New Ceph cluster - cannot add additional monitor
      (Mike Carlson)
   6. Re: CephFS client issue (Matteo Dacrema)

----------------------------------------------------------------------

Message: 1
Date: Sat, 13 Jun 2015 21:08:25 +0200
From: Pawe? Sadowski <ceph@xxxxxxxxx>
To: Gregory Farnum <greg@xxxxxxxxxxx>
Cc: ceph-users <ceph-users@xxxxxxxx>
Subject: Re: [ceph-users] Erasure coded pools and bit-rot protection
Message-ID: <557C7FA9.1020604@xxxxxxxxx>
Content-Type: text/plain; charset=utf-8

Thanks for taking care of this so fast. Yes, I'm getting broken object.
I haven't checked this on other versions but is this bug present
only in Hammer or in all versions?

W dniu 12.06.2015 o 21:43, Gregory Farnum pisze:
> Okay, Sam thinks he knows what's going on; here's a ticket:
> http://tracker.ceph.com/issues/12000
>
> On Fri, Jun 12, 2015 at 12:32 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>> On Fri, Jun 12, 2015 at 1:07 AM, Pawe? Sadowski <ceph@xxxxxxxxx> wrote:
>>> Hi All,
>>>
>>> I'm testing erasure coded pools. Is there any protection from bit-rot
>>> errors on object read? If I modify one bit in object part (directly on
>>> OSD) I'm getting *broken*object:
>> Sorry, are you saying that you're getting a broken object if you flip
>> a bit in an EC pool? That should detect the chunk as invalid and
>> reconstruct on read...
>> -Greg
>>
>>>     mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
>>>     bb2d82bbb95be6b9a039d135cc7a5d0d  -
>>>
>>>     # modify one bit directly on OSD
>>>
>>>     mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
>>>     02f04f590010b4b0e6af4741c4097b4f  -
>>>
>>>     # restore bit to original value
>>>
>>>     mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
>>>     bb2d82bbb95be6b9a039d135cc7a5d0d  -
>>>
>>> If I run deep-scrub on modified bit I'm getting inconsistent PG which is
>>> correct in this case. After restoring bit and running deep-scrub again
>>> all PGs are clean.
>>>
>>>
>>> [ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)]
-- 
PS

------------------------------

Message: 2
Date: Sun, 14 Jun 2015 15:26:54 +0000
From: Matteo Dacrema <mdacrema@xxxxxxxx>
To: ceph-users <ceph-users@xxxxxxxx>
Subject:  CephFS client issue
Message-ID: <d28e061762104ed68e06effd5199ef06@Exch2013Mb.enter.local>
Content-Type: text/plain; charset="us-ascii"

?Hi all,

I'm using CephFS on Hammer and sometimes I need to reboot one or more clients because , as ceph -s tells me, it's "failing to respond to capability release".After tha?t all clients stop to respond: can't access files or mount/umont cephfs.

I've 1.5 million files , 2 metadata servers in active/standby configuration with 8 GB of RAM , 20 clients with 2 GB of RAM each and 2 OSD nodes with 4 80GB osd and 4GB of RAM.

Here my configuration:

[global]
        fsid = 2de7b17f-0a3e-4109-b878-c035dd2f7735
        mon_initial_members = cephmds01
        mon_host = 10.29.81.161
        auth_cluster_required = cephx
        auth_service_required = cephx
        auth_client_required = cephx
        public network = 10.29.81.0/24
        tcp nodelay = true
        tcp rcvbuf = 0
        ms tcp read timeout = 600

        #Capacity
        mon osd full ratio = .95
        mon osd nearfull ratio = .85

[osd]
        osd journal size = 1024
        journal dio = true
        journal aio = true

        osd op threads = 2
        osd op thread timeout = 60
        osd disk threads = 2
        osd recovery threads = 1
        osd recovery max active = 1
        osd max backfills = 2

        # Pool
        osd pool default size = 2

        #XFS
        osd mkfs type = xfs
        osd mkfs options xfs = "-f -i size=2048"
        osd mount options xfs = "rw,noatime,inode64,logbsize=256k,delaylog"

        #FileStore Settings
        filestore xattr use omap = false
        filestore max inline xattr size = 512
        filestore max sync interval = 10
        filestore merge threshold = 40
        filestore split multiple = 8
        filestore flusher = false
        filestore queue max ops = 2000
        filestore queue max bytes = 536870912
        filestore queue committing max ops = 500
        filestore queue committing max bytes = 268435456
        filestore op threads = 2

[mds]
        max mds = 1
        mds cache size = 750000
        client cache size = 2048
        mds dir commit ratio = 0.5

Here ceph -s output:

root@service-new:~# ceph -s
    cluster 2de7b17f-0a3e-4109-b878-c035dd2f7735
     health HEALTH_WARN
            mds0: Client 94102 failing to respond to cache pressure
     monmap e2: 2 mons at {cephmds01=10.29.81.161:6789/0,cephmds02=10.29.81.160:6789/0}
            election epoch 34, quorum 0,1 cephmds02,cephmds01
     mdsmap e79: 1/1/1 up {0=cephmds01=up:active}, 1 up:standby
     osdmap e669: 8 osds: 8 up, 8 in
      pgmap v339741: 256 pgs, 2 pools, 132 GB data, 1417 kobjects
            288 GB used, 342 GB / 631 GB avail
                 256 active+clean
  client io 3091 kB/s rd, 342 op/s

Thank you.
Regards,
Matteo

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150614/5102e408/attachment.html>

------------------------------

Message: 3
Date: Sun, 14 Jun 2015 16:32:33 +0000
From: Gregory Farnum <greg@xxxxxxxxxxx>
To: ceph@xxxxxxxxx
Cc: ceph-users <ceph-users@xxxxxxxx>
Subject: Re:  Erasure coded pools and bit-rot protection
Message-ID:
    <CAC6JEv-XwreCeYHvsD3fHO0BnkeJWEw_Vk0pezei5JFi1uwrGA@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="utf-8"

Unfortunately this will be an issue in all versions of the code. I can't
speak with authority but I suspect Sam will want to backport the fix to
Firefly as well.
-Greg
On Sat, Jun 13, 2015 at 8:08 PM Pawe? Sadowski <ceph@xxxxxxxxx> wrote:

> Thanks for taking care of this so fast. Yes, I'm getting broken object.
> I haven't checked this on other versions but is this bug present
> only in Hammer or in all versions?
>
>
> W dniu 12.06.2015 o 21:43, Gregory Farnum pisze:
> > Okay, Sam thinks he knows what's going on; here's a ticket:
> > http://tracker.ceph.com/issues/12000
> >
> > On Fri, Jun 12, 2015 at 12:32 PM, Gregory Farnum <greg@xxxxxxxxxxx>
> wrote:
> >> On Fri, Jun 12, 2015 at 1:07 AM, Pawe? Sadowski <ceph@xxxxxxxxx> wrote:
> >>> Hi All,
> >>>
> >>> I'm testing erasure coded pools. Is there any protection from bit-rot
> >>> errors on object read? If I modify one bit in object part (directly on
> >>> OSD) I'm getting *broken*object:
> >> Sorry, are you saying that you're getting a broken object if you flip
> >> a bit in an EC pool? That should detect the chunk as invalid and
> >> reconstruct on read...
> >> -Greg
> >>
> >>>     mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
> >>>     bb2d82bbb95be6b9a039d135cc7a5d0d  -
> >>>
> >>>     # modify one bit directly on OSD
> >>>
> >>>     mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
> >>>     02f04f590010b4b0e6af4741c4097b4f  -
> >>>
> >>>     # restore bit to original value
> >>>
> >>>     mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
> >>>     bb2d82bbb95be6b9a039d135cc7a5d0d  -
> >>>
> >>> If I run deep-scrub on modified bit I'm getting inconsistent PG which
> is
> >>> correct in this case. After restoring bit and running deep-scrub again
> >>> all PGs are clean.
> >>>
> >>>
> >>> [ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)]
> --
> PS
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150614/a6e52313/attachment-0001.htm>

------------------------------

Message: 4
Date: Sun, 14 Jun 2015 12:31:32 -0500
From: Lincoln Bryant <lincolnb@xxxxxxxxxxxx>
To: Matteo Dacrema <mdacrema@xxxxxxxx>, ceph-users
    <ceph-users@xxxxxxxx>
Subject: Re:  CephFS client issue
Message-ID: <557DBA74.3020704@xxxxxxxxxxxx>
Content-Type: text/plain; charset="windows-1252"; Format="flowed"

Hi Matteo,

Are your clients using the FUSE client or the kernel client? If the 
latter, what kernel version?

--Lincoln

On 6/14/2015 10:26 AM, Matteo Dacrema wrote:
> ?Hi all,
>
>
> I'm using CephFS on Hammer and sometimes I need to reboot one or more clients because , as ceph -s tells me, it's "failing to respond to capability release".After tha?t all clients stop to respond: can't access files or mount/umont cephfs.
>
> I've 1.5 million files , 2 metadata servers in active/standby configuration with 8 GB of RAM , 20 clients with 2 GB of RAM each and 2 OSD nodes with 4 80GB osd and 4GB of RAM.
>
>
>
> Here my configuration:
>
>
> [global]
>          fsid = 2de7b17f-0a3e-4109-b878-c035dd2f7735
>          mon_initial_members = cephmds01
>          mon_host = 10.29.81.161
>          auth_cluster_required = cephx
>          auth_service_required = cephx
>          auth_client_required = cephx
>          public network = 10.29.81.0/24
>          tcp nodelay = true
>          tcp rcvbuf = 0
>          ms tcp read timeout = 600
>
>          #Capacity
>          mon osd full ratio = .95
>          mon osd nearfull ratio = .85
>
>
> [osd]
>          osd journal size = 1024
>          journal dio = true
>          journal aio = true
>
>          osd op threads = 2
>          osd op thread timeout = 60
>          osd disk threads = 2
>          osd recovery threads = 1
>          osd recovery max active = 1
>          osd max backfills = 2
>
>
>          # Pool
>          osd pool default size = 2
>
>          #XFS
>          osd mkfs type = xfs
>          osd mkfs options xfs = "-f -i size=2048"
>          osd mount options xfs = "rw,noatime,inode64,logbsize=256k,delaylog"
>
>          #FileStore Settings
>          filestore xattr use omap = false
>          filestore max inline xattr size = 512
>          filestore max sync interval = 10
>          filestore merge threshold = 40
>          filestore split multiple = 8
>          filestore flusher = false
>          filestore queue max ops = 2000
>          filestore queue max bytes = 536870912
>          filestore queue committing max ops = 500
>          filestore queue committing max bytes = 268435456
>          filestore op threads = 2
>
> [mds]
>          max mds = 1
>          mds cache size = 750000
>          client cache size = 2048
>          mds dir commit ratio = 0.5
>
>
>
> Here ceph -s output:
>
>
> root@service-new:~# ceph -s
>      cluster 2de7b17f-0a3e-4109-b878-c035dd2f7735
>       health HEALTH_WARN
>              mds0: Client 94102 failing to respond to cache pressure
>       monmap e2: 2 mons at {cephmds01=10.29.81.161:6789/0,cephmds02=10.29.81.160:6789/0}
>              election epoch 34, quorum 0,1 cephmds02,cephmds01
>       mdsmap e79: 1/1/1 up {0=cephmds01=up:active}, 1 up:standby
>       osdmap e669: 8 osds: 8 up, 8 in
>        pgmap v339741: 256 pgs, 2 pools, 132 GB data, 1417 kobjects
>              288 GB used, 342 GB / 631 GB avail
>                   256 active+clean
>    client io 3091 kB/s rd, 342 op/s
>
> Thank you.
> Regards,
> Matteo
>
>
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150614/77e13e5a/attachment-0001.htm>

------------------------------

Message: 5
Date: Sun, 14 Jun 2015 11:32:46 -0700
From: Mike Carlson <mike@xxxxxxxxxxxx>
To: Alex Muntada <alexm@xxxxxxxxx>
Cc: ceph-users@xxxxxxxx
Subject: Re:  .New Ceph cluster - cannot add additional
    monitor
Message-ID:
    <CA+KW7xQRCcfz+enHgWODSO86j3ni4WFA7XvF0uG=gTwAgb0AAA@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="utf-8"

Thank you for the reply Alex, I'm going to check into that and see if it
helps resolve the issue.

Mike C

On Fri, Jun 12, 2015 at 11:57 PM, Alex Muntada <alexm@xxxxxxxxx> wrote:

> We've recently found similar problems creating a new cluster over an older
> one, even after using "ceph-deploy purge", because some of the data
> remained on /var/lib/ceph/*/* (ubuntu trusty) and the nodes were trying to
> use old keyrings.
>
> Hope it helps,
> Alex
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150614/866f83f9/attachment-0001.htm>

------------------------------

Message: 6
Date: Sun, 14 Jun 2015 19:00:47 +0000
From: Matteo Dacrema <mdacrema@xxxxxxxx>
To: Lincoln Bryant <lincolnb@xxxxxxxxxxxx>, ceph-users
    <ceph-users@xxxxxxxx>
Subject: Re:  CephFS client issue
Message-ID: <deb408dee1b74d8d88221edbd72d1cd1@Exch2013Mb.enter.local>
Content-Type: text/plain; charset="us-ascii"

Hi Lincoln,

I'm using the kernel client.

Kernel version is: 3.13.0-53-generic?

Thanks,

Matteo

________________________________
Da: Lincoln Bryant <lincolnb@xxxxxxxxxxxx>
Inviato: domenica 14 giugno 2015 19:31
A: Matteo Dacrema; ceph-users
Oggetto: Re:  CephFS client issue