On Monday, June 15, 2015 3:05 AM, "ceph-users-request@xxxxxxxxxxxxxx" <ceph-users-request@xxxxxxxxxxxxxx> wrote:
Send ceph-users mailing list submissions to
ceph-users@xxxxxxxxxxxxxx
To subscribe or unsubscribe via the World Wide Web, visit
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
or, via email, send a message with subject or body 'help' to
ceph-users-request@xxxxxxxxxxxxxx
You can reach the person managing the list at
ceph-users-owner@xxxxxxxxxxxxxx
When replying, please edit your Subject line so it is more specific
than "Re: Contents of ceph-users digest..."
Today's Topics:
1. Re: Erasure coded pools and bit-rot protection (Pawe? Sadowski)
2. CephFS client issue (Matteo Dacrema)
3. Re: Erasure coded pools and bit-rot protection (Gregory Farnum)
4. Re: CephFS client issue (Lincoln Bryant)
5. Re: .New Ceph cluster - cannot add additional monitor
(Mike Carlson)
6. Re: CephFS client issue (Matteo Dacrema)
----------------------------------------------------------------------
Message: 1
Date: Sat, 13 Jun 2015 21:08:25 +0200
From: Pawe? Sadowski <ceph@xxxxxxxxx>
To: Gregory Farnum <greg@xxxxxxxxxxx>
Cc: ceph-users <ceph-users@xxxxxxxx>
Subject: Re: [ceph-users] Erasure coded pools and bit-rot protection
Message-ID: <557C7FA9.1020604@xxxxxxxxx>
Content-Type: text/plain; charset=utf-8
Thanks for taking care of this so fast. Yes, I'm getting broken object.
I haven't checked this on other versions but is this bug present
only in Hammer or in all versions?
W dniu 12.06.2015 o 21:43, Gregory Farnum pisze:
> Okay, Sam thinks he knows what's going on; here's a ticket:
> http://tracker.ceph.com/issues/12000
>
> On Fri, Jun 12, 2015 at 12:32 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>> On Fri, Jun 12, 2015 at 1:07 AM, Pawe? Sadowski <ceph@xxxxxxxxx> wrote:
>>> Hi All,
>>>
>>> I'm testing erasure coded pools. Is there any protection from bit-rot
>>> errors on object read? If I modify one bit in object part (directly on
>>> OSD) I'm getting *broken*object:
>> Sorry, are you saying that you're getting a broken object if you flip
>> a bit in an EC pool? That should detect the chunk as invalid and
>> reconstruct on read...
>> -Greg
>>
>>> mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
>>> bb2d82bbb95be6b9a039d135cc7a5d0d -
>>>
>>> # modify one bit directly on OSD
>>>
>>> mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
>>> 02f04f590010b4b0e6af4741c4097b4f -
>>>
>>> # restore bit to original value
>>>
>>> mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
>>> bb2d82bbb95be6b9a039d135cc7a5d0d -
>>>
>>> If I run deep-scrub on modified bit I'm getting inconsistent PG which is
>>> correct in this case. After restoring bit and running deep-scrub again
>>> all PGs are clean.
>>>
>>>
>>> [ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)]
--
PS
------------------------------
Message: 2
Date: Sun, 14 Jun 2015 15:26:54 +0000
From: Matteo Dacrema <mdacrema@xxxxxxxx>
To: ceph-users <ceph-users@xxxxxxxx>
Subject: CephFS client issue
Message-ID: <d28e061762104ed68e06effd5199ef06@Exch2013Mb.enter.local>
Content-Type: text/plain; charset="us-ascii"
?Hi all,
I'm using CephFS on Hammer and sometimes I need to reboot one or more clients because , as ceph -s tells me, it's "failing to respond to capability release".After tha?t all clients stop to respond: can't access files or mount/umont cephfs.
I've 1.5 million files , 2 metadata servers in active/standby configuration with 8 GB of RAM , 20 clients with 2 GB of RAM each and 2 OSD nodes with 4 80GB osd and 4GB of RAM.
Here my configuration:
[global]
fsid = 2de7b17f-0a3e-4109-b878-c035dd2f7735
mon_initial_members = cephmds01
mon_host = 10.29.81.161
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public network = 10.29.81.0/24
tcp nodelay = true
tcp rcvbuf = 0
ms tcp read timeout = 600
#Capacity
mon osd full ratio = .95
mon osd nearfull ratio = .85
[osd]
osd journal size = 1024
journal dio = true
journal aio = true
osd op threads = 2
osd op thread timeout = 60
osd disk threads = 2
osd recovery threads = 1
osd recovery max active = 1
osd max backfills = 2
# Pool
osd pool default size = 2
#XFS
osd mkfs type = xfs
osd mkfs options xfs = "-f -i size=2048"
osd mount options xfs = "rw,noatime,inode64,logbsize=256k,delaylog"
#FileStore Settings
filestore xattr use omap = false
filestore max inline xattr size = 512
filestore max sync interval = 10
filestore merge threshold = 40
filestore split multiple = 8
filestore flusher = false
filestore queue max ops = 2000
filestore queue max bytes = 536870912
filestore queue committing max ops = 500
filestore queue committing max bytes = 268435456
filestore op threads = 2
[mds]
max mds = 1
mds cache size = 750000
client cache size = 2048
mds dir commit ratio = 0.5
Here ceph -s output:
root@service-new:~# ceph -s
cluster 2de7b17f-0a3e-4109-b878-c035dd2f7735
health HEALTH_WARN
mds0: Client 94102 failing to respond to cache pressure
monmap e2: 2 mons at {cephmds01=10.29.81.161:6789/0,cephmds02=10.29.81.160:6789/0}
election epoch 34, quorum 0,1 cephmds02,cephmds01
mdsmap e79: 1/1/1 up {0=cephmds01=up:active}, 1 up:standby
osdmap e669: 8 osds: 8 up, 8 in
pgmap v339741: 256 pgs, 2 pools, 132 GB data, 1417 kobjects
288 GB used, 342 GB / 631 GB avail
256 active+clean
client io 3091 kB/s rd, 342 op/s
Thank you.
Regards,
Matteo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150614/5102e408/attachment.html>
------------------------------
Message: 3
Date: Sun, 14 Jun 2015 16:32:33 +0000
From: Gregory Farnum <greg@xxxxxxxxxxx>
To: ceph@xxxxxxxxx
Cc: ceph-users <ceph-users@xxxxxxxx>
Subject: Re: Erasure coded pools and bit-rot protection
Message-ID:
<CAC6JEv-XwreCeYHvsD3fHO0BnkeJWEw_Vk0pezei5JFi1uwrGA@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="utf-8"
Unfortunately this will be an issue in all versions of the code. I can't
speak with authority but I suspect Sam will want to backport the fix to
Firefly as well.
-Greg
On Sat, Jun 13, 2015 at 8:08 PM Pawe? Sadowski <ceph@xxxxxxxxx> wrote:
> Thanks for taking care of this so fast. Yes, I'm getting broken object.
> I haven't checked this on other versions but is this bug present
> only in Hammer or in all versions?
>
>
> W dniu 12.06.2015 o 21:43, Gregory Farnum pisze:
> > Okay, Sam thinks he knows what's going on; here's a ticket:
> > http://tracker.ceph.com/issues/12000
> >
> > On Fri, Jun 12, 2015 at 12:32 PM, Gregory Farnum <greg@xxxxxxxxxxx>
> wrote:
> >> On Fri, Jun 12, 2015 at 1:07 AM, Pawe? Sadowski <ceph@xxxxxxxxx> wrote:
> >>> Hi All,
> >>>
> >>> I'm testing erasure coded pools. Is there any protection from bit-rot
> >>> errors on object read? If I modify one bit in object part (directly on
> >>> OSD) I'm getting *broken*object:
> >> Sorry, are you saying that you're getting a broken object if you flip
> >> a bit in an EC pool? That should detect the chunk as invalid and
> >> reconstruct on read...
> >> -Greg
> >>
> >>> mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
> >>> bb2d82bbb95be6b9a039d135cc7a5d0d -
> >>>
> >>> # modify one bit directly on OSD
> >>>
> >>> mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
> >>> 02f04f590010b4b0e6af4741c4097b4f -
> >>>
> >>> # restore bit to original value
> >>>
> >>> mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
> >>> bb2d82bbb95be6b9a039d135cc7a5d0d -
> >>>
> >>> If I run deep-scrub on modified bit I'm getting inconsistent PG which
> is
> >>> correct in this case. After restoring bit and running deep-scrub again
> >>> all PGs are clean.
> >>>
> >>>
> >>> [ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)]
> --
> PS
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150614/a6e52313/attachment-0001.htm>
------------------------------
Message: 4
Date: Sun, 14 Jun 2015 12:31:32 -0500
From: Lincoln Bryant <lincolnb@xxxxxxxxxxxx>
To: Matteo Dacrema <mdacrema@xxxxxxxx>, ceph-users
<ceph-users@xxxxxxxx>
Subject: Re: CephFS client issue
Message-ID: <557DBA74.3020704@xxxxxxxxxxxx>
Content-Type: text/plain; charset="windows-1252"; Format="flowed"
Hi Matteo,
Are your clients using the FUSE client or the kernel client? If the
latter, what kernel version?
--Lincoln
On 6/14/2015 10:26 AM, Matteo Dacrema wrote:
> ?Hi all,
>
>
> I'm using CephFS on Hammer and sometimes I need to reboot one or more clients because , as ceph -s tells me, it's "failing to respond to capability release".After tha?t all clients stop to respond: can't access files or mount/umont cephfs.
>
> I've 1.5 million files , 2 metadata servers in active/standby configuration with 8 GB of RAM , 20 clients with 2 GB of RAM each and 2 OSD nodes with 4 80GB osd and 4GB of RAM.
>
>
>
> Here my configuration:
>
>
> [global]
> fsid = 2de7b17f-0a3e-4109-b878-c035dd2f7735
> mon_initial_members = cephmds01
> mon_host = 10.29.81.161
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> public network = 10.29.81.0/24
> tcp nodelay = true
> tcp rcvbuf = 0
> ms tcp read timeout = 600
>
> #Capacity
> mon osd full ratio = .95
> mon osd nearfull ratio = .85
>
>
> [osd]
> osd journal size = 1024
> journal dio = true
> journal aio = true
>
> osd op threads = 2
> osd op thread timeout = 60
> osd disk threads = 2
> osd recovery threads = 1
> osd recovery max active = 1
> osd max backfills = 2
>
>
> # Pool
> osd pool default size = 2
>
> #XFS
> osd mkfs type = xfs
> osd mkfs options xfs = "-f -i size=2048"
> osd mount options xfs = "rw,noatime,inode64,logbsize=256k,delaylog"
>
> #FileStore Settings
> filestore xattr use omap = false
> filestore max inline xattr size = 512
> filestore max sync interval = 10
> filestore merge threshold = 40
> filestore split multiple = 8
> filestore flusher = false
> filestore queue max ops = 2000
> filestore queue max bytes = 536870912
> filestore queue committing max ops = 500
> filestore queue committing max bytes = 268435456
> filestore op threads = 2
>
> [mds]
> max mds = 1
> mds cache size = 750000
> client cache size = 2048
> mds dir commit ratio = 0.5
>
>
>
> Here ceph -s output:
>
>
> root@service-new:~# ceph -s
> cluster 2de7b17f-0a3e-4109-b878-c035dd2f7735
> health HEALTH_WARN
> mds0: Client 94102 failing to respond to cache pressure
> monmap e2: 2 mons at {cephmds01=10.29.81.161:6789/0,cephmds02=10.29.81.160:6789/0}
> election epoch 34, quorum 0,1 cephmds02,cephmds01
> mdsmap e79: 1/1/1 up {0=cephmds01=up:active}, 1 up:standby
> osdmap e669: 8 osds: 8 up, 8 in
> pgmap v339741: 256 pgs, 2 pools, 132 GB data, 1417 kobjects
> 288 GB used, 342 GB / 631 GB avail
> 256 active+clean
> client io 3091 kB/s rd, 342 op/s
>
> Thank you.
> Regards,
> Matteo
>
>
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150614/77e13e5a/attachment-0001.htm>
------------------------------
Message: 5
Date: Sun, 14 Jun 2015 11:32:46 -0700
From: Mike Carlson <mike@xxxxxxxxxxxx>
To: Alex Muntada <alexm@xxxxxxxxx>
Cc: ceph-users@xxxxxxxx
Subject: Re: .New Ceph cluster - cannot add additional
monitor
Message-ID:
<CA+KW7xQRCcfz+enHgWODSO86j3ni4WFA7XvF0uG=gTwAgb0AAA@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="utf-8"
Thank you for the reply Alex, I'm going to check into that and see if it
helps resolve the issue.
Mike C
On Fri, Jun 12, 2015 at 11:57 PM, Alex Muntada <alexm@xxxxxxxxx> wrote:
> We've recently found similar problems creating a new cluster over an older
> one, even after using "ceph-deploy purge", because some of the data
> remained on /var/lib/ceph/*/* (ubuntu trusty) and the nodes were trying to
> use old keyrings.
>
> Hope it helps,
> Alex
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150614/866f83f9/attachment-0001.htm>
------------------------------
Message: 6
Date: Sun, 14 Jun 2015 19:00:47 +0000
From: Matteo Dacrema <mdacrema@xxxxxxxx>
To: Lincoln Bryant <lincolnb@xxxxxxxxxxxx>, ceph-users
<ceph-users@xxxxxxxx>
Subject: Re: CephFS client issue
Message-ID: <deb408dee1b74d8d88221edbd72d1cd1@Exch2013Mb.enter.local>
Content-Type: text/plain; charset="us-ascii"
Hi Lincoln,
I'm using the kernel client.
Kernel version is: 3.13.0-53-generic?
Thanks,
Matteo
________________________________
Da: Lincoln Bryant <lincolnb@xxxxxxxxxxxx>
Inviato: domenica 14 giugno 2015 19:31
A: Matteo Dacrema; ceph-users
Oggetto: Re: CephFS client issue
Hi Matteo,
Are your clients using the FUSE client or the kernel client? If the latter, what kernel version?
--Lincoln
On 6/14/2015 10:26 AM, Matteo Dacrema wrote:
?Hi all,
I'm using CephFS on Hammer and sometimes I need to reboot one or more clients because , as ceph -s tells me, it's "failing to respond to capability release".After tha?t all clients stop to respond: can't access files or mount/umont cephfs.
I've 1.5 million files , 2 metadata servers in active/standby configuration with 8 GB of RAM , 20 clients with 2 GB of RAM each and 2 OSD nodes with 4 80GB osd and 4GB of RAM.
Here my configuration:
[global]
fsid = 2de7b17f-0a3e-4109-b878-c035dd2f7735
mon_initial_members = cephmds01
mon_host = 10.29.81.161
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public network = 10.29.81.0/24
tcp nodelay = true
tcp rcvbuf = 0
ms tcp read timeout = 600
#Capacity
mon osd full ratio = .95
mon osd nearfull ratio = .85
[osd]
osd journal size = 1024
journal dio = true
journal aio = true
osd op threads = 2
osd op thread timeout = 60
osd disk threads = 2
osd recovery threads = 1
osd recovery max active = 1
osd max backfills = 2
# Pool
osd pool default size = 2
#XFS
osd mkfs type = xfs
osd mkfs options xfs = "-f -i size=2048"
osd mount options xfs = "rw,noatime,inode64,logbsize=256k,delaylog"
#FileStore Settings
filestore xattr use omap = false
filestore max inline xattr size = 512
filestore max sync interval = 10
filestore merge threshold = 40
filestore split multiple = 8
filestore flusher = false
filestore queue max ops = 2000
filestore queue max bytes = 536870912
filestore queue committing max ops = 500
filestore queue committing max bytes = 268435456
filestore op threads = 2
[mds]
max mds = 1
mds cache size = 750000
client cache size = 2048
mds dir commit ratio = 0.5
Here ceph -s output:
root@service-new:~# ceph -s
cluster 2de7b17f-0a3e-4109-b878-c035dd2f7735
health HEALTH_WARN
mds0: Client 94102 failing to respond to cache pressure
monmap e2: 2 mons at {cephmds01=10.29.81.161:6789/0,cephmds02=10.29.81.160:6789/0}
election epoch 34, quorum 0,1 cephmds02,cephmds01
mdsmap e79: 1/1/1 up {0=cephmds01=up:active}, 1 up:standby
osdmap e669: 8 osds: 8 up, 8 in
pgmap v339741: 256 pgs, 2 pools, 132 GB data, 1417 kobjects
288 GB used, 342 GB / 631 GB avail
256 active+clean
client io 3091 kB/s rd, 342 op/s
Thank you.
Regards,
Matteo
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Questo messaggio e' stato analizzato con Libra ESVA ed e' risultato non infetto.
Clicca qui per segnalarlo come spam.<http://esva01.enter.it/cgi-bin/learn-msg.cgi?id=4F9DA41B1A.A3369>
Clicca qui per metterlo in blacklist<http://esva01.enter.it/cgi-bin/learn-msg.cgi?blacklist=1&id=4F9DA41B1A.A3369>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150614/62614a4e/attachment-0001.htm>
------------------------------
Subject: Digest Footer
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
------------------------------
End of ceph-users Digest, Vol 29, Issue 15
******************************************
ceph-users@xxxxxxxxxxxxxx
To subscribe or unsubscribe via the World Wide Web, visit
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
or, via email, send a message with subject or body 'help' to
ceph-users-request@xxxxxxxxxxxxxx
You can reach the person managing the list at
ceph-users-owner@xxxxxxxxxxxxxx
When replying, please edit your Subject line so it is more specific
than "Re: Contents of ceph-users digest..."
Today's Topics:
1. Re: Erasure coded pools and bit-rot protection (Pawe? Sadowski)
2. CephFS client issue (Matteo Dacrema)
3. Re: Erasure coded pools and bit-rot protection (Gregory Farnum)
4. Re: CephFS client issue (Lincoln Bryant)
5. Re: .New Ceph cluster - cannot add additional monitor
(Mike Carlson)
6. Re: CephFS client issue (Matteo Dacrema)
----------------------------------------------------------------------
Message: 1
Date: Sat, 13 Jun 2015 21:08:25 +0200
From: Pawe? Sadowski <ceph@xxxxxxxxx>
To: Gregory Farnum <greg@xxxxxxxxxxx>
Cc: ceph-users <ceph-users@xxxxxxxx>
Subject: Re: [ceph-users] Erasure coded pools and bit-rot protection
Message-ID: <557C7FA9.1020604@xxxxxxxxx>
Content-Type: text/plain; charset=utf-8
Thanks for taking care of this so fast. Yes, I'm getting broken object.
I haven't checked this on other versions but is this bug present
only in Hammer or in all versions?
W dniu 12.06.2015 o 21:43, Gregory Farnum pisze:
> Okay, Sam thinks he knows what's going on; here's a ticket:
> http://tracker.ceph.com/issues/12000
>
> On Fri, Jun 12, 2015 at 12:32 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>> On Fri, Jun 12, 2015 at 1:07 AM, Pawe? Sadowski <ceph@xxxxxxxxx> wrote:
>>> Hi All,
>>>
>>> I'm testing erasure coded pools. Is there any protection from bit-rot
>>> errors on object read? If I modify one bit in object part (directly on
>>> OSD) I'm getting *broken*object:
>> Sorry, are you saying that you're getting a broken object if you flip
>> a bit in an EC pool? That should detect the chunk as invalid and
>> reconstruct on read...
>> -Greg
>>
>>> mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
>>> bb2d82bbb95be6b9a039d135cc7a5d0d -
>>>
>>> # modify one bit directly on OSD
>>>
>>> mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
>>> 02f04f590010b4b0e6af4741c4097b4f -
>>>
>>> # restore bit to original value
>>>
>>> mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
>>> bb2d82bbb95be6b9a039d135cc7a5d0d -
>>>
>>> If I run deep-scrub on modified bit I'm getting inconsistent PG which is
>>> correct in this case. After restoring bit and running deep-scrub again
>>> all PGs are clean.
>>>
>>>
>>> [ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)]
--
PS
------------------------------
Message: 2
Date: Sun, 14 Jun 2015 15:26:54 +0000
From: Matteo Dacrema <mdacrema@xxxxxxxx>
To: ceph-users <ceph-users@xxxxxxxx>
Subject: CephFS client issue
Message-ID: <d28e061762104ed68e06effd5199ef06@Exch2013Mb.enter.local>
Content-Type: text/plain; charset="us-ascii"
?Hi all,
I'm using CephFS on Hammer and sometimes I need to reboot one or more clients because , as ceph -s tells me, it's "failing to respond to capability release".After tha?t all clients stop to respond: can't access files or mount/umont cephfs.
I've 1.5 million files , 2 metadata servers in active/standby configuration with 8 GB of RAM , 20 clients with 2 GB of RAM each and 2 OSD nodes with 4 80GB osd and 4GB of RAM.
Here my configuration:
[global]
fsid = 2de7b17f-0a3e-4109-b878-c035dd2f7735
mon_initial_members = cephmds01
mon_host = 10.29.81.161
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public network = 10.29.81.0/24
tcp nodelay = true
tcp rcvbuf = 0
ms tcp read timeout = 600
#Capacity
mon osd full ratio = .95
mon osd nearfull ratio = .85
[osd]
osd journal size = 1024
journal dio = true
journal aio = true
osd op threads = 2
osd op thread timeout = 60
osd disk threads = 2
osd recovery threads = 1
osd recovery max active = 1
osd max backfills = 2
# Pool
osd pool default size = 2
#XFS
osd mkfs type = xfs
osd mkfs options xfs = "-f -i size=2048"
osd mount options xfs = "rw,noatime,inode64,logbsize=256k,delaylog"
#FileStore Settings
filestore xattr use omap = false
filestore max inline xattr size = 512
filestore max sync interval = 10
filestore merge threshold = 40
filestore split multiple = 8
filestore flusher = false
filestore queue max ops = 2000
filestore queue max bytes = 536870912
filestore queue committing max ops = 500
filestore queue committing max bytes = 268435456
filestore op threads = 2
[mds]
max mds = 1
mds cache size = 750000
client cache size = 2048
mds dir commit ratio = 0.5
Here ceph -s output:
root@service-new:~# ceph -s
cluster 2de7b17f-0a3e-4109-b878-c035dd2f7735
health HEALTH_WARN
mds0: Client 94102 failing to respond to cache pressure
monmap e2: 2 mons at {cephmds01=10.29.81.161:6789/0,cephmds02=10.29.81.160:6789/0}
election epoch 34, quorum 0,1 cephmds02,cephmds01
mdsmap e79: 1/1/1 up {0=cephmds01=up:active}, 1 up:standby
osdmap e669: 8 osds: 8 up, 8 in
pgmap v339741: 256 pgs, 2 pools, 132 GB data, 1417 kobjects
288 GB used, 342 GB / 631 GB avail
256 active+clean
client io 3091 kB/s rd, 342 op/s
Thank you.
Regards,
Matteo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150614/5102e408/attachment.html>
------------------------------
Message: 3
Date: Sun, 14 Jun 2015 16:32:33 +0000
From: Gregory Farnum <greg@xxxxxxxxxxx>
To: ceph@xxxxxxxxx
Cc: ceph-users <ceph-users@xxxxxxxx>
Subject: Re: Erasure coded pools and bit-rot protection
Message-ID:
<CAC6JEv-XwreCeYHvsD3fHO0BnkeJWEw_Vk0pezei5JFi1uwrGA@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="utf-8"
Unfortunately this will be an issue in all versions of the code. I can't
speak with authority but I suspect Sam will want to backport the fix to
Firefly as well.
-Greg
On Sat, Jun 13, 2015 at 8:08 PM Pawe? Sadowski <ceph@xxxxxxxxx> wrote:
> Thanks for taking care of this so fast. Yes, I'm getting broken object.
> I haven't checked this on other versions but is this bug present
> only in Hammer or in all versions?
>
>
> W dniu 12.06.2015 o 21:43, Gregory Farnum pisze:
> > Okay, Sam thinks he knows what's going on; here's a ticket:
> > http://tracker.ceph.com/issues/12000
> >
> > On Fri, Jun 12, 2015 at 12:32 PM, Gregory Farnum <greg@xxxxxxxxxxx>
> wrote:
> >> On Fri, Jun 12, 2015 at 1:07 AM, Pawe? Sadowski <ceph@xxxxxxxxx> wrote:
> >>> Hi All,
> >>>
> >>> I'm testing erasure coded pools. Is there any protection from bit-rot
> >>> errors on object read? If I modify one bit in object part (directly on
> >>> OSD) I'm getting *broken*object:
> >> Sorry, are you saying that you're getting a broken object if you flip
> >> a bit in an EC pool? That should detect the chunk as invalid and
> >> reconstruct on read...
> >> -Greg
> >>
> >>> mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
> >>> bb2d82bbb95be6b9a039d135cc7a5d0d -
> >>>
> >>> # modify one bit directly on OSD
> >>>
> >>> mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
> >>> 02f04f590010b4b0e6af4741c4097b4f -
> >>>
> >>> # restore bit to original value
> >>>
> >>> mon-01:~ # rados --pool ecpool get `hostname -f`_16 - | md5sum
> >>> bb2d82bbb95be6b9a039d135cc7a5d0d -
> >>>
> >>> If I run deep-scrub on modified bit I'm getting inconsistent PG which
> is
> >>> correct in this case. After restoring bit and running deep-scrub again
> >>> all PGs are clean.
> >>>
> >>>
> >>> [ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)]
> --
> PS
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150614/a6e52313/attachment-0001.htm>
------------------------------
Message: 4
Date: Sun, 14 Jun 2015 12:31:32 -0500
From: Lincoln Bryant <lincolnb@xxxxxxxxxxxx>
To: Matteo Dacrema <mdacrema@xxxxxxxx>, ceph-users
<ceph-users@xxxxxxxx>
Subject: Re: CephFS client issue
Message-ID: <557DBA74.3020704@xxxxxxxxxxxx>
Content-Type: text/plain; charset="windows-1252"; Format="flowed"
Hi Matteo,
Are your clients using the FUSE client or the kernel client? If the
latter, what kernel version?
--Lincoln
On 6/14/2015 10:26 AM, Matteo Dacrema wrote:
> ?Hi all,
>
>
> I'm using CephFS on Hammer and sometimes I need to reboot one or more clients because , as ceph -s tells me, it's "failing to respond to capability release".After tha?t all clients stop to respond: can't access files or mount/umont cephfs.
>
> I've 1.5 million files , 2 metadata servers in active/standby configuration with 8 GB of RAM , 20 clients with 2 GB of RAM each and 2 OSD nodes with 4 80GB osd and 4GB of RAM.
>
>
>
> Here my configuration:
>
>
> [global]
> fsid = 2de7b17f-0a3e-4109-b878-c035dd2f7735
> mon_initial_members = cephmds01
> mon_host = 10.29.81.161
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> public network = 10.29.81.0/24
> tcp nodelay = true
> tcp rcvbuf = 0
> ms tcp read timeout = 600
>
> #Capacity
> mon osd full ratio = .95
> mon osd nearfull ratio = .85
>
>
> [osd]
> osd journal size = 1024
> journal dio = true
> journal aio = true
>
> osd op threads = 2
> osd op thread timeout = 60
> osd disk threads = 2
> osd recovery threads = 1
> osd recovery max active = 1
> osd max backfills = 2
>
>
> # Pool
> osd pool default size = 2
>
> #XFS
> osd mkfs type = xfs
> osd mkfs options xfs = "-f -i size=2048"
> osd mount options xfs = "rw,noatime,inode64,logbsize=256k,delaylog"
>
> #FileStore Settings
> filestore xattr use omap = false
> filestore max inline xattr size = 512
> filestore max sync interval = 10
> filestore merge threshold = 40
> filestore split multiple = 8
> filestore flusher = false
> filestore queue max ops = 2000
> filestore queue max bytes = 536870912
> filestore queue committing max ops = 500
> filestore queue committing max bytes = 268435456
> filestore op threads = 2
>
> [mds]
> max mds = 1
> mds cache size = 750000
> client cache size = 2048
> mds dir commit ratio = 0.5
>
>
>
> Here ceph -s output:
>
>
> root@service-new:~# ceph -s
> cluster 2de7b17f-0a3e-4109-b878-c035dd2f7735
> health HEALTH_WARN
> mds0: Client 94102 failing to respond to cache pressure
> monmap e2: 2 mons at {cephmds01=10.29.81.161:6789/0,cephmds02=10.29.81.160:6789/0}
> election epoch 34, quorum 0,1 cephmds02,cephmds01
> mdsmap e79: 1/1/1 up {0=cephmds01=up:active}, 1 up:standby
> osdmap e669: 8 osds: 8 up, 8 in
> pgmap v339741: 256 pgs, 2 pools, 132 GB data, 1417 kobjects
> 288 GB used, 342 GB / 631 GB avail
> 256 active+clean
> client io 3091 kB/s rd, 342 op/s
>
> Thank you.
> Regards,
> Matteo
>
>
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150614/77e13e5a/attachment-0001.htm>
------------------------------
Message: 5
Date: Sun, 14 Jun 2015 11:32:46 -0700
From: Mike Carlson <mike@xxxxxxxxxxxx>
To: Alex Muntada <alexm@xxxxxxxxx>
Cc: ceph-users@xxxxxxxx
Subject: Re: .New Ceph cluster - cannot add additional
monitor
Message-ID:
<CA+KW7xQRCcfz+enHgWODSO86j3ni4WFA7XvF0uG=gTwAgb0AAA@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="utf-8"
Thank you for the reply Alex, I'm going to check into that and see if it
helps resolve the issue.
Mike C
On Fri, Jun 12, 2015 at 11:57 PM, Alex Muntada <alexm@xxxxxxxxx> wrote:
> We've recently found similar problems creating a new cluster over an older
> one, even after using "ceph-deploy purge", because some of the data
> remained on /var/lib/ceph/*/* (ubuntu trusty) and the nodes were trying to
> use old keyrings.
>
> Hope it helps,
> Alex
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150614/866f83f9/attachment-0001.htm>
------------------------------
Message: 6
Date: Sun, 14 Jun 2015 19:00:47 +0000
From: Matteo Dacrema <mdacrema@xxxxxxxx>
To: Lincoln Bryant <lincolnb@xxxxxxxxxxxx>, ceph-users
<ceph-users@xxxxxxxx>
Subject: Re: CephFS client issue
Message-ID: <deb408dee1b74d8d88221edbd72d1cd1@Exch2013Mb.enter.local>
Content-Type: text/plain; charset="us-ascii"
Hi Lincoln,
I'm using the kernel client.
Kernel version is: 3.13.0-53-generic?
Thanks,
Matteo
________________________________
Da: Lincoln Bryant <lincolnb@xxxxxxxxxxxx>
Inviato: domenica 14 giugno 2015 19:31
A: Matteo Dacrema; ceph-users
Oggetto: Re: CephFS client issue
Hi Matteo,
Are your clients using the FUSE client or the kernel client? If the latter, what kernel version?
--Lincoln
On 6/14/2015 10:26 AM, Matteo Dacrema wrote:
?Hi all,
I'm using CephFS on Hammer and sometimes I need to reboot one or more clients because , as ceph -s tells me, it's "failing to respond to capability release".After tha?t all clients stop to respond: can't access files or mount/umont cephfs.
I've 1.5 million files , 2 metadata servers in active/standby configuration with 8 GB of RAM , 20 clients with 2 GB of RAM each and 2 OSD nodes with 4 80GB osd and 4GB of RAM.
Here my configuration:
[global]
fsid = 2de7b17f-0a3e-4109-b878-c035dd2f7735
mon_initial_members = cephmds01
mon_host = 10.29.81.161
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
public network = 10.29.81.0/24
tcp nodelay = true
tcp rcvbuf = 0
ms tcp read timeout = 600
#Capacity
mon osd full ratio = .95
mon osd nearfull ratio = .85
[osd]
osd journal size = 1024
journal dio = true
journal aio = true
osd op threads = 2
osd op thread timeout = 60
osd disk threads = 2
osd recovery threads = 1
osd recovery max active = 1
osd max backfills = 2
# Pool
osd pool default size = 2
#XFS
osd mkfs type = xfs
osd mkfs options xfs = "-f -i size=2048"
osd mount options xfs = "rw,noatime,inode64,logbsize=256k,delaylog"
#FileStore Settings
filestore xattr use omap = false
filestore max inline xattr size = 512
filestore max sync interval = 10
filestore merge threshold = 40
filestore split multiple = 8
filestore flusher = false
filestore queue max ops = 2000
filestore queue max bytes = 536870912
filestore queue committing max ops = 500
filestore queue committing max bytes = 268435456
filestore op threads = 2
[mds]
max mds = 1
mds cache size = 750000
client cache size = 2048
mds dir commit ratio = 0.5
Here ceph -s output:
root@service-new:~# ceph -s
cluster 2de7b17f-0a3e-4109-b878-c035dd2f7735
health HEALTH_WARN
mds0: Client 94102 failing to respond to cache pressure
monmap e2: 2 mons at {cephmds01=10.29.81.161:6789/0,cephmds02=10.29.81.160:6789/0}
election epoch 34, quorum 0,1 cephmds02,cephmds01
mdsmap e79: 1/1/1 up {0=cephmds01=up:active}, 1 up:standby
osdmap e669: 8 osds: 8 up, 8 in
pgmap v339741: 256 pgs, 2 pools, 132 GB data, 1417 kobjects
288 GB used, 342 GB / 631 GB avail
256 active+clean
client io 3091 kB/s rd, 342 op/s
Thank you.
Regards,
Matteo
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Questo messaggio e' stato analizzato con Libra ESVA ed e' risultato non infetto.
Clicca qui per segnalarlo come spam.<http://esva01.enter.it/cgi-bin/learn-msg.cgi?id=4F9DA41B1A.A3369>
Clicca qui per metterlo in blacklist<http://esva01.enter.it/cgi-bin/learn-msg.cgi?blacklist=1&id=4F9DA41B1A.A3369>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20150614/62614a4e/attachment-0001.htm>
------------------------------
Subject: Digest Footer
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
------------------------------
End of ceph-users Digest, Vol 29, Issue 15
******************************************
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com