On Mon, Oct 23, 2017 at 9:35 AM, Eric Eastman <eric.eastman@xxxxxxxxxxxxxx> wrote: > With help from the list we recently recovered one of our Jewel based > clusters that started failing when we got to about 4800 cephfs snapshots. > We understand that cephfs snapshots are still marked experimental. We are > running a single active MDS with 2 standby MDS. We only have a single file > system, we are only taking snapshots from the top level directory, and we > are now planning on limiting snapshots to a few hundred. Currently we have > removed all snapshots from this system, using rmdir on each snapshot > directory, and the system is reporting that it is healthy: > > ceph -s > cluster ba0c94fc-1168-11e6-aaea-000c290cc2d4 > health HEALTH_OK > monmap e1: 3 mons at > {mon01=10.16.51.21:6789/0,mon02=10.16.51.22:6789/0,mon03=10.16.51.23:6789/0} > election epoch 202, quorum 0,1,2 mon01,mon02,mon03 > fsmap e18283: 1/1/1 up {0=mds01=up:active}, 2 up:standby > osdmap e342543: 93 osds: 93 up, 93 in > flags sortbitwise,require_jewel_osds > pgmap v38759308: 11336 pgs, 9 pools, 23107 GB data, 12086 kobjects > 73956 GB used, 209 TB / 281 TB avail > 11336 active+clean > client io 509 kB/s rd, 2548 B/s wr, 0 op/s rd, 1 op/s wr > > The snapshots were removed several days ago, but just as an experiment I > decided to query a few PGs in the cephfs data storage pool, and I am seeing > they are all listing: > > “purged_snaps": "[2~12cd,12d0~12c9]", purged_snaps IDs of snapshots whose data have been completely purged. Currently purged_snap set is append only, osd never remove ID from it. Regards Yan, Zheng > > Here is an example: > > ceph pg 1.72 query > { > "state": "active+clean", > "snap_trimq": "[]", > "epoch": 342540, > "up": [ > 75, > 77, > 82 > ], > "acting": [ > 75, > 77, > 82 > ], > "actingbackfill": [ > "75", > "77", > "82" > ], > "info": { > "pgid": "1.72", > "last_update": "342540'261039", > "last_complete": "342540'261039", > "log_tail": "341080'260697", > "last_user_version": 261039, > "last_backfill": "MAX", > "last_backfill_bitwise": 1, > "purged_snaps": "[2~12cd,12d0~12c9]", > … > > Is this an issue? > I am not seeing any recent trim activity. > Are there any procedures documented for looking at snapshots to see if there > are any issues? > > Before posting this, I have reread the cephfs and snapshot pages in at: > http://docs.ceph.com/docs/master/cephfs/ > http://docs.ceph.com/docs/master/dev/cephfs-snapshots/ > > Looked at the slides: > http://events.linuxfoundation.org/sites/events/files/slides/2017-03-23%20Vault%20Snapshots.pdf > > Watched the video “Ceph Snapshots for Fun and Profit” given at the last > OpenStack conference. > > And I still can’t find much on info on debugging snapshots. > > Here is some addition information on the cluster: > > ceph df > GLOBAL: > SIZE AVAIL RAW USED %RAW USED > 281T 209T 73955G 25.62 > POOLS: > NAME ID USED %USED MAX AVAIL OBJECTS > rbd 0 16 0 56326G 3 > cephfs_data 1 22922G 28.92 56326G 12279871 > cephfs_metadata 2 89260k 0 56326G 45232 > cinder 9 147G 0.26 56326G 41420 > glance 10 0 0 56326G 0 > cinder-backup 11 0 0 56326G 0 > cinder-ssltest 23 1362M 0 56326G 431 > IDMT-dfgw02 27 2552M 0 56326G 758 > dfbackup 28 33987M 0.06 56326G 8670 > > > Recent tickets and posts on problems with this cluster > http://tracker.ceph.com/issues/21761 > http://tracker.ceph.com/issues/21412 > https://www.spinics.net/lists/ceph-devel/msg38203.html > > ceph -v > ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe) > > Kernel is 4.13.1 > uname -a > Linux ss001 4.13.1-041301-generic #201709100232 SMP Sun Sep 10 06:33:36 UTC > 2017 x86_64 x86_64 x86_64 GNU/Linux > > OS is Ubuntu 16.04 > > Thanks > Eric > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com