Re: metadata spill back onto block.slow before block.db filled up

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 29 Nov 2017, Igor Fedotov wrote:
> I've just updated the bug notes.
> 
> Most probably the issue is caused by an already fixed bug in RocksDB,
> 
> see
> https://github.com/facebook/rocksdb/commit/65a9cd616876c7a1204e1a50990400e4e1f61d7e
> 
> Hence the question is if we plan to backport the fix and how to arrange that.

If you cna confirm the problem doesn't reproduce after cherry-picking that 
commit, we can just do that for the luminous branch.

For master, let's fast forward rocksdb past it?

sage


> 
> Thanks,
> Igor
> 
> On 11/28/2017 6:44 PM, Igor Fedotov wrote:
> > here it is
> > 
> > http://tracker.ceph.com/issues/22264
> > 
> > 
> > On 11/28/2017 6:37 PM, Mark Nelson wrote:
> > > Looks like a bug guys! :)  Mind making a ticket in the tracker?
> > > 
> > > Mark
> > > 
> > > On 11/28/2017 07:39 AM, Igor Fedotov wrote:
> > > > Looks like I can easily reproduce that (note slow_used_bytes):
> > > > 
> > > >  "bluefs": {
> > > >         "gift_bytes": 105906176,
> > > >         "reclaim_bytes": 0,
> > > >         "db_total_bytes": 4294959104,
> > > >         "db_used_bytes": 76546048,
> > > >         "wal_total_bytes": 1073737728,
> > > >         "wal_used_bytes": 239075328,
> > > >         "slow_total_bytes": 1179648000,
> > > >         "slow_used_bytes": 63963136,
> > > >         "num_files": 13,
> > > >         "log_bytes": 2539520,
> > > >         "log_compactions": 3,
> > > >         "logged_bytes": 255176704,
> > > >         "files_written_wal": 3,
> > > >         "files_written_sst": 10,
> > > >         "bytes_written_wal": 1932165189,
> > > >         "bytes_written_sst": 340957748
> > > >     },
> > > > 
> > > > 
> > > > On 11/28/2017 4:17 PM, Sage Weil wrote:
> > > > > Hi Shasha,
> > > > > 
> > > > > On Tue, 28 Nov 2017, shasha lu wrote:
> > > > > > Hi, Mark
> > > > > > We test bluestore with 12.2.1.
> > > > > > There are two host in our rgw cluster, each host contain 2 osds. The
> > > > > > rgw pool size is 2.  Using a 5GB partition for db.wal, a 50GB SSD
> > > > > > partition for block.db.
> > > > > > 
> > > > > > # ceph --admin-daemon ceph-osd.1.asok config get rocksdb_db_paths
> > > > > > {
> > > > > >      "rocksdb_db_paths": "db,51002736640 db.slow,284999998054"
> > > > > > }
> > > > > > 
> > > > > > After writing about 400W 4k rgw objects, using ceph-bluestore-tool
> > > > > > to
> > > > > > export rocksdb file.
> > > > > > 
> > > > > > # ceph-bluestore-tool bluefs-export --path /var/lib/ceph/osd/osd1
> > > > > > --out-dir /tmp/osd1
> > > > > > # cd /tmp/osd1
> > > > > > # ls
> > > > > > db  db.slow  db.wal
> > > > > > # du -sh *
> > > > > > 2.8G    db
> > > > > > 809M    db.slow
> > > > > > 439M    db.wal
> > > > > > 
> > > > > > block.db partition have 50GB space, but it only contains ~3GB files.
> > > > > > Then the metadata rolling over onto the db.slow.
> > > > > > It seems that only L0-L2 files located in block.db. (L0 256M; L1
> > > > > > 256M;
> > > > > > L2 2.5GB), L3 and higher level file located in db.slow.
> > > > > > 
> > > > > > According to ceph docs, the metadata rolling over onto the db.slow
> > > > > > only when block.db filled up. But in our env the block.db partition
> > > > > > is
> > > > > > far from filled up.
> > > > > > Did I make any mistakes?  Is there any additional options should be
> > > > > > set to rocksdb?
> > > > > You didn't make any mistakes--this should happen automatically.  It
> > > > > looks
> > > > > like rocksdb isn't behaving as advertised.  I've opened
> > > > > http://tracker.ceph.com/issues/22264 to track this.  We need to start
> > > > > by
> > > > > reproducing the situation.
> > > > > 
> > > > > My guess is that rocksdb is deciding that deciding that all of L3
> > > > > can't
> > > > > fit on db and so it's putting all of L3 on db.slow?
> > > > > 
> > > > > sage
> > > > > 
> > > > > -- 
> > > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel"
> > > > > in
> > > > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > > 
> > 
> > -- 
> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux