Re: 答复: How's cephfs going?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

While not necessarily CephFS specific - we somehow seem to manage to frequently end up with objects that have inconsistent omaps. This seems to be replication (as anecdotally it's a replica that ends up diverging, and it's at least a few times something that happened after the osd that held that replica were re-started). (I had hoped http://tracker.ceph.com/issues/17177 would solve this - but it doesn't appear to have solved it completely).

We also have one workload which we'd need to re-engineer in order to be a good fit for CephFS, we do a lot of hardlinks where there's no clear "origin" file, which is slightly at odds with the hardlink implementation. If I understand correctly, unlink is move from directory tree into the stray directories, decrement link count, if link count = 0, purge, if not keep it around until you encounter another link to it and re-integrate it back in again. This netted us hilariously large stray directories, which combined with the above were less than ideal.

Beyond that - there's been other small(-ish) bugs we've encountered, but it's either been solvable by cherry-picking fixes, upgrading, or using the available tools for doing surgery guided either by the internet and/or an approximate understanding of how it's supposed to work/be).

-KJ

On Wed, Jul 19, 2017 at 11:20 AM, Brady Deetz <bdeetz@xxxxxxxxx> wrote:
Thanks Greg. I thought it was impossible when I reported 34MB for 52 million files. 

On Jul 19, 2017 1:17 PM, "Gregory Farnum" <gfarnum@xxxxxxxxxx> wrote:


On Wed, Jul 19, 2017 at 10:25 AM David <dclistslinux@xxxxxxxxx> wrote:
On Tue, Jul 18, 2017 at 6:54 AM, Blair Bethwaite <blair.bethwaite@xxxxxxxxx> wrote:
We are a data-intensive university, with an increasingly large fleet
of scientific instruments capturing various types of data (mostly
imaging of one kind or another). That data typically needs to be
stored, protected, managed, shared, connected/moved to specialised
compute for analysis. Given the large variety of use-cases we are
being somewhat more circumspect it our CephFS adoption and really only
dipping toes in the water, ultimately hoping it will become a
long-term default NAS choice from Luminous onwards.

On 18 July 2017 at 15:21, Brady Deetz <bdeetz@xxxxxxxxx> wrote:
> All of that said, you could also consider using rbd and zfs or whatever filesystem you like. That would allow you to gain the benefits of scaleout while still getting a feature rich fs. But, there are some down sides to that architecture too.

We do this today (KVMs with a couple of large RBDs attached via
librbd+QEMU/KVM), but the throughput able to be achieved this way is
nothing like native CephFS - adding more RBDs doesn't seem to help
increase overall throughput. Also, if you have NFS clients you will
absolutely need SSD ZIL. And of course you then have a single point of
failure and downtime for regular updates etc.

In terms of small file performance I'm interested to hear about
experiences with in-line file storage on the MDS.

Also, while we're talking about CephFS - what size metadata pools are
people seeing on their production systems with 10s-100s millions of
files?

On a system with 10.1 million files, metadata pool is 60MB


Unfortunately that's not really an accurate assessment, for good but terrible reasons:
1) CephFS metadata is principally stored via the omap interface (which is designed for handling things like the directory storage CephFS needs)
2) omap is implemented via Level/RocksDB
3) there is not a good way to determine which pool is responsible for which portion of RocksDBs data
4) So the pool stats do not incorporate omap data usage at all in their reports (it's part of the overall space used, and is one of the things that can make that larger than the sum of the per-pool spaces)

You could try and estimate it by looking at how much "lost" space there is (and subtracting out journal sizes and things, depending on setup). But I promise there's more than 60MB of CephFS metadata for 10.1 million files!
-Greg

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Kjetil Joergensen <kjetil@xxxxxxxxxxxx>
SRE, Medallia Inc
Phone: +1 (650) 739-6580
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux