Well...only because I had this discussion in the back of my mind when I've watch the video yesterday. ;-) Cheers, Frédéric. ----- Le 22 Nov 24, à 8:59, Eugen Block eblock@xxxxxx a écrit : > Then you were clearly paying more attention than me. ;-) We had some > maintenance going on during that talk, so I couldn't really focus > entirely on listening. But thanks for clarifying! > > Zitat von Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx>: > >> Hi Eugen, >> >> During the talk you've mentioned, Dan said there's a hard coded >> limit of 256 MDSs per cluster. So with one active and one >> standby-ish MDSs per filesystem, that would be 128 filesystems at >> max per cluster. >> Mark said he got 120 but.. things start to get wacky by 80. :-) >> >> More fun to come, for sure. >> >> Cheers, >> Frédéric. >> >> [1] https://youtu.be/qiCE1Ifws80?t=2602 >> >> ----- Le 21 Nov 24, à 9:36, Eugen Block eblock@xxxxxx a écrit : >> >>> I'm not aware of any hard limit for the number of Filesystems, but >>> that doesn't really mean very much. IIRC, last week during a Clyso >>> talk at Eventbrite I heard someone say that they deployed around 200 >>> Filesystems or so, I don't remember if it was a production environment >>> or just a lab environment. I assume that you would probably be limited >>> by the number of OSDs/PGs rather than by the number of Filesystems, >>> 200 Filesystems require at least 400 pools. But maybe someone else has >>> more experience in scaling CephFS that way. What we did was to scale >>> the number of active MDS daemons for one CephFS. I believe in the end >>> the customer had 48 MDS daemons on three MDS servers, 16 of them were >>> active with directory pinning, at that time they had 16 standby-replay >>> and 16 standby daemons. But it turned out that standby-replay didn't >>> help their use case, so we disabled standby-replay. >>> >>> Can you show the entire 'ceph fs status' output? Any maybe also 'ceph >>> fs dump'? >>> >>> Zitat von Александр Руденко <a.rudikk@xxxxxxxxx>: >>> >>>>> >>>>> Just for testing purposes, have you tried pinning rank 1 to some other >>>>> directory? Does it still break the CephFS if you stop it? >>>> >>>> >>>> Yes, nothing changed. >>>> >>>> It's no problem that FS hangs when one of the ranks goes down, we will have >>>> standby-reply for all ranks. I don't like that rank which is not pinned to >>>> some dir handled some io of this dir or from clients which work with this >>>> dir. >>>> I mean that I can't robustly and fully separate client IO by ranks. >>>> >>>> Would it be an option to rather use multiple Filesystems instead of >>>>> multi-active for one CephFS? >>>> >>>> >>>> Yes, it's an option. But it is much more complicated in our case. Btw, do >>>> you know how many different FS can be created in one cluster? Maybe you >>>> know some potential problems with 100-200 FSs in one cluster? >>>> >>>> ср, 20 нояб. 2024 г. в 17:50, Eugen Block <eblock@xxxxxx>: >>>> >>>>> Ah, I misunderstood, I thought you wanted an even distribution across >>>>> both ranks. >>>>> Just for testing purposes, have you tried pinning rank 1 to some other >>>>> directory? Does it still break the CephFS if you stop it? I'm not sure >>>>> if you can prevent rank 1 from participating, I haven't looked into >>>>> all the configs in quite a while. Would it be an option to rather use >>>>> multiple Filesystems instead of multi-active for one CephFS? >>>>> >>>>> Zitat von Александр Руденко <a.rudikk@xxxxxxxxx>: >>>>> >>>>> > No it's not a typo. It's misleading example) >>>>> > >>>>> > dir1 and dir2 are pinned to rank 0, but FS and dir1,dir2 can't work >>>>> without >>>>> > rank 1. >>>>> > rank 1 is used for something when I work with this dirs. >>>>> > >>>>> > ceph 16.2.13, metadata balancer and policy based balancing not used. >>>>> > >>>>> > ср, 20 нояб. 2024 г. в 16:33, Eugen Block <eblock@xxxxxx>: >>>>> > >>>>> >> Hi, >>>>> >> >>>>> >> > After pinning: >>>>> >> > setfattr -n ceph.dir.pin -v 0 /fs-mountpoint/dir1 >>>>> >> > setfattr -n ceph.dir.pin -v 0 /fs-mountpoint/dir2 >>>>> >> >>>>> >> is this a typo? If not, you did pin both directories to the same rank. >>>>> >> >>>>> >> Zitat von Александр Руденко <a.rudikk@xxxxxxxxx>: >>>>> >> >>>>> >> > Hi, >>>>> >> > >>>>> >> > I try to distribute all top level dirs in CephFS by different MDS >>>>> ranks. >>>>> >> > I have two active MDS with rank *0* and *1 *and I have 2 top >>>>> dirs like >>>>> >> > */dir1* and* /dir2*. >>>>> >> > >>>>> >> > After pinning: >>>>> >> > setfattr -n ceph.dir.pin -v 0 /fs-mountpoint/dir1 >>>>> >> > setfattr -n ceph.dir.pin -v 0 /fs-mountpoint/dir2 >>>>> >> > >>>>> >> > I can see next INOS and DNS distribution: >>>>> >> > RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS >>>>> >> > 0 active c Reqs: 127 /s 12.6k 12.5k 333 505 >>>>> >> > 1 active b Reqs: 11 /s 21 24 19 1 >>>>> >> > >>>>> >> > When I write to dir1 I can see a small amount on Reqs: in rank 1. >>>>> >> > >>>>> >> > Events in journal of MDS with rank 1: >>>>> >> > cephfs-journal-tool --rank=fs1:1 event get list >>>>> >> > >>>>> >> > 2024-11-20T12:24:42.045056+0300 0xc5c1cb UPDATE: >>>>> (scatter_writebehind) >>>>> >> > A2037D53 >>>>> >> > 2024-11-20T12:24:46.935934+0300 0xc5c629 SESSION: () >>>>> >> > 2024-11-20T12:24:47.192012+0300 0xc5c7cd UPDATE: (lock inest >>>>> accounted >>>>> >> > scatter stat update) >>>>> >> > 2024-11-20T12:24:47.904717+0300 0xc5ca0b SESSION: () >>>>> >> > 2024-11-20T12:26:46.912719+0300 0xc5ca98 SESSION: () >>>>> >> > 2024-11-20T12:26:47.910806+0300 0xc5cc3c SESSION: () >>>>> >> > 2024-11-20T12:27:35.746239+0300 0xc5ccc9 SESSION: () >>>>> >> > 2024-11-20T12:28:46.923812+0300 0xc5ce63 SESSION: () >>>>> >> > 2024-11-20T12:28:47.903066+0300 0xc5d007 SESSION: () >>>>> >> > 2024-11-20T12:29:08.063326+0300 0xc5d094 EXPORT: () >>>>> >> > di1/A2037D53 >>>>> >> > 2024-11-20T12:30:46.909621+0300 0xc5d96f SESSION: () >>>>> >> > 2024-11-20T12:30:47.908050+0300 0xc5db13 SESSION: () >>>>> >> > 2024-11-20T12:32:46.907649+0300 0xc5dba0 SESSION: () >>>>> >> > 2024-11-20T12:32:47.905962+0300 0xc5dd44 SESSION: () >>>>> >> > 2024-11-20T12:34:44.349348+0300 0xc5ddd1 SESSIONS: () >>>>> >> > >>>>> >> > But the main problem, when I stop MDS rank 1 (without any kind of >>>>> >> standby) >>>>> >> > - FS hangs for all actions. >>>>> >> > Is this correct? Is it possible to completely exclude rank 1 from >>>>> >> > processing dir1 and not stop io when rank 1 goes down? >>>>> >> > _______________________________________________ >>>>> >> > ceph-users mailing list -- ceph-users@xxxxxxx >>>>> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>> >> >>>>> >> >>>>> >> _______________________________________________ >>>>> >> ceph-users mailing list -- ceph-users@xxxxxxx >>>>> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >>>>> >> >>>>> >>>>> >>>>> >>>>> >>> >>> >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx