Re: [PATCH] ceph: add halt mount option support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2020-02-19 at 21:42 +0100, Ilya Dryomov wrote:
> On Wed, Feb 19, 2020 at 8:22 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:
> > On Tue, Feb 18, 2020 at 6:59 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> > > > Yeah, I've mostly done this using DROP rules when I needed to test things.
> > > > But, I think I was probably just guilty of speculating out loud here.
> > > 
> > > I'm not sure what exactly Xiubo meant by "fulfilling" iptables rules
> > > in libceph, but I will say that any kind of iptables manipulation from
> > > within libceph is probably out of the question.
> > 
> > I think we're getting confused about two thoughts on iptables: (1) to
> > use iptables to effectively partition the mount instead of this new
> > halt option; (2) use iptables in concert with halt to prevent FIN
> > packets from being sent when the sockets are closed. I think we all
> > agree (2) is not going to happen.
> 
> Right.
> 
> > > > I think doing this by just closing down the sockets is probably fine. I
> > > > wouldn't pursue anything relating to to iptables here, unless we have
> > > > some larger reason to go that route.
> > > 
> > > IMO investing into a set of iptables and tc helpers for teuthology
> > > makes a _lot_ of sense.  It isn't exactly the same as a cable pull,
> > > but it's probably the next best thing.  First, it will be external to
> > > the system under test.  Second, it can be made selective -- you can
> > > cut a single session or all of them, simulate packet loss and latency
> > > issues, etc.  Third, it can be used for recovery and failover/fencing
> > > testing -- what happens when these packets get delivered two minutes
> > > later?  None of this is possible with something that just attempts to
> > > wedge the mount and acts as a point of no return.
> > 
> > This sounds attractive but it does require each mount to have its own
> > IP address? Or are there options? Maybe the kernel driver could mark
> > the connection with a mount ID we could do filtering on it? From a
> > quick Google, maybe [1] could be used for this purpose. I wonder
> > however if the kernel driver would have to do that marking of the
> > connection... and then we have iptables dependencies in the driver
> > again which we don't want to do.
> 
> As I said yesterday, I think it should be doable with no kernel
> changes -- either with IP aliases or with the help of some virtual
> interface.  Exactly how, I'm not sure because I use VMs for my tests
> and haven't had to touch iptables in a while, but I would be surprised
> to learn otherwise given the myriad of options out there.
> 

...and really, doing this sort of testing with the kernel client outside
of a vm is sort of a mess anyway, IMO.

That said, I think we might need a way to match up a superblock with the
sockets associated with it -- so mon, osd and mds socket info,
basically. That could be a very simple thing in debugfs though, in the
existing directory hierarchy there. With that info, you could reasonably
do something with iptables like we're suggesting.

> > From my perspective, this halt patch looks pretty simple and doesn't
> > appear to be a huge maintenance burden. Is it really so objectionable?
> 
> Well, this patch is simple only because it isn't even remotely
> equivalent to a cable pull.  I mean, it aborts in-flight requests
> with EIO, closes sockets, etc.  Has it been tested against the test
> cases that currently cold reset the node through the BMC?
> 
> If it has been tested and the current semantics are sufficient,
> are you sure they will remain so in the future?  What happens when
> a new test gets added that needs a harder shutdown?  We won't be
> able to reuse existing "umount -f" infrastructure anymore...  What
> if a new test needs to _actually_ kill the client?
> 
> And then a debugging knob that permanently wedges the client sure
> can't be a mount option for all the obvious reasons.  This bit is easy
> to fix, but the fact that it is submitted as a mount option makes me
> suspect that the whole thing hasn't been thought through very well.

Agreed on all points. This sort of fault injection is really best done
via other means. Otherwise, it's really hard to know whether it'll
behave the way you expect in other situations.

I'll add too that I think experience shows that these sorts of
interfaces end up bitrotted because they're too specialized to use
outside of anything but very specific environments. We need to think
larger than just teuthology's needs here.

-- 
Jeff Layton <jlayton@xxxxxxxxxx>




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux