The continuing story ...

mark at mark.mielke.cc (Mark Mielke) · Thu, 10 Sep 2009 09:27:12 -0400

On 09/10/2009 06:25 AM, Stephan von Krawczynski wrote:
> On Wed, 09 Sep 2009 19:43:15 -0400
> Mark Mielke<mark at mark.mielke.cc>  wrote:
>
>    
>> In this case, there is too many unknowns - but I agree with Anand's
>> logic 100%. Gluster should not be able to cause a CPU lock up. It should
>> be impossible. If it is not impossible - it means a kernel bug, and the
>> best place to have this addressed is the kernel devel list, or, if you
>> have purchased a subscription from a company such as RedHat, than this
>> belongs as a ticket open with RedHat.
>>      
> You know, I am really bothered about the way the maintainers are acting since
> I read this list. There is really a lot of ideology going on ("can't be", "is
> impossible for userspace" etc) and very few real debugging.
>    

In general, if one didn't understand the architecture (black box), you 
would be right. I would not want to here these things either -

However, Anand did not *just* say "can't be", or "is impossible for 
userspace". He provided significant explanation which exactly matches my 
own prior understanding about the separation between user space and 
kernel space. In particular, if you read about the intent of FUSE - the 
technology being used to create a file system, I think you will find 
that what Anand is saying is the *exact* purpose for this project. Why 
have a file system in user space in the first place? It introduces 
performance limitations. The "why do it" is precisely to provide the 
level of isolation and separation from the kernel. Take into account 
that FUSE *encourages* user's to use or write their own file system 
layers. That is, you do not need to be admin/root in order to call 
fusermount and mount a FUSE file system. If any user of your system can 
create it - don't you think it is reasonable to expect the kernel and 
FUSE developers to protect "complete system lock up" from occurring?

If FUSE cannot provide separation from kernel space, then FUSE is 
garbage and should be thrown away.

GlusterFS folk chose to use FUSE to obtain this separation. If this 
separation is not being provided - than the value of FUSE in the first 
place is brought into question.

> This application is not the only one in the world. People use heavily file-
> and net-acting applications like firefox, apache, shell-scripts, name-one on
> their boxes. None leads to effects seen if you play with glusterfs. If you
> really think it is a logical way of debugging to go out and simply tell
> "userspace can't do that" while the rest of the application-world does not
> show up with dead-ends like seen on this list, how can I change your mind?
> I hardly believe I can. I can only tell you what I would do: I would try to
> document _first_ that my piece of code really does behave well. But as you may
> have noticed there is no real way to provide this information. And that is
> indeed part of the problem.

Other applications fail all of the time as well - it seems to be a 
question, in this case, of how critical the failure is. But, let us say 
that the problem is "every time you send too many bytes to the network 
device, it dies" - whose responsibility is this to fix? Is it GlusterFS 
for being too efficient? Is it the Linux kernel driver for not loading 
the data onto the device properly? Is it the device for locking up under 
a certain type of load? Do you think the best and only place to look for 
an answer is the user space program? Let's say every time you used 
Firefox, your CPU + network device locked up - would you go to the 
Firefox devel mailing list and demand they fix this? Because, you know 
you would get the same response. They would laugh you out - whether you 
were right or wrong. The onus is on you, as it is your hardware which is 
dying. Now, if you have paid a subscription price - the responsibility 
might increase - but if it's free / open source / community-based 
support, and the general consensus on the technology is that FUSE is a 
user space capability, and user space should not be able to lock up CPU 
or network device - which is most definitely true, at least as an ideal 
- then yes, the onus is the community member to prove that others should 
reconsider their long held beliefs.

I disagree that other user space programs do not trigger kernel bugs. 
Read the kernel devel list for a while, or patch the kernel release 
notes. I think you will find that most bugs - of which there are 
thousands or more - are triggered by user space programs. The kernel 
developers fix the problems, because they are kernel problems. Unless 
you tell them about the problem - they won't know to look into it. In 
this particular case - what do you want Anand or gluster.com to do? 
Let's say every time they send one particular packet very quickly after 
another particular packet, it locks up. Do you want Anand or gluster.com 
to stop sending these packets? Or do you want the Linux developers to 
fix the device so this no longer happens again? Which is a better 
solution to the problem? Which helps the most people, and prevents the 
problem from occurring in the future?

> Wouldn't it be a nice step if you could debug the ongoings of a
> glusterfs-server on the client by simply reading an exported file (something
> like a server-dependant meta-debug-file) that outputs something like strace
> does? Something that enables you to say: "Ok, here you can see what the
> application did, and there you can see what the kernel made of it". As we
> noticed a server-logfile is not sufficient.
>    

Sure, that would be awesome - and I think it's provided by such things 
as DTrace, or strace. This is a kernel problem. If every user space 
application must re-invent this wheel, that's surely a lot of effort on 
the part of application developers. Should Firefox provide the same 
functionality?

> Is ideology really a prove for anything in todays' world? Do you really think
> it is possible to understand the complete world by seeing half of it and the
> other half painted by ideology? What is wrong about _proving_ being not
> guilty? About acting defensive ?
>    

Ideology? It depends. Some basic principles and basic understandings of 
how the systems are designed are required to help us triage problems we 
face every day. Knowing the system has failed is insufficient. The first 
question is - where did the failure occur, and who should I ask for 
help? In the case of a CPU lockup and/or network device lockup, most 
people would start with the Linux kernel. Now, if this was an NFS 
problem - than NFS is part of the kernel, and so the NFS part of the 
kernel would be open for consideration. But, we know that GlusterFS does 
*not* introduce any code into the kernel. This little bit of information 
is important, and should not be ignored.

Now, maybe it's easier to start with GlusterFS, and try and pull on the 
community and the GlusterFS support people for *help*, because we can 
argue "it's a failure that from a superficial level, seems to be trigger 
by your software, so you should be concerned" - but this has a limit to 
how effective it is as a means of addressing the problem. First, the 
GlusterFS people probably have little or no clue on how to diagnose a 
CPU lockup or network device lockup. Since their software is entirely in 
user space, they would not require this capability as part of their 
employment requirements. If somebody on the staff *happens* to have the 
right experience, and happens to know the answer, than you would be 
lucky. This is not the sort of thing I would expect however, even if I 
had a subscription for support. For them to say - we have looked into 
this, and this is not our problem because of such and such, but please 
come back if you can prove otherwise such that we can do something about 
this - is a fair answer.

> It is important to understand that this application is a kind of core
> technology for data storage. This means people want to be sure that their
> setup does not explode just because they made a kernel update or some other
> change where their experience tells them it should have no influence on the
> glusterfs service. You want to be sure, just like you are when using nfs. It
> just does work (even being in kernel-space!).
> Now, answer for yourself if you think glusterfs is as stable as nfs on the
> same box.

One could say the same thing about every layer between user space and 
the hardware. If your disk dies, is this a GlusterFS problem? If your 
disk controller dies, is this a GlusterFS problem? If your network 
device dies, is this a GlusterFS problem? If your CPU dies, is this a 
GlusterFS problem?

I don't agree that NFS just works. NFS has gone through a lot of 
evolution and maturity. At our company, I've been aware of numerous 
problems with NFS. Coincidentally, I was in a call with one of the 
owners of the Linux NFS code from RedHat a few weeks ago to discuss the 
subject of file system caching, and whether or not NFS would be a 
solution to a problem we were having with another network file system 
(ClearCase MVFS). During the call, the subject of NFS development did 
come up, and as I recall, he humbly acknowledged that NFS has had a lot 
of problems and they have been working hard on it. I told him I thought 
they were doing a great job and I meant it. Everything is relative. Yes, 
we rely on NFS every day at work - but, for the most part, it works 
great. We have problems, but RedHat has been responsive to our problems 
*once we have identified them as NFS problems*, and we work together 
towards a solution. But then, we also pay RedHat money to support us.

Cheers,
mark

-- 
Mark Mielke<mark at mielke.cc>