Re: file(1)

Russell Coker <russell@xxxxxxxxxxxx> · Wed, 2 Apr 2008 22:02:32 +1100

On Tuesday 01 April 2008 01:12, Stephen Smalley <sds@xxxxxxxxxxxxx> wrote:
> > There are three or four formats depending on how you are counting:
> > - the kernel binary policy file (policy.N),
> > - the module policy package file (.pp files),
> > - the binary module file (.mod files, and these come in two flavors -
> > base and non-base).
> >
> > base.pp is a module policy package file containing a binary module
> > (.mod) file, a file contexts (.fc) file, and potentially other
> > components (e.g. seusers, users_extra).
> >
> > So don't confuse the module policy package file with a module file - the
> > module policy package file has its own header before the module file.
>
> So, to be precise, we've got the following format for the module policy
> package file:
> module package magic number (different from the module magic number)
> module package format version (different from the module format version)
> number of sections
> offset array, indexed by section number
> binary module (with its own header)
> any other sections (file contexts, etc)
>
> Note that the offset array is variable length - it will be the number of
> sections * sizeof uint32.

This is where things start to become tricky, magic(5) doesn't support arrays.  
I could probably handle this with nested if statements for a reasonable 
number (it seems that we generally don't have that many).

> And note that the order of sections isn't necessarily fixed, although
> the current module package writing code always puts the binary module
> first.  But the read code doesn't make any assumptions about ordering
> and just determines what each section is as it encounters it.

My current effort just gives information on the first one, the binary seems to 
be the most interesting one (if for example qmail.pp 
becomes /lost+found/#1234 then having the string "qmail\5" be displayed would 
be helpful).

> One other fun tidbit - in libsepol 2.0.23, I changed the policy reading
> code to accept either "Flask" or "SE Linux" as the string identifier as
> other projects, like Solaris FMAC, are using "Flask" as the string
> identifier as a more general specifier.  So it would be nice if file(1)
> ultimately accepted either as well.

Given that we already have hex magic numbers to recognise the various 
structures and that the most desirable way for humans to recognise file types 
is via file(1), what is the benefit of having either "SE Linux" or "Flask" in 
there?

Would it be possible to just accept that for the current version of the file 
format "SE Linux" will be contained in there and not bother about this?  It's 
no secret that the Flask code was not originally developed for Solaris.

Eventually we just have to lose some data from file(1) output when these 
situations arise, I suspect that we have already reached the stage where we 
exceed the flexibility of the magic(5) configuration language.  It may be 
that on the Linux platform we just get the wrong data from other platforms.

It seems that there are ongoing plans for new file formats.  Could we agree 
that getting things to work with file(1) will be a pre-requisite for future 
file formats?

-- 
russell@xxxxxxxxxxxx
http://etbe.coker.com.au/          My Blog

http://www.coker.com.au/sponsorship.html Sponsoring Free Software development

--
This message was distributed to subscribers of the selinux mailing list.
If you no longer wish to subscribe, send mail to majordomo@xxxxxxxxxxxxx with
the words "unsubscribe selinux" without quotes as the message.