Re: [GIT PULL] Orangefs (text only resend)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I don't think any of us can *guarantee* that our code is bug free.
And, for me, coherency locking in giant programs like the Kernel
is hard... I think some of you guys can just shut your eyes and
see in your head how every lock in the kernel interacts. Now
that I've been refreshed on lockdep, I remember finding problems
(not with memory allocations) when Christoph was working with us,
and it was a good thing we had his help to fix them. Lockdep doesn't
point out any problems to me now, and I don't see problems with our
GFP_KERNEL allocations when I study the code surrounding
them (now that you have pointed out that I should). That you
think there might be problems there make me think there
might be too, so I at least won't be blithe about it...

Here is the iov_iter code that causes no regressions... it is
not checked into even my github repository yet.

file.c:

if (to_user) {
    iov_iter_init(&iter, READ, vec, nr_segs, total_size);
    ret = pvfs_bufmap_copy_to_user_iovec2(bufmap,
    &iter,
    total_size,
    buffer_index,
    nr_segs);
    ret = 0;
}

pvfs2-bufmap.c:

int pvfs_bufmap_copy_to_user_iovec2(struct pvfs2_bufmap *bufmap,
                    struct iov_iter *iter,
                    size_t total_size,
                    int buffer_index,
                    int nr_segs)
{
    struct pvfs_bufmap_desc *from;
    struct page *page;
    size_t remaining = total_size;
    size_t written;
    size_t copy;
    int i;

    from = &bufmap->desc_array[buffer_index];

    for (i = 0; i < bufmap->page_count; i++) {
        page = from->page_array[i];
        copy = min_t(size_t, remaining, PAGE_SIZE);
        written = copy_page_to_iter(page, 0, copy, iter);
        remaining -= written;
        if (written < copy && iov_iter_count(iter) > 0)
            break;
    }
    return remaining ? -EFAULT : 0;
}

The loop is ridiculous, page_count is always 10240, and that
loop churns 10240 times on a file containing "hello"...

I studied on the code all day, and changed/simplified it in ways
that I thought were sane, but my changes always caused
regressions...

cat hello.file        always worked

cp client.txt (a giant dbench input file) from orangefs to /tmp always worked

diff /pvfsmnt/client.txt /tmp/client.txt       broken, even though
both files md5summed the same...

the dbench test I run broke.

It all works with the above code (and with our open-coded non-iov-iter code),
but the above code is "not right"... I expect to make it right.

I learned about fuzzing at Vault. I will use this to help me
start trying to add meaningful fuzzing to my tests:

https://lwn.net/Articles/637151/

I am open to fuzzing suggestions...

-Mike

On Fri, Sep 11, 2015 at 6:24 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> On Fri, Sep 11, 2015 at 05:12:08PM -0400, Mike Marshall wrote:
>> I'm about to leave for the day...
>>
>> I haven't found any problems with the GFP_KERNEL allocations that
>> Al warned me about... that doesn't mean there aren't any...
>
> *IF* you have nothing that would require locks in any of the pathways
> related to memory pressure (and can guarantee that no such thing will
> appear), GFP_KERNEL should be OK.  Still, doing that under the system-wide
> mutex taken whenever you need to send a request looks like a Bad Idea(tm) -
> too easy to introduce such deadlocks on subsequent changes.
>
>> I'm using copy_page_to_iter in my new branch as Al suggested. I've
>> changed the code quite a bit from any samples I've posted, there were
>> regressions with it. I finally stole some code from
>> cifs/file.c/cifs_readdata_to_iov
>> and everything works... but I'm not happy with it yet... Linus once posted
>> a message to the effect that "you don't fix bugs by thrashing around until
>> stuff seems to work, you fix them by doing the right thing on purpose..."
>> and I'm working towards that end...
>
> Could you tell where does the current code live?  What's in -next appears
> to be unchanged...
>
> BTW, as for passing all your tests...  Do those include fuzzing it by
> misbehaving server?  And getdents() from a directory that has a bunch of
> long names *and* a short one in the very end looks like it would misbehave
> even on correctly working server...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux