[users@httpd] mod_dav Problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello. I am new to the list because of the problem I am seeing on our servers, which I explain below.

Some Info On The Problem:
We have over 250,000 PUTs a night. Of those, we usually have about 50 that end up with a 204 status even though they don't actually exist once the upload is finished. This is obviously a huge problem. I was able to narrow down the problem. It seems to only happen with some requests that first return a 500 (the "Could not get next bucket brigade" error). The client gets the 500, and then starts the transfer again. Apache will then respond with a 204 message, and the client will think the upload worked, even though it really didn't. Every filename uploaded is unique, and should always return a 201, so the fact that we see 204's is odd and must mean that the 500 (or first request) has not finished when the 204 (or second request) starts. Some log entries will hopefully explain this better...

+---------------------+--------+----------------------------------------+-------------+--------+-----------+---------+----------+
| requesttime         | method | requesturl                             | querystring | status | timetaken | ioinput | iooutput |
+---------------------+--------+----------------------------------------+-------------+--------+-----------+---------+----------+
| 2005-12-29 03:59:55 | PUT    | /webdav/username1/folder/file1.txt|             |    204 |    428980 |    1200 |     1277 |
| 2005-12-29 03:58:57 | PUT    | /webdav/username1/folder/file1.txt |             |    500 | 306521783 |     495 |     1311 |
| 2005-12-29 06:05:49 | PUT    | /webdav/username2/folder/file1.txt |             |    204 |   2497618 |  142558 |     1277 |
| 2005-12-29 06:02:55 | PUT    | /webdav/username2/folder/file1.txt |             |    500 | 303576082 |   96329 |     1311 |
+---------------------+--------+---------------------------------------+-------------+--------+-----------+---------+----------+

The above entries show the problem for two different users. The first request by each user eventually times out (and returns the 500). Our timeout is set to 300s and the request is killed after basically 300s. But based on the requesttime, you can see that the second request for each user happens before the first request is finished. So the second request finished successfully with a 204, but the first one is still running for some reason. And when the first one times out it deletes the file. Unfortunately, now the client thinks the file is there because the second request was successful.

The Setup:
We have a few web servers, and a few more file servers. This problem happens on every web and file server. The file servers are attached using NFS with the sync option. The web servers are running 2.0.55 with FC3. The file servers are mostly FC3, but two are RH9. The client software that uploads the files will retry the upload 3 times until it gets a status >= 200 and < 300. I've been experimenting with different setups on one webserver.

More Info:
With the experimental webserver, I have tried apache 2.0.46 (it is pre bitbucket). I have tried turning KeepAlive off. I have changed the Timeout. None of that has done anything.

At the end of the dav_method_put function, I added code that actually checks the existence of the file that was uploaded, and also checks the size if it does exist.

So right before the return, I added:

    struct    stat    statinfo;
    if(stat(r->filename, &statinfo) != 0) {

        err = dav_new_error(r->pool, HTTP_NOT_FOUND, 0,
                            apr_psprintf(r->pool,
                                         "File Not Found After PUT: %s",
                                         r->filename));

        return dav_handle_err(r, err, NULL);

    } else {
        //THIS SECTION HAS NEVER BEEN NEEDED
        if (statinfo.st_size != total_written) {
            ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, r,
                          apr_psprintf(r->pool, "Invalid PUT: %s (WRITTEN: %i, SIZE: %i)", r->filename, total_written, statinfo.st_size));
        } else {
            ap_log_rerror(APLOG_MARK, APLOG_NOTICE, 0, r,
                          apr_psprintf(r->pool, "Successful PUT: %s (WRITTEN: %i)", r->filename, total_written));
        }
    }


Unfortunately, this has only decreased the number of lost files, but not eliminated it. I would also recommend that something like this is added to the actual mod_dav code.

The only thing I can think of is somehow the 500 process remains alive until the end of the 204 and then deletes the file. Also, the fact that 204 is being returned means the 204 request was writing over the 500's version of the file, so the 500 request has not finished when the 204 happens.

Thoughts???

[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux