Re: Search then Delete Performance

Arjen Nienhuis <a.g.nienhuis@xxxxxxxxx> · Wed, 15 Sep 2010 04:39:19 +0200

Hi,

It's probably slow because you run many queries where a few would work:

DELETE FROM unassignedjobs WHERE jobid IN (6, 8 ,2, 99, 66)

But I wouldn't know how to build a query like that in C. A script in
python or even bash that dit it would be faster than your C
implementation.

What you can do in C is this:

sprintf(
   buffer,
  "INSERT INTO assignedjobs (jobid,nodeid)\n"
  "SELECT jobid, '%s' from unassignedjobs LIMIT %d\n",
  nodename.c_str(), number
);

Some smart SQL with some 'RETURNING' clauses could run the whole
assignJobs function in a single query and it would be a lot faster.

Met vriendelijke groet,
Arjen Nienhuis

On Wed, Sep 15, 2010 at 2:55 AM, Michael Hull <mikehulluk@xxxxxxxxxxxxxx> wrote:
> Hi Everyone,
> I am fairly new to practical databases, but I am trying out the c
> interface to postgres and am wondering how to improve performance. I
> am a researcher, and I am trying to perform a large parameter sweep.
> Since this will involve a couple of thousand simulations, I have a
> process that manages which simulations have been done, and which still
> need to be done, so I can run it easily on a cluster.
>
> So, I have a fairly simple schema of 4 tables.
>
> -- na, ca,ks,kf,lk,iinj are the parameters for my simulation.
> CREATE TABLE alljobs (
>        id SERIAL,
>        ca int,
>        na int,
>        lk int,
>        ks int,
>        kf int,
>        iinj int,
>        PRIMARY KEY(id)
> );
>
>
> CREATE TABLE assignedjobs (
>        jobid int,
>        nodeid varchar(100),
>        assignedtime timestamp,
>        PRIMARY KEY(jobid)
> );
>
>
> CREATE TABLE completedjobs (
>        jobid int,
>        PRIMARY KEY(jobid)
> );
>
> CREATE TABLE unassignedjobs(
>        jobid int,
>        PRIMARY KEY(jobid)
> );
>
>
>
> alljobs is initially populated, and contains all the simulations that
> will ever be run
> unassignedjobs contains the ids in alljobs that havent been run yet
> assignedjobs contains the ids in alljobs that have been dispatched to
> some cpu on the cluster and are currently simulating
> completedjobs contains all the completed jobs.
>
> So fairly simply, I have a daemon running on a machine, which accesses
> this DB. Clients connect and request the details for say 1000
> simulations, at which point the daemon takes 1000 entries from the
> unassigned table and moves them to the assigned table. The once the
> client is finished with those jobs, it signals this to the daemon,
> which then move those jobs from 'assigned' to 'complete'.
>
> So this is fairly simple to implement, but my problem is that it is very slow.
>
>
> In particular, I have a 'select' waiting for network connections,
> which then calls this function:
>
> typedef vector<long> VectorLong;
> VectorLong assignJobs(PGconn* pDB, int number, string nodename)
> {
>        char buffer[1000];
>        sprintf(buffer,"SELECT jobid from unassignedjobs LIMIT  %d",number);
>        PGresult* pRes = PQexec(pDB, buffer);
>
>        printf("assigning jobs");
>
>        //PGresult* pRes = PQexec(pDB, "SELECT * from alljobs LIMIT 100");
>        //PGresult* pRes = PQexec(pDB, "SELECT * from alljobs");
>
>        int nRes = PQntuples(pRes);
>        printf("Results found: %d",nRes);
>
>        VectorLong jobs;
>        for(int i=0;i<nRes;i++)
>        {
>                long id = atol( PQgetvalue(pRes,i,0) );
>                cout << id << " ";
>                jobs.push_back(id);
>
>                sprintf(buffer, "DELETE FROM unassignedjobs WHERE jobid = %ld", id);
>                PQexec(pDB, buffer);
>
>                sprintf(buffer, "INSERT INTO assignedjobs (jobid,nodeid) VALUES
> (%ld, %s)", id, nodename.c_str() );
>                PQexec(pDB, buffer);
>        }
>
>
>        return jobs;
> }
>
> but it is painfully slow. I was wondering if there is a way to improve
> this? I feel there should be since I already have a 'pointer' to the
> rows I want to delete.
>
>
> Any help would be greatly appreciated.
>
> Many thanks
>
>
> Mike Hull
>
> --
> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general