Hi, It's probably slow because you run many queries where a few would work: DELETE FROM unassignedjobs WHERE jobid IN (6, 8 ,2, 99, 66) But I wouldn't know how to build a query like that in C. A script in python or even bash that dit it would be faster than your C implementation. What you can do in C is this: sprintf( buffer, "INSERT INTO assignedjobs (jobid,nodeid)\n" "SELECT jobid, '%s' from unassignedjobs LIMIT %d\n", nodename.c_str(), number ); Some smart SQL with some 'RETURNING' clauses could run the whole assignJobs function in a single query and it would be a lot faster. Met vriendelijke groet, Arjen Nienhuis On Wed, Sep 15, 2010 at 2:55 AM, Michael Hull <mikehulluk@xxxxxxxxxxxxxx> wrote: > Hi Everyone, > I am fairly new to practical databases, but I am trying out the c > interface to postgres and am wondering how to improve performance. I > am a researcher, and I am trying to perform a large parameter sweep. > Since this will involve a couple of thousand simulations, I have a > process that manages which simulations have been done, and which still > need to be done, so I can run it easily on a cluster. > > So, I have a fairly simple schema of 4 tables. > > -- na, ca,ks,kf,lk,iinj are the parameters for my simulation. > CREATE TABLE alljobs ( > id SERIAL, > ca int, > na int, > lk int, > ks int, > kf int, > iinj int, > PRIMARY KEY(id) > ); > > > CREATE TABLE assignedjobs ( > jobid int, > nodeid varchar(100), > assignedtime timestamp, > PRIMARY KEY(jobid) > ); > > > CREATE TABLE completedjobs ( > jobid int, > PRIMARY KEY(jobid) > ); > > CREATE TABLE unassignedjobs( > jobid int, > PRIMARY KEY(jobid) > ); > > > > alljobs is initially populated, and contains all the simulations that > will ever be run > unassignedjobs contains the ids in alljobs that havent been run yet > assignedjobs contains the ids in alljobs that have been dispatched to > some cpu on the cluster and are currently simulating > completedjobs contains all the completed jobs. > > So fairly simply, I have a daemon running on a machine, which accesses > this DB. Clients connect and request the details for say 1000 > simulations, at which point the daemon takes 1000 entries from the > unassigned table and moves them to the assigned table. The once the > client is finished with those jobs, it signals this to the daemon, > which then move those jobs from 'assigned' to 'complete'. > > So this is fairly simple to implement, but my problem is that it is very slow. > > > In particular, I have a 'select' waiting for network connections, > which then calls this function: > > typedef vector<long> VectorLong; > VectorLong assignJobs(PGconn* pDB, int number, string nodename) > { > char buffer[1000]; > sprintf(buffer,"SELECT jobid from unassignedjobs LIMIT %d",number); > PGresult* pRes = PQexec(pDB, buffer); > > printf("assigning jobs"); > > //PGresult* pRes = PQexec(pDB, "SELECT * from alljobs LIMIT 100"); > //PGresult* pRes = PQexec(pDB, "SELECT * from alljobs"); > > int nRes = PQntuples(pRes); > printf("Results found: %d",nRes); > > VectorLong jobs; > for(int i=0;i<nRes;i++) > { > long id = atol( PQgetvalue(pRes,i,0) ); > cout << id << " "; > jobs.push_back(id); > > sprintf(buffer, "DELETE FROM unassignedjobs WHERE jobid = %ld", id); > PQexec(pDB, buffer); > > sprintf(buffer, "INSERT INTO assignedjobs (jobid,nodeid) VALUES > (%ld, %s)", id, nodename.c_str() ); > PQexec(pDB, buffer); > } > > > return jobs; > } > > but it is painfully slow. I was wondering if there is a way to improve > this? I feel there should be since I already have a 'pointer' to the > rows I want to delete. > > > Any help would be greatly appreciated. > > Many thanks > > > Mike Hull > > -- > Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general