Matt Domsch wrote:
On Fri, Nov 02, 2007 at 11:06:11AM -0700, Toshio Kuratomi wrote:
Chuck Anderson wrote:
Won't there be performance problems with a TurboGears-based wiki? I
thought MirrorManager was having issues with TG performance and had to
enable form-data caching to get acceptable performance at the cost of
possibly stale data. I don't know the details behind it, but that was
the reason I was given for why when you edit forms in MM it sometimes
returns old pre-edit field values.
We might have performance issues but I'm confident they'll be different
performance issues than we're currently experiencing ;-)
The issues we're running into with moin right now are largely caused by
Moin's philosophy of having to run off the filesystem, not a db. This
means 1) we're unable to spread the load among multiple different app
servers so we are constrained to a single server's memory and CPU
resources, 2) it makes multiple views of data much harder than it needs
to (in the subscription list case, Moin has to walk the filesystem,
finding each user's prefs file, parsing it for a watchlist, if the
watchlist exists, checking if the page and page categories are in that
watchlist, and finally being able to send the notification. With a db,
we'd have a separate table for the watchlist and have indexes for the
userid and the pagename. Searching for a page wouldn't have to open a
file for every single one of our users, instead it would access a single
table and pull out the users which were in the watchlist.)
With MirrorManager I know we've had memory and db query speed issues
trying to serve the mirrorlist directly from the TG app. I wasn't aware
that mirrormanager was having trouble keeping up with it's management
functionality, Matt is that still true or is caching a leftover from
when the two functions were combined?
I'm sure it's still true, it predated having any mirrorlist
functionality at all.
The short story is, TG (well, SQLObject) either caches data very
aggresively, so you can see stale data on changes, or not at all, so
each field read in each row results in a DB query. Even with
object.sync() calls scattered through the UI actions like I did,
leaving caching enabled we do still see stale data on occasion.
Disabling caching, generating the UI pages or certainly the publiclist
pages takes _forever_, hundreds of thousands of small DB queries.
Maybe SQLAlchemy has a better caching mechanism, I don't know.
I've just taken an extremely quick look at this and I don't know where
the stale data problem is coming from, but it does look like SQLObject
could make more db calls than SQLAlchemy even with caching on. The
first part of this is okay::
In [30]: import model
In [31]: sites = model.Site.select(orderBy='name')
In [32]: for site in sites:
....: pass
....:
1/Select : SELECT site.id, site.name, site.password, site.org_url,
site.private, site.admin_active, site.user_active, site.created_at,
site.created_by, site.all_sites_can_pull_from_me,
site.downstream_comments FROM site WHERE 1 = 1 ORDER BY name
1/QueryR : SELECT site.id, site.name, site.password, site.org_url,
site.private, site.admin_active, site.user_active, site.created_at,
site.created_by, site.all_sites_can_pull_from_me,
site.downstream_comments FROM site WHERE 1 = 1 ORDER BY name
1/COMMIT : auto
This second part is inefficient::
In [33]: site.hosts
1/QueryAll: SELECT id FROM host WHERE site_id = (173)
1/QueryR : SELECT id FROM host WHERE site_id = (173)
1/COMMIT : auto
1/QueryAll: SELECT id FROM host WHERE site_id = (173)
1/QueryR : SELECT id FROM host WHERE site_id = (173)
1/COMMIT : auto
Out[33]:
[Snip values of site.hosts]
In [34]: site.hosts
1/QueryAll: SELECT id FROM host WHERE site_id = (173)
1/QueryR : SELECT id FROM host WHERE site_id = (173)
1/COMMIT : auto
1/QueryAll: SELECT id FROM host WHERE site_id = (173)
1/QueryR : SELECT id FROM host WHERE site_id = (173)
1/COMMIT : auto
Out[34]:
The list of hosts is retrieved from the db each time the variable is
accessed even though caching is enabled. This will make a difference if
you access a variable more than once, for instance, printing all the
site.hosts.name in a menu of links at the top of the page and then
looping through site.hosts to print out a complete record for each.
For the stale data problem I'd have to know how to reproduce it. Is the
data stale when two people are editing the same information? Is it
stale on a page refresh? Etc.
-Toshio
_______________________________________________
Fedora-infrastructure-list mailing list
Fedora-infrastructure-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-infrastructure-list