A few questions and comments about a small subset of your ideas,  
since hillct has already put your patch to him on the bug muncher.

On 15-Jul-06, at 6:28 PM, Cory R. King wrote:

Regarding the comments table:

> * sid - no default.  Maybe make a story_comments and poll_comments  
> table so referential integrity can be enforced...

I'm not sure about this, for a few reasons. First of all, comments  
are comments; whether they're posted to stories or polls doesn't  
change their nature. Splitting them up would make several things  
harder, including comment searches. Second, Rusty went out of his way  
to allow "hidden sids" in scoop after the change from CGI parameter  
URLs to pseudo-paths, and while they aren't often used now that scoop  
has diaries, I don't really want to disable them. Yes, I know many  
people could consider hidden sids a bug, and it takes advantage of a  
lack of referential integrity. I just fail to see this as a problem.

> * pid - index.  FK to comments(cid).  Consider a nested tree  
> algorithm instead of parent-child: http://threebit.net/tutorials/ 
> nestedset/tutorial1.html
> * cid - make it auto_increment, drop the default value.

These two points seem to me to be two separate issues but not one per  
point.

* cid as globally unique index, pid with FK to cid
* nested tree algorithm for comment threading

Having the cid as a globally unique index doesn't strike me as being  
any more efficient, because comments are accessed in one of three ways:

1) grouped by story - you're using the sid to decide which comments  
to fetch
2) grouped by author - you're using the uid to decide which comments  
to fetch
3) search results - you're using the content of the comments to  
decide which comments to fetch

In none of those cases does having the cid as a unique index help in  
any way. Having cids unique within a given story is much more user- 
friendly, too; comment #25 is actually the 25th comment posted to  
that story, instead of the 25th comment in a story being #62649.

As for the nested tree algorithm, I took a look at it and it seemed  
interesting at first. When they drew the diagram showing the comment  
groupings, however, a red flag went up, and the short list of  
disadvantages confirmed what the drawing made me suspect. On every  
comment insert or delete, *every* comment in the table for that sid  
would need to have its position tracker updated so the groupings stay  
correct, with the associated time in updating the index to match.

A small improvement in search performance would come along with a  
huge deterioration in comment posting performance. Since search is  
generally used much less frequently than comment posting, this  
strikes me as a very bad strategy to take for Scoop.

I've been looking around to try and find better ways of storing the  
comment thread structure, but all of the stuff I can find are data  
structures suitable for handling such a tree in code, not for storing  
them in the database. I'm aware that the current strategy doesn't  
allow nifty things such as searching threads and comment  
relationships, but I've been unable to find anything better to date  
that allows that and doesn't degrade the performance and feature set  
we're accustomed to.

-janra