A few questions and comments about a small subset of your ideas, since hillct has already put your patch to him on the bug muncher. On 15-Jul-06, at 6:28 PM, Cory R. King wrote: Regarding the comments table: > * sid - no default. Maybe make a story_comments and poll_comments > table so referential integrity can be enforced... I'm not sure about this, for a few reasons. First of all, comments are comments; whether they're posted to stories or polls doesn't change their nature. Splitting them up would make several things harder, including comment searches. Second, Rusty went out of his way to allow "hidden sids" in scoop after the change from CGI parameter URLs to pseudo-paths, and while they aren't often used now that scoop has diaries, I don't really want to disable them. Yes, I know many people could consider hidden sids a bug, and it takes advantage of a lack of referential integrity. I just fail to see this as a problem. > * pid - index. FK to comments(cid). Consider a nested tree > algorithm instead of parent-child: http://threebit.net/tutorials/ > nestedset/tutorial1.html > * cid - make it auto_increment, drop the default value. These two points seem to me to be two separate issues but not one per point. * cid as globally unique index, pid with FK to cid * nested tree algorithm for comment threading Having the cid as a globally unique index doesn't strike me as being any more efficient, because comments are accessed in one of three ways: 1) grouped by story - you're using the sid to decide which comments to fetch 2) grouped by author - you're using the uid to decide which comments to fetch 3) search results - you're using the content of the comments to decide which comments to fetch In none of those cases does having the cid as a unique index help in any way. Having cids unique within a given story is much more user- friendly, too; comment #25 is actually the 25th comment posted to that story, instead of the 25th comment in a story being #62649. As for the nested tree algorithm, I took a look at it and it seemed interesting at first. When they drew the diagram showing the comment groupings, however, a red flag went up, and the short list of disadvantages confirmed what the drawing made me suspect. On every comment insert or delete, *every* comment in the table for that sid would need to have its position tracker updated so the groupings stay correct, with the associated time in updating the index to match. A small improvement in search performance would come along with a huge deterioration in comment posting performance. Since search is generally used much less frequently than comment posting, this strikes me as a very bad strategy to take for Scoop. I've been looking around to try and find better ways of storing the comment thread structure, but all of the stuff I can find are data structures suitable for handling such a tree in code, not for storing them in the database. I'm aware that the current strategy doesn't allow nifty things such as searching threads and comment relationships, but I've been unable to find anything better to date that allows that and doesn't degrade the performance and feature set we're accustomed to. -janra