Cory, You were absolutely correct that this discussion should be taking place on the dev list. Most of the index modifications you mention are obvious when applied to databases that support foreign key relations but have little bearing when applied to mysql, for example where an index on topic ID is only valuable if enough queries were performed that would benefit from that index, to justify the additional database impact of maintaining the index. At least that was the reasoning when the schema was first designed back when resource allocation on server hardware was still an issue. These days you could add an index on any field without meaningful impact but you have to ask yourself, would there be meaningful benefit from the index? What percentage of queries over the course of a day on an average Scoop site would benefit from an index on 'tid' in the stories table? Certainly a few, but what percentage of the total? Which indexing alterations would result in the greatest benefit? Certainly making 'nickname' unique is a requirement. It's unfortunate that was overlooked when the user data structures were reorganized but it's important now, to also handle the erroneous data where in your case you said you had 50 conflicts (which could only have occurred if your code tree and database schema are or were at one time out of sync, or modified from the distribution). A mechanism must be established to correct the conflicts, the simplest being to update each but the first userid in a conflicting set and generate an automated email informing them of the change in userid. Regardless, a migration mechanism must be created. Other of the observations are a bit more questionable. Other than NULL, for example, what default sould be suitable for the field 'sid' in stories? There really isn't one, so the default of an empty string is no less appropriate than NULL and possibly more so since under certain circumstances when parsed as a string, NULL will no longer evaluate as FALSE. You asked about IP data types and rather than repeat the explanation I'll refer you here: http://arjen-lentz.livejournal.com/44290.html where you can see there are slightly more efficient ways to store IP addresses but using them relies on non-portable SQL queries. With regard to time fields (datetime and timestamp) what default other than beginning of the epoch or NOW() would you like to suggest? Perhaps YESTERDAY() or DATE_SUB(CURDATE(),INTERVAL 45 MINUTE) just to mess with people. Regardless, I look forward to your solution to the compatibility issues between databases and database versions. There have been a lot of advancements. When you're ready, the patch should be submitted as a series of ALTER statements so it can be applied to the schemas in currently deployed sites. Send me an email off-list and I'll be happy to review and test your patch for inclusion as soon as it's delivered. Best Regards, Colin Hill PS: Thanks for picking up the gauntlet on database schema cleanup. If on the other hand, you won't be following through and completing the work you've started , then good luck to you and have a nice weekend. scoop-help-request at lists.kuro5hin.org wrote: > Send Scoop-help mailing list submissions to > scoop-help at lists.kuro5hin.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.kuro5hin.org/mailman/listinfo/scoop-help > or, via email, send a message with subject or body 'help' to > scoop-help-request at lists.kuro5hin.org > > You can reach the person managing the list at > scoop-help-owner at lists.kuro5hin.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Scoop-help digest..." > > > Today's Topics: > > 1. Database changes? (Cory R. King) > 2. Re: Database changes? (William Scott Lockwood III) > 3. Re: Database changes? (Cory R. King) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sat, 15 Jul 2006 18:28:40 -0700 > From: "Cory R. King" <coryking at xlan.org> > Subject: [Scoop-help] Database changes? > To: scoop-help at lists.kuro5hin.org > Message-ID: <44B99648.6090909 at xlan.org> > Content-Type: text/plain; charset=UTF-8; format=flowed > > Ladies, Gentleman, > > Forgive me for posting dev stuff on this list, but I dev looks rather > dead. This is long, so if you don't care about the guts of scoop, don't > read :-) > > I'm looking over the latest and greatest scoop.sql and notice a severe > lack of indices, both unique and "regular". There are also some > downright disturbing "user friendly" default values added by MySQL. > > These are something I believe I can fix. I've included the current > schema definition for a couple major tables and added remarks below them: > > ----------------------------------------------------- > > CREATE TABLE `users` ( > `uid` int(11) NOT NULL auto_increment, > `nickname` varchar(50) default NULL, > `realemail` varchar(50) default NULL, > `origemail` varchar(50) default NULL, > `passwd` varchar(12) default NULL, > `trustlev` int(1) default NULL, > `perm_group` varchar(50) default NULL, > `mojo` varchar(5) default NULL, > `creation_ip` varchar(16) NOT NULL default '', > `creation_time` datetime NOT NULL default '0000-00-00 00:00:00', > `newpasswd` varchar(12) default NULL, > `pass_sent_at` datetime default NULL, > `is_new_account` tinyint(1) default NULL, > `creation_passwd` varchar(50) default NULL, > PRIMARY KEY (`uid`) > ) TYPE=MyISAM; > > Notes: > > * nickname - This *MUST* be unique index. Making this unique is VERY > important - when I tried adding this constraint on my existing data, I > found at least 50 duplicate nicknames in the database!! I don't care > that "the code enforces this", having unique nicknames is a critical > thing that MUST be forced by the database! > > * creation_time - for some reason or other, I put an index on this > column in my database. Don't recall why - probably for signup code. > > * creation_ip - should get one just to avoid a sequential scan when > creating accounts. Speaking of, does MySQL have a datatype for IP's? > > WARNING: Thinking of Postgres? THIS TABLE HAS DRAGONS! > Like the majority of databases, Postgres has CaSE SENsitiVE strings > that will break scoop's login system. The nickname "CorYKing" is > different then "coryking". Since forcing all the existing nicknames to > lower-case wasn't an option, login's were a HUGE problem for me. If > anybody is considering Postgres (and you should be), contact me for a > nice little case-insensitive, index-safe datatype I found. > > > ------------------------------------------------------------------------------------------------- > CREATE TABLE `stories` ( > `sid` varchar(20) NOT NULL default '', > `tid` varchar(20) NOT NULL default '', > `aid` int(11) NOT NULL default '0', > `title` varchar(100) default NULL, > `dept` varchar(100) default NULL, > `time` datetime NOT NULL default '0000-00-00 00:00:00', > `introtext` text, > `bodytext` text, > `writestatus` int(1) NOT NULL default '0', > `hits` int(1) NOT NULL default '0', > `section` varchar(30) NOT NULL default '', > `displaystatus` int(1) NOT NULL default '0', > `commentstatus` int(1) default NULL, > `totalvotes` int(11) NOT NULL default '0', > `score` int(11) NOT NULL default '0', > `rating` int(11) NOT NULL default '0', > `attached_poll` varchar(20) default NULL, > `sent_email` int(1) NOT NULL default '0', > `edit_category` tinyint(1) NOT NULL default '0', > PRIMARY KEY (`sid`), > KEY `section_idx` (`section`,`displaystatus`), > KEY `displaystatus_idx` (`displaystatus`), > FULLTEXT KEY `storysearch_idx` (`title`,`introtext`,`bodytext`) > ) TYPE=MyISAM; > > * aid - index, FK to users, no default. > * section - index, foreign key to sections. No default. > * tid - index, foreign key to topics. No Default, or make it nullable. > * sid - no default > * time - index?, no default > > ------------------------------------------------------------------------------------------------ > CREATE TABLE `comments` ( > `sid` varchar(30) NOT NULL default '', > `cid` int(15) NOT NULL default '0', > `pid` int(15) NOT NULL default '0', > `date` datetime default NULL, > `rank` int(1) default NULL, > `subject` varchar(50) NOT NULL default '', > `comment` text NOT NULL, > `pending` int(1) default '0', > `uid` int(1) NOT NULL default '-1', > `points` decimal(4,2) default NULL, > `lastmod` int(1) default '-1', > `sig_status` int(1) default '1', > `sig` varchar(160) default NULL, > `commentip` varchar(16) default NULL, > `pre_rating` decimal(4,2) default NULL, > PRIMARY KEY (`sid`,`cid`), > KEY `stuff` (`uid`,`pid`), > FULLTEXT KEY `commentsearch_idx` (`subject`,`comment`) > ) TYPE=MyISAM; > > * sid - no default. Maybe make a story_comments and poll_comments table > so referential integrity can be enforced... > * pid - index. FK to comments(cid). Consider a nested tree algorithm > instead of parent-child: > http://threebit.net/tutorials/nestedset/tutorial1.html > * cid - make it auto_increment, drop the default value. > * uid - index, FK to users (does the default make sense?) > > ---------------------------------------------------------------------------------------------- > > .... I could go on but you get my drift. In short: > > 1) Scoop's default schema file should include some reasonable indices on > each table. > > 2) The schema should not contain any mysql's "friendly" default values; > if a column is supposed to be NOT NULL and scoop isn't setting a value, > well then scoop should get an error and blow up! Last I checked, a > blank story id, nickname or section isn't valid... > > 3) I don't know much about MySQL's support, but appropriate foreign keys > should also be created. > > Postgres Notes: > 1) Beware of MySQL date handling. Once in Postgres, make it all > "timestamp with timezone". > > 2) Beware of MySQL strings. "CoryKing" is a different binary value then > "coryking" and should be treated as such in your database. MySQL treats > these as the same value. The rest of the world does. Beware.... > > Have a great weekend! > > -- > Cory R. King -- Scoophost.com - a service of Pinnacle Digital Scoop consulting and hosting services