Cory,
   You were absolutely correct that this discussion should be taking
place on the dev list. Most of the index modifications you mention are
obvious when applied to databases that support foreign key relations but
have little bearing when applied to mysql, for example where an index on
topic ID is only valuable if enough queries were performed that would
benefit from that index, to justify the additional database impact of
maintaining the index. At least that was the reasoning when the schema
was first designed back when resource allocation on server hardware was
still an issue. These days you could add an index on any field without
meaningful impact but you have to ask yourself, would there be
meaningful benefit from the index? What percentage of queries over the
course of a day on an average Scoop site would benefit from an index on
'tid' in the stories table? Certainly a few, but what percentage of the
total? Which indexing alterations would result in the greatest benefit?
   Certainly making 'nickname' unique is a requirement. It's unfortunate
that was overlooked when the user data structures were reorganized but
it's important now, to also handle the erroneous data where in your case
you said you had 50 conflicts (which could only have occurred if your
code tree and database schema are or were at one time out of sync, or
modified from the distribution). A mechanism must be established to
correct the conflicts, the simplest being to update each  but the first
userid in a conflicting set and generate an automated email informing
them of the change in userid. Regardless, a migration mechanism must be
created.
   Other of the observations are a bit more questionable. Other than
NULL, for example, what default sould be suitable for the field 'sid' in
stories? There really isn't one, so the default of an empty string is no
less appropriate than NULL and possibly more so since under certain
circumstances when parsed as a string, NULL will no longer evaluate as
FALSE.
   You asked about IP data types and rather than repeat the explanation
I'll refer you here: http://arjen-lentz.livejournal.com/44290.html where
you can see there are slightly more efficient ways to store IP addresses
but using them relies on non-portable SQL queries.
   With regard to time fields (datetime and timestamp) what default
other than beginning of the epoch or NOW() would you like to suggest?
Perhaps YESTERDAY() or DATE_SUB(CURDATE(),INTERVAL 45 MINUTE) just to
mess with people.
   Regardless, I look forward to your solution to the compatibility
issues between databases and database versions. There have been a lot of
advancements. When you're ready, the patch should be submitted as a
series of ALTER statements so it can be applied to the schemas in
currently deployed sites. Send me an email off-list and I'll be happy to
review and test your patch for inclusion as soon as it's delivered.

Best Regards,
Colin Hill

PS: Thanks for picking up the gauntlet on database schema cleanup. If on
the other hand, you won't be following through and completing the work
you've started , then good luck to you and have a nice weekend.

scoop-help-request at lists.kuro5hin.org wrote:
> Send Scoop-help mailing list submissions to
> 	scoop-help at lists.kuro5hin.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://lists.kuro5hin.org/mailman/listinfo/scoop-help
> or, via email, send a message with subject or body 'help' to
> 	scoop-help-request at lists.kuro5hin.org
> 
> You can reach the person managing the list at
> 	scoop-help-owner at lists.kuro5hin.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Scoop-help digest..."
> 
> 
> Today's Topics:
> 
>    1. Database changes? (Cory R. King)
>    2. Re: Database changes? (William Scott Lockwood III)
>    3. Re: Database changes? (Cory R. King)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Sat, 15 Jul 2006 18:28:40 -0700
> From: "Cory R. King" <coryking at xlan.org>
> Subject: [Scoop-help] Database changes?
> To: scoop-help at lists.kuro5hin.org
> Message-ID: <44B99648.6090909 at xlan.org>
> Content-Type: text/plain; charset=UTF-8; format=flowed
> 
> Ladies, Gentleman,
> 
> Forgive me for posting dev stuff on this list, but I dev looks rather 
> dead.  This is long, so if you don't care about the guts of scoop, don't 
> read :-)
> 
> I'm looking over the latest and greatest scoop.sql and notice a severe 
> lack of indices, both unique and "regular".  There are also some 
> downright disturbing "user friendly" default values added by MySQL.
> 
> These are something I believe I can fix.  I've included the current 
> schema definition for a couple major tables and added remarks below them:
> 
> -----------------------------------------------------
> 
> CREATE TABLE `users` (
>    `uid` int(11) NOT NULL auto_increment,
>    `nickname` varchar(50) default NULL,
>    `realemail` varchar(50) default NULL,
>    `origemail` varchar(50) default NULL,
>    `passwd` varchar(12) default NULL,
>    `trustlev` int(1) default NULL,
>    `perm_group` varchar(50) default NULL,
>    `mojo` varchar(5) default NULL,
>    `creation_ip` varchar(16) NOT NULL default '',
>    `creation_time` datetime NOT NULL default '0000-00-00 00:00:00',
>    `newpasswd` varchar(12) default NULL,
>    `pass_sent_at` datetime default NULL,
>    `is_new_account` tinyint(1) default NULL,
>    `creation_passwd` varchar(50) default NULL,
>    PRIMARY KEY  (`uid`)
> ) TYPE=MyISAM;
> 
> Notes:
> 
>   * nickname - This *MUST* be unique index.  Making this unique is VERY 
> important - when I tried adding this constraint on my existing data, I 
> found at least 50 duplicate nicknames in the database!!  I don't care 
> that "the code enforces this", having unique nicknames is a critical 
> thing that MUST be forced by the database!
> 
>   * creation_time - for some reason or other, I put an index on this 
> column in my database.  Don't recall why - probably for signup code.
> 
>   * creation_ip - should get one just to avoid a sequential scan when 
> creating accounts.  Speaking of, does MySQL have a datatype for IP's?
> 
> WARNING: Thinking of Postgres?  THIS TABLE HAS DRAGONS!
> Like the majority of databases, Postgres  has CaSE SENsitiVE strings 
> that will break scoop's login system.  The nickname "CorYKing" is 
> different then "coryking".  Since forcing all the existing nicknames to 
> lower-case wasn't an option, login's were a HUGE problem for me.  If 
> anybody is considering Postgres (and you should be), contact me for a 
> nice little case-insensitive, index-safe datatype I found.
> 
> 
> -------------------------------------------------------------------------------------------------
> CREATE TABLE `stories` (
>    `sid` varchar(20) NOT NULL default '',
>    `tid` varchar(20) NOT NULL default '',
>    `aid` int(11) NOT NULL default '0',
>    `title` varchar(100) default NULL,
>    `dept` varchar(100) default NULL,
>    `time` datetime NOT NULL default '0000-00-00 00:00:00',
>    `introtext` text,
>    `bodytext` text,
>    `writestatus` int(1) NOT NULL default '0',
>    `hits` int(1) NOT NULL default '0',
>    `section` varchar(30) NOT NULL default '',
>    `displaystatus` int(1) NOT NULL default '0',
>    `commentstatus` int(1) default NULL,
>    `totalvotes` int(11) NOT NULL default '0',
>    `score` int(11) NOT NULL default '0',
>    `rating` int(11) NOT NULL default '0',
>    `attached_poll` varchar(20) default NULL,
>    `sent_email` int(1) NOT NULL default '0',
>    `edit_category` tinyint(1) NOT NULL default '0',
>    PRIMARY KEY  (`sid`),
>    KEY `section_idx` (`section`,`displaystatus`),
>    KEY `displaystatus_idx` (`displaystatus`),
>    FULLTEXT KEY `storysearch_idx` (`title`,`introtext`,`bodytext`)
> ) TYPE=MyISAM;
> 
>   * aid - index, FK to users, no default.
>   * section - index, foreign key to sections.  No default.
>   * tid - index, foreign key to topics.  No Default, or make it nullable.
>   * sid - no default
>   * time - index?, no default
> 
> ------------------------------------------------------------------------------------------------
> CREATE TABLE `comments` (
>    `sid` varchar(30) NOT NULL default '',
>    `cid` int(15) NOT NULL default '0',
>    `pid` int(15) NOT NULL default '0',
>    `date` datetime default NULL,
>    `rank` int(1) default NULL,
>    `subject` varchar(50) NOT NULL default '',
>    `comment` text NOT NULL,
>    `pending` int(1) default '0',
>    `uid` int(1) NOT NULL default '-1',
>    `points` decimal(4,2) default NULL,
>    `lastmod` int(1) default '-1',
>    `sig_status` int(1) default '1',
>    `sig` varchar(160) default NULL,
>    `commentip` varchar(16) default NULL,
>    `pre_rating` decimal(4,2) default NULL,
>    PRIMARY KEY  (`sid`,`cid`),
>    KEY `stuff` (`uid`,`pid`),
>    FULLTEXT KEY `commentsearch_idx` (`subject`,`comment`)
> ) TYPE=MyISAM;
> 
> * sid - no default.  Maybe make a story_comments and poll_comments table 
> so referential integrity can be enforced...
> * pid - index.  FK to comments(cid).  Consider a nested tree algorithm 
> instead of parent-child: 
> http://threebit.net/tutorials/nestedset/tutorial1.html
> * cid - make it auto_increment, drop the default value.
> * uid - index, FK to users (does the default make sense?)
> 
> ----------------------------------------------------------------------------------------------
> 
> .... I could go on but you get my drift.  In short:
> 
> 1) Scoop's default schema file should include some reasonable indices on 
> each table.
> 
> 2) The schema should not contain any mysql's "friendly" default values; 
> if a column is supposed to be NOT NULL and scoop isn't setting a value, 
> well then scoop should get an error and blow up!  Last I checked, a 
> blank story id, nickname or section isn't valid...
> 
> 3) I don't know much about MySQL's support, but appropriate foreign keys 
> should also be created.
> 
> Postgres Notes:
> 1) Beware of MySQL date handling.  Once in Postgres, make it all 
> "timestamp with timezone".
> 
> 2) Beware of MySQL strings.  "CoryKing" is a different binary value then 
> "coryking" and should be treated as such in your database.  MySQL treats 
> these as the same value.  The rest of the world does.  Beware....
> 
> Have a great weekend!
> 
> --
> Cory R. King

-- 
Scoophost.com - a service of Pinnacle Digital
Scoop consulting and hosting services