Ladies, Gentleman,

Forgive me for posting dev stuff on this list, but I dev looks rather 
dead.  This is long, so if you don't care about the guts of scoop, don't 
read :-)

I'm looking over the latest and greatest scoop.sql and notice a severe 
lack of indices, both unique and "regular".  There are also some 
downright disturbing "user friendly" default values added by MySQL.

These are something I believe I can fix.  I've included the current 
schema definition for a couple major tables and added remarks below them:

-----------------------------------------------------

CREATE TABLE `users` (
   `uid` int(11) NOT NULL auto_increment,
   `nickname` varchar(50) default NULL,
   `realemail` varchar(50) default NULL,
   `origemail` varchar(50) default NULL,
   `passwd` varchar(12) default NULL,
   `trustlev` int(1) default NULL,
   `perm_group` varchar(50) default NULL,
   `mojo` varchar(5) default NULL,
   `creation_ip` varchar(16) NOT NULL default '',
   `creation_time` datetime NOT NULL default '0000-00-00 00:00:00',
   `newpasswd` varchar(12) default NULL,
   `pass_sent_at` datetime default NULL,
   `is_new_account` tinyint(1) default NULL,
   `creation_passwd` varchar(50) default NULL,
   PRIMARY KEY  (`uid`)
) TYPE=MyISAM;

Notes:

  * nickname - This *MUST* be unique index.  Making this unique is VERY 
important - when I tried adding this constraint on my existing data, I 
found at least 50 duplicate nicknames in the database!!  I don't care 
that "the code enforces this", having unique nicknames is a critical 
thing that MUST be forced by the database!

  * creation_time - for some reason or other, I put an index on this 
column in my database.  Don't recall why - probably for signup code.

  * creation_ip - should get one just to avoid a sequential scan when 
creating accounts.  Speaking of, does MySQL have a datatype for IP's?

WARNING: Thinking of Postgres?  THIS TABLE HAS DRAGONS!
Like the majority of databases, Postgres  has CaSE SENsitiVE strings 
that will break scoop's login system.  The nickname "CorYKing" is 
different then "coryking".  Since forcing all the existing nicknames to 
lower-case wasn't an option, login's were a HUGE problem for me.  If 
anybody is considering Postgres (and you should be), contact me for a 
nice little case-insensitive, index-safe datatype I found.


-------------------------------------------------------------------------------------------------
CREATE TABLE `stories` (
   `sid` varchar(20) NOT NULL default '',
   `tid` varchar(20) NOT NULL default '',
   `aid` int(11) NOT NULL default '0',
   `title` varchar(100) default NULL,
   `dept` varchar(100) default NULL,
   `time` datetime NOT NULL default '0000-00-00 00:00:00',
   `introtext` text,
   `bodytext` text,
   `writestatus` int(1) NOT NULL default '0',
   `hits` int(1) NOT NULL default '0',
   `section` varchar(30) NOT NULL default '',
   `displaystatus` int(1) NOT NULL default '0',
   `commentstatus` int(1) default NULL,
   `totalvotes` int(11) NOT NULL default '0',
   `score` int(11) NOT NULL default '0',
   `rating` int(11) NOT NULL default '0',
   `attached_poll` varchar(20) default NULL,
   `sent_email` int(1) NOT NULL default '0',
   `edit_category` tinyint(1) NOT NULL default '0',
   PRIMARY KEY  (`sid`),
   KEY `section_idx` (`section`,`displaystatus`),
   KEY `displaystatus_idx` (`displaystatus`),
   FULLTEXT KEY `storysearch_idx` (`title`,`introtext`,`bodytext`)
) TYPE=MyISAM;

  * aid - index, FK to users, no default.
  * section - index, foreign key to sections.  No default.
  * tid - index, foreign key to topics.  No Default, or make it nullable.
  * sid - no default
  * time - index?, no default

------------------------------------------------------------------------------------------------
CREATE TABLE `comments` (
   `sid` varchar(30) NOT NULL default '',
   `cid` int(15) NOT NULL default '0',
   `pid` int(15) NOT NULL default '0',
   `date` datetime default NULL,
   `rank` int(1) default NULL,
   `subject` varchar(50) NOT NULL default '',
   `comment` text NOT NULL,
   `pending` int(1) default '0',
   `uid` int(1) NOT NULL default '-1',
   `points` decimal(4,2) default NULL,
   `lastmod` int(1) default '-1',
   `sig_status` int(1) default '1',
   `sig` varchar(160) default NULL,
   `commentip` varchar(16) default NULL,
   `pre_rating` decimal(4,2) default NULL,
   PRIMARY KEY  (`sid`,`cid`),
   KEY `stuff` (`uid`,`pid`),
   FULLTEXT KEY `commentsearch_idx` (`subject`,`comment`)
) TYPE=MyISAM;

* sid - no default.  Maybe make a story_comments and poll_comments table 
so referential integrity can be enforced...
* pid - index.  FK to comments(cid).  Consider a nested tree algorithm 
instead of parent-child: 
http://threebit.net/tutorials/nestedset/tutorial1.html
* cid - make it auto_increment, drop the default value.
* uid - index, FK to users (does the default make sense?)

----------------------------------------------------------------------------------------------

.... I could go on but you get my drift.  In short:

1) Scoop's default schema file should include some reasonable indices on 
each table.

2) The schema should not contain any mysql's "friendly" default values; 
if a column is supposed to be NOT NULL and scoop isn't setting a value, 
well then scoop should get an error and blow up!  Last I checked, a 
blank story id, nickname or section isn't valid...

3) I don't know much about MySQL's support, but appropriate foreign keys 
should also be created.

Postgres Notes:
1) Beware of MySQL date handling.  Once in Postgres, make it all 
"timestamp with timezone".

2) Beware of MySQL strings.  "CoryKing" is a different binary value then 
"coryking" and should be treated as such in your database.  MySQL treats 
these as the same value.  The rest of the world does.  Beware....

Have a great weekend!

--
Cory R. King