Post #100,706
5/7/03 9:50:41 AM
|
Is this schema screwed up, or is it just me?
We track what vendors offer what services in what counties. The key values are: \r\n vendornum varchar(14)\r\n countycode varchar(4)\r\n template bigint(20) (This defines which service is offered.) \r\n\r\nWe are also interested in the state. The last two characters of the countycode are the state, so it can be derived from that or listed as its own field. Here's the current schema: \r\n\r\n \r\n \r\n coverage_service | \r\n \r\n \r\n idnum | bigint(20) | auto_increment | \r\n \r\n \r\n county_idnum | bigint(20) | | \r\n \r\n \r\n state_idnum | bigint(20) | | \r\n \r\n \r\n template | bigint(20) | | \r\n \r\n \r\n \r\n\r\n \r\n \r\n coverage_state | \r\n \r\n \r\n idnum | bigint(20) | auto_increment | \r\n \r\n \r\n vendornum | varchar(14) | | \r\n \r\n \r\n state | char(2) | | \r\n \r\n \r\n \r\n\r\n \r\n \r\n coverage_county | \r\n \r\n \r\n idnum | bigint(20) | auto_increment | \r\n \r\n \r\n state_idnum | bigint(20) | | \r\n \r\n \r\n county_code | char(4) | | \r\n \r\n \r\n \r\n\r\nSo in order to determine what vendors offer service in what counties, I have to join three tables: \r\n\r\n SELECT \r\n coverage_state.vendornum , \r\n coverage_county.countycode , \r\n coverage_service.template\r\nFROM \r\n coverage_service ,\r\n coverage_state ,\r\n coverage_county\r\nWHERE\r\n coverage_service.county_idnum = coverage_county.idnum\r\n AND coverage_service.state_idnum = coverage_state.idnum\r\n AND coverage_county.countycode = '$countycode' \r\n\r\nIf you're wondering why state_idnum is in coverage_county, I can't tell you. But whenever we add/delete/update coverage we have to keep everything in sync manually, because MySQL (as of the version this was written for) doesn't do triggers or foreign key constraints. \r\n\r\nI'm planning to change the schema to this: \r\n\r\n \r\n \r\n coverage_service | \r\n \r\n \r\n idnum | bigint(20) | auto_increment | \r\n \r\n \r\n vendornum | varchar(14) | | \r\n \r\n \r\n countycode | char(4) | | \r\n \r\n \r\n state | char(2) | | \r\n \r\n \r\n template | bigint(20) | | \r\n \r\n \r\n \r\n\r\nThen my query becomes: \r\n\r\n SELECT \r\n vendornum , \r\n countycode , \r\n template\r\nFROM\r\n coverage_service\r\nWHERE\r\n countycode = '$countycode' \r\n\r\nI might still keep the coverage_state table, as it provides a quick lookup of what vendors cover what states, but I could also do: \r\n\r\n SELECT DISTINCT\r\n vendornum\r\nFROM\r\n coverage_service\r\nWHERE\r\n state = '$state' \r\n\r\nI believe integer indexes are slightly faster to look up, but since this new schema would eliminate two joins I suspect it will be faster. Other than possible speed of the lookups, is there any reason to prefer the current schema to what I'm proposing?
===\r\n\r\nImplicitly condoning stupidity since 2001.
|
Post #100,715
5/7/03 10:31:20 AM
|
Your new schema looks good.
That's how I would have designed it in the first place.
[link|mailto:jbrabeck@attbi.com|Joe]
|
Post #100,719
5/7/03 10:46:26 AM
|
I guess what I'm looking for is ...
Is there any possible reason to have it structured the way it is now? I don't usually like to change things until I understand why they're the way they are to begin with. Problem is the guy who built this is long gone, and it's entirely possible he was just an idiot.
===
Implicitly condoning stupidity since 2001.
|
Post #100,721
5/7/03 10:57:34 AM
|
Guessing
Over normalized. When normalizing, keep going until all redundant data is removed. Then look at results to see if usage would be enhanced/performance increased by some "denormalization".
If you always need the county AND state then they both should be in the data table instead of requiring a lookup to find the state.
Looks like the developer didn't perform final analysis.
[link|mailto:jbrabeck@attbi.com|Joe]
|
Post #100,722
5/7/03 10:58:46 AM
|
Dependent Foreign Keys
If you're wondering why state_idnum is in coverage_county, I can't tell you. But whenever we add/delete/update coverage we have to keep everything in sync manually, because MySQL (as of the version this was written for) doesn't do triggers or foreign key constraints Well, the drill down would be you have 50 states and then within those 50 states you have n-counties. Not all county names are unique, so you may get a Smith county in both Texas and New Jersey. Anyhow, you'd segregate the two if you want to enforce a rule that says any selected county must be available for the selected state. Also helps when you want to restrict user input of counties based on the state input.
|
Post #100,724
5/7/03 11:05:16 AM
|
Can't happen, each county is unique
We're not using county name, we're using countycode. eg: Rockingham County New Hapmshire is '01NH'. So countycode is a sufficient identifier even without the state field. The only reason to still have the state field is that the state is the last two charachters of the countycode instead of the first. So joins on that would not be able to use the index.
We plan on changing to a new 5-digit countycode where the state will be the first two. Then we will be able to drop the state field entirely.
===
Implicitly condoning stupidity since 2001.
|
Post #100,726
5/7/03 11:15:42 AM
|
Still a function of integrity
How do you ensure that the countycd is consistent with the statecd?
Of course, if the statecd is truly embedded within the countycd, then the best course of action is to eliminate the statecd column, and just use a lookup table that maps county to state.
|
Post #100,730
5/7/03 11:18:53 AM
|
Don't even need a lookup table
The 2-letter state abbreviation is always the last two characters of the countycode. Like I said, the only reason to maintain state as a seperate field is that a join on RIGHT( countycode, 2 ) can't use the index.
===
Implicitly condoning stupidity since 2001.
|
Post #100,735
5/7/03 11:46:48 AM
|
Maybe it's just me
But I've always found JOINs (or any other structural operations) which are dependent upon string parsing to be rather inelegant (read: expensive, not portable, etc.). If you want to mash related values into a single value in order to perpetuate the user's perception that computers are cryptic, or to reduce typing, be my guest. But I would either place the state values in the county table (assuming there are no other referential constraints among state and county) and do a lookup, or create an intermediate view of county+state for what is rapidly turning out to be a "management-level" (read: data warehousing) query.
Many fears are born of stupidity and ignorance - Which you should be feeding with rumour and generalisation. BOfH, 2002 "Episode" 10
|
Post #100,736
5/7/03 11:51:52 AM
|
Agreed.
A separate state field should be kept for the queries. Data encoded in a key is a really, really bad idea.
Even if the state code is in the county code, it should be treated as a separate piece of data that just happens to be in the county code.
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #100,740
5/7/03 12:35:35 PM
|
That's how I've already set it up
Dropping the state field entirely was just a possibility. I can't remember if it was MySQL or Postgres, but one of them allows you to define an index on a portion of a string. Then joins against that same portion -- ie: LEFT( countycode, 2 ) -- would be as fast as against a separate field.
===
Implicitly condoning stupidity since 2001.
|
Post #100,742
5/7/03 12:41:23 PM
|
It's not a matter of speed.
Embedded data is a bad idea.
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #100,744
5/7/03 12:43:59 PM
|
Generally I agree
But in this case we're going to start using the oficial county codes from the USPS, and the data is already there. I wouldn't advocate creating it that way, but since we're going to be getting the data that way it seems reasonable.
===
Implicitly condoning stupidity since 2001.
|
Post #100,757
5/7/03 2:35:22 PM
|
Even so.
You put the official county code in one column, and extract the state for the other column.
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #103,049
5/22/03 2:00:35 AM
|
I agree with Malraux on this one
The county code could change in the future and no longer have state embedded in it. Best to keep an explicit State column around rather than parse the county ID to get State. "Smart ID's" are dangerous things.
________________ oop.ismad.com
|
Post #101,209
5/10/03 11:56:00 AM
|
Thoughts...
You're going from a 3rd normal form (possibly higher) to a single table. Generally speaking it's a bad idea in the database world (but - there are exceptions).
However, given your data, something is indeed strange. Why the hell are there 3 autoincrementing identifiers? The identifier for state (primary key) should be the 2 letter code (and nothing else). County should be a FK (state cd) with no primary (imo).
But that's just my opinion. YMMV
|
Post #101,398
5/12/03 7:35:55 AM
|
You've about got it
All we need in the primary table is the state and county foreign keys.[1] It looks like someone had just taken a class in normalization and went a little bit crazy.
[1] Actually just the county one, because the state is encapsulated within it, but I don't want to get Scott started again. ;)
===
Implicitly condoning stupidity since 2001.
|