IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Can you post
A few records of your raw data and what the result should look like from that data. I must be getting old 'cuz I don't quite follow what you are trying to accomplish. Or it could be I just like working with "visual" information. I keep thinking that the solution is relatively straight forward so I must be missing something.
New 'fraid not
With the Internet policy at work, I can't get any information out without surreptitiously installing ssh software on one of the internet terminals, which is likely to get me asked to leave and not come back. When I need to get help from lists or newsgroups, I have to write down error messages etc on a piece of paper and sneak it out the door.

Here's some bogus data that might help explain what I'm dealing with:
\na  b  date   c   d   e\n----------------------\na  1  1999   1   2   3\na  2  2000   1   2   3\na  1  1999   2   2   3\na  1  1999   2   3   3


The new table will have primary key (a,b), and is supposed to contain all the data in the most recent record from the original table for each a,b item.
\nselect B.* from \n  (select a,b, max(date) as xDate from tbl group by a,b) as A,\n  tbl as B where B.date = a.xDate


max(date) works for a,2 because the data is nice and there is only one last record, but for a,1 it will return 3 records. To pare that down, I decide that c is the next most important field and some values are more important than others, and for the sake of simplicity max(c) will return the most important value.
\nselect B.* from\n  (select * from\n    (select a,b, max(date) as xDate from tbl group by a,b) as AA,\n    (select a,b, date, max(c) as xC from tbl group by a,b,date) as BB\n    where BB.date = AA.xDate\n  ) as A, tbl as B where B.c = A.xC


That still leave duplicates, so I pick the next most important field. As the query takes three hours to run, I start to think that I'm going about this the wrong way, but I don't see another way of doing it.
New pump to file, cat file |sort -k3 >outputfile

"You're just like me streak. You never left the free-fire zone.You think aspirins and meetings and cold showers are going to clean out your head. What you want is God's permission to paint the trees with the bad guys. That wont happen big mon." Clete
questions, help? [link|mailto:pappas@catholic.org|email pappas at catholic.org]
New If you're allowed correlated subquery, then:
\nSELECT a.*\n  FROM table_name aa,\n       (SELECT a, b, max(date)\n          FROM table_name\n         GROUP BY a, b) bb\n WHERE aa.a = bb.a\n   AND aa.b = bb.b\n   AND aa.date = bb.date\n

You shouldn't need more than one subquery to do this.

Edit: Sorry. I didn't see what you said about getting multiples of "a, b, date" back from the GROUP BY in the subquery. If that's the case, and you need to go to a variable number of tie-breakers, then you've effectively blown the limits of relational calculus.

IMO, your only choice at this point is to drop to a procedural language, stream the records by, and pick-out the ones you want to keep. Straight SQL simply will not work if you have *any* duplicate records. Even if each row is unique, nested subqueries will make your SELECT move like pondwater.


Second Edit:

Actually, combining with Ben's solution, you can do something like:

\nselect aa.*\n  from table aa,\n       (select a, b, \n               max(to_char(date)||c||d||e) sortkey\n          from table) bb\n where aa.a = bb.a\n   and aa.b = bb.b\n   and to_char(aa.date)||aa.c||aa.d||aa.e = bb.sortkey\n


Like I said above, though, if you have a duplicate row, then you're screwed. Unless, of course, you're using Oracle and can append the 'rowid' psuedocolumn to the sortkey field.
Expand Edited by morganek April 16, 2004, 11:33:52 PM EDT
Expand Edited by morganek April 16, 2004, 11:43:11 PM EDT
New Re: If you're allowed correlated subquery, then:
combining with Ben's solution, you can do something like:
\nselect aa.*\n  from table aa,\n       (select a, b, \n               max(to_char(date)||c||d||e) sortkey\n          from table) bb\n where aa.a = bb.a\n   and aa.b = bb.b\n   and to_char(aa.date)||aa.c||aa.d||aa.e = bb.sortkey\n


Thanks, I think that'll do it.

I didn't understand what Ben meant when I first read his post, but it looks clear in retrospect. I'm in the habit of thinking of data as discrete objects and didn't consider stringing their text together.
New Re: 'fraid not
I am going to assume that in the real table there is some data that would preclude you simply doing this.

select a,b, max(date), max(c), max(d) from tbl group by a,b

I can see two possible solutions other then what Morgan and Ben suggested.

The ugliest is to use a loop and scan through the records one by one. Create an index that contains all the fields you want to include in the selection critera and then use that to pick through the records one by one, inserting the records you want into a second table.

I do not think you need to go that far here, but I have ended up using them when the critera for selecting records would not condense into a single query or when the critera depended on looking at more then one record at a time.

The second goes like this. Run a query to group by (a,b) and stick them in the second table, with data filled in with nulls. Then do a series of queries like this to fill in the data.
\nUpdate tbl2 set (all the data fields) from\n  (select B.* from \n    (select a,b, max(date) as xDate, count(1) as RowCount, c,d\n      from tbl group by a,b,c,d) as A,\n  tbl as B where B.date = a.xDate and RowCount = 1) as GD,\ntbl2 where GD.a = tbl2.a and GD.b = tbl2.b and tbl2.date is null\n

The idea in this case is that query picks out all the rows where there is only one date and moves them over. Then you can write another query that looks at the next field and only moves them over and so on.

I've used this a couple of times when 90% or more of the records where clean and there where only a handful of records that needed scrubbing, often by hand.

Given what you have said, in SLQ Sever I would probably end up using the same solution you came up with, but would put a primary key on the table and create an index that covers the fields that are part of the critera.

Jay
New Is "SQL Sever" intentional? (Regardless, I like it. :-)
To deny the indirect purchaser, who in this case is the ultimate purchaser, the right to seek relief from unlawful conduct, would essentially remove the word consumer from the Consumer Protection Act
- [link|http://www.techworld.com/opsys/news/index.cfm?NewsID=1246&Page=1&pagePos=20|Nebraska Supreme Court]
     Slow query needs optimizing - (tangaroa) - (19)
         Re: Slow query needs optimizing - (pwhysall) - (3)
             Just imagine a Beowolf cluster of those. -NT - (ChrisR) - (2)
                 Great, now we're turning into /. - (inthane-chan)
                 Running DOSiX! -NT - (Another Scott)
         Can you post - (jbrabeck) - (6)
             'fraid not - (tangaroa) - (5)
                 pump to file, cat file |sort -k3 >outputfile - (boxley)
                 If you're allowed correlated subquery, then: - (morganek) - (1)
                     Re: If you're allowed correlated subquery, then: - (tangaroa)
                 Re: 'fraid not - (JayMehaffey) - (1)
                     Is "SQL Sever" intentional? (Regardless, I like it. :-) -NT - (ben_tilly)
         I don't recognize the "within" clause - (ben_tilly)
         Not quite sure what you are doing - (JayMehaffey) - (6)
             Whew, glad it's not just me that didn't understand. -NT - (jbrabeck)
             Pet peeve on SQLServer - (ChrisR) - (4)
                 Rarely use real temporary tables - (JayMehaffey) - (3)
                     I just finished writing about 15 stored procedures. - (mmoffitt) - (2)
                         Table variables are great - (ChrisR) - (1)
                             Thanks. - (mmoffitt)

Such firms gave a bad name to the penis enhancement business.
103 ms