Post #241,808
1/18/06 8:29:18 PM

Here ya go
#1 - Isolate out address elements into \taddress1, \taddress2, \tcity, \tstate, \tzip.
#2 - Run through a CASS certified address standardization program that does zip+4 appends. Do not bother trying to write this yourself. You MUST buy it or rent someone else's. They have the Postal database with all the street synonyms and aliases, route number->friendly name, list of legal address ranges per street, etc. There is NO WAY you can code a program to do this without the postal database.
I've used a variety of them depending on accuracy desired vs cost and speed. If you like, give me your list and I'll run it through.
#3 - Merge/Purge. You need to determine what to keep VS drop, right? Start thinking about the different ways people write their 1st name (Rob, Robert, Bob), etc. Then add in typos. Account for titles, etc. Suffixes. Then start adding in confidence factors - you want to error on the side of sending more or less mail? Oh, don't forget the married vs maiden name or the multi-family / mother-inlaw living together. Who exactly IS Mrs John Smith? Do you drop her if you have another female smith in the household? Oh, and do you want individual, family (single last name), and household (multiple last names). What about company?
The better MP programs are amazing. Poorer matchode based ones will consider a lot of obvious (to the human eye) dupes as unique people. My home brewed one is matchode based which means more false uniques, but I can't offer my corp one to you.
Dunno about Ben's geocoding answer. Rooftop geocoding is usually way too expensive for casual use, and centroid is way to wide for deduping purposes.
Post #241,815
1/18/06 9:02:40 PM

I think #2 does what I did
I only used geocoding because it was free for me and I had code for it. I was using it to turn an address into a location, and using that to match addresses. But if you can standardize addresses another way, go for it.
For my project I was able to ignore the name issue - I was just merging several lists into one, and needed to spit out "these aren't duplicated" and "these may be, look them over by eye". So it was only semi-automated.
Cheers, Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
Post #241,819
1/18/06 9:10:51 PM

Your geocoding process standardized first
When we geocode stuff, 1st step is to standardize. They just did it for you under the covers.
Post #241,821
1/18/06 9:26:04 PM
1/18/06 9:28:16 PM

Come to think of it, I really did need the geocoding.
There is a problem with apartment buildings where on one lot you have 2 buildings with different addresses, but they're really the same. So I counted as possible dupes to review any two that were close. (I used "within adjacent 100' by 100' boxes, so that range was 100-300 feet.) That I couldn't have done without geocoding, but it was only a nice to have anyways. And it obviously doesn't apply in this case.
Cheers, Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)

Edited by ben_tilly
Jan. 18, 2006, 09:28:16 PM EST
Post #241,833
1/18/06 11:29:46 PM

Re: Here ya go
#1: have those as discrete fields already
#2: actually did CASS and NCOA to the entire database Tuesday (Donnelly Marketing was the vendor), and loaded the changes today.
#3: ahhh the kicker. Is there a specific merge-purge program you've used or recommend? More on what we do later, need to spend some time with my son.
-- Steve [link||Ubuntu]
Post #241,834
1/18/06 11:36:11 PM

ObLRPD: "Vote him off the island!"
The Dot is feeling cranky tonight.
Enjoy being with your boy.
Cheers, Scott.
Post #241,836
1/18/06 11:49:43 PM

That'd be harder than giving a bath to a bobcat.
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
Post #241,842
1/19/06 1:43:28 AM

the trick is...
the anaesthetic dart before hand. hth :-)
Have fun, Carl Forde
Post #241,853
1/19/06 7:45:39 AM

Talk your talk, wee man.
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
Post #241,837
1/18/06 11:50:33 PM

Firstlogic match/consolidate is verra nice
Group1 MP is (shudder) OK, but the interface is horrendous.
But there is a large amount of tuning and expertise that goes into setting these up. You might spend more on consulting and setup than on the software itself.
How much is the cost to mail each duplicate each month? What is your expected payback time?
Post #241,844
1/19/06 2:09:27 AM

Thanks, having a look.
-- Steve [link||Ubuntu]
Post #241,850
1/19/06 6:36:40 AM

Re "How much is the cost to mail each duplicate each month?"
One cost that people might not think of, because it's not a direct monetary outlay at the time of mailing, is the extent to which the efficiency of the marketing dross (presumably?) you're sending is lessened because of the annoyance of recieving two identical copies of it.
Hard as Hell to quantify, of course... But I know that shit like that, being a sign of inefficiency and thus stupidity (not to mention being a waste of natural resources), pisses *me* off quite a bit; and since I'm hardly unique in this respect, there's *some* opportunity cost incurred there.
[link|mailto:MyUserId@MyISP.CountryCode|Christian R. Conrad] (I live in Finland, and my e-mail in-box is at the Saunalahti company.)
Yes Mr. Garrison, genetic engineering lets us correct God's horrible, horrible mistakes, like German people. - [link||Mr. Hat]
Post #241,870
1/19/06 11:02:52 AM

And we have some who gripe if everyone in the house doesn't get it. Though I think the risk is on the side of sending too many.
Can't please everyone :)
-- Steve [link||Ubuntu]