IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Here ya go
#1 - Isolate out address elements into
\taddress1,
\taddress2,
\tcity,
\tstate,
\tzip.

#2 - Run through a CASS certified address standardization
program that does zip+4 appends. Do not bother trying
to write this yourself. You MUST buy it or rent someone
else's. They have the Postal database with all the street
synonyms and aliases, route number->friendly name, list
of legal address ranges per street, etc. There is NO WAY
you can code a program to do this without the postal database.

I've used a variety of them depending on accuracy desired vs
cost and speed. If you like, give me your list and I'll run
it through.


#3 - Merge/Purge. You need to determine what to keep VS drop,
right? Start thinking about the different ways people write
their 1st name (Rob, Robert, Bob), etc. Then add in typos.
Account for titles, etc. Suffixes. Then start adding in
confidence factors - you want to error on the side of sending
more or less mail? Oh, don't forget the married vs maiden
name or the multi-family / mother-inlaw living together.
Who exactly IS Mrs John Smith? Do you drop her if you have
another female smith in the household? Oh, and do you
want individual, family (single last name), and household
(multiple last names). What about company?

The better MP programs are amazing. Poorer matchode based
ones will consider a lot of obvious (to the human eye) dupes
as unique people. My home brewed one is matchode based which
means more false uniques, but I can't offer my corp one to you.

Dunno about Ben's geocoding answer. Rooftop geocoding is usually
way too expensive for casual use, and centroid is way to wide
for deduping purposes.
New I think #2 does what I did
I only used geocoding because it was free for me and I had code for it. I was using it to turn an address into a location, and using that to match addresses. But if you can standardize addresses another way, go for it.

For my project I was able to ignore the name issue - I was just merging several lists into one, and needed to spit out "these aren't duplicated" and "these may be, look them over by eye". So it was only semi-automated.

Cheers,
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
New Your geocoding process standardized first
When we geocode stuff, 1st step is to standardize.
They just did it for you under the covers.
New Exactly
Come to think of it, I really did need the geocoding.

There is a problem with apartment buildings where on one lot you have 2 buildings with different addresses, but they're really the same. So I counted as possible dupes to review any two that were close. (I used "within adjacent 100' by 100' boxes, so that range was 100-300 feet.) That I couldn't have done without geocoding, but it was only a nice to have anyways. And it obviously doesn't apply in this case.

Cheers,
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
Expand Edited by ben_tilly Jan. 18, 2006, 09:28:16 PM EST
New Re: Here ya go
#1: have those as discrete fields already

#2: actually did CASS and NCOA to the entire database Tuesday (Donnelly Marketing was the vendor), and loaded the changes today.

#3: ahhh the kicker. Is there a specific merge-purge program you've used or recommend? More on what we do later, need to spend some time with my son.
--
Steve
[link|http://www.ubuntulinux.org|Ubuntu]
New ObLRPD: "Vote him off the island!"
The Dot is feeling cranky tonight.

;-)

Enjoy being with your boy.

Cheers,
Scott.
New That'd be harder than giving a bath to a bobcat.
Regards,

-scott anderson

"Welcome to Rivendell, Mr. Anderson..."
New the trick is...
the anaesthetic dart before hand.
hth :-)
Have fun,
Carl Forde
New Talk your talk, wee man.
Regards,

-scott anderson

"Welcome to Rivendell, Mr. Anderson..."
New Firstlogic match/consolidate is verra nice
Group1 MP is (shudder) OK, but the interface is horrendous.

But there is a large amount of tuning and expertise that goes into setting these up. You might spend more on consulting and setup than on the software itself.

How much is the cost to mail each duplicate each month? What is your expected payback time?
New Thanks, having a look.
--
Steve
[link|http://www.ubuntulinux.org|Ubuntu]
New Re "How much is the cost to mail each duplicate each month?"
One cost that people might not think of, because it's not a direct monetary outlay at the time of mailing, is the extent to which the efficiency of the marketing dross (presumably?) you're sending is lessened because of the annoyance of recieving two identical copies of it.

Hard as Hell to quantify, of course... But I know that shit like that, being a sign of inefficiency and thus stupidity (not to mention being a waste of natural resources), pisses *me* off quite a bit; and since I'm hardly unique in this respect, there's *some* opportunity cost incurred there.


   [link|mailto:MyUserId@MyISP.CountryCode|Christian R. Conrad]
(I live in Finland, and my e-mail in-box is at the Saunalahti company.)
Yes Mr. Garrison, genetic engineering lets us correct God's horrible, horrible mistakes, like German people. - [link|http://maxpages.com/southpark2k/Episode_105|Mr. Hat]
New Exactly
And we have some who gripe if everyone in the house doesn't get it. Though I think the risk is on the side of sending too many.

Can't please everyone :)
--
Steve
[link|http://www.ubuntulinux.org|Ubuntu]
     Postal address list cleansing - (Steve Lowe) - (21)
         I just did a deduping project kind of like that - (ben_tilly) - (1)
             Thanks for the tip! - (Steve Lowe)
         do a dump then sort by address. the dupes are identified - (boxley) - (4)
             YM, delete HALF of them... Or NOBODY at that address is left -NT - (CRConrad)
             2 things - (ben_tilly) - (2)
                 so going by lat and long makes more sense, thx -NT - (boxley) - (1)
                     There are apps that standardize addresses - (drewk)
         Here ya go - (broomberg) - (12)
             I think #2 does what I did - (ben_tilly) - (2)
                 Your geocoding process standardized first - (broomberg) - (1)
                     Exactly - (ben_tilly)
             Re: Here ya go - (Steve Lowe) - (8)
                 ObLRPD: "Vote him off the island!" - (Another Scott) - (3)
                     That'd be harder than giving a bath to a bobcat. -NT - (admin) - (2)
                         the trick is... - (cforde) - (1)
                             Talk your talk, wee man. -NT - (admin)
                 Firstlogic match/consolidate is verra nice - (broomberg) - (3)
                     Thanks, having a look. -NT - (Steve Lowe)
                     Re "How much is the cost to mail each duplicate each month?" - (CRConrad) - (1)
                         Exactly - (Steve Lowe)
         Send me a dump of the list in e-mail. - (folkert)

Everyone is terrible and the world is burning.
175 ms