
2 things
1. The language is Perl. Note capitalization and spelling.
2. The problem with deduping is that the same address can be entered in many different ways. For instance Street vs St. So you have to handle variations. And trust me, you'll get a *lot* of variations. While looking at exact matches entered twice finds some duplicates, it leaves enough of a problem that people generally want to do something better. For instance sometimes you'll see St and others Street. People move around where the apartment number goes in the address. (Not an issue for my project, but for this one it would be.)
If it is small enough to go through by hand, humans will handle lots of those issues correctly. But coding up a program to handle these issues is surprisingly tricky.
And even humans run into problems with this. For instance a human who is not familiar with Denver may think that Colorado St and Colorado Rd are the same. A human who is not familiar with Boston may think that if two address have the same city name, street name and street number then having the zip be off in one digit has to be a typo. Both times you'd be wrong.
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)