IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Standardizing Data using the command line
Data is very whacked. Data is single space delimited with space(s) in the fields as well.

  1. Number variable size 5-11 digits (and turns out to be the primary key as well)
  2. Proper Name (always correct)
  3. Previous Acceptable Name or Previous Not-Acceptable Name
  4. Previous Acceptable Name or Previous Not-Acceptable Name
  5. Previous Acceptable Name or Previous Not-Acceptable Name
  6. Previous Acceptable Name or Previous Not-Acceptable Name
  7. Previous Acceptable Name or Previous Not-Acceptable Name
  8. Previous Acceptable Name or Previous Not-Acceptable Name
  9. Previous Acceptable Name or Previous Not-Acceptable Name
  10. Previous Acceptable Name or Previous Not-Acceptable Name
  11. Previous Acceptable Name or Previous Not-Acceptable Name
  12. Previous Acceptable Name or Previous Not-Acceptable Name
  13. Previous Acceptable Name or Previous Not-Acceptable Name
  14. Previous Acceptable Name or Previous Not-Acceptable Name
  15. Previous Acceptable Name or Previous Not-Acceptable Name
  16. Previous Acceptable Name or Previous Not-Acceptable Name
  17. Previous Acceptable Name or Previous Not-Acceptable Name
  18. Previous Acceptable Name or Previous Not-Acceptable Name
  19. Previous Acceptable Name or Previous Not-Acceptable Name
  20. Previous Acceptable Name or Previous Not-Acceptable Name
  21. Previous Acceptable Name or Previous Not-Acceptable Name
  22. Previous Acceptable Name or Previous Not-Acceptable Name
  23. Previous Acceptable Name or Previous Not-Acceptable Name
  24. Previous Not-Acceptable Name
  25. Previous Not-Acceptable Name
  26. Previous Not-Acceptable Name
  27. Previous Not-Acceptable Name
  28. Previous Not-Acceptable Name
  29. Previous Not-Acceptable Name
  30. Previous Not-Acceptable Name
  31. Previous Not-Acceptable Name
  32. Previous Not-Acceptable Name
  33. Previous Not-Acceptable Name
AND I have to make sure I don't intermix the Previous Acceptable (Still in use) Name, and the Previous Not-Acceptable (and completely deprecated) Name.

In this data, guess what signifies a "Previous Acceptable Name" from a "Previous Not-Acceptable Name"...

NOTHING. Sweet!

The company said the only way they can get at that kind of data (yes even the IT people don't know how the systems work) is through a web-interface that requires a login and supposedly only works with IE.

Well, it works with Firefox, lynx and curl besides IE. I can lookup by number or name. If I lookup by number, it gives me all the info I need, except it is in such a horribly formatted source. Also,

I have to dump the number lookup html response to a file. Then a run a script against the 580K files, which removes all unneeded info and coagulates all info for one number, all on one line, tab delimited properly, also an "A-" separator for Previous Acceptable Names and an "N-A" seperator for the Previous Not-Acceptable Names. Plus it put all this info in a single file for ease of use. Major pain, the website sometime does a null reply or a Network disconnect.

Then I get to run another script that chops each record up, into many fields, so I end up with a file in the end with 80 fields in it something like:
  1. Num_Desig(PrimaryKey and Indexed)
  2. CurrentName char 40 (Indexed)
  3. PAN[1-30] char 40
  4. PNAN[1-48] char 40
As it turns out, I "found" a bunch of data they wanted to get out of thier system, but they didn't know how. Plus it wasn't comingled.

The pieces I used were:
  • bash
  • cat
  • sed
  • grep (and egrep)
  • cut
  • sort
  • tee
  • uniq
  • the pipe symbol (|)
  • curl
  • and stdout redirect (>)
Oh, you want an example of the executed lines in the scripts? Okay, this one is a line from the split and glue back together properly script:
cat $INFILE | grep -e N\\-A\\ | grep -e A\\- | \\\n    sed -e s/\\ /\\|/ -e s/\\,\\ /\\\\t/g -e s/A\\-\\ /\\\\t/ -e s/\\ N\\-A\\ /\\|/ | \\\n    cut -f1,3 -d\\| | sed -e s/\\|/\\\\t/ | cut -f1,$FIELD | \\\n    cut -c1-6,10- | grep -e [AEIOUYRSTLN] >> $OUTFILE
How you like that? Now let me see you do that on Windows without Cygwin or other *NIX tools packages added.

Total time involved (not counting waiting for the webserver): 2 Hours, mostly getting the data into easily machine readable format, being tab delimited.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
Freedom is not FREE.
Yeah, but 10s of Trillions of US Dollars?
SELECT * FROM scog WHERE ethics > 0;

0 rows returned.
New Do you consider Perl a Unix tool?
It is available for Windows.

It can be used to do everything that you did.

Cheers,
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
New I think you misinterpreted the Windows comment
How I took it, was, that he used Unix native tools, using the Unix toolbox philosophy.
There was a mildly complicated task, he picked out each piece and handled it as stages of a pipeline.

And he was saying that the only way to accomplish the same thing, with the same ease, on Windows, would be to install Unix tools on it.

On the other hand, as I read that description, and the fact it took 2 hours to come up with the solution, was that it took 1 hour and 45 minutes too long. Tweak tweak.

But compared with most non-Unix tool people, who would screw around with C or Basic or (name your non-scripting language of choice), they might take MANY hours munging through the data. So he did well.
New I don't think I misinterpreted it
If you have Perl, you can do everything he did pretty easily.

So you can install Perl on Windows and then solve the problem.

I am asking him whether he considers that "installing Unix tools". The argument for is that Perl deliberately borrows a lot of its design from a number of Unix tools. The argument against is that there has been a native version of Perl for Windows for many years now. (In fact since Perl 5.005, Windows has been a core target.)

One could easily ask the same question naming other scripting languages such as Python and Ruby.

Cheers,
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
New I think he may be comparing OS built ins with the dos
builtins as he stated, put the unix tools on windows to do same. (or put perl etc on windows to do same)
thanx,
bill
Any opinions expressed by me are mine alone, posted from my home computer, on my own time as a free american and do not reflect the opinions of any person or company that I have had professional relations with in the past 50 years. meep
New We'll see. Well?
New I think in *NIX terms or Everything pipelines into the next
How I took it, was, that he used Unix native tools, using the Unix toolbox philosophy. There was a mildly complicated task, he picked out each piece and handled it as stages of a pipeline.
Yessir exactly the line I took.

On the other hand, as I read that description, and the fact it took 2 hours to come up with the solution, was that it took 1 hour and 45 minutes too long. Tweak tweak.
Err, yeah. I am just not as adept at handling incomprehensibly formatted data, as you are. Of course, when you think in an incomprehensibly formatted way, it does help reduce the time to fixit.

But compared with most non-Unix tool people, who would screw around with C or Basic or (name your non-scripting language of choice), they might take MANY hours munging through the data. So he did well.
The company actually asked me to evaluate the systems they have. Evidently many have tried to cleanup the data, this has been a problem for nearly 2 years since the "Programmer" they had, left to go live in the great outdoors in the mountains in a one-room cabin. About 200 miles from nowhere. He hated technology, and it shows in his work. I haven't seen any code or application or DB, but something tells me: "Clipper"
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
Freedom is not FREE.
Yeah, but 10s of Trillions of US Dollars?
SELECT * FROM scog WHERE ethics > 0;

0 rows returned.
New Yes. I consider Perl
one of the very essence of *NIX tools models. Treat everything as a file and deal with problems.

I have it installed on Windows, but for lack of a better term, I wanted to use "typically included" tools, available in most commercial and free *NIX implementations. Perl by default is a Linux mainstay, most BSD considers it an Add-on. Commercial *NIX doesn't always support Perl, but makes it available under "provided as-is" terms.

It wasn't a slam against Perl, Python or any of the other interpreted languages, most being available on Windows. It was a "okay here is a base system" do your thing, and Windows by default failed miserably.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
Freedom is not FREE.
Yeah, but 10s of Trillions of US Dollars?
SELECT * FROM scog WHERE ethics > 0;

0 rows returned.
New Re: Standardizing Data using the command line
UNIX = text gudness
Windows = Counter-Strike gudness

Perl/Python = portable workee environment

sed/tee/cut = vive la difference! (Welcome to proprietary UNIX, where everything's *slightly* difference, just because!)


Peter
[link|http://www.no2id.net/|Don't Let The Terrorists Win]
[link|http://www.kuro5hin.org|There is no K5 Cabal]
[link|http://guildenstern.dyndns.org|Home]
Use P2P for legitimate purposes!
New hmmm...
Perl/Python = portable workee environment
Also, considered by me, as *NIX tools

sed/tee/cut = vive la difference! (Welcome to proprietary UNIX, where everything's *slightly* difference, just because!)
Yes, but typically nothing that cannot be overcome, easily.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
Freedom is not FREE.
Yeah, but 10s of Trillions of US Dollars?
SELECT * FROM scog WHERE ethics > 0;

0 rows returned.
     Standardizing Data using the command line - (folkert) - (9)
         Do you consider Perl a Unix tool? - (ben_tilly) - (6)
             I think you misinterpreted the Windows comment - (broomberg) - (4)
                 I don't think I misinterpreted it - (ben_tilly) - (2)
                     I think he may be comparing OS built ins with the dos - (boxley)
                     We'll see. Well? -NT - (broomberg)
                 I think in *NIX terms or Everything pipelines into the next - (folkert)
             Yes. I consider Perl - (folkert)
         Re: Standardizing Data using the command line - (pwhysall) - (1)
             hmmm... - (folkert)

Life was hard for the pioneers, but every now and again, someone would get out the fiddle and make it all worse.
105 ms