IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Way overkill
#1 - I want him to be able to be VERY productive in Perl using it for both Windows and Unix automation, and for data processing (direct mail type data) and for database access.

It will take at least a concentrated year for me to get enough Perl/SQL/Unix idiom into his head. He works full time, and this is not directly his job (yet), so this is off-hours stuff.

And he has NO programming background. None. Really. We are at ground zero here.

I just don't want him to baby chick imprint on anything. I want some alternative methods of doing things in his mind so he doesn't think anything is the one true way.

I also want him to have a better understanding of low level coding, so I want him to know enough C to be able to do some simple file processing, and a bit of in memory manipulation. We will NOT go as far at B-Trees or linked lists. This is just so he can get the experience of using a debugger and stepping through the machine code that his program produced.

Since he will be so focused on Perl, I'd like him to have exposure to some other high level language such as Python or Ruby to make sure he doesn't become a Perl bigot.

But past that, nope. I have my own life to live, and that does not include writing any Java code.
New Python and C then.
As I said, Python is very different than Perl in many ways. Ruby is like a cross of Smalltalk and Perl, with the most important similarity (and difference from Pythong) being TIMTOWTDI.
Regards,
-scott
Welcome to Rivendell, Mr. Anderson.
New Is python TIMTOWTDI?
Not quite sure what that comment is saying based on that.

And how weird is that word? Yes, I know it's an acronym, but at this point, we "say" it.
New No, it's the opposite.
Well, not entirely: usually there is more than one way to do anything, but with Python there is generally one best way and then all of the hacks.
Regards,
-scott
Welcome to Rivendell, Mr. Anderson.
New Good, that's what I want.
I'll try to get him posting here for ongoing advice.

Is it reasonable to assume that anything I come up with as an example perl script that is running on my local ubuntu desktop will also be implementable in a python script?

As long as we don't depend on any CPAN based drivers.

For example, we just write a program that takes a variable number of command line arguments. The arguments are file names.

The goal of the program is to parse and store the ids of several files, then run a final file though and test if the final file matched any of the previous files (at the ID level), and output SOME of the final file (records that have IDs that matched), and output it in a different order depending on which of the input files it matched.

The files can be either tab, pipe, or CSV delimited. We do not really parse them out fully, though, since the ID can be pulled via a REGEXP from all the files.

The code exercises REGEXPs, splits, joins, multi-level hashes, REFs, and refs of arrays in anonymous hashes, for storing and later lookup, and the perl idiom of pushing into an element into an array, which required the obscure casting syntax.

The program reads the files and stores ONLY the primary key of each file into a file specific hash for a quick truth test later.

We then read STDIN.

We parse the ID from each record and see if the ID was in any of the previous files.

We store each record (the whole thing, which then leads to Perl/DBM/Tie discussion for when we run out of memory and need to start using disk) into an array that is specific to the matching previous file.

We then output all the records that we read via STDIN that matched ANY of the named files (some didn't), and we output them in the order of the previous files that they matched.

That means all the records that matched the 1st file are output, then all the records that matched the 2nd file, etc, until done.

Essentially a 15 line perl script the way I write it, maybe 5 for BT.

Knowing no python at all, but with my Perl background pushing implementation directions (possibly poorly), how long should it take for us to write this together in python?

Wanna give an pseudo code outline so we do it the "right" way?
Expand Edited by crazy Sept. 3, 2011, 04:52:46 PM EDT
Expand Edited by crazy Sept. 3, 2011, 05:10:35 PM EDT
New Data examples?
The problem with giving you Python pseudocode is that Python is pseudocode that runs... ;-)

Post the Perl REs if you want: Python can use them.
Regards,
-scott
Welcome to Rivendell, Mr. Anderson.
New Here's the code
Note: Lots of silly assignments to tmp vars.
This is so we can dump it using data::dumper.
And the backslashes got lost on the post.


#!/usr/bin/perl -W
use strict;
use Data::Dumper;
# Usage: year_pri.pl 3 year_1.txt year_2.txt year_3.txt input_data.txt
# ###### Note the stupid 3 to tell it 3 files to follow, I wasn't showing him
# the getopts lib yet.

my $file_count = shift;
my $bug = 1;

my @pri_files;
while ($file_count--){
my $file = shift;
push(@pri_files, $file);
}

my %priority_rec;
foreach my $priority (@pri_files){
my ($tmp) = read_file($priority);
$priority_rec{$priority} = $tmp;
}

my %save_records;

while (<>){
my $in_rec = $_;

#"1990118STTSKYJH","BA","INH_ALL-PNCOA3-2011|Work|51207","LAST","FIRST","","","","PATTERSON & KELLY PA","",

if (/|(d+)"/){
my $id = $1;
YEAR_LOOP:
foreach my $priority (@pri_files){
if (defined($priority_rec{$priority}->{$id})){
push (@{$save_records{$priority}},$in_rec);
last YEAR_LOOP;
}
}
} #end if we got a match for the id
else{
die "Cannot get id from: [$_]n";
}

}
foreach my $priority (@pri_files){
print @{$save_records{$priority}};
}

exit (0);

sub read_file {
my $file = shift;

my %ret;

open (IN, $file) or die "Can't open $file for read - $!n";
while (<IN>){
my (@rec) = split(/t/, $_, 2);
$ret{$rec[0]}++;
}
close(IN);

return(%ret);
}
New Re: Here's the code
#!/usr/bin/python


# Usage: cat input_data.txt | ./sort-id.py pri1.txt pri2.txt pri3.txt -

import fileinput, re, sys

pri_files, id_cache, save_records = [], {}, {}

matcher = re.compile('"(\d+)') # not necessary but faster

def process_priority_file(line):
if fileinput.isfirstline(): # save the file name and initialize the record capture list
pri_files.append(fileinput.filename())
save_records[fileinput.filename()] = []

id, separator, rest_of_line = line.partition('\t') # compare to line.split()...

# You're not printing out multiple copies of the same record if it shows up in more than one
# priority file, so I'm not sure what the point of saving a hash per priority filename was.
if id not in id_cache:
id_cache[id] = fileinput.filename()


def process_stdin(line):
matches = matcher.match(line) # matches = re.match('"(\d+)', line) if not precompiled
if not matches:
print 'Cannot get id from %s' % line
sys.exit(1)

rec_id = matches.group(1)

pri = id_cache.get(rec_id)
if pri:
save_records[pri].append(line)

def print_prioritized():
for priority in pri_files:
for r in save_records[priority]:
# didn't see you doing a chomp, but here's the Python version since print adds a newline
print r.rstrip('\r\n')

def read_files():
# http://docs.python.o...ry/fileinput.html is worth reading.
# I didn't feel like reading the filename count.
for line in fileinput.input():
if fileinput.isstdin():
process_stdin(line)
else:
process_priority_file(line)

print_prioritized()


# This test lets you only run something if the module is run from the command line.
# This code won't run if someone does 'import thisfilename'; idiomatically, people will
# sometimes put test code in this stanza so tests can be easily run from the command line.
# You could also put getopts tests for --help, etc. in here.
if __name__ == "__main__":
read_files()


Also, if you don't want to add a number to the command line and don't like the stdin trick I used, just use:

def read_files():

priority_files, input_file = sys.argv[1:-1], sys.argv[-1]
for line in fileinput.input(priority_files):
process_priority_file(line)

for line in fileinput.input(input_file):
process_stdin(line)

print_prioritized()

Regards,
-scott
Welcome to Rivendell, Mr. Anderson.
New Thanks
New How do *you* pronounce it? Me: tim-TOW-te-dee
--

Drew
New Re: How do *you* pronounce it? Me: tim-TOW-te-dee
TIM-TOW-DEE
     need to understand how to hugely scale ruby on rails - (boxley) - (37)
         Twitter? - (Another Scott) - (33)
             Ow. - (static) - (32)
                 PHP vs Java scaling - (malraux) - (1)
                     +5, Informative. -NT - (static)
                 I believe they went to Scala on the JVM - (S1mon_Jester) - (1)
                     different type of application, easy to shard - (boxley)
                 As BT lectured me a while ago - (crazy) - (27)
                     Saw a presentation on Ruby on Rails yesterday. - (static) - (1)
                         I've used Rails - (malraux)
                     That still holds water... - (folkert) - (24)
                         Dunno about everything, but... - (crazy) - (23)
                             I'd say Java and C. - (Another Scott)
                             Re: Dunno about everything, but... - (malraux)
                             Well, what they said - (jake123) - (3)
                                 My assumption - (malraux) - (2)
                                     Speaking of Python... ;-) - (Another Scott) - (1)
                                         I still use 2.x - (malraux)
                             Ehhh - (crazy) - (15)
                                 Consider Haskell - (jake123) - (1)
                                     Actually, I was - (crazy)
                                 Other stuff. - (static)
                                 What are you trying to teach? - (malraux) - (11)
                                     Way overkill - (crazy) - (10)
                                         Python and C then. - (malraux) - (9)
                                             Is python TIMTOWTDI? - (crazy) - (8)
                                                 No, it's the opposite. - (malraux) - (5)
                                                     Good, that's what I want. - (crazy) - (4)
                                                         Data examples? - (malraux) - (3)
                                                             Here's the code - (crazy) - (2)
                                                                 Re: Here's the code - (malraux) - (1)
                                                                     Thanks -NT - (crazy)
                                                 How do *you* pronounce it? Me: tim-TOW-te-dee -NT - (drook) - (1)
                                                     Re: How do *you* pronounce it? Me: tim-TOW-te-dee - (crazy)
                             Pretty dern close... - (folkert)
         passed the tech screen, thx guys, in person interview Fri -NT - (boxley) - (2)
             Yay! - (crazy) - (1)
                 commute first, then yeah a move after a while -NT - (boxley)

He wasn't always this way. He used to be a genial nerd, like many of us.
84 ms