Post #196,662
2/28/05 5:09:09 PM
|
Can someone do my job for me? :-)
To cut a long story short, we've received a corrupted text file - to keep it all anonymous, say it's a fixed record length of 100 bytes long, but bytes 20 and 21 sometimes have a carriage return in them. So the program that processes said file runs into trouble.
It's on HP-UX.
My unix script-fu is weak - I could do it in COBOL, but, I don't have access to the compiler :(
If someone has a moment, can they make me feel like an idiot by showing me how straightforward it is to move spaces to two bytes in a fixed-record-length file?
Thanks, coz we're in a right panic at the mo'. John.
Two out of three people wonder where the other one is.
|
Post #196,665
2/28/05 5:16:20 PM
|
man tr
Peter [link|http://www.ubuntulinux.org|Ubuntu Linux] [link|http://www.kuro5hin.org|There is no K5 Cabal] [link|http://guildenstern.dyndns.org|Home] Use P2P for legitimate purposes!
|
Post #196,669
2/28/05 5:22:01 PM
|
Only if there are no legitimate carriage returns
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
|
Post #196,668
2/28/05 5:20:22 PM
|
Is there a record seperator?
Let me assume a return. Do you want error checks? Let me assume yes. Here's an ugly solution. Save this to a file named, say, fixup and then "perl fixup in > out". In case it does not work, do NOT try to edit in place. (Right now you have one problem, you do not want two...) \n#! /usr/bin/perl -w\nuse strict;\n\nwhile (<>) {\n if (length($_) < 101) {\n if (20 < length($_) and length($_) < 22) {\n # Join the next line on, then replace bytes 20, 21.\n $_ .= <>;\n substr($_, 19, 2, " ");\n }\n else {\n print STDERR "Unexpected return at line $. not in bytes 20 or 21???\\n";\n }\n }\n if (length($_) <> 101) {\n print STDERR "Line $. is not of length 101?\\n";\n }\n print $_;\n}\n Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
|
Post #196,672
2/28/05 5:23:51 PM
2/28/05 5:55:12 PM
|
Re: Can someone do my job for me? :-)
If it's truly fixed length, with no record separator: #!/usr/bin/python\n\nbadchar = '\\n'\nreclen = 100\n\ninf = open('/home/anderson/corrupt.txt', 'r')\noutf = open('/home/anderson/fixed.txt', 'w')\n\nwhile 1:\n rec = inf.read(reclen)\n if rec == '': break\n\n if rec[19] == badchar: rec = rec[:19] + ' ' + rec[20:]\n if rec[20] == badchar: rec = rec[:20] + ' ' + rec[21:]\n outf.write(rec)\n\ninf.close()\noutf.close() Otherwise change the 100 to 101 on UNIX, 102 on Windows. Very rough, but it should get you there.
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
Edited by admin
Feb. 28, 2005, 05:55:12 PM EST
|
Post #196,676
2/28/05 5:29:25 PM
|
Might lose data
I'm a real naysayer today.
But anyways if bytes 20 and 21 sometimes have legitimate data, you'll overwrite them with spaces.
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
|
Post #196,678
2/28/05 5:33:13 PM
2/28/05 5:33:50 PM
|
Overwriting is fine - it's a field we don't actually use.
(I know, I'm a typical user, putting in spec changes at the last minute...)
Two out of three people wonder where the other one is.
Edited by Meerkat
Feb. 28, 2005, 05:33:50 PM EST
|
Post #196,679
2/28/05 5:34:41 PM
|
I try not to assume about such things
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
|
Post #196,691
2/28/05 5:55:53 PM
|
Well, I changed it to check regardless.
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #196,677
2/28/05 5:32:41 PM
2/28/05 5:35:40 PM
|
You guys *all* rock!
I'd tried tr but was foiled because the file does have carriage returns at the end. I couldn't get dd to do what I wanted either in the short amount of time I spent trying.
It was easy to identify the records that were 'bad', and since there were only 64 of them, and people were jumping up and down, I just got the line numbers of the bad records and fixed it in vi. Very low-tech, I know.
Proof of the pudding should follow in a few minutes when the job in question re-cycles.
I've saved off your scripts for next time this happens - and for my own learning.
Thanks again all - as an old tv show used to (almost) say: 'The nature of ziwt was irrepressible!' :)
edit: Can't even spell ziwt. Shoot me now...
Edited by Meerkat
Feb. 28, 2005, 05:35:40 PM EST
|
Post #196,680
2/28/05 5:36:20 PM
|
The Proper Way
#!/usr/bin/pfy\n\nwhile (!sorted) do {\n my $attitude = pester(pfy);\n if ($attitude = "bad") then {\n threaten(pfy);\n harangue(pfy);\n }\n drink ($coffee);\n send ($email);\n return $to_pub;\n}
Peter [link|http://www.ubuntulinux.org|Ubuntu Linux] [link|http://www.kuro5hin.org|There is no K5 Cabal] [link|http://guildenstern.dyndns.org|Home] Use P2P for legitimate purposes!
|
Post #196,687
2/28/05 5:51:01 PM
|
Perfect! :)
Two out of three people wonder where the other one is.
|