IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Hey Ben: Perl fork / return value question
I need to have a bunch of processes running at once.
I reviewed CPAN, non seems to do what I want.
I need to get the exit values.

It SEEMS the various POSIX::WIFEXITED and POSIX::WEXITSTATUS
should do what I want, but I can't make them work.

Here's some sample code:

\n#!/usr/bin/perl -w\n\nuse strict;\n\nuse POSIX;\n\n\nprint STDERR " forking...\\n";\nmy $pid = fork_child("sleep 1;exit 0");\nprint STDERR " back... [$pid]\\n";\nwhile (1){\n\n\t#\n\t# This is the "correct" way according to the docs.\n\t# It never exits according to WIFEXITED but I know it does.\n\t#\n\tif (WIFEXITED($pid)){\n\t\tmy $ret = WEXITSTATUS($pid);\n\t\tprint STDERR " Process exited: : [$pid]:[$ret]\\n";\n\t}\n\telse{\n\t\tnext;\n\t}\n\n\t#\n\t# This will reap, but not get a resturn value\n\t#\n\tmy $kid = waitpid($pid, WNOHANG);\n\n\n\tif ($kid == -1){\n\n\t\t#\n\t\t# This makes no sense since waitpid reaped the return\n\t\t#\n\t\tmy $ret = WEXITSTATUS($pid);\n\n\t\tprint STDERR " Exit for PID: [$pid]:[$ret]\\n";\n\t\texit(0);\n\t}\n\n\tprint STDERR "Sleeping...\\n";\n\tsleep(1);\n}\n\n\n\nsub fork_child{\n\tmy $command = shift;\n\n\tmy $pid = fork;\n\n\tunless(defined($pid)){\n\t\tdie "Fork failed: $!";\n\t}\n\tunless ($pid){\n\t  \tmy $ret = system($command);\n\t\t$ret = $ret / 256;\n\t\tprint STDERR "System return: [$ret]\\n";\n\t  \texit($ret);\n\t}\n\treturn ($pid); # parent\n}\n


Any ideas?
New Just follow the documentation...
Under [link|http://www.perl.com/doc/manual/html/pod/perlfunc/waitpid.html|waitpid] it says that, The status is returned in $?. No need for POSIX. The [link|http://www.perl.com/doc/manual/html/pod/perlfunc/wait.html|wait] command does the same thing.

For an example that I wrote a long time ago using this, see [link|http://www.perlmonks.org/?node_id=28870|Run Commands in Parallel]. (Note that these days the cool kids all seem to recommend [link|http://search.cpan.org/~dlux/Parallel-ForkManager-0.7.5/ForkManager.pm|Parallel::ForkManager]. I haven't had occasion to use it though - I wrote that snippet before it existed, and since I already had working code, I never saw a good reason to add another external dependency.)

Cheers,
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
New Smack! (forehead)
New Doesn't seem to work
What am I doing wrong?

\n#!/usr/bin/perl -w\n\nuse strict;\n\nuse POSIX;\n\n\nprint STDERR " forking...\\n";\nmy $pid = fork_child("sleep 1;exit 2");\nprint STDERR " back... [$pid]\\n";\nwhile (1){\n\n   my $kid = waitpid($pid, WNOHANG);\n\n\n   if ($kid == -1){\n\n      my $ret = $?;\n\n\n      print STDERR " Exit for PID: [$pid]:[$ret]\\n";\n      exit(0);\n   }\n\n   print STDERR "Sleeping...\\n";\n   sleep(1);\n}\n\nsub fork_child{\n   my $command = shift;\n\n   my $pid = fork;\n\n   unless(defined($pid)){\n      die "Fork failed: $!";\n   }\n   unless ($pid){\n      my $ret = system($command);\n      $ret = $ret / 256;\n      print STDERR "System return: [$ret]\\n";\n      exit($ret);\n   }\n   return ($pid); # parent\n}\n\n


Produces this:

\n[broom@mix sweep]$ ./pid_test.pl\n forking...\n back... [21692]\nSleeping...\nSleeping...\nSystem return: [2]\nSleeping...\n Exit for PID: [21692]:[-1]\n\n


I always get the -1.
New I think you wanted $kid == $pid, not $kid == -1
As the documentation says, you get -1 as a return if there is no child process left to reap. You get 0 as a return if there is a child process left to reap but you didn't reap it. Otherwise you get the pid of the reaped process.

So you'll have your return on the call which reaps your kid.

An incidental efficiency note. For the effort of using Time::HiRes and then usleep(0.1) you can convert from a CPU-wasting busy-wait to a fairly laid-back parent process with a barely perceptible change in responsiveness.

Cheers,
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
New Got it.
I was confused, as usual.

What do you mean by "busywait"? Busywait would be
running a loop with no sleep, checking the time, and seeing
if enough time has passed to make it worthwhile to run
more code. "sleep(1)" takes no CPU cycles.
New D'oh, I missed the call to sleep in your loop
Must be that amnesia due to having a kid...

Cheers,
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
New No prob
Anyway, that sleep for for example code only.

Imagine the code that reads the following control file,
executing things as they show up in a variety of
directories.

I needed the catching of the return values to keep
a history of failures and raise alarms if they hit
a certain threshold in a certain timeframe.

Different dirs have different sleep cycles, so I sleep
on the shortest one and compare to see if I've slept
enough for the longer ones.

Note: Ignore the crypt commands, they are play placeholders.

\n[GLOBALS]\n#\n# Set a single section here in order to test \n# the setup of a new section while the main\n# sweep is still running.  \n#\nsection_to_process=Customer_B_crypt_dir\n#\n#\ndaemon=0\nverbose=1\ntime_stamp=1\n#\n# It is up the the called program to handle \n# error messages, recovery, etc. since the \n# sweep program simply does not have application\n# specific awareness to be able to give error\n# messages of any real value.\n#\n# But if enough things go wrong in a certain\n# period (determined on a dir group by dir group basis),\n# there is probably a serious problem SOMEWHERE.\n#\n#\nemail_on_failure=broom@somewhere.com\n#\n# Sequential by default\n#\nmax_parallel_jobs_per_dir=1\n#\n# Kill myself if this file arrives\n#\nabort_file=/tmp/sweep_abort.flag\n#\n# All dirs will be based from here in the test\n#\nbase=/home/broom/sweep\nlog_dir={{base}}/logs\nstdout={{log_dir}}/stdout.log\nstderr={{log_dir}}/stderr.log\n#\n# Default amount of time to sleep between\n# dir sweeps\n#\nsleep_time=5\nglobal_email=broom@somewhere.com mgraham@somewhere.com\noperations_email=operations_manager@somewhere.com\n#\n# These are default in the program\n#\n# Match everything\n#match_pattern=.*\n# Except things that start with period\n#exclude_pattern=^\\.\n#\n\n#\n# {{file}} is special, it will be subtituted with\n# the new file.  All other are must be defined before used\n#\n#\n################################################################################\n#\n#\n[Customer_B_crypt_dir]\n#\n#\n################################################################################\n\nsuccess_email=customer_b_csr@somewhere.com {{global_email}}\nfailure_email={{success_email}} {{operations_email}}\ndir_group=Crypt\ncust=cust_b\nsource={{base}}/{{cust}}/q\ndest={{base}}/{{cust}}/inprocess\ndone={{base}}/{{cust}}/done\nerror={{base}}/{{cust}}/bad\nlog_file={{base}}/logs/{{cust}}.log\nnice=5\nuser=broom\n#\ncommand=sleep 10;rm -f /tmp/test.tmp;set >/tmp/test.tmp;set;\\\nps -eo "%n %U %u %p %y %x %c";echo {{file}} --process={{dest}} \\\n--done={{done}} --error={{error}} --log={{log_file}}\n#\nmax_parallel_jobs_per_dir=3\nstdout={{log_dir}}/cust_b.stdout.log\nstderr={{log_dir}}/cust_b.stderr.log\n\n\n################################################################################\n#\n#\n[Customer_A_crypt_dir]\n#\n#\n################################################################################\n\ndir_group=Crypt\nexclude_pattern=(^\\.|README|\\.key\\.asc$)\nsuccess_email=customer_a_csr@somewhere.com {{global_email}}\nfailure_email={{success_email}} {{operations_email}}\ncust=cust_a\nsource={{base}}/{{cust}}/q\ndest={{base}}/{{cust}}/inprocess\ndone={{base}}/{{cust}}/done\nerror={{base}}/{{cust}}/bad\nlog_file={{base}}/logs/{{cust}}.log\n#\ncommand=echo {{file}} --process={{dest}} \\\n--done={{done}} --error={{error}} --log={{log_file}}\n#\nsleep_time=60\nmax_parallel_jobs_per_dir=1\nmax_load=3\nstdout={{log_dir}}/cust_a.stdout.log\nstderr={{log_dir}}/cust_a.stderr.log\n\n\n#\n# The following FTP vars should supply everything required for\n# a generic FTP set of scripts.  If you find yourself hardcoding\n# ANYTHING in the script, back-off and put the var in here.\n#\n################################################################################\n#\n#\n[West_FTP_Server]\n#\n#\n################################################################################\n#\n# Set the disable flag to turn off processing\n# of a particular directory\n#\ndisable=1\n\ndir_group=ftp_internet\nsuccess_email=west_ftp_production_person@somewhere.com {{global_email}}\nfailure_email={{success_email}} {{operations_email}}\nftp_base=west_ftp\nsource={{base}}/{{ftp_base}}/q\ndest={{base}}/{{ftp_base}}/inprocess\ndone={{base}}/{{ftp_base}}/done\nerror={{base}}/{{ftp_base}}/bad\nlog_file={{base}}/logs/{{ftp_base}}.log\nnice=5\nuser=ftp_production_user\n#\n# Note:  This file MUST be root read-only!\n#\nftp_server=ftp3.somewhere.com\nftp_user=user_a\nftp_pass=pass_a\n#\ncommand=ftp_via_sweep.sh {{file}} \n#\nmax_parallel_jobs_per_dir=1\nstdout={{log_dir}}/{{ftp_base}}.stdout.log\nstderr={{log_dir}}/{{ftp_base}}.stderr.log\n\n\n################################################################################\n#\n#\n[Central_FTP_Server]\n#\n#\n################################################################################\ndisable=1\n\ndir_group=ftp_internet\nsuccess_email=central_ftp_production_person@somewhere.com {{global_email}}\nfailure_email={{success_email}} {{operations_email}}\nftp_base=central_ftp\nsource={{base}}/{{ftp_base}}/q\ndest={{base}}/{{ftp_base}}/inprocess\ndone={{base}}/{{ftp_base}}/done\nerror={{base}}/{{ftp_base}}/bad\nlog_file={{base}}/logs/{{ftp_base}}.log\nnice=5\nuser=ftp_production_user\nftp_server=dfwfap01.somewhere.com\nftp_user=user_b\nftp_pass=pass_b\n#\ncommand=ftp_via_sweep.sh {{file}} \n#\nmax_parallel_jobs_per_dir=1\nstdout={{log_dir}}/{{ftp_base}}.stdout.log\nstderr={{log_dir}}/{{ftp_base}}.stderr.log\n\n\n################################################################################\n# There are a series of settings that this file controls in addtion to listing\n# particular directories to sweep.  These are:\n#\n# dir_group - if the dir_group matches the section name.\n#\n# \n#\n#\n[ftp_internet]\n#\n#\n################################################################################\ndir_group=ftp_internet\n\n#\n# 5 failures in 60 minutes is something\n# to start getting worried about.  We\n# need to allow for this timeframe because\n# a single transfer may go for quite a while before failing.\n\n#\nmax_failure_count=5\nmax_failure_minutes=60\n\n################################################################################\n#\n#\n[Crypt]\n#\n#\n################################################################################\ndir_group=Crypt\n#\n# 5 failures in 5 minutes\n# This would mean someone is continuously submitting\n# jobs and they fail quickly, which means there is \n# probably a core GPG or key problem.\n#\n#\nmax_failure_count=5\nmax_failure_minutes=5\n
New Got it, but define shortest?
If one place says that they want to sleep 15 seconds and another 20 seconds, if you do it wrong, you might wind up waiting close to 35 seconds for the 20 second people (because you're waking up every 15 seconds, you woke up and it is not yet time for the 20 second people, you slept and it is now way late).

I'd be tempted to solve this problem in a really cheesy way. For instance lose the "sleep/wake" logic entirely - just set a bunch of cron jobs that have a specified email address. And have an automated job that monitors that email address and will notify you in turn if errors happen too fast.

Your sweep utility becomes simpler. Your notification utility is also straightforward. And the "too many errors" logic now can apply for a variety of kinds of programs - not just this one kind of sweep. Besides it is now easy to tweak this so that one machine is monitoring sweeps that might be done on several different machines. And a final bonus - you don't have to write most of the code since the heavy lifting gets pushed to cron and email.

Cheers,
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
New Yick
I orginal thought about doing it that way, since my orginal was a single dir. The launch overhead, read database of historical files, scan dir, check current list, write database of new files, it is all a huge amount of unnecessary overhead.

Don't worry about the odd sleep calc when they don't mesh. I think in 15 second increments. Almost all dirs will be 1 minute, but a couple will be 15 seconds.

The code was easy, and the X errors in Y time was a breeze, aging off errors too old, etc.

New Sounds reasonable
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
     Hey Ben: Perl fork / return value question - (broomberg) - (10)
         Just follow the documentation... - (ben_tilly) - (9)
             Smack! (forehead) -NT - (broomberg)
             Doesn't seem to work - (broomberg) - (7)
                 I think you wanted $kid == $pid, not $kid == -1 - (ben_tilly) - (6)
                     Got it. - (broomberg) - (5)
                         D'oh, I missed the call to sleep in your loop - (ben_tilly) - (4)
                             No prob - (broomberg) - (3)
                                 Got it, but define shortest? - (ben_tilly) - (2)
                                     Yick - (broomberg) - (1)
                                         Sounds reasonable -NT - (ben_tilly)

Good thing it smells like Kung Pao Tofu instead of like Greg's ears.
87 ms