I had a user start running jobs on my system for the 1st time today.
His very first job aborted with an error.
There are many steps in the run, doing the same things for
different files. All other files / jobs succeeded.
When I reran the failed job, it worked just fine.
This error said it could not lookup a customer id in the
mainframe VSAM file. The only way that could happen
would be the project id did not match anything or there
was an underlying failure in the mainframe VSAM SQL
interface.
But previous tests showed the project / customer ID have
been there for months.
I emailed the MF sysprog and asked if there was anything
going on in the error logs that could help.
Nope.
I emailed the MIS COBOL programmers, asking if they were
doing anything with the file that could lock me out.
Nope.
Then one of them responded, that they have seen this type
of error before in their programs, but they reran and it
worked and they never followed up.
ARRGG!!
The MF sysprog followed up, telling me he noticed an FTP log
entry in that time frame. Our accounting department has a
process that overwrites that file every day.
During the upload, it is inaccessible to everyone else. And
if I don't code for the error condition, my program will
abort.
For 27 seconds a day.
This has been this way for years, and none of our current
COBOL programmers have ever coded around it. They'd rather
have their code abort occasionally and rerun it.
And my user managed to hit this the 1st time he ever used
my program!!!!!
And this user triggered the Windows rename / lock problem twice, as
opposed to be seeing it once in 6 months previously.