Post #200,387
3/24/05 6:39:39 PM
|
File locking problem
I have what might be a Samba problem, or it might be a stupid windows scripting problem. On Win2K, I have a simple batch file that executes another batch file. \nSET CX_FONT_DIRECTORY=c:\\nofont\nt:\ncd \\afp_server\\watch\\in_process\ndate /t\ntime /t\necho Executing %1\ncmd /c %1\ndate /t\ntime /t\nmove %1 ..\\done\n One file could be run_16669.2005_21_24_14_21_04.16695.bat: \nREM TIMEOUT=360\npnettc T:/prod_data/drop_off/wfd/p7934/prg_p7934_03142005_14_19_25.wfd ^\n-c T:/wfd/afp_job_files/afp300.job ^\n-e AFP ^\n-f c:/pnt/output/afp/ccip_p7934_c080_es1_seg_k01e_s0012.afp ^\n-difDataInput1 ^\nT:/prod_data/in/p7934es1/ccip_p7934_c080_es1_seg_k01e_s0012.txt ^\n-l c:/pnt/output/afp/ccip_p7934_c080_es1_seg_k01e_s0012.afp.log\n The final move places the batch file in a "done" dir, which signifies it is done. The master workflow systems grabs it, picks up the various logs, and moves on. Sometimes though, ie: 1 job out of a 100 or so, the final "move" fails. This happens: \nThe process cannot access the file because it is being \nused by another process.\n Process had completed, but the job appears to be hung since the file never shows up in the done dir. T: is a Samba share. I have logging turned up to an insane level, so I have a huge amount of detail. Except I was notified 2 hours after this failure, and the logs wrapped about 30 seconds after it happened. ARRGG!! I could rewrite it as a Perl script, and add in some errorcheck / retry logic. But I'd really like to know WHY it was happening.
|
Post #200,389
3/24/05 7:15:13 PM
|
I would assume that...
The process cannot access the file because it is being used by another process. :-P
Seriously, my painful recollections are that many Windows programs take out exclusive locks on anything that they open, and those locks block all other operations. So if anyone else wants to open your file for anything, they can cause huge problems.
My first guess would be to look for users who might be opening a file in some kind of editor (or Excel, or whatever else). My second guess would be to ask what happens if your program is trying to move a file that some automated job (eg a virus scan) happens to be looking at.
This information is based on things that used to be a PITA for me quite a few years ago. The default behaviour could well have changed and I have no relevant recent experience.
Cheers, Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
|
Post #200,395
3/24/05 7:58:29 PM
|
Grrr...
The "drive" is a Linux share. Virus scanning is set to look at local disk only. No one else is accessing it, at least would be incredibly unlikely. Here are the "permissions": \n-rw-r--r-- 1 spooler prod 335 Mar 24 14:21 run_16669.2005_21_24_14_21_04.16695.bat\n The directory itself has more open access to allow the files to be renamed in and out of it by the windows "user", but only the spooler id can open/lock it. Right now only 3 people can even drill down into that area of the system, and they are not the "type". Hey Greg: I can have fake locks set at a directory level in samba, right? Or am I forced to do it at share level? Hmmm - need to go research.
|
Post #200,446
3/25/05 1:22:46 AM
|
I'm guessing, obviously
But my guess is that Samba makes what Windows think are locks act like what Windows thinks locks should act like. So it doesn't really matter what is actually happening on on the Linux side - just how it looks to Windows.
Cheers, Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
|
Post #200,447
3/25/05 1:22:51 AM
|
I'm guessing, obviously
But my guess is that Samba makes what Windows think are locks act like what Windows thinks locks should act like. So it doesn't really matter what is actually happening on on the Linux side - just how it looks to Windows.
Cheers, Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
|
Post #200,394
3/24/05 7:55:10 PM
|
Turn off Opportunistic locking for those files.
Or first try veto oplock files = /.*bat/ on the share (or something similar, look at man smb.conf for multiples)
If that dunnah help, try strict locking = no this maybe the cause as this is very slow when it is yes. if *THAT* doesn't work...
oplocks = no on the share.
Sheesh, if that doesn't work... Turn off kernel locking for samba using: kernel oplocks = no in the global section vs the share section.
If that doesn't do it, it is a Windows issue.
BTW, I slacked off on 3.0.12 due to some bugs that showed up in optimized code. v3.0.13 is available as of last night.
-- [link|mailto:greg@gregfolkert.net|greg], [link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey[link|http://it.slashdot.org/comments.pl?sid=134485&cid=11233230|"Microsoft Security" is an even better oxymoron than "Military Intelligence"] No matter how much Microsoft supporters whine about how Linux and other operating systems have just as many bugs as their operating systems do, the bottom line is that the serious, gut-wrenching problems happen on Windows, not on Linux, not on Mac OS. -- [link|http://www.eweek.com/article2/0,1759,1622086,00.asp|source]
|
Post #200,396
3/24/05 8:00:05 PM
|
I was thinking of the same thing
and writing it while you were writing this!!!
|
Post #200,398
3/24/05 8:07:22 PM
3/24/05 8:07:37 PM
|
Done
Case doesn't count, right?
Edited by broomberg
March 24, 2005, 08:07:37 PM EST
|
Post #200,399
3/24/05 8:08:57 PM
|
You sure I don't want fake oplocks instead?
|
Post #200,464
3/25/05 11:31:09 AM
|
NEVER.
That option is only for seriously Broken software.
Never. I just can't imagine.
-- [link|mailto:greg@gregfolkert.net|greg], [link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey[link|http://it.slashdot.org/comments.pl?sid=134485&cid=11233230|"Microsoft Security" is an even better oxymoron than "Military Intelligence"] No matter how much Microsoft supporters whine about how Linux and other operating systems have just as many bugs as their operating systems do, the bottom line is that the serious, gut-wrenching problems happen on Windows, not on Linux, not on Mac OS. -- [link|http://www.eweek.com/article2/0,1759,1622086,00.asp|source]
|
Post #200,465
3/25/05 11:33:02 AM
|
Not that I am aware of. Of course...
in this case I may be wrong or right.
-- [link|mailto:greg@gregfolkert.net|greg], [link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey[link|http://it.slashdot.org/comments.pl?sid=134485&cid=11233230|"Microsoft Security" is an even better oxymoron than "Military Intelligence"] No matter how much Microsoft supporters whine about how Linux and other operating systems have just as many bugs as their operating systems do, the bottom line is that the serious, gut-wrenching problems happen on Windows, not on Linux, not on Mac OS. -- [link|http://www.eweek.com/article2/0,1759,1622086,00.asp|source]
|
Post #200,402
3/24/05 8:42:16 PM
|
And "blocking locks" should be set to off.
[link|http://www.aaxnet.com|AAx]
|
Post #200,409
3/24/05 9:06:07 PM
|
Err?
\n blocking locks (S)\n This parameter controls the behavior of smbd(8) when given a\n request by a client to obtain a byte range lock on a region of\n an open file, and the request has a time limit associated with\n it.\n\n If this parameter is set and the lock range requested cannot be\n immediately satisfied, samba will internally queue the lock\n request, and periodically attempt to obtain the lock until the\n timeout period expires.\n\n If this parameter is set to no, then samba will behave as previ-\n ous versions of Samba would and will fail the lock request imme-\n diately if the lock range cannot be obtained.\n\n Default: blocking locks = yes\n Why? Seems like it is less likely to trigger a failure if set on, allowing a grace period between lock request and final failure.
|
Post #200,411
3/24/05 9:08:21 PM
|
I've seen it produce time-outs.
[link|http://www.aaxnet.com|AAx]
|
Post #200,412
3/24/05 9:08:33 PM
|
I've seen it produce time-outs.
[link|http://www.aaxnet.com|AAx]
|
Post #200,414
3/24/05 9:12:33 PM
|
In that case: Done
|
Post #200,433
3/24/05 10:44:22 PM
|
And double posts, too! ;-)
-YendorMike
"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." - Benjamin Franklin, 1759 Historical Review of Pennsylvania
|
Post #200,401
3/24/05 8:23:29 PM
|
If the various suggestions don't help
You may want to place a do nothing command at the end. It could be a timing issue. last process completes at start of a cycle and the move tries to execute within the same cycle. Adding a dir or anything would give processor enough time to complete.
Have seen on windows, and on big iron.
A good friend will come and bail you out of jail ... but, a true friend will be sitting next to you saying, "Damn...that was fun!"
|
Post #200,407
3/24/05 9:02:48 PM
|
I hate timing dependant problems!
I had a user start running jobs on my system for the 1st time today. His very first job aborted with an error.
There are many steps in the run, doing the same things for different files. All other files / jobs succeeded.
When I reran the failed job, it worked just fine.
This error said it could not lookup a customer id in the mainframe VSAM file. The only way that could happen would be the project id did not match anything or there was an underlying failure in the mainframe VSAM SQL interface.
But previous tests showed the project / customer ID have been there for months.
I emailed the MF sysprog and asked if there was anything going on in the error logs that could help.
Nope.
I emailed the MIS COBOL programmers, asking if they were doing anything with the file that could lock me out.
Nope.
Then one of them responded, that they have seen this type of error before in their programs, but they reran and it worked and they never followed up.
ARRGG!!
The MF sysprog followed up, telling me he noticed an FTP log entry in that time frame. Our accounting department has a process that overwrites that file every day.
During the upload, it is inaccessible to everyone else. And if I don't code for the error condition, my program will abort.
For 27 seconds a day.
This has been this way for years, and none of our current COBOL programmers have ever coded around it. They'd rather have their code abort occasionally and rerun it.
And my user managed to hit this the 1st time he ever used my program!!!!!
And this user triggered the Windows rename / lock problem twice, as opposed to be seeing it once in 6 months previously.
|
Post #200,421
3/24/05 10:08:29 PM
|
First time I saw it was in COBOL class
Learning loops. One student straight coded, no loop. Program kept abending. Not knowing what we were looking at, we dumped the error log and looked at the generated assembler code. Comparing first pass to 2nd pass we found one instruction out of sequence. Student (not me) put in goto dummy and goto back everything worked. Since then I've always been aware of timing issues. They pop up at the most inopportune times. As you have just seen. :-(
A good friend will come and bail you out of jail ... but, a true friend will be sitting next to you saying, "Damn...that was fun!"
|
Post #200,413
3/24/05 9:10:31 PM
|
Done. And I added another move
\nSET CX_FONT_DIRECTORY=c:\\nofont\nt:\ncd \\afp_server\\watch\\in_process\ndate /t\ntime /t\necho Executing %1\ncmd /c %1\ndate /t\ntime /t\nREM KLUDGE - try a couple of moves for when the move fails.\nREM The extra should not hurt\ndir\nmove %1 ..\\done\ndir\nmove %1 ..\\done\n
|
Post #200,420
3/24/05 10:04:37 PM
|
Lemme know if it worked.
A good friend will come and bail you out of jail ... but, a true friend will be sitting next to you saying, "Damn...that was fun!"
|
Post #200,497
3/25/05 2:56:48 PM
|
Happens when you delete a file and it's opened elsewhere
IIRC, you have to use DISKCHK (or whatever the heck that utility is). Seen the problem when updating web pages from another server, as the web server has the file in cache, and deleting it from the other server will put the file in limbo.
|