IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New File locking problem
I have what might be a Samba problem, or it might be a
stupid windows scripting problem.

On Win2K, I have a simple batch file that executes another batch file.

\nSET CX_FONT_DIRECTORY=c:\\nofont\nt:\ncd  \\afp_server\\watch\\in_process\ndate /t\ntime /t\necho Executing %1\ncmd /c %1\ndate /t\ntime /t\nmove %1 ..\\done\n


One file could be run_16669.2005_21_24_14_21_04.16695.bat:

\nREM TIMEOUT=360\npnettc T:/prod_data/drop_off/wfd/p7934/prg_p7934_03142005_14_19_25.wfd ^\n-c T:/wfd/afp_job_files/afp300.job ^\n-e AFP ^\n-f c:/pnt/output/afp/ccip_p7934_c080_es1_seg_k01e_s0012.afp ^\n-difDataInput1 ^\nT:/prod_data/in/p7934es1/ccip_p7934_c080_es1_seg_k01e_s0012.txt  ^\n-l c:/pnt/output/afp/ccip_p7934_c080_es1_seg_k01e_s0012.afp.log\n


The final move places the batch file in a "done" dir, which signifies
it is done. The master workflow systems grabs it, picks up the
various logs, and moves on.

Sometimes though, ie: 1 job out of a 100 or so, the final "move"
fails. This happens:

\nThe process cannot access the file because it is being \nused by another process.\n


Process had completed, but the job appears to be hung since the
file never shows up in the done dir.

T: is a Samba share.

I have logging turned up to an insane level, so I have a
huge amount of detail.

Except I was notified 2 hours after this failure, and the logs
wrapped about 30 seconds after it happened. ARRGG!!

I could rewrite it as a Perl script, and add in some
errorcheck / retry logic. But I'd really like to know
WHY it was happening.
New I would assume that...
The process cannot access the file because it is being used by another process. :-P

Seriously, my painful recollections are that many Windows programs take out exclusive locks on anything that they open, and those locks block all other operations. So if anyone else wants to open your file for anything, they can cause huge problems.

My first guess would be to look for users who might be opening a file in some kind of editor (or Excel, or whatever else). My second guess would be to ask what happens if your program is trying to move a file that some automated job (eg a virus scan) happens to be looking at.

This information is based on things that used to be a PITA for me quite a few years ago. The default behaviour could well have changed and I have no relevant recent experience.

Cheers,
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
New Grrr...
The "drive" is a Linux share.

Virus scanning is set to look at local disk only.

No one else is accessing it, at least would be incredibly unlikely.
Here are the "permissions":

\n-rw-r--r--    1 spooler  prod 335 Mar 24 14:21 run_16669.2005_21_24_14_21_04.16695.bat\n


The directory itself has more open access to allow the files to be renamed in and out of it by the windows "user", but only the spooler id can open/lock it.

Right now only 3 people can even drill down into that area of the system, and they are not the "type".

Hey Greg: I can have fake locks set at a directory level in samba, right? Or am I forced to do it at share level? Hmmm - need to go research.

New I'm guessing, obviously
But my guess is that Samba makes what Windows think are locks act like what Windows thinks locks should act like. So it doesn't really matter what is actually happening on on the Linux side - just how it looks to Windows.

Cheers,
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
New I'm guessing, obviously
But my guess is that Samba makes what Windows think are locks act like what Windows thinks locks should act like. So it doesn't really matter what is actually happening on on the Linux side - just how it looks to Windows.

Cheers,
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
New Turn off Opportunistic locking for those files.
Or first try veto oplock files = /.*bat/ on the share (or something similar, look at man smb.conf for multiples)

If that dunnah help, try strict locking = no this maybe the cause as this is very slow when it is yes. if *THAT* doesn't work...

oplocks = no on the share.

Sheesh, if that doesn't work... Turn off kernel locking for samba using: kernel oplocks = no in the global section vs the share section.

If that doesn't do it, it is a Windows issue.



BTW, I slacked off on 3.0.12 due to some bugs that showed up in optimized code. v3.0.13 is available as of last night.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey

[link|http://it.slashdot.org/comments.pl?sid=134485&cid=11233230|"Microsoft Security" is an even better oxymoron than "Military Intelligence"]
No matter how much Microsoft supporters whine about how Linux and other operating systems have just as many bugs as their operating systems do, the bottom line is that the serious, gut-wrenching problems happen on Windows, not on Linux, not on Mac OS. -- [link|http://www.eweek.com/article2/0,1759,1622086,00.asp|source]
New I was thinking of the same thing
and writing it while you were writing this!!!
New Done
Case doesn't count, right?
Expand Edited by broomberg March 24, 2005, 08:07:37 PM EST
New You sure I don't want fake oplocks instead?
New NEVER.
That option is only for seriously Broken software.

Never. I just can't imagine.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey

[link|http://it.slashdot.org/comments.pl?sid=134485&cid=11233230|"Microsoft Security" is an even better oxymoron than "Military Intelligence"]
No matter how much Microsoft supporters whine about how Linux and other operating systems have just as many bugs as their operating systems do, the bottom line is that the serious, gut-wrenching problems happen on Windows, not on Linux, not on Mac OS. -- [link|http://www.eweek.com/article2/0,1759,1622086,00.asp|source]
New Not that I am aware of. Of course...
in this case I may be wrong or right.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey

[link|http://it.slashdot.org/comments.pl?sid=134485&cid=11233230|"Microsoft Security" is an even better oxymoron than "Military Intelligence"]
No matter how much Microsoft supporters whine about how Linux and other operating systems have just as many bugs as their operating systems do, the bottom line is that the serious, gut-wrenching problems happen on Windows, not on Linux, not on Mac OS. -- [link|http://www.eweek.com/article2/0,1759,1622086,00.asp|source]
New And "blocking locks" should be set to off.
[link|http://www.aaxnet.com|AAx]
New Err?
\n       blocking locks (S)\n              This  parameter  controls  the  behavior of smbd(8) when given a\n              request by a client to obtain a byte range lock on a  region  of\n              an  open  file, and the request has a time limit associated with\n              it.\n\n              If this parameter is set and the lock range requested cannot  be\n              immediately  satisfied,  samba  will  internally  queue the lock\n              request, and periodically attempt to obtain the lock  until  the\n              timeout period expires.\n\n              If this parameter is set to no, then samba will behave as previ-\n              ous versions of Samba would and will fail the lock request imme-\n              diately if the lock range cannot be obtained.\n\n              Default: blocking locks = yes\n


Why? Seems like it is less likely to trigger a failure if set on, allowing a grace period between lock request and final failure.
New I've seen it produce time-outs.
[link|http://www.aaxnet.com|AAx]
New I've seen it produce time-outs.
[link|http://www.aaxnet.com|AAx]
New In that case: Done
New And double posts, too! ;-)
-YendorMike

"They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety."
- Benjamin Franklin, 1759 Historical Review of Pennsylvania
New If the various suggestions don't help
You may want to place a do nothing command at the end. It could be a timing issue. last process completes at start of a cycle and the move tries to execute within the same cycle. Adding a dir or anything would give processor enough time to complete.

Have seen on windows, and on big iron.
A good friend will come and bail you out of jail ... but, a true friend will be sitting next to you saying, "Damn...that was fun!"
New I hate timing dependant problems!
I had a user start running jobs on my system for the 1st time today.
His very first job aborted with an error.

There are many steps in the run, doing the same things for
different files. All other files / jobs succeeded.

When I reran the failed job, it worked just fine.

This error said it could not lookup a customer id in the
mainframe VSAM file. The only way that could happen
would be the project id did not match anything or there
was an underlying failure in the mainframe VSAM SQL
interface.

But previous tests showed the project / customer ID have
been there for months.

I emailed the MF sysprog and asked if there was anything
going on in the error logs that could help.

Nope.

I emailed the MIS COBOL programmers, asking if they were
doing anything with the file that could lock me out.

Nope.

Then one of them responded, that they have seen this type
of error before in their programs, but they reran and it
worked and they never followed up.

ARRGG!!

The MF sysprog followed up, telling me he noticed an FTP log
entry in that time frame. Our accounting department has a
process that overwrites that file every day.

During the upload, it is inaccessible to everyone else. And
if I don't code for the error condition, my program will
abort.

For 27 seconds a day.

This has been this way for years, and none of our current
COBOL programmers have ever coded around it. They'd rather
have their code abort occasionally and rerun it.

And my user managed to hit this the 1st time he ever used
my program!!!!!

And this user triggered the Windows rename / lock problem twice, as
opposed to be seeing it once in 6 months previously.
New First time I saw it was in COBOL class
Learning loops. One student straight coded, no loop. Program kept abending. Not knowing what we were looking at, we dumped the error log and looked at the generated assembler code. Comparing first pass to 2nd pass we found one instruction out of sequence. Student (not me) put in goto dummy and goto back everything worked. Since then I've always been aware of timing issues. They pop up at the most inopportune times. As you have just seen. :-(
A good friend will come and bail you out of jail ... but, a true friend will be sitting next to you saying, "Damn...that was fun!"
New Done. And I added another move
\nSET CX_FONT_DIRECTORY=c:\\nofont\nt:\ncd  \\afp_server\\watch\\in_process\ndate /t\ntime /t\necho Executing %1\ncmd /c %1\ndate /t\ntime /t\nREM  KLUDGE - try a couple of moves for when the move fails.\nREM  The extra should not hurt\ndir\nmove %1 ..\\done\ndir\nmove %1 ..\\done\n

New Lemme know if it worked.
A good friend will come and bail you out of jail ... but, a true friend will be sitting next to you saying, "Damn...that was fun!"
New Happens when you delete a file and it's opened elsewhere
IIRC, you have to use DISKCHK (or whatever the heck that utility is). Seen the problem when updating web pages from another server, as the web server has the file in cache, and deleting it from the other server will put the file in limbo.
     File locking problem - (broomberg) - (22)
         I would assume that... - (ben_tilly) - (3)
             Grrr... - (broomberg) - (2)
                 I'm guessing, obviously - (ben_tilly)
                 I'm guessing, obviously - (ben_tilly)
         Turn off Opportunistic locking for those files. - (folkert) - (11)
             I was thinking of the same thing - (broomberg)
             Done - (broomberg) - (3)
                 You sure I don't want fake oplocks instead? -NT - (broomberg) - (1)
                     NEVER. - (folkert)
                 Not that I am aware of. Of course... - (folkert)
             And "blocking locks" should be set to off. -NT - (Andrew Grygus) - (5)
                 Err? - (broomberg) - (4)
                     I've seen it produce time-outs. -NT - (Andrew Grygus)
                     I've seen it produce time-outs. -NT - (Andrew Grygus) - (2)
                         In that case: Done -NT - (broomberg)
                         And double posts, too! ;-) -NT - (Yendor)
         If the various suggestions don't help - (jbrabeck) - (4)
             I hate timing dependant problems! - (broomberg) - (1)
                 First time I saw it was in COBOL class - (jbrabeck)
             Done. And I added another move - (broomberg) - (1)
                 Lemme know if it worked. -NT - (jbrabeck)
         Happens when you delete a file and it's opened elsewhere - (ChrisR)

Bad command or filename.
366 ms