IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New Intermittent problem with Oracle
Well here's a fun one.

We have an application that is humming along on high volume against Oracle 8i. Normal (fairly low) levels of errors (mostly due to robots submitting bad data).

And then 2 web pages start having problems. Not every time you try to save data on them, just 80% of the time or so. If you're lucky enough to get a successful save, the write really does make it to the database. And it doesn't depend on the data, if you hit "Refresh" enough it would eventually save. Luckily they are low-volume pages, not a lot of people see them. Unluckily we tend to care about the people who see them, they are walking distance from the programmers and we try to be on good terms with them.

No code changed on disk. No ddl happened on the database. The application logs show tons of ORA-06502: PL/SQL: numeric or value error messages. (You know, what you'd expect if you tried to save text to an integer...) No errors on Oracle's alert log, from the operating system, or from Veritas.

3 hours later the problem stopped on its own.

So what to do now? Try to escalate it when we have no diagnostics indicating what happened? Or sit tight because we're planning to upgrade to 10G anyways shortly?

Luckily the decision is out of my hands, but I've never seen anything like it. (Nor do I wish to see anything like it again...)

Cheers,
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
New To me this sounds like a
Socket or networking problem.

Is the DB local to the webserver? (I doubt it)

Assuming Remote DB.

I'd like to believe it was some kind of framing issue betwixt the Webserver and the DB.

Maybe some kind of unusual fragmenting.

Of course... the come from /dev/ass
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
No matter how much Microsoft supporters whine about how Linux and other operating systems have just as many bugs as their operating systems do, the bottom line is that the serious, gut-wrenching problems happen on Windows, not on Linux, not on Mac OS. -- [link|http://www.eweek.com/article2/0,1759,1622086,00.asp|source]
Here is an example: [link|http://www.greymagic.com/security/advisories/gm001-ie/|Executing arbitrary commands without Active Scripting or ActiveX when using Windows]
New Anything is possible, but that seems unlikely
Of course everything is unlikely in this scenario. There are no likely explanations.

Yes, the database is remote from the webservers. Each webserver has multiple Apache children with independent database connections. The problem showed up on all webservers at the same time, across multiple children. It cleared up on all at the same time. And, as I said, it did not affect other tables, many of which have far higher levels of updates.

That makes me suspect that the problem must be internal to the database. Having the purportedly impossible happen once is unlikely. Having it happen 4 times in sync is less likely. Having it happen at a layer that doesn't have the knowledge to figure out when to work/fail to produce the symptom is again unbelievable.

This looks like what I'd expect if, say, you corrupted the internal memory in a validation routine, and then later that corrupted memory got reloaded from disk. (I'm not saying that that's what happened, just that that's something that could wind up looking like this.)

Cheers,
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
New Then I'd agree...
as you left out the crucial pieces I needed to rule out the communications isues.
[SNIP stuff I really shouldn't have written about wonderful Oracle and Memory useage]
Yeah, I know... SHUDDUP.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
No matter how much Microsoft supporters whine about how Linux and other operating systems have just as many bugs as their operating systems do, the bottom line is that the serious, gut-wrenching problems happen on Windows, not on Linux, not on Mac OS. -- [link|http://www.eweek.com/article2/0,1759,1622086,00.asp|source]
Here is an example: [link|http://www.greymagic.com/security/advisories/gm001-ie/|Executing arbitrary commands without Active Scripting or ActiveX when using Windows]
New Sounds like Oracle have been taking lessons from Microsoft.
You know, writing non-deterministic programs. IMnsHO, Microsoft's greatest sin is teaching the great unwashed that computers are non-deterministic (e.g. a reboot often fixes things).

Wade.

Is it enough to love
Is it enough to breathe
Somebody rip my heart out
And leave me here to bleed
 
Is it enough to die
Somebody save my life
I'd rather be Anything but Ordinary
Please

-- "Anything but Ordinary" by Avril Lavigne.

New Yeah - which is why I dislike the situation so much
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
New You want non-deterministic?
We install Office XP via group policy here at work.

Over the past few days, local computers have taken to becoming convinced that they are no longer members of the security groups which authorize the installation of Office XP on to the computers. In effect, when people reboot their computers...

Office XP uninstalls. Sometimes.

Then we "gpupdate /force" about a dozen or so times, it reinstalls after a reboot, goes humming along for a few days...

And then does it again.
"Here at Ortillery Command we have at our disposal hundred megawatt laser beams, mach 20 titanium rods and guided thermonuclear bombs. Some people say we think that we're God. We're not God. We just borrowed his 'SMITE' button for our fire control system."
New And people wonder why I won't admin Windows Servers... :-)

Is it enough to love
Is it enough to breathe
Somebody rip my heart out
And leave me here to bleed
 
Is it enough to die
Somebody save my life
I'd rather be Anything but Ordinary
Please

-- "Anything but Ordinary" by Avril Lavigne.

New No I don't.
I have zero interest in doing the same.
--
[link|mailto:greg@gregfolkert.net|greg],
[link|http://www.iwethey.org/ed_curry|REMEMBER ED CURRY!] @ iwethey
No matter how much Microsoft supporters whine about how Linux and other operating systems have just as many bugs as their operating systems do, the bottom line is that the serious, gut-wrenching problems happen on Windows, not on Linux, not on Mac OS. -- [link|http://www.eweek.com/article2/0,1759,1622086,00.asp|source]
Here is an example: [link|http://www.greymagic.com/security/advisories/gm001-ie/|Executing arbitrary commands without Active Scripting or ActiveX when using Windows]
New I have zero interest in administrating Windows.
However, I have a very high interest level in my paycheck.
"Here at Ortillery Command we have at our disposal hundred megawatt laser beams, mach 20 titanium rods and guided thermonuclear bombs. Some people say we think that we're God. We're not God. We just borrowed his 'SMITE' button for our fire control system."
New Anything else...
...is the tail wagging the dog.


Peter
[link|http://www.debian.org|Shill For Hire]
[link|http://www.kuro5hin.org|There is no K5 Cabal]
[link|http://guildenstern.dyndns.org|Home]
New Adminning Windows is too much stress for me.

Is it enough to love
Is it enough to breathe
Somebody rip my heart out
And leave me here to bleed
 
Is it enough to die
Somebody save my life
I'd rather be Anything but Ordinary
Please

-- "Anything but Ordinary" by Avril Lavigne.

New It's not the computers that stress me...
...and never has been.

It’s the people. It’s the commute. It’s the numbskull HR policies. It’s the mental maze that is the expenses form. It’s the political wrangling over who’s got control of the local DNS/AD/LDAP/NDS/DTSS/NTP/whatever.

The actual computers and the software that runs on them are fairly low down on the list of things to get wound up about.

An OS is an OS is an OS. They’re all shit, in one way or another. Fanboidom is pointless; I used to say that I’d work on anything with a power switch.



Peter
[link|http://www.debian.org|Shill For Hire]
[link|http://www.kuro5hin.org|There is no K5 Cabal]
[link|http://guildenstern.dyndns.org|Home]
New When Linux breaks, there's a reason.
And usually a nice sane reason. And if it's an insane reason, it's usually because someone you probably work with has made an insane change.

IME, Windows likes to sometimes break "just because it feels like it". This is stressful. More so when you have someone relying on you to have it working - which it was an hour earlier - and they are wondering what kind of meds you're on because it's broken, you keep telling them you don't know why, and they're not sure they believe you when say you didn't touch it.

</rant>

OTOH, I *have* had major major stress from administering a security policy created by auditors who decided to Do Things Right but secure in the knowledge that It Would Not Be Them Doing It. Right down to specifying equipment they couldn't believe was reliable only because we wouldn't follow some of their more extreme rules. I believe Those In Charge wisely and quietly decided, unfortunately after this was irrevocably in place, that auditors were never to be allowed to do that again.

Wade.

Is it enough to love
Is it enough to breathe
Somebody rip my heart out
And leave me here to bleed
 
Is it enough to die
Somebody save my life
I'd rather be Anything but Ordinary
Please

-- "Anything but Ordinary" by Avril Lavigne.

New anything in the system logs?
If you are in a cluster environment check their logs for around that timestamp. If it happens again attempt to see if top(program that shows which program is using the most system resources) indicates what piece of the application is choking.
regards,
daemon
that way too many Iraqis conceived of free society as little more than a mosh pit with grenades. ANDISHEH NOURAEE
clearwater highschool marching band [link|http://www.chstornadoband.org/|http://www.chstornadoband.org/]
New We checked those logs, and we intend to do more
if it happens again.

For one thing this highlighted a lack in our monitoring so that we can be on it in a more timely way next time. Then we have better odds of being able to observe it in real time.

Cheers,
Ben
I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)
New thats always the hardest part it takes a sneaky outage
to realize where the monitoring has gaps.
regards,
daemon
that way too many Iraqis conceived of free society as little more than a mosh pit with grenades. ANDISHEH NOURAEE
clearwater highschool marching band [link|http://www.chstornadoband.org/|http://www.chstornadoband.org/]
     Intermittent problem with Oracle - (ben_tilly) - (16)
         To me this sounds like a - (folkert) - (2)
             Anything is possible, but that seems unlikely - (ben_tilly) - (1)
                 Then I'd agree... - (folkert)
         Sounds like Oracle have been taking lessons from Microsoft. - (static) - (9)
             Yeah - which is why I dislike the situation so much -NT - (ben_tilly)
             You want non-deterministic? - (inthane-chan) - (7)
                 And people wonder why I won't admin Windows Servers... :-) -NT - (static) - (6)
                     No I don't. - (folkert) - (5)
                         I have zero interest in administrating Windows. - (inthane-chan) - (4)
                             Anything else... - (pwhysall)
                             Adminning Windows is too much stress for me. -NT - (static) - (2)
                                 It's not the computers that stress me... - (pwhysall) - (1)
                                     When Linux breaks, there's a reason. - (static)
         anything in the system logs? - (daemon) - (2)
             We checked those logs, and we intend to do more - (ben_tilly) - (1)
                 thats always the hardest part it takes a sneaky outage - (daemon)

I'll even go so far as to concede that today's postliterate teenagers are already sufficiently impaired by pop culture that anything that serves to make them even more scatterbrained is rather in the way of gilding the lily, and should not be encouraged.
130 ms