Post #65,745
11/29/02 10:35:45 PM
|
AARG
What are the symptoms? APM can cause lots of headaches.
Some older laptops have weirdo Cardbus controllers.
-drl
|
Post #65,750
11/29/02 11:19:44 PM
|
Not home @ the moment, but
In no particular order... - Format fails
- During the RPM transaction for MDK9&RH8 kernel installation, install locks hard.
- During Knoppix knx-hdinstall, locks hard
- Debian base installs, but during successive apt-get installs, the 'available' database would corrupt, usually with screwed up spacing or line endings
- X Windows, when I get that far, only starts with knoppix-cd; others fail with 'font fixed not found'; trying to use mkfontdir results in segmentation faults
There are other problems,but they escape me at the moment...
Imric's Tips for Living- Paranoia Is a Survival Trait
- Pessimists are never disappointed - but sometimes, if they are very lucky, they can be pleasantly surprised...
- Even though everyone is out to get you, it doesn't matter unless you let them win.
|
Post #65,766
11/30/02 1:55:39 PM
|
ROFLMAO - Now Knoppix won't boot! This thing IS haunted!
Imric's Tips for Living- Paranoia Is a Survival Trait
- Pessimists are never disappointed - but sometimes, if they are very lucky, they can be pleasantly surprised...
- Even though everyone is out to get you, it doesn't matter unless you let them win.
|
Post #65,822
12/1/02 1:35:47 AM
|
Exorcism in progress
Posting this from Debian/Konqueror right now... only two manual edits of 'available', too!
What a PITA this is!
Imric's Tips for Living- Paranoia Is a Survival Trait
- Pessimists are never disappointed - but sometimes, if they are very lucky, they can be pleasantly surprised...
- Even though everyone is out to get you, it doesn't matter unless you let them win.
|
Post #65,823
12/1/02 1:46:05 AM
|
Pita??? Thought it was pea soup ;-)
You were born...and so you're free...so Happy Birthday! Laurie Anderson
[link|mailto:bepatient@aol.com|BePatient]
|
Post #65,851
12/1/02 9:56:31 AM
|
Pea soup for dipping PITAs...
Seriously, I have NEVER had this much trouble installing an OS. It's still shaky, too. KDE crashed last night, for example... But - I WIN!
Imric's Tips for Living- Paranoia Is a Survival Trait
- Pessimists are never disappointed - but sometimes, if they are very lucky, they can be pleasantly surprised...
- Even though everyone is out to get you, it doesn't matter unless you let them win.
|
Post #65,887
12/1/02 1:59:19 PM
|
Grrrr. available hosed again.
Imric's Tips for Living- Paranoia Is a Survival Trait
- Pessimists are never disappointed - but sometimes, if they are very lucky, they can be pleasantly surprised...
- Even though everyone is out to get you, it doesn't matter unless you let them win.
|
Post #65,921
12/1/02 7:57:52 PM
|
It's looking like a hardware problem, Imric.
"Ah. One of the difficult questions."
|
Post #67,654
12/9/02 6:47:01 PM
|
Yup. Memory. Posting from Debian now.
Unfortunately I'm running on 64mb until I can afford more. :( I hate being poor.
Imric's Tips for Living- Paranoia Is a Survival Trait
- Pessimists are never disappointed - but sometimes, if they are very lucky, they can be pleasantly surprised...
- Even though everyone is out to get you, it doesn't matter unless you let them win.
|
Post #67,656
12/9/02 6:51:38 PM
|
Skip... Drop me a message to my e-mail...
We'll talk.
I got lots o hardware. Might cost you shipping if all goes as planned.
[link|mailto:curley95@attbi.com|greg] - Grand-Master Artist in IT [link|http://www.iwethey.org/ed_curry/|REMEMBER ED CURRY!!!]
Your friendly Geheime Staatspolizei reminds: [link|http://www.wired.com/news/wireless/0,1382,56742,00.html|Wi-Fi enabled device use] comes with an all inclusive free trip to the (county)Photographer! Overbooking, is a problem, please be prepared for "room-ies".
Why You ask? Here is the answer to your query: SELECT * FROM politicians WHERE iq > 40 OR \\ WHERE ego < 1048575; 0 rows found
|
Post #66,031
12/2/02 10:13:18 AM
|
Re: Not home @ the moment, but
Hard locks without warning usually are interrupt conflicts. Use PCI setup to reserve interrupts for ISA devices, or force all PCI interrupt steering to IRQ 11 or some such.
Linux installs are not perfect. My old laptop would not seat Linux after getting upgraded to 256M RAM. Windows 2000 did install and runs fine.
Hardware probing was locking my brand new Thinkpad until I futzed with the PCI interrupt settings.
If you want to ship me the machine, I GUARANTEE you I will get Linux running on it :)
-drl
|
Post #66,183
12/2/02 3:39:24 PM
|
What?
If you want to ship me the machine, I GUARANTEE you I will get Linux running on it :) Where's the fun in that (yeah, I have a twisted idea of fun...)? Thanks, though! I've played with the PCI settings - and that WAS part of the problem, initially - playing with them is what got Debian to boot at all. Right now, I think it is marginal HW that's screwing me up. This box is built from parts of old dead PCs - it's my 'Frankenbox Monster' - heck the HDDs were scavenged from a dumpster here in my apt. complex! The memory was mostly bought (128m), but the last 64mb came from dead machines here in my apt. The video (the Trident Cyberblade / 32m) was also purchased, because I could pick it up for $30 at a PC trade show that a buddy took me to. The 3c509 card is many years old - it used to run in 'Max', my old server - but it gave me problems there intil I replaced it with a generic PCI card a year ago. Replacing an IDE cable has bought me a little more reliability, it seems - now I can 'reliably' do a Debian net install of the base system, at least. Next on the 'hit parade' is trying new memory (borrowing some for the experiment).
Imric's Tips for Living- Paranoia Is a Survival Trait
- Pessimists are never disappointed - but sometimes, if they are very lucky, they can be pleasantly surprised...
- Even though everyone is out to get you, it doesn't matter unless you let them win.
|
Post #66,130
12/2/02 1:20:46 PM
|
Looks to be bad hardware
Well, you know, if my machine did that, I'd immediately think bad hardware. The primary candidates would be the RAM and the hard-drive host adapter, in that order. If there's a possibility of heat build-up, that would also be worth checking. (Check for overall heat build-up, jammed or blocked fans, slipped heat-sinks.) Also, on a theory that you might be trying to push some hardware component past spec, go into your BIOS Setup program, and revert its state to factory defaults (because those are the most conservative and least aggressive).
My first step actually would be to burn an LNX-BBC disk, or in any other way acquire bootable media capable of running memtest86, which I would then run overnight. RAM that passes an overnight run of memtest86 can be pretty much ruled out as a suspect.
Sometimes, people I tell this to will resist, saying "But I've installed operating system Foo a dozen times, with no problem." However, either the hardware glitch has arisen since your last installation, or Foo is oblivious to most hardware faults, or both. For example, all Microsoft OSes' installers of my acquaintance will blithely ignore severe memory errors and cheerfully create configurations that will then slowly and silently autocorrupt on account of data passing through the bad RAM patch thereafter. NetWare and OS/2 are significantly better. FreeBSD and Linux show signs of distress that are immediate and unmistakeable, if you're familiar with them.
For non-RAM hardware errors (such as screwy ATA chipsets), FreeBSD and Linux are sometimes a bit more opaque in the clarity of their complaints, but they'll definitely let you know something is wrong. And that's precisely what your situation sounds like.
Rick Moen rick@linuxmafia.com
If you lived here, you'd be $HOME already.
|
Post #66,153
12/2/02 2:21:01 PM
|
On the face of it, disagree because this is a laptop
..and you know they have elaborate POSTs, special BIOSen, power management issues (I've heard that using lm-sensors on a ThinkPad can just wipe it out, fragging the EEPROM).
As for memtest86, don't know how that is put together but it fails on my Toshiba Tecra 8000 with 256M (not 128M), while Windows 2000 runs fine there. I'll guess that Windows is more conservative about timing issues.
-drl
|
Post #66,187
12/2/02 3:55:49 PM
|
Yeah, my 390x was a pain at first, too!
I had trouble with the 'legacy modem' setting with it, though...
Imric's Tips for Living- Paranoia Is a Survival Trait
- Pessimists are never disappointed - but sometimes, if they are very lucky, they can be pleasantly surprised...
- Even though everyone is out to get you, it doesn't matter unless you let them win.
|
Post #66,219
12/2/02 6:22:45 PM
|
Be very careful about bad RAM, please
deSitter wrote:
On the face of it, disagree because this is a laptop..and you know they have elaborate POSTs, special BIOSen, power management issues (I've heard that using lm-sensors on a ThinkPad can just wipe it out, fragging the EEPROM).
I honestly hadn't noticed that he said this was a laptop. In the case of laptops, you can indeed have special problems: I tend to almost unconsciously disable power management on them, and avoid using funny stuff that can complicate diagnosis (lm-sensors, apmd, tweaked non-conservative ATA driver settings, etc.). Anyone who's playing with such things should (obviously) switch them off as a necessary initial diagnostic step.
I'm not sure what your point is about laptops' Power-On Self-Tests. They should not be relevant subsequent to boot-up. I urge placing scant-to-no reliance on any assurance they might purport to give about one's hardware being healthy. In my experience, their alleged hardware-OK results are nearly meaningless, particularly as to RAM, but also in other areas.
As for memtest86, don't know how that is put together but it fails on my Toshiba Tecra 8000 with 256M (not 128M), while Windows 2000 runs fine there. I'll guess that Windows is more conservative about timing issues.
If my RAM failed the current memtest86 version, and wasn't some extremely new type of RAM on some very new motherboard chipset, I would be strongly inclined to believe it defective. I have seen various NT versions pose no objections to RAM that turned out later to be proveably defective. I would be especially cautious on this matter given the dropping of parity support from practically all x86 machines, following the example of Intel's Triton series motherboard chipsets, some years back.
Again, the reason to be cautious is that undetected bad RAM can and does lead to gradual, progressive, and silent corruption of data that happens to pass through it.
On the other hand, if some assurance about hardware quality from MS-Windows 2000 strikes you as good enough to bank on, then I hope your trust is justified, and wish you the best of luck.
Rick Moen rick@linuxmafia.com
If you lived here, you'd be $HOME already.
|
Post #66,235
12/2/02 6:58:59 PM
|
Laptop?
Ross, having reviewed the gentleman's post, it most definitely does not sound like a laptop: I note mention of a "3c509 card", later replaced "with a generic PCI card".
If that's a laptop, I want to see it, soonest.
Rick Moen rick@linuxmafia.com
If you lived here, you'd be $HOME already.
|
Post #66,186
12/2/02 3:54:05 PM
|
I concur.
No heat build up, though. I have started over from scratch BIOS-wise, too. *smile* either the hardware glitch has arisen since your last installation, or Foo is oblivious to most hardware faults, or both. This whole thing started when the HDD where Debian used to live fried itself badly enough to keep the rest of the system from booting. and OS/2 are significantly better I've been thinking of dusting off my old Warp CDs and installing just for giggles, anyway. I get nostalgic for the WPS, sometimes ( DAMN IBM!) For non-RAM hardware errors (such as screwy ATA chipsets), FreeBSD and Linux are sometimes a bit more opaque in the clarity of their complaints, but they'll definitely let you know something is wrong. And that's precisely what your situation sounds like. Yah. Well - it's a learning experience! At least this machine isn't vital to anything - it's the box I've got in my living room for surfing and games...
Imric's Tips for Living- Paranoia Is a Survival Trait
- Pessimists are never disappointed - but sometimes, if they are very lucky, they can be pleasantly surprised...
- Even though everyone is out to get you, it doesn't matter unless you let them win.
|
Post #66,233
12/2/02 6:48:02 PM
12/2/02 10:17:19 PM
|
Hardware diagnosis
imric wrote:
This whole thing started when the HDD where Debian used to live fried itself badly enough to keep the rest of the system from booting.
Depending on what you mean by "fried", this might point to the underlying cause.
Back a number of years ago, we were having a several-day heat wave in San Francisco, reaching about 40 Celsius (maybe 103 F). That day, I was in my apartment, stripped to my waist, and made the judgement error of trying to reload my 486 tower with OS/2 3.0 from CD media, with the case open (and thus no proper forced-air flow). Furthermore, I'd made the error of mounting one of the system's two fast SCSI drives mounted with only about 5 mm of air space between the drive's electronics and a blocking steel bracket. As a result, the drive's electronics partially cooked, and some data writes thereafter were not reliably correct. Everything seemed OK at the software level: CRC values got calculated and confirmed, but in some cases the CRCs were generated and stored (accurately) on the basis of already incorrect datastreams. I didn't figure out what had happened for a few days, and could confirm it only indirectly after swapping out the drive.
Anyhow, my point is that if there has been a past heat episode in the machine we're talking about, you need to consider the possibility that more damage occurred at that time, not just the drive that got "fried", then. There may have been heat stress to other components, or electrical damage. (People are sometimes surprised when they ask me "May I test this board I think is bad on your motherboard?" and I reply "No!" Somehow "Oops, I'm sorry" doesn't suffice when someone damages my motherboard.)
If it eventually comes down to "Some hardware is acting up, I'm not sure which, and I've ruled out software/firmware configuration", then I guess you'll be spending a fun time honing your hardware diagnostic skills: You can try removing non-essential hardware, to see if they problem goes away. You can test suspect equipment a piece at a time in a different system that's known-good. You can swap in temporary replacements for suspect components in the suspect machine. There are no doubt other basic approaches that aren't readily coming to mind, but those are some of the classics.
Rick Moen rick@linuxmafia.com
If you lived here, you'd be $HOME already.
Edited by rickmoen
Dec. 2, 2002, 10:17:19 PM EST
|
Post #66,237
12/2/02 7:15:17 PM
|
Yup! Fun...
The genesis of this machine. Why else build a machine out of trash? *grin*
Imric's Tips for Living- Paranoia Is a Survival Trait
- Pessimists are never disappointed - but sometimes, if they are very lucky, they can be pleasantly surprised...
- Even though everyone is out to get you, it doesn't matter unless you let them win.
|
Post #66,295
12/2/02 11:00:59 PM
|
Smoke Emitting FPUs
I had an easy diagnosis once.
I learned UNIX from a guru, and he put me, his one and only student, in charge of the engineering development machine for a Star Wars project.
Well, the Main Programmer was a PhD in math - so he hated physicists. He'd always look at me in the lab as if to say - "Physicists! Bah!"
But root was the brains of the operation. Root was so called because he was ugly as sin but had a beautiful lady on his arm all the time, and drove a purple Cadillac. I drove a red Mustang 5-liter convertible. He was good at ballroom dancing. I wasn't. (But I did OK too :)
Anyway for the development system root put me in charge of - one day the long sought-after external FPU for the 68030 (68882?) arrived, and root bid me install it in my own system! Well, Mr. math PhD would not hear of it! He insisted on installing it himself! Well, he installed it sideways (it was square) and that was not good. The very expensive FPU was destroyed, but the rest of the system lived on and accepted a new FPU a week later, installed by me.
I never talked much to Mr. Math PhD after that. He said my C programming looked like FORTRAN - and he was right. He wouldn't admit it, but he was impressed with my VAX FORTRAN parser written in XEDIT on an IBM 4341 running VM/CMS.
Pretty soon Star Wars ended, and Business by Numbers began, and then it was now.
-drl
|
Post #66,336
12/3/02 1:18:05 AM
12/3/02 4:18:55 PM
|
Re: Smoke Emitting FPUs
Good story there. Thanks.
I did a bonehead manoeuver, once, at the time I bought my K6/233 motherboard. Not as bad as your math Ph.D colleague's, and more cheaply recoverable, but funny.
This was back when you paid only a small premium for building up from individual components of your choosing: The price gap has widened since then. I was trying to put together a system from quality, maximum-bang-for-buck components, and so got an FIC PA-2007 mini-AT-format mainboard, K6/233, a separate huge CPU heat sink with ball-bearing fan, big tower case, PC Power & Cooling TurboCool PSU, an Adaptec AHA-2940UW, Number Nine Imagine 128 video, and 64 MB Corsair SDRAM (augmented later on, but only 64 MB initially) rated for CAS2 operation at 100MHz on the local memory bus. Carried forward from older boxen were an old Keytronics motherboard, Logitech Trackman Marble, a pair of 10k RPM IBM wide-SCSI hard drives, an old Toshiba ATAPI CD-ROM, a Toshiba floppy, and a pair of antique 3Com 3C509 ethernet cards.
I banged the machine together, flipped it on, loaded Debian, and started trying to use it. SIG11 errors started cropping up. Hmm, that's odd. Power it down, check for sources of heat problems, spaced out the cards and hard drives, made sure all fans were working, made sure there was no air blockage. Took a coping saw and made a cutout on the drive bracket, and mounted another muffin fan pointing upwards at the hard drives through it, just for an additional safety factor. Made sure nothing was jamming the CPU or power supply fans. Put the beast back together, powered up: No problems. Good. Half hour later, a consistent string of SIG11s, especially if I started doing strenuous things like kernel compiles. Damn.
Open 'er up again. Recheck for heat concentrations: None. Everything seems to be running cool. OK, am I seeing a sensitive CPU, dodgy RAM, or what? Run memtest86, no problems. Hmm. I'm being tempted, at this point, to try exchanging the CPU. But I want to try one thing out: I check out where the heat sink was mounted onto the CPU: I had a sort of heat-transferring pad between them, that came with the heat sink. So, I go down to Fry's Electronics, and buy some more of that stuff, and also a tube of heat-conducting paste, which one can use instead, in such places. Get back to my machine, take the heat sink off. Try a freshly cut piece of conductive pad. Basically the same results, except a pattern is starting to emerge: I can do the heaviest number-crunching I like within the first 20-30 minutes from power-on, without trouble. Afterwards, anything even mildly strenuous (e.g., kernel compiles) will generate Signal 11 errors. Conclusion: Something is heating up, 1/2 hour from power-on. It's not a strictly operational thing, but rather heat build-up over time.
I shut down, and take the CPU heat sink off, again. Pull the pad out, and clean all surfaces. Put the heat-conducting paste onto the bottom of the heat sink, latch it onto the CPU. Lock everything back in place, re-test. Same results. Hmm, dodgy CPU?
Shortly before I jump to that (wrong) conclusion, I pull the heat sink off, again. I put the heat sink upside-down on my desk and stare at it. Oh, fsck. What a dumbass error.
The outline of the K6 is square, as is its socket. The heat sink is also square. Unlike your Ph.D friend's FPU, the K6's socket is keyed to admit it in only one orientation, so you cannot fry it that way. So, you think, square CPU that can be inserted only the correct way; square heat sink that goes on top; lever that locks the heat sink down. Put heat sink A on chip B in socket C; what can go wrong? There are two orientations the heat sink can mount onto the CPU; 180 degree rotations of one another. Both must be right, right?
Wrong. The contact area on top of the CPU is offset towards one side, and is maybe 65% of the total square. The bottom surface of the heat sink projects downwards only on the lateral side only where the matching 65% of its contact surface can touch the CPU's contact area below it. You can verify that this has happened if/when you yank the heat sink and newly-squeezed thermal paste, because the thermal paste will be mashed from contact with the CPU. Likewise, the CPU's contact area will have some detached thermal paste smeared across it.
Only mine wasn't. On the heat sink bottom, only the paste near the centre of its contact area was mashed, with the remainder dangling into space exactly the way I'd dabbed it on. The CPU's upper surface sported only a little paste, near the middle: Most of the CPU's contact surface had been air-gapped, rather than heat-sinked. Luckily, trapped heat across most of the CPU had not turned it into silicon carne, and everything worked beautifully once I flipped the friggin' heat sink around the correct way.
The silver lining is that that machine emerged from the exercise with awesomely efficient cooling.
Rick Moen rick@linuxmafia.com
If you lived here, you'd be $HOME already.
Edited by rickmoen
Dec. 3, 2002, 04:18:55 PM EST
|
Post #66,354
12/3/02 4:52:24 AM
|
Heh. Good story. :-)
"Ah. One of the difficult questions."
|
Post #66,249
12/2/02 8:20:59 PM
|
+5 Informative.
Besides, most Unices (Linux included) have a reputation of being tougher on the memory hardware than any version of Windows.
Wade.
"Ah. One of the difficult questions."
|
Post #66,261
12/2/02 9:07:05 PM
|
OS/2's install was always killer on bad memory.
TRAP 0E or somesuch. We had a machine that refused to install OS/2, turned out it was bad memory.
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #66,284
12/2/02 10:15:26 PM
|
Re: OS/2's install was always killer on bad memory.
Ja, I remember that. TRAP0002, which was a rather opaquely worded software-level report of NMI (non-maskable interrupt) problems at the underlying hardware level. Damn, that's starting to come back to me.
You generally got slightly more immediate, slightly more strident, and slightly easier-to-interpret memory-fault error reporting from the Linux kernel (not that "Signal 11" and "Segmentation Fault" is exactly crystal-clear, either), but OS/2 was/is pretty good at that.
Rick Moen rick@linuxmafia.com
If you lived here, you'd be $HOME already.
|
Post #66,291
12/2/02 10:46:56 PM
|
This was ca. 1992
I didn't start running Linux until '93. And at that point Linux certainly wasn't being used in "real businesses", let alone a big 6 CPA firm... ;-)
Which suggests a topic for a new thread...
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|