Computer Stories from Field Service and System Support ------------------------------------------------------ When I worked for a company that had a contract with 3M, 3M had asked me to write them a memo describing why we were having problems with diskette failures. I said in the memo that the disks were failing due to head crashes. "If the customers would just clean their heads periodically, we wouldn't have these problems," I said in the memo. One customer responded with "What kind of shampoo do you recommend?" ------------------------------------------------------------------------------ An end-user hotline received a call about a bad software disk. They asked the customer to make a copy of the disk and mail it in to the hotline. A few days later, they received a letter with a mimeographed copy of the disk. Since it was a double-sided disk, both sides of the disk had been xeroxed. ----------------------------------------------------------------------------- A Computer Operator says as she is lifting an RP06 disk pack from the drive: "Gee, how much does one of these weigh?" Me: "It depends on how much data is on the disk.... The operator believed it. ----------------------------------------------------------------------------- I had a similar experience while working as a student operator at Michigan Tech. One particularly trying afternoon, the computer was merrily crashing for a number of reasons. After about four such spectacles, we broadcast that the computer would be down for the remainder of the afternoon. There was a resigned groan from the users and they began to file out of the Center, except for one comely young woman with wide blue eyes who wandered up to the counter and queried: "What's wrong with the computer?" Too tired and irritated to give her a straight answer, I looked her straight in the eye and replied: "Broken muffler belt." A look of deep concern wafted into her expression as she asked: "Oh, that's bad. Can you call Midas?" ------------------------------------------------------------------------------ I work for University Computing Services answering questions about any and all aspects of computing here, and as a result I run into some truly astonishing mental densities... A few excerpts from the Helpdesk: Caller: "What's the name for when you're entering data into the computer?" HD: "Data Entry." Caller: "Thank you!" ---------------------------------------------------------------------------- Overheard in a student computer lab: Client (raising hand and waving frantically): "The computer says 'Enter your name and press RETURN.' What do I do??" Lab Assistant: "Enter your name and press RETURN." Client (as if a revelation has struck): "Oh!" ----------------------------------------------------------------------------- Another friend of mine in a similar situation reports having a student in the lab one day, who had to abort out of the SET PASSWORD sequence because he couldn't think of a six-letter word. ----------------------------------------------------------------------------- Here's another drive story. I work at the computer workstation labs here on campus and a number of people always comes in to utilize the equipment to write their papers, do their programs, etc... Well, the lab is stocked with IBM PS/2's, Mac II's, and VaxStation 2000's. When it gets busy towards the middle of the semester, people are rushing to use any machine to write their papers. most of them have never used a computer before either much less tell the difference between the three. Well, it seems like the biggest problem we have is people mistaking the Vax's for Macintoshes and PS/2's. If any of you are familiar with the VaxStation 2000, they have 2 half-height drive bays with a GAP BETWEEN the two. So naturally people RAM these 3.5" disks and 5.25" disks into the gap and yell at us lab monitors when the computer won't "read" their disk. ------------------------------------------------------------------------------- These disk drive stories reminded me of the time my brother visited me, and wanted to try out my (then) new Mac SE. He's a dBASE consultant, working strictly with IBM's, so I thought I'd let him do everything on the Mac for himself, so he could see how intuitive and superior the interface is. That's why I wasn't concerned when he said "This disk is a little hard to shove in". I figured he just wasn't used to the 3.5" drives. "Go ahead and push it in", I said. Next, he couldn't eject the disk. And my Dark Castle disk was missing. I used tweezers to pull the disk out, but it wasn't the Dark Castle disk. I was confused, and he was not convinced of the superiority of the Mac. You've probably figured it out, but I didn't know until the next day, when the service guy called me and said that I was going to need a new drive, because someone had inserted a disk, and then inserted *another* disk into the space formed when the tape-cassette-like disk mechanism drops down into place. Then, when we tried to eject the disk, the mechanism tried but failed to move upward, mashing the disks, the heads and the drive mechanism. The tech said he had never seen that happen before. Only $170 for labor and a rebuilt drive! I have had 30-40 people, all much less experienced than my brother, sit down and use the Mac with minimal help. None of them has ever done anything like that. I'm not sure what this means, but my brother still hasn't bought a Mac. ------------------------------------------------------------------------------- Friend of mine was a researcher. He had an AT at home with two floppy drives that he installed himself. There was a tiny space between the two drives. One day he comes home and finds that he forgot to lock the door to his study and his FOUR year old son got in. He sees before him utter destruction, diskettes everywhere. He picks up all the disks and sorts them and is missing four. He asks his son, where are the other four? His son points to the disk drives as any good four year would. But both drives are empty. MY friend tears the house apart looking for the disks, even looking in the other baby's dirty diaper bin. Nothing. A week later he gets a call from his father in law for whom he had installed a small business system. It seems a secretary had lost a disk in the machine. He goes to take a look and sure enough the secretary had stuck the disk into the space between the two disk drives. He opens the system and there is the diskette lying on the motherboard. He rushes home and sure enough his four year old had done the same thing! ------------------------------------------------------------------------------- Before I transferred out of Georgetown, I used to work as repair technician/ customer support guy at a systems integration house in Rosslyn. One morning, a customer called demanding to know why his brand new 3.5" floppy drive wouldn't work with any of his files. After about half an hour of talking him through "drive not ready" messages, I asked him whether the metal slide on the disk he was using might be jammed. "What metal slide?" he replied. It turns out that he and his assistant had spent the previous afternoon cutting their old disks down from 5.25" to 3.5" so that they would fit in their snazzy new drive with four times the storage! ------------------------------------------------------------------------------- Yeah, along similar lines, I used to work at Magnetic Scrolls (UK adventure games house) and we provided Amiga's for the artists to use along with some s/w. We wrote some instructions for them about how to get started along the lines of: 1. insert disk into drive 2. ... We got a call from them "My computers not working, etc...", someone went round to sort the problem out. Lo and behold, they didn't realise you had to take out the disk that was already in the drive. They had forced a second disk in there. Ho hum... ------------------------------------------------------------------------------- How about a user, confronted by a typical "press any key" message, calling tech support complaining... "But I don't have an ANY key on my keyboard!" ------------------------------------------------------------------------------- The best story I ever heard (knowing half of the computer clinic people, I heard a lot of stories) had to be the girl with the Mac disk that had been run over by a truck and folded in half in the process. Undeterred, she unfolded it, took it down to a Mac cluster, and popped it in. Of course it didn't work, and it was so badly mangled that the Mac couldn't even eject it. So she went and complained to somebody, and they got ahold of the guy with the Mac cracker (this was a few years ago and the victim was a Mac Plus). He popped the case and removed the disk. When he handed it back to her, he said, clearly and distinctly, "DON'T PUT THIS DISK IN ANOTHER MAC." So she went upstairs to the clinic, said "My disk doesn't work," and inserted it into the nearest Mac (yet another Plus) before anybody could stop her. And the guy with the Mac cracker had just left... ------------------------------------------------------------------------------- I used to run a small computer shop that provided support to some of it's better customers. I had a secretary calling for several days saying that the computer wouldn't read the disks a few days after she formatted them. I tried all the usual stuff over the phone and finally decided to stop by her office since it was a few blocka away. I took a diagnostic disk with me. When I got there I asked to see the problem with one of the disks. She took one off the side of the file cabinet next to her desk, it was stuck up there with a real cute little magnet. Need I say more?? ----------------------------------------------------------------------------- The other comes from a friend at HP tech support. He had a call from a fellow who couldn't get the 3.5" disk into his machine. He said he'd followed the instructions to the letter where it said to remove the disk from the plastic box first, he was a little ticked off that it had taken 20 min to get that plactic box off the disk and wasn't sure how to get it back on. ----------------------------------------------------------------------------- While working as a consultant at North Carolina State University (Raleigh), I was in one of the terminal clusters one day, minding my own business when a user walked up and asked a question about his terminal. Well, this was a Televideo 910 connected via an IBM 7171 Protocol converter to the campus Iron Pig. This guy's problem was that his screen was so burned in that he could not read any of the 'normal' text, so only the highlighted (bright) text showed up on the screen. To get around this problem, the 7171 allowed special escape sequences to turn off things like highlighting, underline, etc. I explained to the user that he needed to type "Escape, Tick, and a lower case k". I went into great detail explaining where the 'Escape' key was, and that the 'Tick' is one of these -> ` Anyway, as he was walking away, he turned back to me and said, and I quote: "How do I type a lower case k?" ------------------------------------------------------------------------------- Back in my days at UCSB I was responsible for taking care of some VAXen that were shared between researchers and secretaries. One day a particularly crazed secretary called me up with the usual complaint "My computer doesn't work." For some reason these people, supposedly trained extensively in word processing and technical writing, never quite understood that they had a terminal and the computer was a long ways from them and probably working fine. Nevertheless, I went through my standard list of things to try and avoid walking to the secretary's office until I was finally convinced that the terminal was in fact switched on, plugged in, online and the person in question hadn't hit the scroll-lock key. Somewhat dejectedly I went up to the office to find it empty, I sat down at the terminal and spent ten minutes playing with it until I was pretty sure that the keyboard had died. I unplugged it and was carrying it out of the office when in walked the secretary holding a cloth dripping with water. She looked at the keyboard and said "Oh, you're not taking my keyboard are you? I've just spent twenty minutes cleaning it." I suppose some people were just not meant to use computers. ------------------------------------------------------------------------ Me and another guy who work in the same shop have a routine for users who complain about their disks being bad. Here's an example of how it works: luser walks in with a 5.25" floopie muttering and groaning about "all that work and no other copy" luser> My files are all on this disk and now when I put it > in the machine, it says it cant read the disk!!! What > am I going to do? can you help me? gaspo> Well, first of all, wheres the sleve? luser> The sleve? what sleve? gaspo> you know, the little paper envelope type thing it > came in? luser> Oh, you mean the rapper!! I threw that away Ok, so I'm thinking that the disk has been through hell, and there has gota be a mondo big scratch right on sector 0. So, I tell her to let me try it. I put the disk in my machine, and sure enough, its hosed bad. I pull another 5.25 out of the second drive (kinda like the old slip of the hand magician type stuff) and tell her to take it over to Tym's desk and he'll fix it. gaspo> Make sure you tell him that I'm all out of powder. So, while the luser is going over to Tym's, I use some random utils to recover the files, format a new 5.25, copy them over... scene 2, luser arrives at Tym's desk luser> Gaspo said you could fix this. And he said to tell you > that hes all out of powder. What did he mean by that? Tym > Oh, he meant Magic Powder. No problem. I have some right > here. luser> Magic Powder? Tym > You don't have the sleve for this right? Well, when you > don't use the sleve, the bits can fall off. When the > bits fall off, letters start falling off of your files, > and you see, it can get real bad. luser> oh, so you can fix it? Tym > sure!! Tym opens up desk drawer, pulls out small plastic box, and pretends to sprinkle the "magic Powder" on the disk. Tym > Now, take this back to gaspo. scene 3, luser returns to gaspos desk luser> Tym put some magic powder on it for me, will it work now? gaspo> Not yet, i have to say the incantations first. I do some hand waving, "abra cadabra" stuff, and in the process, switch back the disks and labels with the one I replaced her files on. gaspo> Here you go. luser> Thats it? It's fixed? gaspo> yup. luser> But how did you learn how to do all this? gaspo> Hey, I made it to a few classes at college!! One of these days, I'm going to sneak out and watch when they get back to the lab and tell the other lusers how we used the "magic Powder" to fix her disk. BTW, They're right. I do spend to much time in front of a computer. "Tym, let's pick up the chicks and go get some brewhahas!!" ---------------------------------------------------------------------------- I heard a version of the diskettes-stored-on-refridgerator-door-with-magnets story from my very first computer prof, back in the dark ages of punched-card equipment. Seems the customer had a card file of name & address data, used to print mailing labels. She (it's always a "she" in these stories, sigh) called her local IBM service office and complained that even though she had changed the cards to reflect new addresses, her labels kept coming out with the OLD addresses. Turns out she was updating her cards by erasing the old data on the top line, then writing in the new data with a pencil -- no word as to what she thought the holes in the card were for. ---------------------------------------------------------------------------- But we have to realize that stupid users do have their uses. I worked in a lab once, where some of the machines were not covered under a maintenance contract. One day a workstation's screen remained blank when it was turned on and rebooted after being turned off for a while. After the engineers and maintenance people had checked all the obvious things like power supply etc, they concluded "the tube's gone" and abandoned it. After a couple of weeks, when most workstations were in use, a novice walked in, found the abandoned workstation and switched it on. All the engineers around chuckled under their breath as the novice sat looking at the blank screen and waited and waited. Nothing happened. Then this "stupid user" turned round and asked his neighbour, a hardware wizard, "Where's the brightness control, on this thing?" As the chilling truth dawned, the wizard reached round the back and - voila! ------------------------------------------------------------------------- The scenario: A warehouse, with loads of shelves of stuff. A few terminals in the warehouse. When an item is removed from the shelves, the information on its tag is entered into the computer. There's a chart on the wall behind each terminal which has the code to enter for each part. The computer then knows whats in stock. Easy really. So... the system has been working for a while, and then the support team receive a telephone call from them. ``We're getting spurious spaces on one of our VDUs, can you send a chap round to fix it?'' Now, these VDUs are RS232 at 9600 baud, running at about the limit of their range. So, the field engineer expects that its noise on the line, so he follows the routing, unwraps the wire from various hazards (you know the sort of thing, electric motors, fluorescent lights, telephones, the usual problems with long serial wires). Having made sure that the cable is now well protected from external magnetic fields, only earthed at one end and all that jazz, he leaves. And the system is fine. For a week. Then the support engineers are phoned again: ``Those spurious spaces are back.'' So, in goes a new VDU. The wire is replaced. The serial interface card is replaced. The operating system is upgraded (one visit to the sight for each modification). By now the support guys are getting worried. The customer is getting a bit p****d off. Words are being said. You know the sort of thing. Eventually, a field service engineer is sent to watch for these spurious spaces. So he waits near the VDU, watching. The warehouse people take items off the shelf. Most of them know most of the computer codes already, so just go to the VDU and tap in the code. Sometimes, someone looks up a code. And then it happened. One of the ladies comes upto the VDU with several items. Now, she's a big girl (now what I mean?) and she put here little collection down beside the VDU, sat down, picked up each one in turn, and typed it's number in. Then she picked up one she didn't recognised, *leaned forward* to read the number from the list on the wall, and the VDU started beeping. Her bosoms had made firm contact with the space bar.... ------------------------------------------------------------------------- And then there's the MD that I heard about, who had to send a memo to all the WP staff, asking them not to wear nylon panties at work, as the wriggling of the nylon on the typist's chairs created enough static to blow the WP line driver chips up. No-one knows if the MD checked all the WP staff himself..... (MD: Managing Director - very important chap in UK companies.) (WP: Word Processing [I believe thats something like using vi and troff]) ------------------------------------------------------------------------- Then there is finding a keyboard in pieces. Waybackwhen, when the Apple II's and PET computers were starting to come out, in Berkeley, the Lawrence Hall of Science (just above the UC campus and the LBL), a sort of hands-on science museum had strings of them sitting on tables, where people would pay a buck or two to spend an hour in front of them, playing games or whatever. This was just before the idea of personal computers was considered a reality to the laypeople. Anyway, you would have been astounded at the number of people who saw the message "Hit any key to continue", and actually SLAMMING their hands down on they keys! Later, those in charged noticed this trend and spend a couple days running through their software changing all instances of 'hit' to 'press'. ------------------------------------------------------------------------- Now the hardware was "Toytown" stuff... we're talking Z80, 64K of mem and 8" floppies here (and no, that is not a pun on the first part of Jon's anecdote!), but the real fun part of the job was that all of the operators were female, mostly young and mostly single (it was a great job for a young, single guy and I enjoyed it... despite the fact that I wasn't either!). Anyway, the `static' problem that was mentioned did crop up quite regularly. The "classic" situation would have the operator using a swivel chair with castors occasionally zooming over to the printer to pick up her work (one good shove off the desk on the outward `zoom' and a good shove off the printer stand to get back) and then zapping the keyboard when she started typing again. The "classic" cure was to get the floor area sprayed with some anti-static gunk and put some sort of discharge device on the chair. The fun part was always the banter, though. It was always great fun to trot into some company where the M.D. (or whoever) was hopping up and down because the system was down. Operator and F.E. would immediately launch into a colourful and intimate conversation on the pros and cons of various types of underwear, much to the amusement of the other staff and chagrin of management type! All of this, by the way, was brought to mind not only by the posting, but also by an entry in our database which I came across this morning (I kid you not!). The entry was related to a CPU/Memory failure and the symptoms were listed as:- PANIC: PANTY ERROR ------------------------------------------------------------------------ My favorite story from my days selling PCs on the phone was the guy who called up and wanted to order a _slot_. After some cross-examination, he finally explained that he'd bought a bus mouse, and the box said "requires one slot". ------------------------------------------------------------------------ When I was working with a custom software house, we had a client running a vertical-market application I had written who called up with a problem one day. I suggested he edit a certain field on a certain screen, to make a one-character change near the end of the (long) string. I heard lots and lots of typing. It turned out that in over a year of using the program daily, despite the help screens and the nice manual and the personal tutorials, this guy had never figured out what the Insert key was for. ------------------------------------------------------------------------ I had a user ask about how to avoid spacing problems on the Laserwriter, and we advised her to replace indenting spaces with tabs. She then asked how to put in tabs. We told her how to find the ruler, etc. Then she asked, no, how do I put in a tab. I looked at her for a minute and said, "You use the Tab key." The troubles went on. I ended up doing the whole reformatting for her. At the end, one of the resumes printed too high up on the page, so I said, just put in a couple of Returns. Then she asked how to put in Returns. I said, use the Return or Enter key. The sad part was that she actually was an owner of MS Word. She had had a Mac for three years with Word and all that time she had not learned how to use a Tab key, among other things. I couldn't help pointing this out to her, and she smiled sheepishly and said, "Sad, isn't it?" ------------------------------------------------------------------------ Two "dumb user" stories experienced during my time as a student work in a lab at Louisiana Tech University. The lab I worked in had approx 30 3278 terminals hooked to a IBM 4341. 1) A young lady walks up carrying a 5 1/4" diskette, hands it to my compadre (Jerry) and says: "There is something wrong with this diskette, can you tell me what is wrong with it?". Well, Jerry holds the diskette up towards the light and stares at it intently while spinning the disk inside the jacket. After a few seconds, he points to a spot on the disk and says: "There! That bit right there is wrong!". The girl replies excitedly "Thank you!," takes the diskette and left. I thought I would die. 2) The terminals in the lab were connected of course to a 3274 controller which was tied in to the Computing Center via a 9600 baud modem. It was miserably slow. The terminal that was at the desk we used was connected to a controller which was "locally" attached and was extremely fast. One day, while at work, this guy comes back to us from the terminals and asks Jerry (different one!): "Why is this terminal so much faster than the ones out there?" Jerry responds in a non-chalant tone: "I don't know, someone told me it was because the cables to those are so tangled that it takes the bits longer to get to the terminal." The guy responds: "Oh! That makes sense," and returned to his terminal. again I almost died. -------------------------------------------------------------------------- At a large Canadian university in the habit of hiring student operators, the following occurred: One night, a disk crashed. No banshee wail or smoking special effects, but the disk spun down and refused to start again. The operator popped the top, looked at the platter with its dangling ribbons of oxide, and correctly inferred that the disk pack was no longer useful. Probably defective. The solution ? A fresh disk pack, of course. The drive actually almost came up to speed, before the bits of oxide ground into the heads stripped the new pack. The drive spins down, the operator pops the top, and determines ... that this new pack must be defective as well. Two packs later, mercifully, the operator is confused enough to leave the drive alone until the end of her shift. Thank goodness she didn't decide that the first pack was okay, it must have been a faulty drive, requiring the pack to be moved to another drive ... ------------------------------------------------------------------------- Once, in another lifetime, I was the SA for a large Unix system that had removable-media disks. One night, my graveyard shift operator was performing disk-to-disk backups and, in the process of changing a pack, spilled a can of Pepsi into the drive. Not wanting to bother me (it was about 2 AM), she got a rag and wiped up all the Pepsi she could. She then closed the lid and spun up the disk! Upon running the backup program, she got bunches of write errors. She deduced that the deluged pack was bad and she should try another. Five disk packs and two disk drives later, she developed a headache and went home early, leaving me to walk into a maelstrom of confusion and irate users the next morning. When she got in to work that night (I stayed late to wait for her), I asked her to read to me the big sign hanging on the wall: "No food or drink allowed in the computer room." ------------------------------------------------------------------------- The office where I work had just set up this particular user with a word processing system costing quite a bit of money. Now my opinion of this particular person will not be detailed here, but it isn't pretty. He had initially wanted us to spend $30000 just so that he could have the WP he was used to, then latched onto WordStar 4 and was resisting all attempts to move him to something more supportable. He had screwed up so many things that at one point I was visiting him as a matter of course just to see what he'd managed to foul up that day, as were two other members of the staff here. Then, one day as I was off to grab a can of drink from the vending machine, I saw him coming along the corridor with a determined look in his eyes. I briefly entertained a regressive fantasy about running like hell and hoping that he had not seen me, but supressed it with and just stood still whilst he caught up with me. "My computer is making a funny sound and it isn't even turned on," he said. "Uh, huh," I replied, as neutrally as I could after a statement like the one I had just heard. "What sort of sound?" I was wondering if he actually meant that WordStar wasn't loaded, or if the sound was coming from the Silentwriter attached to the computer, or if the men in white coats had finally arrived to take him away. "Well," he replied, "it's sort of a burr, burr noise. Could you come and have a look at it?" Fortunately, the office wasn't far away, and we went there. I looked at his computer. It was indeed turned off. I looked at the printer. It too was off. I looked at the telephone, reached over and hung it up, and the noise ceased. He had been listenning to the dial-tone. ------------------------------------------------------------------------- I was working on a dBASE programming project. The senior programmer on the project, Bill, a very competent guy, was taking a beginning C class at night to expand his job skills. He asked me for help on his latest assignment. Bill was to write a C program to reproduce a decimal-to-binary conversion table that he had been handed. That seemed straightforward enough, so I quickly wrote down three or four variant loops that would produce the desired results, and a couple of print statements. I answered a few questions Bill had about the various techniques I'd used. His last question was: "Is there some pattern or algorithm to explain which binary numbers correspond to which decimal numbers, or are they just randomly assigned?" ------------------------------------------------------------------------------ There is a story going around HP about a site that kept having unexplained disk crashes. Tracked it down to a security guard would would hide out in the computer room in the wee hours of the morning. He was a smoker and ignored all the "No Smoking" signs in the room. He found that if he held his cigarette close to the disk fan intake that none of the smoke ever made it into the room. ------------------------------------------------------------------------------ Then there were the school kids who discovered that they didn't have to go outside to clean the erasers. They would hold them next to the fan intake on the IBM-PC and it would suck up all of the chalk dust. ------------------------------------------------------------------------------ There was a good story told to me by a DEC guy. The fellow in question is kind of a metafield-service guy; if he can't fix it, it don't get fixed. There was this terminal server that would mysteriously die at 2:00 AM every morning for about 15 minutes, at which point it would come back up. They tried everything to fix it, but they could never figure out what was going wrong and nothing they did worked. So they decided to watch it one morning. At 2:00 AM, the janitor came in, unplugged the terminal server, plugged his vacuum cleaner in, vacuumed the floor, and then plugged the terminal server back in. Problem solved. ------------------------------------------------------------------------------ At a big unnamed DEC site in an unnamed previous decade, we started to get mysterious disk faults and crashes on three -- just three -- of our nodes. Always at about 11:30pm. A few times when the systems people would come in to look at it, some packs would be offline. Operators were questioned but knew nothing. Sabotage was considered somewhat unlikely as the nodes were by no means heavily used or commercially very important. Finally a couple of us decided to stay late, order a pizza and hang out in a corner of the computer room, down at the end of the aisle. We wanted to see what happened at 11:30 pm! The AC plant was right behind us so we could hardly hear ourselves think -- and the pizza got cold before we could finish it! -- but our patience was rewarded. At 11pm the night janitorial shift showed up, drank coffee, shot the bull with their buddies on afternoons, etc. At about 11:30 they got their equipment and started to work. Into the computer room toddled this old guy, looked about 70, pushing one of those huge plastic trashcans on wheels. No carts or cans were *allowed* on the floor, of course, but this guy never got the message. He took the nearest shortcut, DOWN our aisle. The trashcan barely fit between the rows of machines. The handles stuck out to either side... JUST at the height of the disk pack front panel switches! The can wobbled down the aisle, bumping into machinery on either side every so often. >>>POOK<<< there went a disk pack! >>>PUNK<<< there went another one! We could see the red SYSDOWN lights flicker on across the room by the time the janitor reached our end. He never understood why three young computer jocks were giving him a standing O! :-) We solved the problem by shoving the disk cabinets at either end of the aisle FORWARD a few inches, so that humans could still get through but big trashcans couldn't even get started. ------------------------------------------------------------------------------ In a computer room, somewhere in New England, they had a mysterious problem. Occasionally, late at night, the system would crash. No explanation, many visits from service reps, etc. The usual stuff. Since the 1's and 0's flowing through this particular facility represented dollars and cents, there were security cameras in the computer room. Each day's activities were recorded. One morning after a crash, the D.P. manager resorted to watching the tape from the night before. What he saw was a very tired, bored night operating repeatedly pressing the "." key on the main console. When the screen filled with ".............", the operator would press "Send". Buffer over-flow. Reserved memory over-written. A few minutes later, the O.S. does the wrong kind of call, and... ------------------------------------------------------------------------------ It seems that a big mainframe facility (IBM I think) started having tape problems suddenly. This after several years of virtually no problems. Something was corrupting tapes, which resulted in bad record errors on reading. Not every block read bad, just every so often. At first they suspected media, so they studied it awhile and found that no one tape brand was involved. Further, you could rewrite a bad tape and it never failed to work perfectly. It ALWAYS worked on a rewrite. There was no pattern apparent in the tape drives involved. All units were checked repeatedly and no one could ever force the problem to occur. They starting logging all tape operations and finally did determine that one particular (very experienced) operator seemed to have removed all the tapes. Yet none of the tapes had ever failed on his shift. So they started watching the operator, but could see nothing going on there. He was competent, experienced, and not the sort of person who would do harm to data. He was the very model of efficiency, in fact. What corrupts a tape, but not permanently? Further analysis of a bad tape showed the bad records fairly periodic, but not perfectly so. They seemed to slowly become more frequent toward the end of the tape. Still, they looked and looked at the tape units, for something which would corrupt a record every few feet. It almost HAD to be mechanical. Some roller, some guide or capstan, or something that rolled along the tape. But something that didn't track the tape perfectly and somehow tracked a little differently as the tape was used up. What was it? Further analysis looked at bad tapes of different density. Guess what! The higher density tape had porportionately FEWER errors. Man, it HAD to be something mechanical doing this! But still, the bad records got a little closer together as the end of the tape approached. Finally, they got out the magnetic viewing fluid and actually looked at the magnetic patterns on the tape. Sure enough, every few feet, there was a tiny disturbance, on one edge, which was corrupting a record. They measured the distance between several of these bad spots in several tape positions and plotted them and got roughly a straight line slope. After much head scratching, someone saw a correlation: with the tape on the reel, the bad spots were in a line approximately radial to the reel. A ha! So they starting looking at reels and spent a good deal of time on that, with NO mechanism discovered. All the reels were just fine. None were measureably magnetic. Thereupon they watched the operator, and finally figured it out. On changing a reel, the operator would sometimes temporarily set a reel on the top edge of the drive door, which, strangely enough, had a magnetic strip catch on it. The strip, well out of the way of any tape inside the unit, was magnetizing the tape along a straight line where its edge was exposed by the hole in the side of the reel. Since the operator didn't always set the reel up there and it wasn't often in exactly the correct position to expose the tape to the strip, it happened very seldom. And, the reel involved was always from a tape being removed, so the problem was not seen until the reel was again loaded and read. Of course, setting a reel on the top edge of the tape unit's door was never a recommended procedure. But, supposedly, this little "manufacturing design defect" in the drive was later corrected. I thought it was a pretty cute story, which is probably pretty badly corrupted in the re-telling. I heard it first about 8 or 9 years ago, when I asked someone about the magnetic viewing fluid.. ------------------------------------------------------------------------------ Brad Allen's collection of "Hacking Horror Stories" from 1982. These are the responses to my ARPANET post of 27 October 82. If reading any of the stories below inspires you to send your own, please do! I will continue to update this file as long as macabre tales of men and their machines continue to come in. - Brad ------------------------------------------------------------------------------ Date: 27 October 1982 1917-EDT From: Bob Colwell at CMU-10A When I was finishing my Master's here at CMU, we were using a PDP-11/45 that was showing incipient senility. One week before the final demo, the RT-11 monitor stopped powering up properly and instead took to halting the machine at some incredibly non-obvious spot. This was not acceptable performance, so we scratched our heads faster and faster for about two days trying to fix it. Finally, in desperation, we single-stepped the RT-11 boot sequence, and found that it was doing a memory check that it believed was failing. It then tried to jump to a "memory check failed" diagnostic that it expected to find in memory, which of course was not there. What was there, however, was a random collection of bits that just happened to look like a jump to the original totally bogus location that we could see on the lights of the front panel. (Incidentally, we could read and write the supposedly bad memory location using the front panel). The solution? We powered up the machine with the halt switch asserted. Then we loaded in a "Return from Interrupt" instruction where the random bit collection was. Presto. By the way, until this problem occurred, we were competing for use of the 11/45 with two other groups of students. Since they all gave up when this difficulty hit, we had sole use of the machine until it got officially fixed. ------------------------------------------------------------------------------ Date: 27 Oct 1982 1708-PDT From: Dave Dyer On a tops-10 system I was responsible for, I made a typo installing a bug fix to the monitor's file system code. The result was that for several days (until the file system began seriously degrading) a randomly selected physical block of the disk was written with a copy of the retrieval information for the system's accounting files. Another, we had installed a new memory box, which unknon to us was responding with the wrong word once in 10^8 or so operations. We ran with this flake for about a month before the bit decay was tracked down to the culprit. At that point, EVERYTHING that had been done during the bad time was "possibly" damaged, and quite a few were in fact damaged. It took about a year before the last artifacts of that episode were filtered out. ------------------------------------------------------------------------------ Date: 27 October 1982 20:40-EDT From: Peter Szolovits My first paying programming job was to convert some FORTRAN programs from the 7094 to an IBM 360 in 1966 at UCLA. Some of these were unbelievably hairy (doing memory management within Fortran, character manipulation before there were characters in Fortran, etc.) and obscure (some of the code was in fact Fortran II code that first needed conversion to Fortran IV). The real horror was that my predecessor had been taken away by the men in the white coats, and lived in a mental hospital; so there really was no way to get any additional info on much of this code, and I had a graphic example of where my job led. ------------------------------------------------------------------------------ Date: 27 Oct 1982 20:30-EDT From: James.Gosling at CMU-CS-VLSI at CMU-10A Several years ago I was doing some development work on a compiler for a language like Pascal. And like most Pascal implementations, the compiler was written in the same language and was used to compile itself. It was broken into many modules. To make a change to the compiler I would just recompile the affected module and link it back in with the rest of the modules. At some point, I took one of these test versions of the compiler and replaced the production compiler with it -- it seemed to be just fine. In fact, it was fine for quite a while. So long that this new version got onto the backups and all of the backups of the production compiler were lost. There was also the problem that the old production compiler couldn't have compiled the new compiler anyway, since the language had changed quite a lot. Well... In one of the modules that had never been through the new compiler was a piece of code that tickled a bug in the code generator. The bug was a cooperative one between one of the new pieces of code and one of the old one. What I ended up with was a compiler which I couldn't recompile because fixing the bug involved compiling a module that tickled the bug. Because of the circularity in the compiler (that it compiled itself) I was up the proverbial creek without a paddle. There was no way that I could recompile or shuffle anything to fix the beast. All backups were either of the broken compiler or had been overwritten. The solution was incredibly messy: I spent a long time doing intensive octal surgery on the object modules that I had. This was made very difficult because there was essentially no information left around to correlate program text to compiled code and because the bug caused bad code to be generated in many places. ------------------------------------------------------------------------------ Date: 27 Oct 1982 2231-EDT From: Larry Seiler My most painful bug was a simple uninitialized variable (I had moved the initialization statement to a position after the first reference). This variable was a pointer, and its position in the call stack just happened to contain an address in code space. So running the program caused certain instructions in a different procedure to be changed into loops, with bizarre results. Loading the debugger caused the program to work correctly, by tranferring the target of the modification into an unused part of the debugger (I think). Even after I discarded my innocent assumption that the code I wrote was the code that was being executed, I still had to guess what routine was writing to code space (and by what mechanism). Total time required to fix the bug: 8 hours. How embarrassing. Why am I telling you this? Well, why not? ------------------------------------------------------------------------------ Date: 28 Oct 1982 0012-MDT From: JW-Peterson at UTAH-20 (John W. Peterson) In trying to learn the graphics/animation biz, I've run into a few. In making some films this summer I wound up working strictly at night, to help prevent any light from entering the room. The filming had to be completed entirly over the weekend, so it would not interfere with normal bussiness activity (like turning the lights on...). Worse yet the old Bolex I was using had no way for the computer to trip it's shutter, so I had to manually press the cable release every time the computer rang the terminal bell; for several hours at a strech. Some other animation stories: Before color graphics CRT's & framebuffers were invented, the poor filmmaker had to sleep next to the camera. When the bell rang, he would wake up, change the color filter wheel to the next primary color, backwind the film all the way, and go back to sleep... Perhaps best of all is Jim Blinn's "Korean Janitor" movie. During the creation of the DNA sequences for "Cosmos", they decided to let the camera run over night, with the computer tripping it every several seconds. So the locked up the room and put a big "Filming in process: Do Not Enter" sign on the door. Unfortunatly, the Korean janitor could not read the english sign but DID have a pass key. The resulting film shows a DNA molecule twisting in space, a flood of light, and then a jerkey sequence of the janitor cleaning the room at 200mph, seen as a reflection in the screen. ------------------------------------------------------------------------------ Date: 28 Oct 1982 1054-EDT From: Geoffrey H. Cooper This is our favorite "what happens when people are taught higher level models before the lower level ones" story. I get this second hand, so some of the details might be a little off. It may not be of the sort you had in mind, but it's amusing enough to bear repeating anyway. Around here, we teach a course in software engineering in which the students are taught and write programs in CLU (a language which lets user defined abstractions work the same way that the language defined ones do). One common final project for the course involved writing an assembler in CLU. The problem statement required that numbers be input and output in octal, rather than decimal. Most of the students, I am told, defined an OCTAL abstraction, with all the normal integer arthmetic operations, and with Parse and Unparse operations that converted strings into OCTAL's and back again. This was implemented by representing an OCTAL as an array of integers, each of which represented an octal digit. The arithmetic operations simulated octal arithmetic on this representation. None of the students was apparently aware that the normal integer data abstraction that they had been using was really just stored as bits, which were more easily converted to octal than decimal. ------------------------------------------------------------------------------ Date: Thursday, 28 October 1982 10:57-EDT From: Jon Webb Well, here it is: I was working as an undergraduate programmer at my undergraduate university, and I basically had the run of the time-sharing user interface (it was TSO, on an IBM 360/65). I decided it would be nice if you could edit lines you'd typed, like the facility in the C-shell on unix except more primitive. Well, it was a pretty trivial change to allow this, but unfortunately to be effective the change had to be installed in the system, I couldn't test it in advance. So I installed it one night, and TSO wouldn't work anymore. Very embarassing, especially as the backup method I thought would work didn't. In fact one of the systems programmers had to be called in to fix the system, in the middle of the night. I gave up on editting in TSO. This is an argument for personal computers. ------------------------------------------------------------------------------ Date: 28 Oct 1982 08:55:57-PDT From: CSVAX.bitar@Berkeley I was working late one night developing a file under the Unix operating system. I was in a hurry at one point, and wanting to rename the file, I executed the unix move command. A moment later Unix complained of indigestion, and I noticed that instead of typing 'mv oldname newname', which is Unix's way of renaming a file, I had typed 'rm oldname newname'. So Unix had executed 'rm oldname', then run into newname and vomited. I nearly did the same. Fortunately I did have a backup copy of the file, which I subsequently re-editted, bringing it up to date. After that incident, though, I was very careful about slight cognitive mistakes, such as thinking 'move' (mv) and typing 'rm' (remove) instead. ------------------------------------------------------------------------------ Date: 28 October 1982 1155-EDT From: Robert Frederking at CMU-10A Yourdon's book on software engineering has a few of these. Most of my really horrible experiences happened due to politics or manufacturer's screw-ups. (Example of first): CWRU was building a network, and had to pick between DEC and Harris computers (Harris won because one of their VPs was a trustee at CWRU - they were clearly inferior machines). Besides teaching their staff how to program, we had to constantly show them that feature X was broken, and how to fix it. The project finally collapsed due to their crufty machines. The operating system was *not* virtual memory (altho user space was), and while adding networking software to their OS, they ran out of room. "Sorry". (Example of second): in trying to microprogram Intel's hack-of-a- bit-slice-machine, you had to fit your instructions into a 2-dimensional address space! Some instructions could only branch in rows, others only in columns, yet others only to specific clusters of locations. It was clearly a hack to cover running out of instruction bits. They even had to sell a program designed to find a fit for your microcode to the available space (I think the problem is NP-complete - 2d bin packing). The best example is the interupt disable instruction on the 6800. If the least significant bit of the *preceding* instruction is 1, the whole processor hangs when you try to disable the interupt. Also, some of the illegal opcodes (which aren't masked out) will cause the processor to hang so badly, it can't be reset. You have to turn it off, and wait for the dynamic RAM register to fade out! ------------------------------------------------------------------------------ Date: 28 Oct 1982 14:47:27-EDT From: David.Cunnius at CMU-CS-SPEECH at CMU-10A The old 15-311, Software Engineering Methods, will probably be one of the more fertile sources of horror stories. The semester I took this course, Spring '80, one of the tasks was a database implementation for a science- fiction wargame. Looking back now, I think our project group was doomed from the start. Of the original five-man team, one dropped the course before anyone else even met him, one had to take some time off to deal with a family crisis around mid-term, and one simply disappeared for a period of three weeks, coming back without even a memory of where he'd been. Despite all that, we did get something together for the final demo. We were using a modular design and had divided the task into thirteen subtasks. At the demo, four of the thirteen modules worked properly, two that had tested out perfectly the previous day didn't work at all at the demo, and most of the other seven hadn't even been coded yet. Of the four modules that worked, the most impressive one was the display package; unfortunately, that was also the only module which was optional in the original specification. Two of the members of the group somehow managed to pull 'D's as our final grade; to this day I haven't had the nerve to ask the other two what their grades were. ------------------------------------------------------------------------------ Date: 28 Oct 1982 1318-PDT From: Bob Bandes As a senior project when I was going to school at UC Santa Cruz, I put together a real-time voice controlled operating system. The entire thing was written in assembly language on a PDP-11/32 running RT11. Since this was a single user system with a fixed disk, it was necessary to make a tape backup at the end of every session. Well, after one particularly furious day of hacking, I decided to write my backup tape and go home for the day. My normal procedure was to mount my backup tape and use ROLLIN to copy an entire disk-image to the tape. Unbeknownst to me, the procedure that I used had the effect of first initializing the tape before making the backup. This had always worked just fine. But on this particular day, I had been working on my disk I/O routines and apparently had somehow managed to write garbage on some unknown portion of the disk. I had no idea that anything was wrong as I went to make my backup tape. As usual, first the tape was initialized, then, as ROLLIN began to write the disk image, the program hung! There I was with no backup tape and having major problems making a backup. My next move was to panic. After settling down somewhat, I tried rebooting the operating system and making the backup again. Still the same problem. Then I remembered about the DECtape drive on the machine. If I could only find a DECtape and manage to individually tranfer the files that I needed I would be home free. I ran over to the cabinets and began frantically looking for DECtapes. AHA! I found one! As I ran back over to the computer, I took a bounding step and landed on the side of my ankle. I proceeded to lie on the floor writhing and screaming in agony for the next fifteen minutes. "This just isn't my day," I was saying to myself. When the pain began to subside I tried to get up. I couldn't walk on the ankle since it hurt so much. So I hopped over to the DECtape drive and mounted the DECtape. Then I hopped over to console and sat down. At least something went right that day, as the machine allowed me (without hanging) to individually transfer all my files to DECtape. I then read a clean version of the operating system onto the disk and proceeded to tranfer all of my files from DECtape back onto the disk. This time all went normally with the magtape backup and the world was safe again for future hacking. Fortunately my ankle wasn't broken. It was only severly sprained. For the next few weeks I was forced to do my hacking with an ace-bandage wrapped around my ankle. ------------------------------------------------------------------------------ Date: 28 Oct 1982 20:26:51-PDT From: Kim.norvig@Berkeley Lucky for me, most of the stories I remember are happy ones, not horror stories. My favorite story about someone else is when Jim Meehan was writing TALESPIN, his AI program that generated stories, mostly about birds and bears running around the forest. One story started off fine, then started to slow down, and finally ended with the line Joe Bear thinks that FREE STORAGE IS EXHAUSTED Oh well, @b(I) thought it was cute. ------------------------------------------------------------------------------ Date: 30 Oct 1982 1635-EDT From: RG.JMTURN at MIT-OZ at MIT-MC The experience that still makes my skin crawl is the time I was debugging some Lisp Machine board at the MIT AI lab. I had spent several hours trying to isolate a noisy signal which seemed to be tied to another one, but I could not find a common wire and I had replaced all the common chips. In desperation, I pulled out the the board and yanked the extender, about to give up hope. As I stared down at the extender, I muttered some curse to the designers of the machine...and noticed a solder splash on the extender shorting two lines! For ghu's sake, if you can't trust your tools, what can you trust. On the other hand, for an example of the other extreme, this week, I was in Montreal doing an installation for Lisp Machine, Inc. A crufty Bus Interface seemed to be making the machine go 1/2 speed, and sometimes fail entirely. The person I was working with and I decided to call it a day around 5, and go to our hotel. When we came back the next morning, the machine worked perfectly. The best we can figure it, the machine wanted us to be able to have a night in Montreal, and the afternoon the next day... ------------------------------------------------------------------------------ Date: 30 Oct 1982 03:44:28-PDT From: CSVAX.fishkin@Berkeley My name is Ken Fishkin, and I'm a grad at Berkeley. My most painful hack occured while hacking a 6K line C database program at the University of Wisconsin-Madison as an undergrad. My program worked perfectly, with all debug prints on. When I set my 'const' debug to false, however, the program would crash! To make things even more fun, if I deleted 1 debug print the program would still run correctly, but if I deleted another instead it wouldn't! I wound up doing a sort of tree traversal, individually deleting some 200! debug prints individually, finding the proper sequence of delete-compile-delete that would keep my program intact. To this day, I still have no idea what was wrong with the program. ------------------------------------------------------------------------------ Date: 2 Nov 1982 1128-EST From: MASON at CMU-20C Many roboticists have reported the following demo problem: when filming or demonstrating, we often raise venetian blinds, turn on the lights, or bring in floods. The increase in ambient light may cause optical-interrupt type sensors on the robot to stop functioning, and the heat from floods may affect other components of the system. Thus a system which has functioned flawlessly for months begins to malfunction the very minute the generals arrive. Real-time programming has its special frustrations, but the most difficult bugs arise from difficulties in the timing of process interactions. Most of these are too complicated to make good stories. One of the most confusing PDP11 bugs I had may be worth telling. When a byte is pushed onto the stack, the stack pointer is first incremented to keep the pointer at word boundaries. Hence the odd byte is garbage, left over from no-longer-active stack frames. I had a program which pushed a byte, but popped a word, thus accessing this garbage. Even careful inspection of the code didn't turn up this violation of stack discipline. The worst part is that the manifestation of the bug would vary depending on which process last used the stack. In particular, the bug became invisible when single-stepping with our symbolic debugger---the debugger (im)providentially cleared the relevant byte in the act of saving some registers. This reminds me of another PDP11 bug. Our 11/40 had a micro-code error. The SOB instruction (subract one and branch, used for simple loops) didn't test the TRAP bit, which is used by debuggers for single-stepping. Hence, when single-stepping, the programmer was not shown the instruction following the SOB. It was executed "in secret", with very confusing results. ------------------------------------------------------------------------------ Date: 2 Nov 1982 17:19:35-EST From: jfw at mit-vax at mit-xx Two summers ago, while I was working on an improvement to our UNIX at LL-ASG, I fired up a test version a little too fast, and watched with puzzlement as the filesystem check program started printing out random things. I wound up killing a 100Mb filesystem full of useful things. After 2 weeks of poring over the code I wrote which did that, I found the bug: " = " instead of " |= ". One character did all that... ------------------------------------------------------------------------------ Date: 4 November 1982 0036-EST (Thursday) From: Mark.Sherman at CMU-10A As an undergrad I worked as a systems staff on a time sharing system that resembled Multics (called DSL/TSS - think of it as Unix on HP21 series machines). On such systems, the login program is like any other program; when a user sits down he "calls" this program from a predefined file system path to gain access to the system. For some unrememberable reason, I had to make some modifications to this program, did so, and installed the new version. The only real way to try this program out was to log out and then log back in. Having logged out, I tried to log back in. To my chagrin, I had accidently set the protection on the new login program to read instead of its normal read-execute. Thus the system refused to run the login program. By S.O.P., this would not be a problem - when doing such a drastic change, we always made sure that at least one other systems programmer was logged in so that he could patch anything that was necessary, like changing access control on the login program. Before my attempt to change the login program, there were two other systems programmers logged in. After my mistake, I walked over to the two other staff people only to find that they had both logged out - after all each knew that the other was logged in and so saw no reason to stay on as the "protection". Thus there was no way to log into the system and no way to patch it while it ran. We had to move the system to a spare disk, boot a backup system, bring up the extra disk with the file system containing the bogus protection as a "raw" disk and use a special disk utility to set the one necessary bit giving execute access to the login program. ------------------------------------------------------------------------------ Date: Thursday, 4 November 1982 01:39-EST From: Skef Wholey CMU's 15-311 is indeed a source of horrors, and I experienced a rather horrible in that class last year. There were five of us in our group, which we called "SPAM", each of us competent hackers. Our project was a 68000 simulator and debugger, which would run 68000 machine code and let you look at registers and memory and so forth. Our work progressed on schedule (with the aid of many all-nighters), and we were able to run simple assembly language programs just about a week before the demo. Being a rather noisy bunch, wanting our demo to be as slick as possible, we decided that we'd run a backgammon program written in C compiled with cc68. We had used small programs compiled with cc68 to test the simulator. The programs were small enough to compile and assemble on a Vax, print the hex object code, and type it into file which we would load into our simulator. The backgammon program was too large for this, obviously, so the object code was FTP'ed to another machine, put on tape, and brought to the Computation Center, where we pulled it off of tape and loaded it into our simulator. The program didn't work. It didn't work the day before the demo. We found a few bugs in our simulator, but worst yet we found bugs in the cc68 compiler, now N machines away. Fixing these we found bugs in the game playing program itself. Compiling the program on the Vax and transporting the object code was out of the question at this point -- too little time left before the demo (we had all announced that we'd appear in coat and tie). So we ever so carfully patched the hex files, and voila! The program ran beautifully. That year Comp Center gave each undergrad who needed a computer account an account on each undergrad machine (TOPS-D and TOPS-E). These machines were on Comp Center's DECnet: not a reliable network at that time. We had the current version of our system and the patched hex files on TOPS-D, because the load was lower there that night, but were scheduled to demo on TOPS-E terminals. DECnet was, of course, down for quite a while, but finally came up. We quickly transferred the current system to the E and ran back to our rooms or homes to shower and dress. We marched triumphantly into the terminal room and sat at our terminals while our SPAMmascots fed cookies to the waiting crowd and our professor. The system came up fine, and we demonstrated how to deposit into and read from memory and registers before moving onto the demo programs. We loaded the hex files, set breakpoints at our test locations, and lo! IT DIDN'T WORK. We were all somewhat bummed and embarrassed, and managed to muddle through at the mercy of this mysterious adversary that had destroyed a system that worked an hour before. The professor suggested that we get our act a little more together and have a somewhat less flashy demo in his office a few days hence. The problem: we had neglected to copy the patched hex files from the D to the E. We were demoing buggy 68000 code. The second demo went a bit better. We now laugh about the first. Comp Center no longer gives out accounts to one student on more than one machine. Good idea. ------------------------------------------------------------------------------ Date: 4 Nov 1982 8:36-EST From: Ed.Frank at CMU-CS-VLSI at CMU-10A While working on the software for a Graphics terminal we built at Stanford, I ran into the following problem. The software was written in assembly language, and was burnt into EPROMS. For a long time the software easily fit in four 2708 (1K x 8) EPROMS. Well, one week after adding the graphics support code to the terminal, I simply could not get it to work. I spent literally dozens of hours going over at most 500 assembly language statements, to no avail. Things were so bad in fact that I seriously began to question my abilities as a programmer. One evening while I was checking the output of the assembler (for at this point I was convinced it was an assembler bug) I noticed that that one of the target addresses of a jump was greater than FFF (hex). I didn't think anything of it, until a few seconds latter when it occured to me that addresses > 4K required 5 proms. I quickly went back to work, burned the extra eprom, and the program worked perfectly! ------------------------------------------------------------------------------ Date: 4 November 1982 0955-EST (Thursday) From: Richard.Korf at CMU-10A (C410RK40) My favorite bug of all time concerned an ASR35 Teletype. I was trying to format some output and found that directly after printing a long line, the second line was indented by one space. Naturally, the bug went away when I ran the debugger. It finally turned out that the printing head was physically bouncing off the left hand stop. If it didn't have to print again immediately, it would have a chance to settle back to the beginning of the line. ------------------------------------------------------------------------------ Date: 4 November 1982 1134-EST (Thursday) From: Steven.Shafer at CMU-10A (C410SS40) I had a nasty experience with an old PDP-11/40E running UNIX. I had written a program which juggled several processes, one of which was the largest core-image of any program in existance on the machine (<64K, of course). One day, it died a sudden death. I started tracking it down with print statements. At first, the problem looked like something being set to 0; then, as I added more debugging code, the 0's jumped around. I never knew which routines they would crop up in, or whether global data structures were affected, or even if code itself was being overwritten. Sometimes, the program would die even though the debugging code showed nothing extraordinary. I eventually gave up and rewrote the program from scratch, using smaller processes and succeeding. Several months later, a paging bug was fixed: it was responsible for writing 0's on pages when the core-image of a process was beyond a certain length. What makes this a horror story is a UNIX vagary tickled by the bug: within the code being executed, there was a statement to close a file. The file, like all UNIX files, was indexed by a small integer. When the zeroes struck this variable, the effect was to close file 0, i.e. disconnect the keyboard! So, not only did the program die, but it refused to talk to me long before the actual moment of death, leaving me to watch helplessly as it writhed in agony, unable to talk to it, unable to interrupt it, and never knowing where the Flying Fickle Finger of Fate would strike next! ------------------------------------------------------------------------------ Date: 4 November 1982 1411-EST From: Ellen Lowenfeld at CMU-10A This one's kind of embarrassing, looking back on it... When I was a sophomore at Brown, I took a course which had a big project, I guess like 311 here, except that the groups were pairs. So that I and my partner could test pre-compiled code separately (IBM 370, batch mode) we each had a dummy main routine. Mine printed its name, and then called whatever routine(s) I wanted to test. Unfortunately, I left out the quotes around its name, and sent it into infinite recursion. IBM's great error message once I found it after looking in 3 manuals, and poring over pages of IEFH01X (or something like that), was "user error". Not until I had spent most of a day looking for a wizard did I go back and just look at the code I had written. Was my face red when all the people I had talked to while trying to find out the problem asked what it turned out to be! ------------------------------------------------------------------------------ Date: 4 Nov 1982 13:09:55-EST From: Neil.Swartz at CMU-RI-FAS at CMU-10A Several stories come to mind. At Princeton, they had WATFIV on a 360/91. You got 2 seconds of computer time and 600 lines of output. One job came out in WATFIV that printed a line of characters and then overstruck the characters again and again. The computer counted this as one line so it would do this forever. The print heads tore through the paper, the ribbon and started in on the carriage. The system was down for more than 12 hours. Another good one which I have heard about- (If anybody knows more about this I would like to hear about it) The Phantom Teletype Program. The way it worked was this: At a random time interval the program would start up and pick a teletype on the system. It would print "The Phantom Teletype Strikes Again!!" and then it would copy itself somewhere else on disk, set up the parameters for its re-execution, and delete the old copy. System programmers could find out where it had been, but not where it was currently. Because it was too difficult to track, they left it on the system. ------------------------------------------------------------------------------ Date: 4 Nov 1982 1538-CST From: CMP.LSMITH at UTEXAS-20 My first hacking horror story goes back to my very first programming course. My program kept exceeding its time limit and aborting. I checked my code carefully and decided it was correct, but only needed a little more time to finish. So I confidently upped my limit from 7 seconds to a CPU minute of CDC 6600 time. I was really horrified when it timed out again, blowing my entire semester's allotment. A sharp consultant found my bug. I made the FORTRAN equivalent of "FOR X = 1.0 BY 0.1 TO 10.0," with my final test an equal. Since 0.1 is a repeating fraction in binary, it never equaled 10, so it went past and on to infinity. Years later I was working on a PDP11/45 Unix system. The system began crashing some time after we retrieved something from the backup tapes, using Unix's raw mode access to the tape. In cooked mode, things worked right, so we knew it couldn't be a hardware problem. After some months of trying to debug the problem, we modified the tape device handler so that it spun and monitored its registers until the transfer completed. One of the high bits in the address register was sticking off. In cooked mode, Unix read into its system buffers in low core and everything worked because that bit stayed off anyway. In raw mode, it read into user space directly. Whenever the address register was incremented past that bit boundary, the DMA transfer would drop down and wipe out some random locations and the system would slowly collapse. The worst horror stories are when you spend days hacking at a program, only to discover that you've invoked a compiler bug. We are extremely fortunate to have the ELISP system. I had a problem with a lengthy computation sometimes returning NIL from compiled code. Between the (RETURN RESULT) in the called function and (SETQ X (CALLED ...)) in the caller, the value was being lost. Interpreted, it worked. If I traced the function, it worked. If I traced any function in a chain below it, it worked. It turns out that if you have a chain of calls about 10 deep, then a MAPCAR over a list of at least 3 values, then about three more calls down, and all the functions are compiled, then the time bomb NIL is stuck up on the stack. If any function in the chain is interpreted, for example by tracing it, then the behavior goes away. As far as I know, this bug still hasn't been found. ------------------------------------------------------------------------------ Date: 4 Nov 1982 20:08-EST From: Victor.Milenkovic at CMU-CS-IUS at CMU-10A One version of the PL/I debugger at Yorktown had no provision for displaying the hex values of pointer variables. However, it would, on request, display the hex address of any other type of variable, as well as its value. And so, in my program, I would create records, containing a single float variable, based at the pointer I wanted to see, and recompile. By requesting the address of these records, I could determine the value of the pointer. In PL/I one can allocate an area of memory and declare offset variables into it. One can freely assign offset variables into pointer variables and back again -- or so I thought. If a pointer to offset assignment results in a negative offset, nothing complains (although it should), but if one assigns the offset back to the pointer, it gets garbage. This peculiarity caused a very tenacious bug. ------------------------------------------------------------------------------ Date: 4 November 1982 2210-EST From: eddie caplan at CMU-10A i was doing research in the computer music lab. i was trying to generate emotional responses in subjects by producing sympathetic vibrations from the 64 loudspeakers surrounding the listening room. normally, we would add sub- and ultrasonic frequencies to classical "standards", and then play them to the subjects. now, usually we just use frequency modulation to synthesize the instruments of the classic orchestras. but one day as i was making an undergraduate volunteer retch to beethoven's seventh symphony, a thought struck me. if i changed to additive synthesis for the instruments, i could elicit REALLY BIG responses! i mean, i had been having pretty good results up 'til then, and i wasn't complaining. but, with FM there was lots of data lost. additive synthesis would make the music itself generate an emotional response. full fidelity beethoven combined with me could convert hasidic jews to catholicism! so, i spent the next week redoing the beethoven. i finished at 2:30am, and the only other person around was my officemate, dana. i asked her if she had heard beethoven's seventh recently. i told her that i had a recording of boston symphony conducted by klaus tennstedt. i still remember her eyes lighting up at the prospect. i hated to lie to her, but she couldn't be told the truth or the data would be tainted. i had to expose her to it without her suspecting. i put dana into the listening room and turned on the music with my sub-and ultrasonic frequencies added. i watched through the soundproof glass from the observation room. during the first movement, dana cried uncontrollably. she curled up in the chair and wimpered. dana laughed insanely, and had what appeared to be several orgasms. "i've done it!", i cried. but then, the second movement began. i shudder still when i think of it. i looked in at dana. she was sitting upright in the chair, staring straight ahead, her hands gripping her knees. there was blood starting to drip from her fingernails. she was becoming catatonic and starting to shake. i had to halt the processor before permanent damage was done. but before i was able to stand, dana let out an excrutiating scream. she shook violently and fell to the floor. then, dana began to float into the air. i pulled open the door and rushed into the listening room. dana was screaming far above my head. beethoven was screaming from the 64 speakers. then, i called her name. it was too much. dana dissolved. i think that the added sound of me yelling to her exceeded the threshold. i know now that i am to blame for her dissolving, and that i'm responsible for bringing her back. perhaps it can be done with bartok. dana always liked bartok. ------------------------------------------------------------------------------ Date: 4 Nov 1982 22:08-EST From: Rob.MacLachlan at CMU-CS-SPICE at CMU-10A I ran into my most obsure bug last summer when I was working on a boot image builder for Accent to run under Accent. What I had to do was convert the original program, which had POS filesystem calls that read and wrote random things scattered throughout it to use the Accent primitives, which are read and write an entire file. After factoring this code out into a separate module I found that the program died the same way about one time out of five. Since the debugger was virtually non-existant I proceeded to put in debugging code. First I put in a check where it was dying for the fatal condition, which would print various information. I found that when the error occured the cause was that the Pascal Get intrinsic was returning a random value instead of the correct one, but no particular pattern was observable. I then put in code to dump the contents of the pascal file object after every value read from the file to see if it was getting clobbered; with this code in place the program died with an illegal memory reference inside the system print routine inside of one of the debugging WriteLn's. At this point it was obvious that something earlier in the program was damaging the environment somehow, so I tried successively commenting out earlier parts of the program to find the offending code, and I found that if I did not read an earlier file, than the problem did not occur. This caused me to suspect my file handling module, so I put debugging code in it to check that all of the pointers it was returning were valid. When this debugging code was inserted the program then died earlier in the program, but this time consistantly during the reading of the third microcode file. Insertion of debugging code at this point revealed that to a point the buffer contained the correct data, but the rest was zero. At this point I felt reasonably sure that I had found a bug in Accent, so I called in the wizards, who looked at the address of the buffer and said: 'Oh that crosses a 64k boundry'. Evidently it was a "Known" bug that a pascal object could not cross a 64k boundry, because the address calculations wrap around, and the ReadFile routine I was calling read the file into a place in memory such that it crossed a 64k boundry. The Execution of the debugging code I put in caused storage to be allocated, thus causing the heap to cross a 64k boundry earlier in the program. ------------------------------------------------------------------------------ Date: 5 November 1982 0122-EST (Friday) From: Tom.Lane at CMU-10A I have spent too many years of my life hacking systems which tried to enlarge a processor's address space by using software-controlled bank switching (C.mmp/Hydra & Cm* locally, Hewlett-Packard 9845 out in the real world; personal computing CP/M systems seem to be going down the same garden path). These machines extend a processor with (say) a 64K address space to handle megabytes, by dividing the processor address space into two to 16 blocks. Each block is mapped to a block of physical memory by means of an associated processor register. Accessing a particular memory location requires loading up one of the map registers with the block number of the location, then accessing the processor- visible address "register number * block size + location's offset within block". This scheme is a LOSER. The majority of bugs found in each system I have worked with have been directly related to bank switching; it's just too easy to forget to load or restore a map register. This leads to reading or clobbering semi-random locations in blocks other than the one wanted. Worse, the bugs are often very difficult to duplicate, since they only show up when two data structures being manipulated at once happen to reside in different physical blocks. HP's testing records showed that 75% of the bugs discovered during system testing were of this ilk; many of them required an unreasonable amount of effort to track down. ------------------------------------------------------------------------------ Well, I worked for this large 3-letter organization funded by a certain 3-letter government agency, as part of a large 3-letter University located in Pittsburgh PA. They decided that they could save money by hiring fresh undergrads for systems work, instead of experienced (and expensive) Unix professionals. I was given a MicroVax II as my computing engine. Each of our microVaxen had two 70Mb drives, which were coupled via links into a single file system. This ran EMACS a great deal more slowly than my PC ran Epsilon, and I found that I could edit faster if I used the 24x80 screen on the serial line. Eventually, I was so frustrated by the incredibly poor environment that I demanded, and got, an IBM PC in my office, on which I proceeded to do my work. I would occasionally upload a file from the PC, until they broke Kermit (by replacing it with a non-working version) and used to read my mail on the Unix box (until they installed a new mailer which broke all existing mail-reading programs; the program I used was written in a subset-Ada compiler for which the original compiler sources had been lost, and the current compiler binary wouldn't run). Then I stopped using Unix entirely. Every night, a daemon job on the central file server (a VAX/750) would back up as many user machines as it could reach, which turned out to be all of them. This was done faithfully. I went to Australia for 6 weeks. When I returned, I found that I had made more progress in that time than most of the people who stayed... It seems that the documentation group was working on the major 1-and-5-year plans, and had filled up their disk. "No problem" quoth Facilities, "We will install a new, larger disk for you". Said disk was installed, first thing in the morning, and the backup tapes were restored. The disk was empty. "What of our backups?" said everyone... Well, it seems that the backup software was very clever and didn't follow links, so every night, faithfully, we had backed up the Unix kernel, man pages, /usr/bin, /etc, and all those valuable, irreplaceable bits that were identical from machine to machine. But not a single user file! "So give us back our old disk" said the documenters. At this point my memory and/or the story gets a little muddled, but the upshot is that the original disk was no longer readable (or had been reformatted, or had suffered hardware failure), and the entire pair of plans were gone, three days before delivery! So, they tried two attacks. The first was to co-opt the Kurtzweil (sp?) optical-character-recognition system and try to optically scan in the entire document. This was not entirely successful, and the secretary of one of my colleagues went slightly nuts because every time it didn't recognize a letter it displayed the bitmap and asked for a correction, and she spent a couple days at this. Meanwhile, everyone in the building who could type was given some segment of the document to type in. Who says parallel processing can't work... Meanwhile, the backup procedures were undergoing some changes, and were being taught to follow links. One night, one of the systems hackers was startled to see his entire file system disappear out from under him... He quickly investigated and found out what happened, and shut down the automatic updater, but not before it had nuked about 30 machines. These machines, of course, did not have any backups... The story, as told to me, was something like this... Every night this program ran on the master file server, whose goal in life was to bring /usr/bin and other such files into conformance with the master copies on the file server. Any file on the file server which was newer than one on the target machine was sent to the target machine. Any file /not/ on the file server, but present on the target machine, was deleted (so that obsolete programs could be deleted simply by deleting them from the file server, and they would disappear from the client machines the next night) But since the program would now follow links, it discovered /all these other files/ on each target machine /which weren't on the file server!/ Files like Mailbox, or any other user-defined file... So this updater promptly deleted them. One of these machines was mine. I lost 6 weeks of email. Interestingly enough, I seemed to have not suffered unduly from this particular problem. All my other work had been done on a PC, where I had a decent editor, and a printer (FX80) that could actually print what I worked on. So I lost no important work. But the concept of the hardcopy paper backup was singularly amusing. I once met a salesman for Kurtzweil and said "Oh yes, you sell computer mass-storage backup devices!" When he tried to explain that he actually sold optical scanners, I told him this story, to his considerable amusement. This is not the end of the story, however... The special switch was enabled that allowed the file daemon to back up the user disks by tracing down symlinks. This also seems to have had, as a side effect, an impact on the automatic updating software. It seems that this software would compare files on the user disk to files on the master file server and would download to the user disk any files which had changed on the master server. Since it did this with root privileges, there was no way, for example, for me to keep my version of Kermit from going away from my /usr/bin; I would have to put it in some other directory (this is the "we know best for you what files should be on your machine" attitude). It also noticed that if a file was /not/ on the master file machine and /was/ on your local disk, then it was a file which had been deleted, so it was deleted from the local disk. Now that the user's personal files were visible, the daemon looked around, saw that there were no files by that name on the master server, and deleted the user's personal files. Note that the backup had not yet run. We lost something like 30 or 40 complete machines. Months of work, with no backups. It was discovered before it did in all 80 by one of our late-night hackers, who had his file system disappear out from under him. Fortunately, he was a systems person and deduced that something in the master update program was bad, so he was able to kill it before all the machines were nuked. What was well and truly amazing was that the facilities manager was still employed by noon the next day. In fact, he stayed on for months. Another moral: make sure procedures work, and when they fail make sure that your backups (which from moral 1, above, you already know work) are current. I arrived back from a 6-week trip to Australia 3 days after this happened. I was the only one who had not lost any work. My machine was nuked; all the mail I had received while I was away was gone, but no important work had been damaged. This was because I did it all on the PC, which I, personally, took responsibility for backing up every day. In 6 weeks, I made zero progress. Everyone else made significant negative progress. I have not trusted any central facilities management system at all, ever, since that incident, and nothing I've experienced since has convinced me I'm wrong (although there is more evidence that I am right). Fortunately, I was able to escape from that place (although it took nearly another year) and go out and do something useful with my life. ------------------------------------------ The mail program insisted upon verifying addresses before it sent mail. For some reason, if it didn't find "wherever.someplace.domain" in its list, it knew that it couldn't send the mail out (never mind that full domain addressing was in place for the mail server. This was the mail program. It wouldn't just hand the mail off to the server and say "here, make your best effort". It refused to hand it off at all, even though the server was perfectly capable of delivering the mail). It used an index into the massive database of known hosts, kept on each machine. Since lookup was cheap, it would look up the address 3 or 5 times (I forget which) each time a verification before sending was needed. The author was careful; if the index file was unavailable, he searched the host table sequentially. The operations staff, in response to complaints from users about disk space (remember, the 70Mb nanodisks), deleted the index from the master server. The update program promptly deleted the index from all the local disks. It now took 5-9 minutes to send a piece of mail. Longer, if the name couldn't be found. If they put the index back, everyone would get it. If they put it only on my machine, it would be deleted later that night by the updater. I seemed to be under the delusion that I had a better understanding of my needs than they did. For me, the MicroVax as a mail machine was more important than its use for any other purpose. I needed that index. No luck. I obviously didn't know what was best for me. I stopped using email entirely at that point. When I missed an important (to him) meeting with the director, he complained that I hadn't shown up. I explained that I did not use email because it was so broken that it was unusable. He was shocked. ---------------------------------------------------------------------- While at another 3-letter organization associated with the same 3-letter University, I was working on some development. The development protocol was to archive the sources in RCS, check them out, edit them if needed, and do the build. The RCS vault files were kept on the "RCS volume". The source files were checked out to the "source volume" and built to the "release volume". Since I was the only developer at the time, and all the files were being changed minute-by-minute, there was very little apparent need to check them in. So I continued editing. We had a server crash. Sigh. I'd probably lose a few hours work, since backups ran overnight. I got a call that evening asking if we had anything important on the source volume "Only my last week's work" I said, not yet concerned. Well, it seems that everyone "knew" (never mind that new hires were given an office, an undocumented workstation, some undocumented software, and told "make something new happen", and /absolutely nothing more/, like perhaps an overview of the system...) that source volumes are not backed up. Since they are only transient places to hold the expanded RCS vault files, they are all "derived" files and don't need to be backed up! Fortunately, my work was rescued. However, it proves once again that You Can't Trust Facilities Administrators" because they set policies and never again bother to explain them, certainly not to new hires. I could have lost all my work, which in retrospect would not have been so bad, since I never got it to work at all because basic tools (compilers, debuggers, printing facilities, document production, text editors, etc.) were, somewhere between laughably obsolete, quaint, terminally cute, or broken. Not one piece of software, with the exception of the mail system (which I'm using right now) was close to the quality, completeness, correctness, or general utility of what I use daily on my PC. ------------------------------------------------------------------------------ When I worked at the Faculty Computing Center at Central Connecticut State University, I spent a lot of time running around doing software upgrades for faculty members. One day, the head of a department called my boss and said she wanted to upgrade to the latest version of Word Perfect. He told her to make backups of all of her documents. She said okay, and I got sent over a couple of hours later. "Make sure she made the backups," my boss said as I left. I sat down at her computer and asked, "You make your backups?" She said yes, so I deleted all the files in her Word Perfect directory and installed the new version. "Okay," I said, "you can put your data back on," and I left. I had just gotten back to my desk when my boss got a frantic phone call. "Where's all my stuff?" she wanted to know. "What do you mean?" my boss asked. "Well, all of my reports and letters. They're gone!" "Copy them back from your floppy disks." "WHAT floppy disks? He didn't give me any floppy disks." "Hold on a minute." Boss conferred with me, went back to the phone. "You told Jeff that you made backups." "Well, I didn't know what that meant, so I said yes because I didn't want to look stupid." ------------------------------------------------------------------------------ I was an SA out in NY, and once a new user approached me and asked why I would periodically delete files from the system. I told her that there were only so many A's, B's, etc. in the system and that when all of one particular letter was "used up" the computer would just substitute a blank on the screen when you hit the letter. Therefore, I had to clear off old files to free up some of the alphabet for new files. :^) She believed me and tried to explain this to a coworker. It was pretty funny to watch!! ------------------------------------------------------------------------------ I was teaching a very basic class in BASIC Programming to a group of adults. Adults who have never been around computers before are very nervous and much harder to teach than children, however I am a patient person so I enjoy their successes. However, I must share the following: After putting a short program on the board, I told the students to type "R", "U", "N" and press return to see the program execute. A hand went up in the back of the room, waving to get my attention, and the person attached to the hand said, "I did what you said and it didn't work". Knowing full-well that all of us make mistakes when typing at the computer, I suggested she retype "R", "U", "N" and press return. A few seconds later, the lady's hand goes up again. "It still doesn't work", she said. So.. I went back to see what the problem was ... only to find that instead of typing RUN, she had typed in the following: ARE YOU IN Could I make this up?? ------------------------------------------------------------------------------ How about a young system administrator who decided to free up some space in root by moving all of the /dev entries to another partition and symbolically linking over to the new directory. A slightly closer read of the documentation that suggests this technique for other areas specifically suggest NOT doing this to /dev since the machine won't run.../boot nada... a couple of hours later after a complete reinstall of the OS (SunOS 3.?) from tape (what a pain...) things were fixed and I was a bit wiser... ------------------------------------------------------------------------------ (A prime example of Net interest...someone mentioned a system support call where a new sys admin decided to save space by compressing this "worthless" file vmunix. So naturally all the folks on the net run a comparison to see just how much the file really does compress. Please note that this is not advised, as the system will fail to operate after compressing is complete.) SunOS 4.1.1 on a Sparc 2 compressed to 61.3% Ultrix 4.2, with AFS ................. 54.7% Pyramid OSx 4.1 ................. 49.5% SYSVR2 on NS32000 ................. 69.0% Ultrix 4.2 ................. 47.2% Apple's Unix 2.0 ................. 60.1% MIPS ................. 57.1% Amiga (SVR4) ................. 51.9% SGI IRIX ................. 55.76% SIEMENS SINIX v2.1 ................. 68.74% (Ever wonder how much MSDOS.SYS would PKZIP? Leave it to the net...) DOS 5.0: file orig. size .ZIP size (%) .LZH size (%) ------------------------------------------------------------------ io.sys 33430 21674 (64%) 20926 (62.6%) msdos.sys 37394 27718 (74%) 26711 (71.4%) command.com 47845 29966 (62%) 29171 (61.0%) --------------------------------------------------------------------------- Back many moons ago, I had become the admin for one of the two Data General MV's used by the language's group. Since disk drives in that era were generally removable 190 or 330 meg disk packs from CDC, we often time had alternate packs on top of the drives. I came up with the bright idea of putting a sticky label on the inside spindle to label the pack (instead of on the plexiglass cover). After I loaded the disk, and spun it up, it took about an hour for the label to come free and fling itself into the head assembly. They had the entire school of field engineers in training come by because it wasn't often that they got to rebuild an entire disk drive. --------------------------------------------------------------------------- One day I had then mounmental task of removing a user from our eight-user NeXT (that's 12.5% of the users!). The NeXT has a window-based, menu-based user administration program in which you (as root) simply hilight the user's name and select 'Delete User.' After a confirmation, the program deletes the user's entry in the password files and removes their home directory. Well, I went in to delete a user. But I missed on the mouse click and the user's name didn't get hilighted. My hand being on cruise control, I selected 'Delete User' and confirmed it. Moments later (while the disk was still running), I realized what had happened. You see, the program, by default, hilights the first user on the list. Namely, the bin user whose home directory is /bin. Whoosh. Everything under /bin was gone. Unfortunately, the NeXT's boot disk was locked away and unacessible by me. I had to wait a few days before it could be brought out and the OS rebuilt on the hard drive. The solution: NeXT should change the program so that NO user is hilighted by default. --------------------------------------------------------------------------- >SunOS uses kill -1 to tell init to recheck the ttys file. SCO uses kill -2. Yes, the init between SunOS and other operating systems are confusing: once my assistant didn't realize that the main development machine was running BSD, not ATT like the other machines. She typed "init 0", and our BSD system really didn't like running multiple copies of init. Of course, some of us learned that we could quickly log out by typing "kill -1 0" to kill all the processes we own. But if you`re root at the time... Anyway, the first day I got root was soon after I made the big switch from sh to csh (aw, c'mon, I was going between v6 and BSD 4.1). Anyway, I had some finds to run, so I decided to run them at ultra-low priority. I used "nice -20". Oops. We had to reboot it. The system that nobody could su on confused us....until we finally found out that somebody hit CTRL-S on the console, and su wouldn't complete until it could write a message to /dev/console. Or the time we forgot one argument in our testing of: dump 0sdbuf 3100 6250 32b /dev/rmt0 /dev/sd0a and dumped *to* the root filesystem. But my personal favorite was our system which ran progressively slower each day. Funny, it had been up for so long recently, which was not typical. Finally, when the performance was unbearable, somebody happened to do a "ls /tmp", which took over 10 minutes. There were thousands of files in the /tmp directory! So every process had to lock that huge dir each time, and that was the bottleneck. But how did it get so big? The /etc/rc file's "rm /tmp/*" command had quietly failed with the error message "arg list too long". Now it uses xargs. --------------------------------------------------------------------------- I work at the support hotline for a fairly large Unix vendor. Customer calls are intercepted by a group of receptionists, who determine the general nature of each caller's problem or question and then place it on an electronic queue. The receptionists attach a "headline" to each call, so that the support analysts can decide whether a particular call is in their area of expertise. Unfortunately, the receptionists are not generally familiar with Unix. Spelling errors can happen. "The cron log file has exceeded 250 mega bite" "Air message on consol" Sometimes there is strange imagery involved. Picture this: "Cannot get into the library" "Runaway process boards" "Terminals need to be brightened up" ...you can ignore this problem until they're suicidal. "Question about braking when dialing in from a modem" ...calling from your car phone? "Does not see the boot" ...check the end of your foot. "Terminal has no cusor and making a high pitch wine" ...mmmm, just LOVE that high pitch wine! "Cannot get into Telnet" ...yeah, telnet is pretty boring. "Constant memory vaults" ...you're using too many JUMP instructions. "X's and O's on terminal" ...how cute, it's just telling you it loves you. "Terminal density is gone - cannot see screen" ...someone call a physicist -- their system is losing its mass! "Bust fault and reset of system" ...can the hardware guy install a bra? There is some hardware we just don't support. "Install wife terminal" "Has a PC that knocks down all terminals" "Foot disk needs to be reformatted" ...contact your chiropractor. "Actuary on printer is out" ...are they at an insurance company? This is clearly NOT a software problem. "Trouble with electrical smell on system" This one came up a few weeks after Gorbachev had his trouble: "When logging on, getting overthrow signal" Similarly: "Warning regent table overthrow" Here's a stumper. "EGA controller error grade andy controller, bell doesn't work" Users may get a little fed up. "Is it possible to communicate with a Unix machine?" "Too much paper during printing" Sometimes, you just have to wonder... "Getting a parody error" "If terminal is off, can't get prompt back" "Having ahard disfailure" "Question about configuration of Woodperfect" "Set off a background process accidentally and wants to kill" ...I, too, would kill after making such a mistake. "Questions on fox based software" ...those animals really do understand relational databases! "Problem logging onto root, gets Chinese characters" ...oh, your console is upside-down. "Each time he accesses a dose you have to reset the terminal" ...wow, man, the screen is breathing... "Kill process logs users off system" ...it does tend to do that. "Question on repetitioning the disc" ...we have here a signed statement: you should increase swap. "Q how to do PCP over x dot 25" ...please, don't network under the influence. "UPS DOWN" ...and down is up, right, sir? --------------------------------------------------------------------------- My debug story (to get back to the thread, having nearly lost it to whinging comp sci types wittering about how FORTRAN doesn't stop them from being stupid -- Real Programmers don't have a problem): 1987, a pdp11/70, running RSTS/E V8. Real World commercial timesharing/ facilities management shop. Blinking lights, washing machine disk drives. 40-odd users on a 2MB sub-1MIPS machine. I had been given the job if debugging a program that had been giving wrong answers for a number of months. This particular program ran for about 12 hours every month doing things to the stock file. Among other things, it tallied up the depreciation, and was producing the wrong answers. Several people had tried to track the bug down already. Data being fed into this brute was sorted in a way that didn't allow simple summarising of records, so it would be stored in a "virtual array". A virtual array in this case is an array that lives on disk; you opened a file then dimensioned the array on the channel. In fact, there were three arrays; the main two-dimensional array which held totals broken down into part number and something else which I forget, and two arrays containing the part codes and the code for the something else. This array would be closed when the program had finished and opened by the next program to run, which had the same array dimensioned over it. It would then read the array and tally up the depreciation. OK, I said, letes see where the data is wrong. Tallied the array down each column. Produced a report from the real data showing the same data. They matched perfectly. I did the same going across the rows. Same result. Fine, I said. It's the second program. But careful examination couldn't fault it. Finally, I declared war on the problem. I wrote a file dump program to see what was really in the array, as a sanity check that I wasn't missing something silly. Yes, there's the first array. There's the second. There's the main data array. 'ang on. The data in the second "boundary" array ended exactly on a block boundary. More scratching around revealed that there should be more data in that array. Of course! The data was in the main array, but there was stuff missing from the boundary arrays that told the reading program it was there! But why? More examinatoin showed that the block wher ethe array should have extended would have been the last block written before the file was closed. Hmmm.... Then I spotted something odd. The virtual array was closed with: CLOSE #-1 This needs explanation. Under RSTS, when you open a file, a buffer gets allcated in your precious 32k words of virtual memory. Normally, when you close a file, RSTS would assume that you might want that buffer again and keep it allocated. A negative channel number in a CLOSE statement explicitly told RSTS to chuck away the buffer. In this case, the program was going to terminated very soon anyway, so there was no point at all in using the negative channel. Basically, the negative channel CLOSE had been put in by someone trying to conserve memory, and doing it by "cargo cult" methods. I wondered: is the negative channel causing the buffer to be scratched before writing it back to the file? I removed the minus sign. The program worked. -- Don Stokes (don@zl2tnm.gen.nz) -------------------------------------------------------------------------- A friend of mine (name withheld to protect the innocent), who was used to C but needed to write a program in Pascal, told of a particularly disturbing debugging incident. It seems that there was a loop in the program that just wasn't getting executed! When you stepped through with the debugger, it would just step right past the loop, with no apparent reason for its failure to execute. It turns out that he had used braces { } instead of BEGIN/END to delimit the body of the loop. Braces, of course, are comment markers in Pascal. -- Carl Witty (cwitty@ai.mit.edu) -------------------------------------------------------------------------- Here is one of the stranger bugs I ever encountered. While trying to build a multi-user server for what was, at that time, as single-user 4GL database product called RAMIS, we designed a module that would open several update paths through a single file (table in relational parlance) for different users. The vendor-supplied API routine was documented to permit this (although, as we later found out, they had never tested it themselves because they "didn't see any practical use for the capability" :-) Anyway, we tied our front-end screen-painter/forms package to our multi-user server and logged two users into the system. Each attempted to add a record (row in relational parlance :-) to a file that had exactly one record already in it. The initial attempt appeared to work (both users got back OK signals which suggested the code had worked properly). Both users logged out and we ran a simple RAMIS report request to list the supposedly three records now in the file. Well, after waiting for several minutes for the report to come back (it typically should have taken about 2 seconds or less), we killed the execution. It occurred to me that the request we issued asked for the report writer to gather all the data, sort it and then print it. So I modified the request slightly to eliminate the sorting option and have it report each record as it was retrieved. Well, I ran the new test and it printed out: KEY --- RECORD 1 RECORD 2 RECORD 3 RECORD 1 RECORD 2 RECORD 3 RECORD 1 RECORD 2 RECORD 3 .... !!!&&$%####&@@@!!!## The database had gone into a loop (read that again closely...) No, not the program, the database itself had actually put itself into a loop! RAMIS was implemented as a linked list of records and apparently, due to a bug in the API routine supplied to us by the vendor, when we had multiple paths open through a single file, the writes to the file would collide and in this particularly instance, the pointer on RECORD 3 which should have pointed to EOF, actually pointed back to RECORD 1. So, as the report writer tried to collect records, it just kept going back through the same three records over and over again. We later tried it and let it run indefinitely and it ultimately abended when it ran out of temporary disk storage to put the three rows on (several million times :-) We had to wait till the next release for a bug fix (about 6 months) and they indeed did make it work, but what a mess :-) -- Jon Rosen -------------------------------------------------------------------------- Some years ago I was putting together an X.25 driver under TCP/IP on a particular manufacturer's box. The box had dual 68020's running *nix & the communication board (on which most of the dirty work for X.25 would be done) was dual 68000's (not running anything in particular except the protocol). I got the thing running, sort of, and started to do some stress testing. Once in a while, varying from 10 minutes to 10 hours after start, the communication between the main processor and the board would get hosed. I was using the "standard" subroutine calls which had been developed for the kernel/board communications by the folks who built the board. "Well," methinks, "must be them hardware types botched some boundary condition in them routines!" Spent about a week peering through the bowels of some pretty obscure code (comments in Italian!) to no avail. Board designer back in Italy doesn't believe it's happening! Time is getting short. Need to stage LTD (Live Test Demo) for gummint. Management and technical types from Italy descend upon site of LTD about a month before actual date. Among them is designer of board. I recreate problem with board and stick his nose into debugging trace. NOW he believes me. Symptoms indicate producer pointer stepping all over consumer pointer in producer-consumer circular queue causing queue to get consumed TWICE, violating something in the way the board software expected to do things, and causing it to hold it's breath until it turned blue (translation: only a reboot cured it). We found a workaround where, by INCREASING the amount of interrupt traffic between kernel and board, the bug only manifested itself about every 3 days, rather than every few hours. This was still a risk for the LTD, but I'd rather be lucky than good, and in this case we got lucky and passed. Three months later, I get a FedEx package from Italy. In it is a note from the guy, and a replacement set of Pr. Logic Arrays for my boards. Seems that there was a condition under which the PALs would go off the deep end IFF at the time an interrupt was being serviced there was a hex E in a particular memory location. (How he found this, I'll never know. I have visions of him sitting up nights with logic probes attached to every single lead on the board!). We had uncovered this bug because for our testing I had been shipping the largest convenient file around for the FTP test, which happened to be /unix. The LTD only required us to ship ASCII files around to pass the test. Ah well! -- Nick Landsberg -------------------------------------------------------------------------- This one is nasty, all right. It involves manslaughter by software. Many years ago I was speeding up our generic CODASYL database unload/resize/reload program so that customers would be able to restructure their production databases more easily -- it had been taking some shops 100 hours (yes, a long 4-day weekend)! The basic approach of the program, which BTW was Fortran with embedded DML running VAX DBMS on VMS, was to read a text file that listed a sequence of top-level (system-owned) records and sets to walk. Each record found was written to a flat file along with the key values of all its set-owners. The database was then destroyed, recreated empty with its areas the proper new sizes. Then the flat file was read and each set owner found and the record inserted. I sped the program up in two ways. First, the program was made to know about set currencies so that the reload, which took 3/4 of the time, would not have to do multiple redundant set-owner FINDs when loading in each record. That is, if the record I want to insert is owned by record X in set Y, and some previously loaded record already made record X the currency-of-set for Y, then I can skip this FIND(OWNER,SET=Y). This cut the entire unload/resize/reload in half. The other feature was restartability on the reload. That is, if the 70-hour reload blew up at hour 69 due to power failure or disk overflow or some damn thing (in a 70-hour window, or even a 20-hour window with the speedup, the likelihood of a system crash was good), the customer could restart without having to go back to hour zero. This was accomplished by COMMITing the transaction every "x" records, writing to a singular db record the total "n*x" records committed so far, retaining all currencies in the running program and proceeding; on a restart the first "n*x" records in the flat file were skipped, and the reload would proceed with the next record, all of whose set-owners would be FINDed before the insert, naturally. I was pretty elated with this performance improvement and was running a series of crash-&-restart tests on a couple of small databases, so far with no error, but by no means satisfied with my testing suite yet. But my manager came to me with the news that customer Q direly needed to resize their db because response time had degenerated from 20 seconds to 20 minutes on certain things. Resizing using the released unload/load program would take them a week, clearly impossible for a live manufacturing plant. "Could they have an unofficial, pre-beta copy of the new code?" "Sure, but I'm not done testing it yet. In two weeks." "No, they need it now because performance is so bad, and besides this weekend includes a holiday Monday." Part of me was cautious, part of me wanted to satisfy the customer, a large part was callow and cocksure. I agreed to give the new program to Q. I also gave them my home phone number if any problems arose. Saturday went by without a hitch. But on Sunday I got to know the tension in the voice of Jim N., the database administrator, quite well. The program had overrun some VMS process limit and crashed. No problem, they just restarted ... but the restarted load was running very verrrry slowly. Aha, I diagnosed a bug, gave them some code to type in, recompile and restart. Back and forth on this when my top-of-the-head fix wasn't quite right. Ok, now it's running at normal speed ... but it's connecting records to the wrong sets! Abort it, here's another fix, try again. ...Thus hours went by. It wasn't until early the next day that the penny dropped and I realized there was a fatal bug in my restart capability AND that one of my fixes had just kissed off all possibility of undoing the damage and actually restarting. What had I overlooked? In all my tests, the "n*x+1"th record was always a top-level or owned only by top-levels -- none happened to be two levels down in the set hierarchy/network. Q's crash/restart involved a second-level record; the reload FINDed as owner of its second-level set the *first* of a non-unique group of first-level records, when it should have been FINDing the unique owner-of-owner-of-set and then unique second-level WITHIN that set. Yes, a goodly lot of records got connected wrong. Testing with a larger database before releasing it probably would've shaken this bug out. It was obvious in hindsight. Thus when I got to work Tuesday morning I was feeling vaguely guilty about all the sweat and trouble Q's DP staff had gone through. I called them up and asked to speak to Jim N. to see if they had restored everything okay. Ann R. came on the line. "Where's Jim?" I asked. "He died last night," she replied. "Heart attack. Stress." Surprisingly, Q didn't sue us. But then, it might have been bad publicity for them: because the final irony is that what the company manufactured was pacemakers. -- Matthew Rabuzzi -------------------------------------------------------------------------- Their PDP1145 had one program that occasionally trashed a variable. Much software debugging determined that one little used bit bashing command would "occasionally" not work correctly. As a test, they wrote a program that used that command and displayed the number of errors on the display register which showed up on the front of the machine. Amazingly, they quickly noted that the error rate decreased as the number of errors displayed on the front panel increased, until the display was all 1's, (16 lamps on.) When the error count overflowed to 0, the error rate shot up. They then discovered, (with a different readout of course,) that when the raised the "lamp test" switch, (which lit all lamps on the front panel,) that they no longer got any errors. Subsequent scoping of the output of the switch- ing power supply which ran the front panel, (among other things,) was very noisy. Unfortunately, local field circus had to order a power supply from MS, so, their quick fix was to tape the lamp test in the ON position. All was noted in the log book. A day later the log book also noted, "System crash. Cause, taped switch came unstuck. Cure, re-taped with better tape." -- Frank R. Borger (Frank@rover.uchicago.edu) -------------------------------------------------------------------------- I was at a major oil tanker company when it was decided to bring IBM's brand new RT computers into the office. We set up a computer room on the 34th floor. We experienced only one major problem with these machines, we would have at least two disk drive failures a week and they would fail either early in the morning or late in the afternoon. This went on for about two months. IBM was perplexed: after all, Walmart and Boeing were running these things without a problem. Post-mortem analysis of the drives indicated a noise-induced problem. IBM flew in their "Noise Determination Team". If you haven't seen these guys, then you're really missing something. They instrumented our computer room with dozens of little attenna-like devices. The problem: the computer room ajoined the elevator shaft and the excess traffic at 7:45am and 4:15pm was generating more noise than the IBM electronics could handle. They provided us with special shielding for the computers. An interesting side note; we had been running ancient Zilog S8000 Unix boxes in that same room for two years with no problem. -- Steve McDowell (mcdowell@exlog.com) -------------------------------------------------------------------------- A story told at often at DECUS about one site that the computer seemed to crash every day about the same time. Location was an automated warehouse with first floor a large, high warehouse, computers running the stacker cranes etc on fllor above. Field circus boys are really dying trying to figure things out. In desperation, one FE is looking out the window, pounding on the sill, and sundenly Groks, "Hey, it makes no sense, but the last couple of days, I've heard that garbage truck drive up, and a couple of minutes later the system crashes." Sure enough, he's prophetical. Few seconds later comes the crash. They quickly run down the stairs, and see a fork lift driving down the middle of the warehouse with an empty dumpster, (just swapped with the full one that the garbage people took out.) Fork lift is driving directly under the computer one floor above. Final solution? Turns out the fork lift had totally shielded ignition wires, caused enough broadcast energy to overload the high speed disk port on the computer. (As I remeber, someone also noticed a radio go bonkers when the fork lift went by.) Anyway, after many kilo-$ of parts and people, problem was fixed with a set of resistance spark-plug wires for the lift truck. -- Frank R. Borger (Frank@rover.uchicago.edu) -------------------------------------------------------------------------- Here's a pretty bad story. I wanted to have root use tcsh instead of the Bourne shell. So I decided to copy tcsh to /usr/local/bin. I created the file, /etc/shells, and put in /usr/local/bin/tcsh, along with /bin/sh and /bin/csh. All seems fine, so I used the chsh command and changed root's shell to /usr/local/bin/tcsh. So I logged out and tried to log back in. Only to find out that I couldn't get back in. Every time I tried to log in, I only got the statement: /usr/local/bin/tcsh: permission denied! I instantly realized what I had done. I forgot to check that tcsh has execute privileges and I couldn't get in as root! After about 30 minutes of getting mad at myself, I finally figured out to just bring the system down to single-user mode, which ONLY uses the /bin/sh, thankfully, and edited the password file back to /bin/sh. I'll never do that again. This wasn't that much of a horror story, but good enough if you aren't that familiar with the system. John Ellithorpe Internet: jdell@maggie.mit.edu -------------------------------------------------------------------------- From: dbrillha@dave.mis.semi.harris.com (Dave Brillhart) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! We can laugh (almost) about it now, but... Our operations group, a VMS group but trying to learn UNIX, was assigned account administration. They were cleaning up a few non-used accounts like they do on VMS - backup and purge. When they came across the account "sccs", which had never been accessed, away it went. The "deleteuser" utility fom DEC asks if you would like to delete all the files in the account. Seems reasonable, huh? Well, the home directory for "sccs" is "/". Enough said :-( -------------------------------------------------------------------------- From: tzs@stein.u.washington.edu (Tim Smith) Newsgroups: comp.unix.admin,alt.folklore.computers Subject: Re: WANTED: Unix administration horror stories ! I was working on a line printer spooler, which lived in /etc. I wanted to remove it, and so issued the command "rm /etc/lpspl." There was only one problem. Out of habit, I typed "passwd" after "/etc/" and removed the password file. Oops. I called up the person who handled backups, and he restored the password file. A couple of days later, I did it again! This time, after he restored it, he made a link, /etc/safe_from_tim. About a week later, I overwrote /etc/passwd, rather than removing it. After he restored it again, he installed a daemon that kept a copy of /etc/passwd, on another file system, and automatically restored it if it appeared to have been damaged. Fortunately, I finished my work on /etc/lpspl around this time, so we didn't have to see if I could find a way to wipe out a couple of filesystems... -------------------------------------------------------------------------- From: nickp@BNR.CA ("Nick Pitfield", N.T.) Newsgroups: comp.unix.admin Subject: WANTED: Unix administration horror stories The following horror story occured only last week.... One of my colleagues had been itching to get into sys admin for some time, so last week he was finally sent on a 5-day sys admin course run by HP in Bracknell.. On the following Sunday, he decided to try out his new found knowledge by trying to connect and configure a DAT drive on one of our critical test systems. He connected the cables up okay, and then created the device file using 'mknod'. Unfortunately, he gave the device file the same minor & major device numbers as the root disk; so as soon as he tried to write to this newly installed 'DAT drive', the machine wents tits up with a corrupt root disk....ho hum. -------------------------------------------------------------------------- From: pd@x.co.uk (Paul Davey) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! How about the following (true) story which I used to tell system administration trainees. A friend of mine went to upgrade the Unix on a customers machine. Before he arrived one of the users decided to backup all the application data to save time. Before doing this he decided to delete an application specific directory called ``&saved&'' which contained old and unwanted files. So: The user typed (as root) ``rm -rf /&saved/&'' (He knew you had to escape the ``&''s.) (You know what he did wrong I hope.) When my friend arrived the whole filesystem was (needless to say) empty. Oh well says he, let's restore the backups... (Yes they had made backups with cpio regularly.) The first physical tape of the backup restored OK, but subsequent volumes would not read. It turned out that for 2 years the backup script had been giving messages of the form as each tape was filled. Write error on archive Reached end of medium on /dev/tape If you want to go on, type device/file name when ready. Cpio had failed to write the volume header on all but the first tape. Morals: 1) Unix is unforgiving if you make a mistake. 2) Don't ignore any sort of error message 3) Check that your backups read back (at least occasionally) Happy Ending: My friend was able to dd in the backup tapes and recreate the volume header from the actual data with a specially written C program. -------------------------------------------------------------------------- From: philip@haas.berkeley.edu (Philip Enteles) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! As a new system administrator of a Unix machine with limited space I thought I was doing myself a favor by keeping things neat and clean. One day as I was 'cleaning up' I removed a file called 'bzero'. Strange things started to happen like vi didn't work then the compliants started coming in. Mail didn't work. The compilers didn't work. About this time the REAL system administrator poked his head in and asked what I had done. Further examination showed that bzero is the zeroed memory without which the OS had no operating space so anything using temporary memory was non-functional. The repair? Well things are tough to do when most of the utilities don't work. Eventually the REAL system administrator took the system to single user and rebuilt the system including full restores from a tape system. The Moral is don't be to anal about things you don't understand. Take the time learn what those strange files are before removing them and screwing yourself. -------------------------------------------------------------------------- From: hirai@cc.swarthmore.edu (Eiji Hirai) Newsgroups: alt.folklore.computers,comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! Some of these stories of pure stupidity rather than of interesting horror but they did happen. [ BTW, these happened at a different place at a different time than where I am now. Don't bother my current employer about it. ] (1) A consultant we had hired (and not a very good one) was installing Unix on one our workstations. He was mucking with creating and deleting /dev/tty* files and made /dev/tty a regular file. Weird things started to happen. Commands would only print their output if you pressed return twice, etc. Fortunately, we solved the problem by re-mknod-ing /dev/tty. However, it took a while to realize what was causing this problem. (2) I wanted to create a second swap partition on another disk and made the partition start at sector 0 of the disk! (which sounded ok at the time since all other regular 'a' partitions started on sector 0) Every time I rebooted, fsck would complain about missing partition tables - I initially suspected that the disk was bad but I later realized that swapping was overwriting the partition table. I had lost an unknown percentage of the financial data for the institution that I was working for at the time, right when they were being audited! Yikes! Anyway, we were able to recover the data and life returned to normal but I did wonder at the time whether I could still keep my job there. (3) At the same institution, we were running a system software that had a serious bug where if anyone had logged out ungracefully, the system wouldn't let any more users onto the system and users who were logged on couldn't execute any new commands. (The newest release of the software later on did fix this bug.) I had to reboot the machine to restore the system to a sane state. I did a wall <. -------------------------------------------------------------------------- From: rickf@pmafire.inel.gov (Rick Furniss) Newsgroups: alt.folklore.computers,comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! Horror stories: Did this myself many years ago, and have come close to it since. Murphy's law #?? , preventive maintenence doesnt. try this one: /etc/dump /dev/rmt/0m /dev/dsk/0s1 Or: tar cvf /dev/root /dev/rmt0 Backups on unix can be one of the most dangerous commands used, and they are used to prevent rather than cause a problem. If any Unix utility were a candidate for a warning message, or error checking, this would be it. Just in case you didnt catch the HORROR above, the parameters are backworks causing a TOTAL wipe out of the root file systems. More systems have been wiped out by admins, than any hacker could do in a life time. ***** standard DISKclamer ***** personal views of my person only rickf@pmafire.inel.gov -------------------------------------------------------------------------- From: djd@csg.cs.reading.ac.uk (David J Dawkins) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! weave@bach.udel.edu (Ken Weaverling) writes: >A friend of mine called me up saying he no longer could log into his >system. I asked him what he had done recently, and found out that he >thought that all executable programs in /bin /usr/bin /etc and so on >should be owned by bin, since they were all binaries! So he had >chown'ed them all. Oh you bastards. I was hoping that a thread like this would never appear, because if it did, I knew I would have to confess. Oh well... About a year back, I was looking through /etc and found that a few system files had world write permission. Gasping with horror, I went to put it right with something like dipshit# chmod -r 664 /etc/* (I know, I know, goddamnit!.. now) Everything was OK for about two to three weeks, then the machine went down for some reason (other than the obvious). Well, I expect that you can imagine the result. The booting procedure was unable to run fsck, so barfed and mounted the file systems read-only, and bunged me into single-user mode. Dumb expression..gradual realisation..cold sweat. Of course, now I can't do a frigging chmod +x on anything because it's all read-only. In fact I can't run anything that isn't part of sh. Wedgerama. Hysteria time. Consider reformatting disks. All sorts of crap ideas. Headless chicken scene. Confession. "You did WHAT??!!" Much forehead slapping, solemn oaths and floor pacing. Luckily, we have a local MegaUnixGenius who, having sat puzzled for an hour or more, decided to boot from a cdrom and take things from there. He fixed it. My boss, totally amazed at the fix I'd got the system into, luckily saw the funny side of it. I didn't. Even though at that stage, I didn't know much about unix/suns/booting/admin, I did actually know enough to NOT use a command like the one above. Don't ask. Must be the drugs. BTW, if my future employer _is_ reading this (like they say he/she might), then I have certainly learned tonnes of stuff in the last year, especially having had to set up a complete Sun system, fix local problems, etc :-) Anyone else got a tale of SGS (Spontaneous Gross Stupidity) ? -dave "I'm much better now, honest.. no, really.. hey what's this button doooooooooOOOOOO..." djd@csg.cs.rdg.ac.uk -------------------------------------------------------------------------- From: kochmar@sei.cmu.edu (John Kochmar) Newsgroups: comp.unix.admin,alt.folklore.computers Subject: Re: WANTED: Unix administration horror stories ! A long time ago, back when the Apollo 460 was around and I had just graduated from college, I had the good fortune of being one of two adminstrators in charge of making a cluster of 460's a part of our environment. One of the things I was tasked with was geting them onto our network. Well, I was young, I had the manuals, and a guy from Apollo tech support was there to help. How hard could it be, right? Well, we got out the manuals, configured the system (relying heavily on the defaults), and within 2 hours, we had that puppy on the network. Life was good. About 3 hours later, I get a phone call from a systems programmer / developer from CMU campus (the SEI is a part of CMU, and we are on their network.) He told me that if I didn't take the &%@*ing Apollo off the network, he was going to do hurtful things to me physically. Life was not so good. As it turned out, in default mode, the Apollo answered every address request it saw, even if it is not the machine the request was for. Kind of a "hey, I'm not who you are looking for, but I'm out here in case you decide you'd rather talk to me." Apollo considered this a feature, and they took advantage of it in their OS environment. However, one of the earlier versions of a heavily network dependant OS developed at CMU considered this a bug. The OS would issue a request, and expect only the machine it was looking for to answer it. Of course, it would assume that if it got an answer to its request, it must be the machine it expected to talk to. It didn't look at the address of the answer it got, so if it wasn't the correct machine, most of the time the OS would hang or panic. The outcome? Over about 3 hours time, more and more of campus was talking to our little 460, which had just enough muscle to keep up with the requests. By the time campus figured out what was going on, we had an Apollo merrily answering the network requests for hundreds of machines (the ones that were still up, that is.) This caused the part of campus who used the new OS going to hell in a bucket, one very busy Apollo 460, and one very warm ethernet. Well, we turned off the Apollo, configured it not to chat to all of campus before putting it back on the ethernet (this time, we did it while talking with campus, making sure we didn't cause the same problems we did the last time -- we didn't have a packet monitor at the time), and campus changed their OS to look at the request response before assuming it was the correct one. I also learned to think very carefully about default values before using them. John Manager, Systems and Tools admin SEI Computing Facilities kochmar@sei.cmu.edu | brain each day, expressed in M&Ms: 250 -------------------------------------------------------------------------- From: alan@spuddy.uucp (Alan Saunders) Newsgroups: alt.folklore.computers,comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! About inexperienced sysadmins .. One such had been on a Sun syasadmin course, and learned all about security. One of the topics was on file and group access. On his return, he decided to put what he had learned into practice, and changed the ownership of all files in /bin, /usr/bin to bin.bin! I was called in when no one could log in to the system (of course /bin/login needs to be setuid root!) Regards .. Alan -------------------------------------------------------------------------- From: Iain.Lea%anl433.uucp@Germany.EU.net (Iain Lea) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! Arne Asplem (aras@multix.no) wrote: : : I'm looking for actual horror stories of what have gone wrong because : of bad system administration, as an early morning wakeup. Try this one for size. I used to work at Siemens R&D in Erlangen (33000 people out of 115000 population work at Siemens - 12000 in the R&D area). We were working on a project porting an ISO FTAM implementation in Ada to C. About 2 months into the project we received a new project leader who decided there were too few people working on the project (sigh!). Anyway we were promised that a "Spitzen Klasse" (Outstanding) SW guy was being sent over from the next lab. The fateful day turned up (had to be a monday) and there was our very own 'Einstein'. We gave him a tour of the lab (ie. Coffee machine on the left, laser on the right etc.) finally getting to out work area. We had a couple of fast 386's (this happened in '89) running Xenix 386. We told Einstein that I was the sysadmin for both machines and that if *anything* was strange or not working to speak with me. OK so the first morning went off without a hitch and we all went to get someting to eat around midday. All except Einstein who said he wanted to check a few things out (Code practices we thought etc. - turned out to be Page 3 of that months playboy). We came back from eating to find Einstein twiddling his thumbs and saying that he could no longer log in on either machine. Ermmm... I asked him if *anything* had happened while we were away. He thought and thought and then said "Nothing really but the lights went out for a few minutes". OK I thought "fsck the disks, remount them and away we go" but then I stopped and asked him again "Anything else?". He then really started looking around and found the palms of his hand the most interesting thing he'd ever seen. He answered "Well I know a little about Unix and fsck is the 'ajax' cleaning program of Unix so when it started again after the lights came back on it started fsck and asked me for a scratchpad file. I just took the one it printed on the line above!" (ie. the name of the filesystem to clean). Another comment he made was "Must be a fast machine as fsck ran quick". Bad you might say until he told me he had done the same thing to our backup machine. Needless to say Einstein & our project leader exited stage left... And we eventually got a backup tape from our data safe stored at another lab. The SW guy is kind of a living legend around here :-) NAME Iain Lea EMAIL Iain.Lea%anl433.uucp@Germany.EU.net SNAIL Siemens AG, ANL A433SZ, Gruendlacher Str. 248, 8510 Fuerth, Germany. PHONE +49-911-3089-407 (work) +49-911-331963 (home) +49-911-3089-290 (FAX) -------------------------------------------------------------------------- From: matthews@oberon.umd.edu (Mike Matthews) Newsgroups: alt.folklore.computers,comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! When I had first gotten my NeXTstation, it had the lil' 105M hard drive in it. I had a 330M external, but alas, no cable for it. (Life was not fun when I was essentially netbooting off a "test" machine.... ".. um, guys, did you just reboot is-next?") Finally got the cable, just in time for the winter holiday (read: no network). Brought the machine home, and I figured I'd just copy the configuration files over from the internal to the external (as a nice gesture to my users so they wouldn't have to change their passwords and everything). The external was a brand new BuildDisk'd disk (had stock NeXTstep on it). NeXT keeps the private information of each machine (/dev, /etc, stuff like that) in a /private directory to make netbooting easier. Hey, I'll just move /private from the 105M to /private on the external. So I deleted the external's /private and tried to move it via the workspace. /dev is in /private. /dev contains device files. Can't move them. BUT. The workspace happily deleted all the files it DID copy, so the internal couldn't boot (no /etc) and the external couldn't boot (no /dev). This is before the advent of boot floppies so I was stuck for about a week at home with $5000 of NeXT computer that I couldn't boot. The moral? *NEVER* move something important. Copy, VERIFY, and THEN delete. ------ Mike Matthews, matthews@oberon.umd.edu (NeXTmail accepted) -------------------------------------------------------------------------- From: robjohn@ocdis01.UUCP (Contractor Bob Johnson) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! Another horror story (mine this time)... Cleaning out an old directory, I did 'rm *', then noticed several files that began with dot (.profile, etc) still there. So, in a fit of obtuse brilliance, I typed... rm -rf .* & By the time I got it stopped, it had chewed through 3 filesystems which all had to be restored from tape (.* expands to ../*, and the -r makes it keep walking up the directory tree). Live and learn... And another... After changing my /etc/inittab file, I was going to kick init by sending it a HUP signal to tell it the file had changed. Unfortunately, I missed and the 1 became a Q... kill -q 1. Large systems die in interesting ways when you lose init! But the best (IMHO)... We had an operator lay a book on the console keyboard, throwing the console into system monitor mode. This stops the system clock, which locks every session dead in it's tracks. At that time we had over 100 user sessions running. Most of our inbound lines are essentially modem lines on a very large "rotor". After their session hung for a minute or so, many users disconnected and called back. They got connected, but received no login prompt (the system was in a sort of suspended animation). Little did they know that they were now on a different port than the one they just abandoned. A call to the computer room soon identified the problem, and the operator was given the commands to resume normal system operation. As near as we can figure, somewhere around half of the users had disconnected but the system didn't notice because it never saw carrier drop on those ports (being dead). New, different users had now connected to those ports. We received several semi-confused user calls, realized what had happened and invoked the magic "/etc/shutdown NOW" command. The procedure (should this ever happen again) will be to manually panic the system and reboot. I also surgically removed the keycap from that particular key on our terminal - you have to work to press it now! Bob Johnson, Systems Administrator Tinker AFB, Oklahoma -------------------------------------------------------------------------- From: greep@Speech.SRI.COM (Steven Tepper) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! > But the stories told now are more folklore than real horror. Having read > 2 Stephen Kings this weekend I beg everyone to tell more interesting > stories, about demons, the system clock running backwards, old files > reappearing etc ! I once had problems with files that mysteriously refused to stayed changed for very long. It was a PDP-11 Unix system that had crashed, and I brought it up single-user. I would change some file and it would stay changed for a minute or so but then revert to its earlier state (contents, protection mode, etc). What happened was that the write-protect switch on the disk drive had gotten bumped into the "on" position but the device driver failed to report any write errors. As long as the data stayed in kernel buffers the changes "took", but they would disappear once the buffers were reused and the system had to reread the disk. -greep -------------------------------------------------------------------------- From: cjc@ulysses.att.com (Chris Calabrese) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! In article <7515@blue.cis.pitt.edu.UUCP> broadley@neurocog.lrdc.pitt.edu writes: >On a old decstation 3100 I was deleting last semesters users to try to >dig up some disk space, I also deleted some test users at the same time. > >One user took longer then usual, so I hit control-c and tried ls. >"ls: command not found" > >Turns out that the test user had / as the home directory and the remove >user script in ultrix just happily blew away the whole disk. >[...] Reminds me of a bit of local folk-lore (this happened before I was in the admin group)... We have a home-grown admin system that controls accounts on all of our machines. It has a remove user operation that removes the user from all machines at the same time in the middle of the night. Well, one night, the thing goes off and tries to remove a user with the home directory '/'. All the machines went down, with varying ammounts of stuff missing (depending on how soon the script, rm, find, and other importing things were clobbered). Nobody knew what what was going on! The systems were restored from backup, and things seemed to be going OK, until the next night when the remove-user script was fired off by cron again. This time, Corporate Security was called in, and the admin group's supervisor was called back from his vacation (I think there's something in there about a helicopter picking the guy up from a rafting trip in the Grand Canyon). By chance, somebody checked the cron scripts, and all was well for the next night... Name: Christopher J. Calabrese att!ulysses!cjc cjc@ulysses.att.com -------------------------------------------------------------------------- From: obi@gumby.ocs.com (Obi Thomas) Newsgroups: alt.folklore.computers,comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! This isn't nearly as bad as some of the stories in this thread, but... I once mistakenly partitioned my Sun's boot disk so that the swap partition overlapped the usr partition. The machine ran fine for a long time (many months), presumably because the swap space was always nearly empty. Then, one day there was a memory parity error and the system crash dumped at the *end* of the swap partition. What should have been a simple reboot after the crash dump turned into a long and painful re-install of the entire system (Suns cannot boot without a /usr partition). Now when I partition a disk I sit there with a calculator and make sure all the numbers add up correctly (offsets, number of cylinders, number of blocks, and so on). -------------------------------------------------------------------------- From: rik@nella15.cc.monash.edu.au (Rik Harris) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! Sometimes it takes a few tries to get it through the tired brain... Most of our disks reside on a single, high-powered server. We decided this probably wasn't too good an idea, and put a new disk on one of the workstations (particularly since the w/s has a faster transfer rate than the server does!). It's still really useful to be able to use all disks from the one machine, so I mounted the w/s disk on the server. I said to myself (being a Friday afternoon...see previous post) "it's only temporary.../mnt is already being used...I'll mount it in /tmp". So, I mounted on /tmp/a (or something). This was fine for a few hours, but then the auto-cleanup script kicked in, and blew away half of my source (the stuff over 2 weeks old). I didn't notice this for a few days, though. After I figured out what had happened, and restored the files (we _do_ have a good backup strategy), everything was OK. Until a few months later. We were trying to convince a sysadmin from another site that he shouldn't NFS export his disks rw,root to everyone, so I mounted the disk to put a few suid root programs in his home directory to convince him. Well, it's only a temporary mount, so.... You guessed it, another Friday afternoon. I did a umount /tmp/b, and forgot about it. I noticed this one about halfway through the next day. (NFS over a couple of 64k links is pretty slow). The disk had not unmounted because it was busy...busy with two find scripts, happily checking for suid programs, and deleting anything over a week old. A df on the filesystem later showed about 12% full :-( Sorry Craig. Now, I create /mnt1, /mnt2, /mnt3.... :-) Remember....Friday afternoons are BAD news. Rik Harris - rik.harris@fcit.monash.edu.au +61 3 571-2895 (AH & ans.mach) +61 3 573-2679 (BH) Faculty of Computing and Information Technology, Caulfield Campus, Monash University, Australia -------------------------------------------------------------------------- From: ranck@joesbar.cc.vt.edu (Wm. L. Ranck) Newsgroups: alt.folklore.computers,comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! Hello folks, Well, after reading some of the stories in this thread I guess I can tell mine. I got an RS/6000 mod. 220 for my office about 6 months ago. The OS was preloaded so I had little chance to learn that process. Being used to a full-screen editor I was not happy with vi so I read in the manual that INED (IBM's editor for AIX) was full-screen and I logged in as root and installed it. I immediately started to play with the new editor and somehow found a series of keys that told the editor to delete the current directory. To this day I don't know what that sequence of keys was, but I was unfortunately in the /etc directory when I found it, and I got a prompt that said "do you want to remove this?" and I thought i was just removing the file I had been playing with but instead I removed /etc! I got the chance to learn how to install AIX from scratch. I did reinstall INED even though I was a little gun-shy but I made sure that whenever I used it from then on I was *not* root. I have since decided that EMACS may be a better choice. * Bill Ranck ranck@joesbar.cc.vt.edu * * DoD #496 Bikes past and present: CB175, CB550F, Norton 750, CB350F, XV535 * -------------------------------------------------------------------------- From: mt00@eurotherm.co.uk (Martin Tomes) Newsgroups: comp.unix.admin,alt.folklore.computers Subject: Re: WANTED: Unix administration horror stories ! We had something really wierd happen one day. I copied a file to /usr/local on someone elses machine and all seemed to be OK. A bit later the user of the machine noticed that the files and directories they were using on another disk partition were corrupted. There were 2 gigbyte files on a 650Mb disk - and lots of them with wierd names and permissions. At first I did not connect the two events. This disk had given trouble when the power failed a week before, so I fsck'ed it. Now I have run fsck more times than I can begin to imagine and seen plenty of errors, some needing 'manual intervention' but I had never seen anything like this before! It was spectacular. And what was more, when I ran it a second time things got worse. Then I tried to backup the /usr/local partition before restoring this corrupt data and lo, that was corrupt too. It turned out that our sysadmin had created the /usr/local disk partition in the wrong place on the disk and put it over the top of the alternate sectors partition. By writing to the /usr/local disk I had written all over the alts which were mapped into the users partition. Oh dear, what a mess. Solution, rebuild all the partitions so they don't overlap and restore, also buy the sysadmin a calculator. Moral, always do your sums on the /etc/partitions file very carefully before using mkpart. Martin Tomes Internet: mtomes@eurotherm.co.uk -------------------------------------------------------------------------- From: caa@Unify.Com (Chris A. Anderson) Newsgroups: alt.folklore.computers,comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! At a company that I used to work for, the CEO's brother was the "system operator". It was his job to do backups, maintentance, etc. Problem was, he didn't have a clue about Unix. We were re- quired to go through him to do anything, though. Well, I was setting up a Plexus P-95 to be a news/mail/communications machine and needed to wipe the disks and install a new OS. El CEO requested that his brother do the in- stallation and disk partitioning. He had done this before, so I gave him the partition maps and let him at it. When he was done, everything seemed to be ok. Great, on with the install and set- up. Things went fine until I started compiling the news and mail software. All of a sudden, the machine paniced. I brought it back up and the root file system was amazingly corrupt. After rebuilding things, it all seemed to be fine -- diagnostics all ran fine, etc. So I started again -- this time keeping an eye on things. Sure enough, the root file system became corrupted again when the system started to load. This time I brought it down and checked everything. The problem? Swap space started at block zero and so did the root file system. ARRRGGGHHHHH!! Oh yes, the brother still works there. Chris Anderson, Unify Corp. caa@unify.com -------------------------------------------------------------------------- From: bmk@box.ssd.loral.com (Bruce Krawetz) Newsgroups: alt.folklore.computers,comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! Back when I was installing X-windows on a Sun-3, I accidently deleted the console's font. Not only would that machine not boot, it wouldn't tell me _why_ it wouldn't boot. It seems that without that font, /vmunix dies most ungracefully very quickly. -------------------------------------------------------------------------- From: stehman%citron.cs.clemson.edu@hubcap.clemson.edu (Jeff Stehman) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! From article <3965@wzv.win.tue.nl>, by rob@wzv.win.tue.nl (Rob J. Nauta): > > But the stories told now are more folklore than real horror. Having read > 2 Stephen Kings this weekend I beg everyone to tell more interesting > stories, about demons, the system clock running backwards, old files > reappearing etc ! Hmmm. Maybe this is a little closer to what you're looking for... Many years ago a tiny little college in the middle of nowhere purchased an NCR tower, then a newfangled contraption. A half-dozen of us were using it for an assembly class. The prof should have made his warnings about TRAP a little more clear. One student runs his program and it suddenly begans spawning processes, rapidly filling the machine. The prof came in, amused, logged on as superuser, and killed a process. Another process was immediately spawned. The prof tried again. He was ignored. He was also no longer amused. After several minutes he gave up and turned off the box. The tower didn't even flinch. He pulled the plug. Nothing. He ripped the back off the box and dug around. Finally he found the fuse and pulled it, killing the machine. Some of us later claimed we heard laughter as it went down. (Many times since then I have wished other computers came with a backup battery as standard issue.) Jeff Stehman (stehman@cs.clemson.edu) -------------------------------------------------------------------------- From: grover@ccai.clv.oh.us (grover davidson) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! Several months ago here, we were reoganizing our disk space on an RS/6000 with AIX 3.1. I have done this many time before, but for some reason, I was rushing through expanding a file system. Instead of entering the new file system size where it belongs, I entered it into the mount point. It also turns out that I was attached 2 levels down in the file system. Since the size was entered as a number ('234567') and was INTERPRETED as a mount point directory, the result was a circular hard link that basicly left the file system unusable. IBM was not able to help, and we had done quite a bit of work that day, we had to somehow recover some of the stuff. We ended up doing a dd of the raw volume, and the read it back in a couple MB at a time and extracted the pieces that we needed for the mess. The other day while reading Stevens new book, "Advanced Programming in the UNIX Environment", he stated that he had done the exact same thing durring the preparation of his book. At least I am not alone..... -------------------------------------------------------------------------- From: rickert@cco.caltech.edu (Keith Warren Rickert) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! Well, there was the time one of the /dev/tty* things got messed up, and I decided to remake all of them from some big nasty script that came with the system. Unfortunately, that script deleted all the old /dev files, but remade them in the current dir...which in this case happened to be /etc. :( Needless to say, they didnt do much good there, and when I later tried to login and got a out of tty's error....ugh. Lucky there werent any long jobs running, so I could reboot it to single user mode (using a tty def that is apparently in PROM or some such...*whew*) and remake the tty's correctly. -------------------------------------------------------------------------- From: joslin_paul@ae.ge.com Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! cjc@ulysses.att.com (Chris Calabrese) writes: >We have a home-grown admin system that controls accounts on all of our >machines. It has a remove user operation that removes the user from >all machines at the same time in the middle of the night. >Well, one night, the thing goes off and tries to remove a user with >the home directory '/'. All the machines went down, with varying >ammounts of stuff missing (depending on how soon the script, rm, find, >and other importing things were clobbered). >Nobody knew what what was going on! The systems were restored from >backup, and things seemed to be going OK, until the next night when >the remove-user script was fired off by cron again. True confession time: Cron is a great way to hide your flubs. I installed the COPS security package on a system, then set up cron to recheck the system once a month. No problem, right? Except that I had configured COPS to put the reports in /. As a security measure, COPS chmods its directory to u-rwx,w-rwx so that only the COPS owner can read the reports. The chronology was 1) Run cops. Add cops entry to root's crontab. Later that day, notice that / was 600; change it back. 2) 30 days later: get calls from users - can't log in, "No shell" error messages. Find / is 600; change it. Vaguely remember that this happened once before. The machine was a sandbox, so almost anything could have changed /. 3) 30 days later: get calls from users - can't log in, "No shell" error messages. Find / is 600; change it. Vaguely remember that this happened once before. Happen to think "cron"; notice that the only cron activity for root last night was COPS. Read COPS source and discover problem. Moral: RTFM. Keep logs, so that you can notice patterns in your data. Don't do anything as root that you can do as a mortal. -- Paul R. Joslin | joslin@c0223.ae.ge.com | "For every problem, there is a solution that is +1 513 583 3537 | simple, elegant, and wrong." | (H. L. Mencken) -------------------------------------------------------------------------- From: frank@krill.NoSubdomain.NoDomain (Frank Flesland) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! Okay, my own little mistakes.. One of our machines was away, so I moved its disk to another machine so the files were accessible. Unfortunately I had forgot that they used different versions of SUN OS... One ran 4.1.1 and the other 4.0.3. These had different versioins of the file system (to support bigger systems etc.) When I mounted the partitions, it worked fine. BUT - next time I booted the machine, fsck complained about some files, and refused to continue booting. I tried running fsck by hand, and was surprised about all the files that were corrupted, but told fsck to fix them.. After a while I thought the disk was beyond help, and then it hit me.. Different file system types! SUN had made a conversion program, but now the file system was so messy that it wouldn't do anything... I tried rebooting from the messy disk (to run fsck, old version), but discovered that one of the files 'fixed' was /bin/sh... THe soloution was that I got in as myself, changed /etc/passwd via FTP form the outside, and performed a couple og other tricks to get the system up to date. (Need I say that there was some kind of problems with the backup tapes too. Media errors or something...) ------------------------------ More fun: A friend, workin as root, tried to be extra careful when terminating a runaway background command. Instead of killing the process, and risk killing the wrong process by a typo, he used "kill % 1".... When i was a novice to Unix, I made my first shell script something like this: #/bin/sh # "foo" does something doo ..... doo .... and put it in my ~/bin, and ran it... This was on a Pyramid, that luckily had a limit on processes pr. user. I haven't tried this on other systems..... Frank "Share and enjoy!" - Sirius Cybernetic Corporation -------------------------------------------------------------------------- From: eps@siivu.oulu.fi (Erkka Sutinen) Newsgroups: alt.folklore.computers Subject: Re: WANTED: Unix administration horror stories ! In article <1992Oct12.080944.23519@jet.uk> djs@jet.uk (David J Stevenson) writes: > When this happened to a colleague (when I worked somewhere else) he restored > vmunix by copying from another machine. Unfortunately, a 68000 kernel does > not run very well on a Sparc... Uh. This reminds me of an long and hard day couple of weeks ago..... I was walking merrily towards my room while world was beautiful and sun was shining. I got the first hint that something may be wrong when I saw our sysadmin banging his head to an nearby door. I asked if something might be wrong, and he told me so while continuing banging. He had made backups of our old faithful, tolsun (sun3/160) and made an minor typo. Instead of tar cf /dev/rst0 . he had written tar xf /dev/rst0 . (Scripts?? we don't use any bleeding scripts!! They restrict creativity and make improvising impossible! ) Oh well. And the tape happened to be an old backup from an sparc. So. No binaries worked, execpt that one could login as box. inetd had crashed (at which point it did is irrelevant.) There was no active sessions for that system and there was no way to get in..... Lets take a break, have some coffee and think it over: There was a lighter side: all of the disks were mounted to another system via nfs, and that daemon was still working. List of the files overwritten was on an log file, and there wasn'y very many of them. backups were on 8mm tapes and our only 8mm drive was on our server, but with nfs, that wasn't problem. On the other hand we had lost /usr/bin, /usr/etc, /usr/kvm and /ucb . Ugh! I think I don't like this. Fine. Lets take everything back from tape.... wait a minute... We had just installed new operating system, which had taken several days since additional upgrade hadn't worked due to lack of disk space and we had an rather unorthodox system running, which worked fine with our add-ons, such as appletalk box... And we hadn't make backups of this new system yet... Aww shit. We couldn't afford to lose this system, it had too much sweat already in it. Fortunately we had all the binaries of the upgraded system on our servers disks, and with nfs we copied everything to the faulty system. Blessed is the network, who rules our lives! Nothing happened. Nothing worked. Ah! We just have to restart inetd. Hmmmm. but how. Do not worry, this is Unix, full of possibilities! First cron, but the clock of that system was totally out of this world showing probably the time of Ouagadougou and we didn't know where was that! Ah! making some false login attempts will bring error messages to the console. Nothing happened... I guess syslog had died too.... When daemons are in agony, no program can stand without feeling some sympathy towards those who suffer so.... Oh no... But of course. Removing roots password.... Why are you brain cells sometimes so slow.... This made it.... But still..... Nothing worked... Ah, we had the upgraded binaries... Hm. Ahh. the sun's unupgrade option was used while upgrading the system, so we just reunupgraded the system. No problem. Everything is fine again... ?? Why doesn't that system work?? Why our server doesn't let anyone in... Awwww...Jesus Christ. What had I done!!! Unupgraded our SparcServer with sun3's binaries.... Awww shit and . At this point I was trying to find an good way to get rid of all of our computers.... Would anyone notice if I just dumped our systems to river and claimed that we have never had any computers.... No hope. Back to work. And at the same moment, the cron miraculously worked, and we had two inetd's running. No harm done. It is just a nuicance. No problem. Don't panic. Both of the systems are up and running.... Well aware of the fact that if there are any more mistakes, the systems will crash and they will not come up. We didn't even have our server's operating system handy, (one disk in whole unversity... ) I took the most recent backup of our server and carefully extracted the rpc.passwdauth and couple of other files from the tape, and lo ... everything worked again. The reunupgrade procedure was made at this time at the right computer, and at our suprise it started to work again.... The world is a miraculous place......While ps again operational, we checked the daemons still running.... nfsd and getty were only systems that had stayed up in addition to the usual bunch (init etc...) At this point we thought everything was fine... Lets make the backup now and take a good care not to touch anything..... It worked.... Couple of days later I noticed that the restoring of our servers daemons had gone to an wrong directory and the mail hadn't been running for two days... No problem... Execpt that delivering two days worth of mail brought our server to its knees... Fortunately it didn't crash but it was ssslllllooooowwwwww for couple of hours.... .....and once again our heroes had defeated evil nazis and were heading home for meal....... Moral of the story: DON'T PANIC . There are several ways to handle unix as long as it is running. Do NOT try to reboot it. It only brings sorrow with it. Moral of the story 2: Douple-check every command you give. If one of your commands has an error, it is the one which causes biggest damage. Moral of the story 3: Check the machine you are in, before you give any even possibly destructive commands... Networks can be a real nuicance you know..... Erkka Pietari Sutinen eps@rieska.oulu.fi / sutinen@csc.fi -------------------------------------------------------------------------- From: pinard@IRO.UMontreal.CA (Francois Pinard) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! Many things happened in those many years I've been with computers. The most horrorful story I've seen is not UNIX related, but it is certainly worth a tale. Here it goes. This big (:-) CDC 6600 system was bootable from tape drive 0, using these 12 inches wheels containing 1/2" tape. The *whole* system was reloaded anew from the tape each time we restarted the machine, because there was no permanent file system yet, the disks were not meant to retain files through computer restarts (unbelievable today, I know :-). The deadstart tapes (as they were called) were quite valuable, and we were keeping at least a dozen backups of those, going back maybe one or two years in development. The problem was that the two vacuum capstans which were driving the tape 0, near the magnetic heads, were not perfectly synchronized, due to an hardware misadjustment. So they were stretching the tape while they were reading it, wearing it in a way invisible to the eye, but nevertheless making the tape irrecoverable. Besides that, everything was looking normal in the tape physical and electrical operations. Of course, nobody knew about this problem when it suddenly appeared. All this happened while all the system administration team went into vacation at the same time. Not being a traveler, I just stayed available `on call'. The knowledgeable operators were able solve many situations, and being kind guys for me (I was for them :-), they would not disturb me just for a non-working deadstart tape. Further, they had a full list of all deadstart backup tapes. So, they first tried (and destroyed) half a dozen backups before turning the machine to the hardware guys, whom destroyed themselves a few more. The technicians had their own systems for diagnostics, all bootable from tape drive 0, of course. They had far less backups to we did. They destroyed almost them all before calling me in. Once told what happened, my only suggestion was to alter the deadstart sequence so to become able to boot from another tape drive. Strangely enough, nobody thought about it yet. In these old times, software guys were always suspecting hardware, and vice versa :-). Happily enough, the few tapes left started, both for production and for the technicians. Tape drive 0 being quite suspectable, the technicians finally discovered the problem and repaired it. My only job left was to upgrade the system from almost one year back, before turning it to operations. This was at the time, now seemingly lost, when system teams were heavily modifying their operating system sources. This was also the time when everything not on big tapes was all on punched Hollerith cards, the only interactive device being the system console. It took me many days, alone, having the machine in standalone mode. The crowd of users stopped regularily in the windows of the computer room, taking bets, as they were used to do, on how fast I will get the machine back up (I got some of my supporters loosing their money, this time :-). This was quite hard work for me, done under high pressure. When the remainder of the staff returned from trip, and when I told them the whole tale, we decided to never synchronize our holidays again. Franc,ois Pinard ``Vivement GNU!'' pinard@iro.umontreal.ca (514) 588-4656 ...!uunet!iros1!pinard -------------------------------------------------------------------------- From: jcm@coombs.anu.edu.au (J. McPherson) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! A few months ago in comp.sys.hp, someone posted about their repairs to an HP 7x0, after a new sysadmin had just started work. They {the new person} had been looking throught the file system to try to make some space, saw /dev and the mainly 0 length files therein. Next command was "rm -f /dev/*" and they wondered why they couldn't login ;) I think the result was that the new person was sent on a sysamin's course a.s.a.p. Internet: clement@arp.anu.edu.au jcm@coombs.anu.edu.au -------------------------------------------------------------------------- From: grant@unisys.co.nz (Grant McLean) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! One of my customers (who shall remain nameless) was having a problem with insufficient swap space. I recommended that he back up the system, boot off the OS tape, repartition the disk, remake the filesystems and restore the data (any idiot could do this, right? :-) ). I also suggested that if he wasn't confident of achieving all this, we could provide a skilled person for a modest fee. Of course he was fully confident so I left him to it. Next day I get a call from the guy to say he'd been there all night and he'd had all sorts of funny messages when restoring from tape. Eventually we tracked his problem down to the backup script he'd been using. It was a simple one liner: find / -print | cpio -oc | dd -obs=100k of=/dev/rmt0 2>/dev/null This was a problem because: 1) His system had two 300MB drives 2) He only had a 150MB tape drive 3) The same script was being run every night by a cron job 4) All his backups were created by this script (In case you haven't worked it out, the dd is to speed up writes to tape but it has the unfortunate side effect that CPIO never finds out about the end of tape. Because the errors were going to the bit bucket, they never knew their backups were incomplete until they came to restore from them). I would have loved to be a fly on the wall when he explained to his boss that the data was gone and there was no way of getting it back. I haven't heard from the guy since then. Hmmm ... Grant McLean | | EMail: grant@unisys.co.nz Unix Support Specialist | | Phone: (GMT+12)(64 4) 4984558 Unisys New Zealand Ltd | | Fax: (64 4) 4711953 -------------------------------------------------------------------------- From: nagappa@menudo.uh.edu (Chaitanya Nagappa) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! The following article is posted by a friend from my account: Chai Nagappa =================================================================== Hi, This is Ravi. Needed to add just a couple of stories from all the wierd stuff that have happenned. So, are these tales for around a campfire on Halloween? At one time, there were three of us working on a unique SVR3.2 motorola based machine, on a R&D project. I took care of all the SysAdmin tasks, I had a back up administrator, and the third person had been stuck into my group (company politics). The group project files were in /user and the individial ones in /user2. We had managed to get backup from the operations department for /user only (not even /; security paranoia?). Anyway, I had another scsi hard disk that I used for making a disk copy of the primary scsi hard disk every Friday. This disk was connected, but not mounted, so that I could do the disk backup from my desk when I wanted to. This machine used to sometimes get a scsi error such that you could not log in, but the processes already running on the machine were not affected. If were logged in the console, you just powered off the machine for a few minutes and rebooted it. Around holidays time the other Admin was off in a long vacation. I had taken Monday off, and headed off for a four day weekend. The machine does the same blurp. The third person decides the power off the machine & turn it back on immediately. It does not come up properly. She decides to reinstall the machine using the installation tape that I had unfortunately left in the open. Reformats the hard disk, installs the base system, and is stuck at that point when I come back in on Tuesday. I almost blow a blood vessel but try to keep calm 'cause I had made a disk copy about 10 days before (too anxious to get on my holiday the previous week). Try to mount the disk... hit vaccuum. Try using dd to look at the disk... Seemed to be a large /dev/null :-? When the lady decided to reinstall the system, it asked her what scsi disks she wanted to reformat, and she said "y" for both 0 & 1!! All my sample/trial&error work for a year had bitten the dust. My only (small) consolation was that I was not the only one affected. Story 2. Live 24 hour online system. Does backup over the ethernet to a SCSI tape. Unfortunately, no SCSI on this system to recover if root/ethernet dies. This was a Compaq Systempro running SCO Unix. Slated a downtime of 4-6am. I thought that it will take me only 30 minutes, as I had installed a similar (Adaptec) SCSI board on a similiar hardware on SCO. Only difference was that this machine was running MPX (multiprocess extension) and you had to deinstall it, install the SCSI, and then reinstall MPX (proper procedure). I had made all my slot/IRQ charts the previous day, and so got busy removing MPX. Then said "mkdev tape", go through the IDs, and am almost at home base. Then... "link kit not installed, use floppy X1" when I tried to remake the kernel. For some reason, when I removed the multiprocessor extension, the single processor files were not moved to their right location. And if I reinstalled the single, all my changes would be lost. Finally, restored the OS (from backup) on the remote machine, and then rcp-ed them over to bring back the MPX version. Unfortunately, rcp does not maintain the date/ permissions, etc. Got a limpimg version of the machine back on-line about 45 minutes after its slated time, and spent the rest of the day fixing vagrant files. The next week, I moved the online programs to another machine (a headache), and reinstalled this machine from scratch. Ok, that should be enough horror. Please send any replies to "ravi@usv.com" instead of this account. Thanks, --Ravi Ramachandran -------------------------------------------------------------------------- From: adb@geac.com (Anthony DeBoer) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! In article JRowe@cen.ex.ac.uk (J.Rowe) writes: >One thing I would like all vendors to do (I know one or two do) is >to give root the option of logging in using another shell. Am I the >only one to have mangled a root shell? This actually leads me back to a Unix admin horror story. At a former employer, I once watched our sysadmin reboot from the distribution tape after making a typing error editing the root line in /etc/passwd. After munging the colon count in this line, nobody could login or su, and he hadn't left himself in root in another session while testing his changes (a rule I've adopted for myself). My "big break", the moment I became sysadmin, was partly by virtue of being the only one to ask him for the root password the day he went out the door for the last time. What I've found preferable, when wanting to set up an alternative shell for root (bash, in my case), is to add a second line in /etc/passwd with a slightly different login name, same password, UID 0, and the other shell. That way, if /usr/local/bin/bash or /usr/local/bin or the /usr partition itself ever goes west, I still have a login with good ol' /bin/sh handy. (I know, installing it as /bin/bash might bypass some potential problems, but not all of them.) This might, of course, be harder to do on a security fascist system like AIX. Simply trying to create a "backup" login with UID 0 there once so that the operator didn't get a prompt and have to remember what to type next was a nightmare. (I wound up giving "backup" a normal UID, put it in a group by itself, and gave it setuid-root copies of find and cpio, with owner root, group backup, and permissions 4550). BTW, this was to make things easier for the backup operator, not to make it secure from that person. -- Anthony DeBoer adb@geac.com | uunet!geac!adb | GEM: ANTHONY.DEBOER -------------------------------------------------------------------------- From: williams@nssdcs.gsfc.nasa.gov (Jim Williams) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! Well, I guess I'll throw in a couple of stories too. The first isn't really a horror story, more of an unexpected failure mode. Story One is about The Sun 3/260 That Froze Solid. One day a user reported that the Sun 3/260 he was using was "dead". On inspection, I found the Sun at the console prompt and the keyboard totally unresponsive. The L1-A sequence did nothing. So I power cycled it. Nothing. A blank screen, no activity. I was ready to call service, then decided to try rebooting with the normal/diag switch set to diag. On looking at the back of the pedestal, I saw that the ethernet cable had been pressed up against the reset switch! ARGGGHHHH! The user had pushed the machine back just enough to press the switch and keep it pressed. (I don't recall if there was a "watchdog reset" message on the console when I found it, but I was new enough to Suns that that would not have been a dead givaway.) Story Two involved connecting an HP laserjet to a Sun 3/280. This sucker just would NOT do flow control correctly. I put a dumb terminal in place of the HP and manually typed ^S/^Q sequences to prove that the serial port really was honoring X-ON/X-OFF. But for some reason the ^Ss from the HP didn't "taste right" to the Sun, which ignored them. Switching the HP serial port between RS422/RS232 had no effect. It evenually turned out to be some sort of flakeyness with the Sun ALM-II board. Everything worked fine after I moved the printer to one of the built-in Zilog ports. Death to flakey hardware... Cheers! Spoken: James W. Williams Company: Hughes STX Internet: williams@nssdcs.gsfc.nasa.gov Phone: +1 301 286-1131 -------------------------------------------------------------------------- From: keith@ksmith.uucp (Keith Smith) Newsgroups: alt.folklore.computers,comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! My dumbest move ever. Client in Charlotte, NC (3 hours + away) has Xenix box with like 15 users running single app. They have a tape backup of course. Anyway they ran slam out of space on the 70MB disk drive so I upgraded them from an MFM to a SCSI 150MB disk. Restored their app & data files, and they were off and running. Anyway they did an application directories backup (tar) on a daily basis and backed the rest of the system up with tar on Monday morning. Being a nice guy I built a menu system and installed the backups on the menu so they could do it with a push of the button. Swell, It's Monday Call if anything else comes up. 1 week later I get a call. Console is scrolling messages, App seems to be missing yesterday's orders, etc. Call in, and cannot log in. 'w' doesn't work. Crazy stuff. Really strange. Grab old drive/controller, fly to Charlotte replace drive, install app backup tape. They re-key missing stuff, etc. Bring new disk back. Won't boot, won't do anything. Boot emergency floppy set. Looking around. Can't figure but have backup tape from that morning that "completed successfully". tar tvf /dev/rct0. Hmm, why all these files look very OLD. Uh, Where, Uh. Look at menu command for the "backup" is 'tar xvf /dev/rct0 /' Anyway, I owned up to the mistake, re-loaded the SCSI drivers and changed the command to 'tar cvf ..' Hehehe, Now I DOUBLE check what I put on a menu, and try not to be in a *HURRY* when I do this stuff. Keith Smith uunet!ksmith!keith 5719 Archer Rd. Digital Designs BBS 1-919-423-4216 Hope Mills, NC 28348-2201 -------------------------------------------------------------------------- From: corwin@ensta.ensta.fr (Gilles Gravier) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! Well, talk about horror stories... We have a DataGeneral Aviion machine where I work at. I was doing regular admin tasks on it and decided, logged in as root, to clean /tmp... (I can already see you laughing there!). So, as usual, I typed "cd / tmp" then "rm *" as I was placed in / when the dreaded rm was entered... My root directory was erased... I realized my error fast enough... So, since I had deleted the kernel, and the administration kernels (that both reside in /), I had to recreate a new kernel. Luckily for me, DG/UX allows to recreate one "on the fly", using parameters of the running kernel (in memory!)... So I did, and then rebooted. Things started getting bad when I still couldn't work on my machine, logins didn't work (No Shell messages...)... Until I could access the /etc/passwd file using a trojan shell through an NFS mounted directory, and great a root account whose shell was not /sbin/sh... On a DG, /sbin and /bin are both links to /usr/sbin... The links were killed when I did my "rm"... Well, now I do backups! This message has been brought to you by Gilles GRAVIER - corwin@ensta.fr -------------------------------------------------------------------------- From: jn11+@andrew.cmu.edu (Joseph M. Newcomer) Newsgroups: alt.folklore.computers Subject: Stupid terminal tricks Then there was the infamous "stuck cursor key" problem on TOPS-10. TOPS-10 was able to run with the user files on DECtape and only a transient file area for users (we used it that way for quite a while at CMU!). One of the logout options was to delete all files on the hard disk. The logout command was 'kjob', abbreviated 'k'. If you just typed 'kjob', it prompted for the type of logout, for example, 'f' for 'fast' logout, another option (I forget what) that deleted scratch files that might be on the disk, etc. And of course, the option to delete all the files, 'k'. This was a dangerous option. If you selected it, you had to confirm it with 'y', or the word "kill", which could be abbreviated "k". On TOPS-10, an escape character could terminate buffered line input, same as newline. So our poor victim accidently hit one of the cursor keys on the terminal, which was defective, and it sent the sequence KKKK... and he watched an entire day's work evaporate. The logout program was modified that same evening to disable forever the "kill files" option. joe -------------------------------------------------------------------------- From: mike@pacsoft.com (Mike Stefanik) Newsgroups: alt.folklore.computers,comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! One of the more interesting problems that I ran into was a customer that was having problems with their SCSI tape drive on a XENIX box. Around midnight, every night, the system would automatically backup and verify their data. One day, the customer needed to restore some data files from the last night's backup. She called because, although the restore worked just fine, she didn't see the busy light on the drive come on, and it didn't sound like the tape was moving. I dialed up the system, had her put a tape in and did a retension -- the drive started winding the tape back and forth, and we both concluded that she was mistaken. After all, the tape was retensioning, and she wasn't getting any backup or verify errors at all. I just chalked this one up to user confusion. A few days later, she called back saying that there really is something wrong with the tape. She needed to restore some data from a few days ago, and like before, the busy light on the drive didn't come on, but files did restore. However when she started the application program, the data hadn't changed. I dialed up the system again, and just on a fluke, issued a "df" -- it showed their rather large root filesystem to be nearly full. Confused, I did a "find", searching for files over 1MB. Of course, what I found was this huge file named /dev/rct0. As I later discovered, their system had crashed a few weeks ago, and she had simply answered "yes" to a bunch of questions that it asked when she brought it back up. The /dev/rct0 device was removed (but /dev/xct0 was still there, which allowed me to retension the tape) and the backup script never checked to make sure that it was actually writing to a character device. Needless to say, I modified the backup program to make sure that it was really writing to a device, and I made her promise to call me whenever the system crashed or asked "funny questions" when it was booting. Mike Stefanik mike@pacsoft.com ...!uunet!pacsoft!mike (714) 681-2623 Pacific Software Group, Riverside, CA -------------------------------------------------------------------------- From: gert@greenie.gold.sub.org (Gert Doering) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! russells@ccu1.aukuni.ac.nz (Russell Street) writes: >So when I came in this morning a user's session had crashed while >he was replying to mail and emacs had spent the night quietly >filling up the root partion (where /tmp) was. Well... sounds familiar... I was on a 5 days vacation, the first day my machine crashed... How? Well... cron started a shell-skript to extract some files from a ".lzh"-Archive. LHarc found that the target file already existed, asked "file exists, overwrite (y/n)?" ... since it was started from cron, it just read "EOF". Tried again. Read "EOF". And so on. All output went to /tmp... what was full after the file reached 90 MB! What happened next? I'm using a SCO machine, /tmp is in my root filesystem and when trying to login, the machine said something about being not able to write loggin informations - and threw me out again. Switched machine off. Power on, go to single user mode. Tried to login - immediately thrown out again. I finally managed to repair the mess by booting from Floppy disk, mounting (and fsck-ing) the root filesystem and cleaning /tmp/* Gert Doering | SubNet : gert@greenie.ucrc.sub.org | mailbox / uucp: Munich / FRG | InterNet: gdoering@physik.tu-muenchen.de | call (089)3243328 (089)3243228 | FidoNet : gert.doering@2:246/55.4 | login bbs / nuucp -------------------------------------------------------------------------- From: almquist@chopin.udel.edu (Squish) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! Two miserable flubs: 1) /etc/rc cleans tmp but it wasn't cleaning up directories so I changed the line: echo clearing /tmp (cd /tmp; rm -f - *) to echo clearing /tmp (cd /tmp; rm -f -r - *; rm -f -r - .*) About 15 minutes later I had wiped out the hard drive. 2) One of the user discs got filled so I needed to move everyone over to the new disc partition. So, I used the tar to tar command and flubbed: cd /user1; tar cf - . | (cd /user1; tar xfBp - ) Next thing I know /user1 is coming up with lots of weird consistency errors and other such nonsense. I meant to type /user2 not /user1. OOOPS! My moral of the story is when you are doing some BIG type the command and reread what you've typed about 100 times to make sure its sunk in (: - mike -------------------------------------------------------------------------- From: anne@maxwell.concordia.ca (Anne Bennett) Newsgroups: comp.unix.admin Subject: WANTED: Unix administration horror stories ! After about four months as a Unix sysadm, and still feeling rather like a novice, I was asked to "upgrade" a Sun lab (3/280 server and ten 3/50 diskless clients) from SunOS 4.0.3 to 4.1 -- of course, this "upgrade" was actually a complete re-install. Well, the server had no tape drive, not even any SCSI controller. There were no other machines on its subnet other than the clients, so I had no boothost (at that time, I did not know that the routers could be reconfigured to pass the appropriate rarp packets, nor do I think our network people would have taken kindly to such a hack!). The clients did have SCSI controllers, but I had no portable tape drive. Luckily, I had a portable disk. So, with great trepidation (remember, I was still a novice), I set up one of the clients, with the spare disk, to be a boothost. I booted the server off the client and read the miniroot from a tape on a remote machine, and copied it to the server's swap partition. Then I manually booted the miniroot on the server by booting off the temporary boothost with the appropriate options, and specified the server's swap partition as containing the kernel to be loaded. Once in the miniroot, I started up routed to permit me to reach the tapehost, and finally invoked suninstall. From then on, it worked like a charm. Needless to say, I was extremely pleased with myself for figuring all of this out. I then settled down to do the "easy stuff", and got around to configuring NIS (Yellow Pages). I decided to get rid of everything I didn't need, under the assumption that a smaller system is easier to understand and keep track of. The Sun System and Network Administration Manual, which is in many ways an admirable tome, had on page 476 a section on "Preparing Files on NIS Clients", which said: "Note that the files networks, protocols, ethers, and services need not be present on any NIS clients. However, if a client will on occasion not run NIS, make sure that the above mentioned files do have valid data in them." So I removed them. Several hours later, when I had finished configuring the server to my satisfaction, reloading the user files, etc., I finally got around to booting up the clients. Well, I *tried* to boot up the clients, but got the strangest errors: the clients loaded their kernels and mounted /, but failed trying to mount /usr with the message "server not responding. RPC: Unknown protocol". I was mystified. I tried putting back the generic kernels on server and clients, several different ifconfig values for the ethernet interfaces, enabling mountd and rexd on server's inetd.conf, removing the clients' /etc/hostname.le0 (which I had added)... all to no avail. 'Twas the last work day before the Christmas break, and I was flummoxed. Of course, I finally connected the error message "unknown protocol" with the removed /etc/protocols (and other) files, restored these files, after which everything was fine again. I was pretty mad, since I had wasted a whole day on this problem, but *technically*, the Sun manual above is correct. It just neglected to mention that of course, *no* machine is running NIS at boot time, therefore *every* machine needs valid data in the networks, services, protocols, and ethers files *at boot time*. Grrr! -------------------------------------------------------------------------- From: rick@sadtler.com (Rick Morris) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! Okay, I'll bite. We had Zenith Data System's Z-286's, boosted to 386's via an excellerator (imagine a large boot stomping lots of data through a small 16 bit funnel...). We were running SCO's Xenix. The user filesystem crashed in such a way that it couldn't be repaired via fsck. fsck would try to repair a specific file and then just stop, leaving the filesystem dirty. The "dirty bit" in the superblock said that it couldn't be mounted because it was dirty. But it couldn't be cleaned. But there was lots of data on it and I hadn't been doing backups because the only I/O device to do backups was the floppy drive and I wasn't about to sit there every night or even once a week and slam 30 odd floppies into the drive while the backups ran, even worse try to restore a file from a backup of 30 floppies.... Anyway, to recover the data I used fsdb to edit the superblock and change the dirty bit to clean, mounted the disk, got off all the good data, and remade the filesystem. Thanks, Xenix. fsck couldn't clean it, but you did supply fsdb! *whew* -- Rick Morris Email: rick@sadtler.com -------------------------------------------------------------------------- From: helen@nomad.urich.edu (Helen C. O'Boyle) Newsgroups: alt.folklore.computers Subject: Re: WANTED: Unix administration horror stories ! In article <1429@pacsoft.com> mike@pacsoft.com (Mike Stefanik) writes: > [ beginnings of backup horror story deleted ] >However when she started the application program, the data hadn't changed. I >dialed up the system again, and just on a fluke, issued a "df" -- it showed >their rather large root filesystem to be nearly full. Confused, I did a "find", >searching for files over 1MB. Of course, what I found was this huge file named >/dev/rct0. That sounds familiar. A couple years ago, when I was out at a client site doing some network support, they asked me to look at their main UNIX box, to see if some files could be deleted to make more space. ZAPPPP went the 13 meg cron log. ;-) ZAPPPP went the 17 meg pacct file. Then for the "find", just to make sure there weren't some other accounting files growing without bound in an unfamiliar place. "GOLLLL-LLLEEEEE," said I, quickly followed by "UHHHHH OHHHHHH," when I saw a 55 meg file in /dev, whose name sounded like a tape device. An ls confirmed that it was ALMOST, but not quite, the tape drive device name. ;-) I checked their backup script (written by the manufacturer's techs when the system was installed over a year prior to that date) and saw the typo. I fixed the typo, pointed out the problem to the SA, and made a few friends at that company. They had never noticed that their tape drive didn't run during backups, because backups ran at 2am when they were asleep. Likewise, they'd never realized that the dozens of tapes they'd purchased (and some, duly discarded after N "uses"!!! ;-) had never been written to, because they never needed to do a restore. I don't know what's more amazing: 1) That the techs didn't at least test the backup script before they left the site after the installation. 2) That this customer did not, in the space of more than a year, *ever* need to do a restore (at least, that is what the admin maintained to me with a straight face, not really comprehending that anyone would ever WANT to restore the previous day's contents of a file (???)). Of course, if I'm getting some yukkks in at others' expense, might as well throw some stones at my house, too. Another backup story. Six years ago or so, AT&T 3B2's had a command which would (allegedly ;-) write verified backups. By failing verify periodically, it even did a good job of convincing the user that it did a decent verify. Until Programmer Z decided she needed a file back. Admin H grabbed the evening backup clerk's logs, found the tape set, put the first tape in, and after a little while, SPLAT, I/O error. Skipping past that, very soon, SPLAT, another I/O error. The online backup record indicated that the backup had completed without errors. Puzzled was I, but I grabbed the weeklies. Same problem. A good copy of the file was eventually retrieved from SOME backup set, but most of the tapes were full of I/O errors, despite system claims of successful verifies. This pleased management no end. It was a consulting company, and the tapes being written by that machine contained the company's bread and butter. In many cases, the only copies of the sources to systems at client sites were found on that machine. Yawn. I don't recall how the utility did its verify, but it didn't verify nearly enough for most users/programmers *I* know to be comfortable with it. So, I added a manual verify pass to the backup (well, like probably everyone else who's ever seriously attacked the job of being an SA, I added LOTS of things when I rewrote it in disgust, but that's another story). Backups (along with network cabling) just seem to be a great place for trouble to hide, waiting for the unwary. In theory it's simple, and in practice, it's not *too* hard to do *mostly* right (online file catalog, dd at optimal streaming blocksize, tape file checksum verification, etc.) and not impossible to do *really* right (for one, toss QIC tapes altogether, use a backup method that accounts for special files, etc.) ....... but in the real world of non-UNIX-wizard SA's, "right" and "recommended by the vendor, so it MUST be the best way, right?" are often considered synonyms when they're really closer to being antonyms. -- Helen C. O'Boyle | Netnews admin, GNU support beastie, C hacker, helen@nomad.urich.edu | UNIX contractor and hanger-on in Richmond, VA -------------------------------------------------------------------------- From: yared@anteros.enst.fr (Nadim Yared) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! Well, My story happened on a Sun Sparcstation 2 I once wanted to update the libc.so.1.7 to libc.so.1.8 by myself, so I got root, and then ftp the /lib/libc.so.1.8 to my /lib. Unfortunately there was not enough room on this partition. So all i got was a file with zero length. The problem is that I ran /usr/etc/ldconfig in the directory /lib, and that was all. Every command could not be executed, cause ld.so checked for /libc.so.1.8, being the newest one. All i needed was a statically linked mv, but SUN does not provide usually the source. Even going single user didn't do anything. So i had to install a miniroot on the swap partition, and cp /bin/mv from the CD-ROM, and execute-it. It sounds like an american film : a happy ending saved my life. Nadim YARED. Ecole Nationale Superieure des Telecommunications de PARIS. -------------------------------------------------------------------------- From: valdis@vttcf.cc.vt.edu (Valdis Kletnieks) Newsgroups: comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! Well, here's a few contributions of mine, over 10 years of hacking Unixoid systems: 1) yesterday's panic: Applying a patch tape to an AIX 3.2 system to bring it to 3.2.3. Having had reasonable sucess at this before, I used an xterm window from my workstation. Well, at some point, a shared library got updated.. I'd seen this before on other machines - what happens is that 'more', 'su', and a few other things start failing mysteriously. Unfortunately, I then managed to nuke ANOTHER window on my workstation - and the SIGHUP semantics took out all windows I spawned from the command line of that window. So - we got a system that I can login to, but can't 'su' to root. And since I'm not root, I can't continue the update install, or clean things up. I was in no mood to pull the plug on the machine when I didn't know what state it was in - was kind of in no mood to reboot and find out it wasn't rebootable. I finally ended up using FTP to coerce all the files in /etc/security so that I could login as root and finish cleaning up.... Ended up having to reboot *anyhow* - just too much confusion with the updated shared library... 2) Another time, our AIX/370 cluster managed to trash the /etc/passwd file. All 4 machines in the cluster lost their copies within milliseconds. In the next few minutes, I discovered that (a) the nightly script that stashed an archive copy hadn't run the night before and (b) that our backups were pure zorkumblattum as well. (The joys of running very beta-test software). I finally got saved when I realized the cluster had *5* machines in it - a lone PS/2 had crashed the night before, and failed to reboot. So it had a propogated copy of /etc/passwd as of the previous night. Go to that PS/2, unplug it's Ethernet.. reboot it. Copy /etc/passwd to floppy, carry to a working (?) PS/2 in the cluster, tar it off, let it propogate to other cluster sites. Go back, hook up the crashed PS/2s ethernet.. All done. Only time in my career that having beta-test software crash a machine saved me from bugs in beta-test software. ;) 3) Once I was in the position of upgrading a Gould PN/9080. I was a good sysadmin, took a backup before I started, since the README said that they had changed the I-node format slightly. I do the upgrade, and it goes with unprecidented (for Gould) smoothness. mkfs all the user partitions, start restoring files. Blam. I/O error on the tape. All 12 tapes. Both Sets of backups. However, 'dd' could read the tape just fine. 36 straight hours later, I finally track it down to a bad chip on the tape controller board - the chip was involved in the buffer/convert from a 32-bit backplane to a 8-bit I/O cable. Every 4 bytes, the 5th bit would reverse sense. 20 mins later, I had a program written, and 'dd | my_twiddle | restore -f -' running. Moral: Always *verify* the backups - the tape drive didn't report a write error, because what it *received* and what went on the tape were the same.... I'm sure I have other sagas, but those are some of the more memorable ones I've had... Valdis Kletnieks Computer Systems Engineer Virginia Tech -------------------------------------------------------------------------- From: cowan@cerianthus.pinetree.org (Darin Cowan - root) Newsgroups: alt.folklore.computers,comp.unix.admin Subject: Re: WANTED: Unix administration horror stories ! fischer@iesd.auc.dk (Lars Peter Fischer) writes: > In a related move, a new computing center sysadmin here once decided > that there was no reason for people to be able to copy system > commands, so he did a "chmod a-r /bin/* /usr/bin/*". > I used to be the tech support for a large system of unix machines in the Military. Military system managers are often "volunteers", so you don't always get knowledgeable people there... One day I get a call from Master Corporal who says his system was locked up. Seems he just got back from a DOS course where he noted that when you type "del *.*" it asks "Are you sure?" Wondering if that was true with Unix as well, he logged in as root and typed "rm -r *" at the prompt. cowan@cerianthus.pinetree.org (Darin Cowan - root) -------------------------------------------------------------------------- From: chuck@edsi.plexus.COM (Chuck Tomasi) Subject: Re: WANTED: Unix administration horror stories ! I was assigned to part of a team to upgrade our board test stations back in January. There were about six or seven systems to uprade OS (HP-UX 7.0 -> 8.0) and upgrade the board test software. Before starting each system we made a backup of the data to cartridge tape. This was taking an incredibly long time for each system (located in different buildings) so we got the idea of doing a backup across the network to a 4mm DAT located on yet another system. When the backup had completed cranking away we reformatted the disks (to reconfigure swap and other things), loaded the new OS, loaded the new software, and then found out to our horror that all the data was not backed up. There were approximately 12 designs lost to varying degrees, but none were usable. Each design is valued at $30,000- 40,000 each. My stomach sank. We spent the rest of the night and all the next day trying to figure out where the data might be recovered from and how much data we may have lost. It was a 36 hour work day and we were supposed to go out to dinner with some friends that night (Friday). I dropped my flatware continuously, couldn't make simple descisions (eg: "What would you like to drink?") and for the most part was out of the conversation entirely unless someone mentioned backups or Unix. As it turns out we were able to recover (from a backup two months ago) all but one design which I believe eventually surfaced. We had egg on our face - especially since I knew that backups over NFS (as root) had problems. I should have seen it coming and didn't. It wasn't long thereafter that I got permission to order an 8mm Exabyte drive to increase our backup policy. The good thing about this is that I got to do a talk about backups at HP two months later (it started as a joke "Hey let's get Chuck to talk about backups, ha ha.") and received an HP 95LX for my efforts. I can't imagine life without the 95LX as a System Administrator. -------------------------------------------------------------------------- From: lennart@blade.stack.urc.tue.nl (Lennart Benschop) Newsgroups: alt.folklore.computers Subject: Unix administration horror stories. Just my Unix (well it's Linux) administration horror story: I installed Linux (a free Unix clone) from the SLS distribution. The SLS distribution is a distribution of Linux and all utilities on floppies. That distributions contained a few little mistakes. One of them was that there was a uuencode program but no uudecode program. Both uuencode and uudecode were hard links to the same program that happened to be uuencode. There may be versions of the program around that behave like either uuencode or uudecode depending on the name through which they are called, but this one is not. The makers of SLS are so kind as to provide fixes for their mistakes. So there was a tar file that contained the missing uudecode binary. A few days later I tried to uuencode a file. Oops. The new uudecode had replaced the old one that was a hard link to uuencode. So uuencode was gone. Both names were still links to one binary. So I erased uuencode and tried to get the original uuencode from the distribution disks. Guess what! Still no uuencode! It was still a link to the uudecode program. Tried it a second time. How was that possible? Finally I got it. The tar file on the distribution disk contained the uuencode program under the name 'uudecode' and an entry 'uuencode' as a hard link to 'uudecode'. By restoring uuencode I merely got a new hard link to uudecode! So I removed uuencode again, renamed uudecode to something completely different, restored 'uudecode' from the distribution disk, renamed that to 'uuencode' (it really was the uuencode program), and finally I renamed the uudecode program back to 'uudecode'. Then it worked as it should. Links can be a major source of confusion if you have them where you don't want and if you don't get rid of them in time. This is maybe not a real horror story but just funny, but I could imagine that similar situations cause real horror at times. -------------------------------------------------------------------------- Newsgroups: alt.folklore.computers Subject: New dumb user story? Hospital context From: sietsma@latcs1.lat.oz.au (Jocelyn Sietsma Penington) The girlfriend of a friend of mine at work is the computer support person at a Melbourne hospital. One day she got an irate phonecall from a matron: "Where are you? I pushed the help key an hour ago, and nobody's come to help yet." The PC wasn't even part of a network! I guess in a hospital...'call button calls a nurse, help key should call support personnel..' ? -------------------------------------------------------------------------- From: cba@hpuerca.atl.hp.com (Brian Abernathy) Subject: Re: Ooops!! and Mainframes Here are some more "fun" things that I have seen happen: There was a LARGE red glass-covered button at the main computer room door. It was, of course, the EPO (Emergency Power Out) button for the room. Room was being painted and the painter had some experience with these EPO beasties. (Or at least, so he thought). He carefully unscrewed the cover, so that it "wouldn't get hit". The cover came off, the spring loaded button popped out, and every internal breaker in every machine popped. It sounded like the entire machine room crashed through the floor. Heard this from a CE who experienced it: He had to maintain a site that had a large number of disk drive "ovens". These were the older removable-pack units. They had two switches that had to be thrown before the disk pack was replaced. One was the spin-up/spin-down switch (supplied power to the drive motor), and the other was the Ready/Not Ready switch. Each unit had a handle which was used by the operator. When the drive had spun down they could pull the drawer open, lift the cover and remove the pack. This particular site had experienced a large number of drive failures, including Head to Disk Interference(HDI), or head crash. CE went onsite, but couldn't find anything wrong with diagnostics. He noticed that there were hooks on all the drives - the kind used to hang potholders. On each hook there was an oven mitt. He just happened to notice that one operator came over to change out the disk pack. He flipped both switched to "OFF", put the mitt on, and opened the drive - while it was still spinning. He pressed the mittened hand down on the disk, forcing it to stop! There was the problem! Needless to say - the customer was very definitely charged for any disk drive problems that had! Before anyone says that there aren't any disk drives that will open up like that: Maybe not now, but I have actually used some of the type described. So, they DID exist at one time! -------------------------------------------------------------------------- From: rich@zeugma.csusb.edu (Rich McGee) Subject: Best Ways to Crash Heads? What's the best head crash you've ever caused? My story: I was a student working on a PDP 11/45, running several types of removable disk media. Since these were all at least 8 platter drives, we paid several hundred dollars per year for a company to come in and physically "wash" the disk surfaces. They would remove the disk from it's cover, clean it, and then inspect it on some type of lathe machine. Net result: If that disk crashed within a year, the company would repair (and pay for) any damage, since the disks were "certified". Usually, this certification process took about 2 hours per pack, and was extremely demanding. Until we got a new tech one day. Instead of taking the usual 2 hours per pack, he did 6 packs - all in under one hour! Since we had paid for several hours of company time, he was left idle. With nothing else to do, we handed him our stock of "gift packs", platters that had been given to us years earlier, but were of unknown quality, so were never spun up on the drives. He quickly inspected these, and stamped a certification decal on the case. That night, we were ordered by our instructor to vacate the labs by 5 pm. Yeah, right. Around 2 in the morning, having finally beaten the latest VT100 StarWars game, we were bored, and decided to have a look at the now-certified "gift packs". I placed one in the second drive unit, and spun it it. Hmmm. Never seen this "file unsafe" light on before. Must be a bad drive...So we removed that pack, and placed it in the primary drive unit, and spun him up. I've never heard a squeal that loud in my life. Picture a sackfull of kittens on their way to the river, and you come close. More squeals, another file unsafe light. Shut everything down. Put it all back the way it was. Reboot. Yup. Squeals, squeaks and the smell of a burning pack quickly filled the room. Imagine having to call your teacher at 3 am and say "Hi! We just wiped out both disk drives..." So *that's* what crow tastes like! The next day, I had the pleasure of calling the certification company. I explained our problem. What follows is the actual conversation, as well as I can remember it: Me: Hello, SCOPUS? Yeah, this is Rich. I've got two drives down, and a grand total of 8 packs gone. I think it was your fault. Them: We only have a one year warranty on disks we certify. Me: I know that. Them: That pack's been out of warranty. Read me the date on the certification decal. Me: Tough to read. What day was "yesterday"? Them: Who was the tech? Me: Steve So-and-So Them: He's fired! Net result: 8 packs, 12 heads, 4 days downtime. The packs were actually OK, the tech was right. What he forgot to inspect was the foam gasket that lined the inside of the cover. Being over ten years old, the gasket had decomposed, sort of like the dashboard of a '57 Chevy that's been parked in the sun too long. When we pulled off the disk cover, the vacuum sucked the decomposed foam into the heads. Those were the days! -------------------------------------------------------------------------- From: jw_lamp@bruny.cc.utas.edu.au (John Lamp) Subject: Re: Best Ways to Crash Heads? Best one I ever did was on a CDC 300Mb drive - the old removable multi platter packs - aka the spin dryer. These ones had been badge engineered to the name of the company whose machine it was (Hint press PF10 for disks, and they show up as 288Mb). Come lunch time, I loaded a brand new pack, started initialise and went to lunch. When I came back, the fault light was on, so I pressed the fault button in accordance with the manual. Yes, I remember the noise, as you said it's unforgettable. I remember even better the grooves up to 2mm deep into the platter and the fact that it took two techs two days of working their proverbials off to get it back up. Then they had the hide to send me a bill for the job. I had established from the I/O error log that the crash happened 45 mins into initialise (high spot on pack?) therefore not operator error, it was a new pack, supplied by them so they ate it. -------------------------------------------------------------------------- From: tecdah1@sdc (Dave Hough,55095) Subject: Re: Best Ways to Crash Heads? Newsgroups: alt.folklore.computers 1. A sort of hardware virus story. The fellow in charge of the systems group arrived on site at 1 a.m. for a new system test. He unloaded the production disks and mounted the two system packs, but the system wouldn't boot. [Head crash on one pack]. So he swapped packs. [Bad drive crashed good pack, bad pack crashed good drive]. Confused, he unmounted an additional production pack and moved the system pack to it. [Bad pack crashed another good drive] Finally convinced that the new system software was failing to boot, he restored the production packs (intending to go home). [Bad drives crash three production packs]. Final total was three drives, two system test packs and three production packs, one VERY embarrassed supervisor. 2. The chief field engineer on site was working on a disk many years ago: a CDC 808 with 3 foot diameter platters and hydraulic positioners. [Aside: the disks were 7'x9'x4' hwd; stored about 12 megawords; platters were so big that the sectors/track changed from 50 to 62 about halfway across; imagine the driver!] Anyway, the CFE grabbed the wrong lever a drive and pulled the main hydraulic fluid dump lever. Gallons of it poured out under the raised floor and into the cable nest. Every FE in the area was under the floor with a rag trying to mop hydraulic fluid off the floor, wipe down cables, etc. Always had cable problems in that area afterwards. -------------------------------------------------------------------------- From: mikeu@rocky.sbi.COM (Michael Urbanski) Newsgroups: comp.sys.sun.admin Subject: Re: Bonehead things to do (Re: / fills up and I cannot find out why...) Here's the stupidest question I ever got. A user called me and said his keyboard was not working. Like a good SA, I grabbed a spare keyboard, and went to the user. The user demonstrated the problem ("See? No matter which key I hit, no response!"). I asked the user to try the same thing on his Sun keyboard, not his Macintosh keyboard and everything worked fine. Sheesh. I wish I could say this has only happened once. -------------------------------------------------------------------------- From: mike@msc.cornell.edu (Mike Heisler) Subject: Bonehead errors (was: / fills...) Newsgroups: comp.sys.sun.admin One day I got tired of cleaning out the pcnfs spool directory by hand and cut-pasted the following command into ROOT's crontab file to run each night: find . -mtime +3 -exec rm {} \; What a mess. It took us two days to realize that it wasn't a disk failure. -------------------------------------------------------------------------- From: matt@centerline.com (Matt Landau) Newsgroups: comp.sys.sun.admin Subject: Re: Most boneheaded thing done on my system Well, not too long ago, in the panic and pressure that always preceeds a major release, I typed: % su # reboot to reboot my workstation, then realized (an instant too late!) that I was typing at an rlogin window... talking to the main fileserver for all of engineering... which takes in excess of 20 minutes to fsck all of its disks. *Sigh* -------------------------------------------------------------------------- From: kevinmac@ll.mit.edu (Kevin McElearney) Subject: Re: Most boneheaded thing done on my system Newsgroups: comp.sys.sun.admin In the "old days" I had a user with a PC connected via switch box to a ALM tty port. The switch box gave him the ability to use the same COM port to Kermit into to Sun or print to a set of serial printers. One day he forgot to switch it from the Sun before he printed (anyone cringing?). Well needless to say the Sun tty port was not a happy camper. It crashed the machine. Most boneheaded thing I have done..... Rebooted a machine from home during OS upgrades and forgot to run installboot... twice 8-} -------------------------------------------------------------------------- From: adh@utwou6 (Andre Hoekstra) Subject: Most boneheaded thing done on my system Newsgroups: comp.sys.sun.admin One user once managed to shut the computer off (just flipping the switches). (This was the print server and one of the NIS slave servers) He thought it was like all other PeeCees: put a floppy in and it will boot. Needless to say that we are not in the C.S. department and that the people here are rather new to workstations (or should I say workstations are rather new to these people?) We have access control to the room where this machine was, so I could easily track down who he was. When I asked him if he had powered the computer down he said: "Damn! I had so many trouble finding all the switches!" (I am seriously thinking to disable all power switches on all boxes...) -------------------------------------------------------------------------- From: gross@ssd.kodak.com (Dan Gross) Subject: Re: Bonehead things to do (Re: / fills up and I cannot find out why...) Newsgroups: comp.sys.sun.admin Or how about bonehead over-analyzing of a problem? A user came into my office saying her computer was hung trying to log her in. She said she'd even tried to log out, and it wouldn't let her. I went out there, and sure enough the terminal was unresponsive. (I think you can see what's coming...). Well, I went back to my machine, had no trouble rlogging over. I blew away her shell, went back to the machine...nothing. I did a kill -HUP 1, and still nothing. After about 5 minutes of puzzling over what happened, the light came on in my head (you know, the one that flashes "You Idiot!" over and over again), went over and hit Ctrl-Q. The machine worked fine after that. -------------------------------------------------------------------------- From: etxorst@eos.ericsson.se (Torsten Lif) Subject: More boneheads Newsgroups: comp.sys.sun.admin My personal worst (by far) was when we were going to install a new disk on one of our servers. We'd informed the users that the system would be down around lunchtime, halted the machine nice and synced. Then we were ready to turn off the power, so I turn around and hit the power switch - on ANOTHER server! Needless to say it was the one with 8 GB disk and still running 4.1.1 (NO quickcheck)... Several partitions fsck'd semi-bad so that after the first boot it started over with a second round of fsck. It all repaired automatically but that time staring at the near-infinite fsck's while the phone was ringing like crazy was kind of frustrating. :-/ Another fun incident (with no egg on my face this time) happened back in the days of 3/280 servers where the power switch moved vertically (with "off" being down). Suddenly all the clients of one server froze. We ran to the computer room and found that the company electricians had let themselves in (with their pass key) and were changing the fluorescent tubes in the overhead lights. Except they were currently sweeping glass debris off the floor since they'd dropped one tube. But what they hadn't noticed was that on its way down, that tube had hit the power switch on the server and turned it off too... Yet another "boner": We were doing maintenance work on a weekend and our manager was there to "help". Suddenly we notice that one server is no longer responding to NFS requests. We look at the console terminal for that server and see our manager hacking away on the keyboard with no results. A quick scan showed that he'd got two keyboards crossed and was entering commands to a CISCO router while looking at the screen of a Sun server console. Unfortunately, when he got no results from his hackings (failing to notice the error messages from the CISCO on the adjacent screen) he'd assumed the terminal was hung and switched off the power to it. This made the server think the wire was disconnected (or maybe it looked like a "break", I don't know) and halted the server into moitor mode... Fortunately, this time it was just a matter of uncrossing the keyboards and typing "c" to continue. -------------------------------------------------------------------------- From: ritter@CS.MsState.EDU (Tom Ritter) Newsgroups: comp.sys.sun.admin Subject: Re: Most boneheaded thing done on my syste Well, THE most bonehead thing ever done on an old system I used to run was done by me. (As a student years ago...) I was given a UNIX system to run with zero experience. But, it had a menu based system adm setup, so it should be easy right?? :-) Well, when I went to add some new users and found /usr2 was full, I added them in the partition with the most free space. ( Which happened to be "/tmp" ) We were all surprised when the system went down a few days later, that all the new folks home directories had gone away.... :-(((( I had some real incentive to learn UNIX administration... -------------------------------------------------------------------------- From: bruce@st-pauli.med.utah.edu (Bruce Milner) Subject: Re: Most boneheaded thing done on my syste Newsgroups: comp.sys.sun.admin One of the people I work with accidently selected swap as the raw partition to run sybase on. Needless to say, this caused unexpected things to happen on our file server here. -------------------------------------------------------------------------- From: mikulska@sirius.Princeton.EDU (Margaret Mikulska) Newsgroups: comp.sys.sun.admin Subject: Re: Most boneheaded thing done on my syste On one of my previous jobs, I inherited a UNIX site where I had to deal with some amateur sysadmins roaming around. One of them was more inclined to experimenting than to reading the manual. One day he called me with "our Sparc doesn't boot". Indeed it didn't. (BTW, the system was configured to be diskFULL.) First he claimed he had no idea what happened. Finally, he confessed that he tried to increase the swap partition. How? He invoked 'format' and saw that the largest partition was the 'c' partition (on the only disk of the sparc), so he put swap on sd0c and rebooted the system ... On another occasion, he was trying to save disk space by putting some filesystem on a file server and NFS-mounting it. So far so good. Unfortunately, he didn't always had a good idea what can be NFS-mounted and what should better be local. Trying to squeeze a few KB from the disks, he set it up to NFS-mount "/sbin". That's /sbin under SunOS 4.x ... Needless to say, neither he nor his colleagues had root access after these incidents. -------------------------------------------------------------------------- Newsgroups: comp.sys.sun.admin From: kevin@csbvax.varian.com (Kevin Myers) Subject: Re: Most boneheaded thing done on my system I have been guilty of most of these (kill 1, reboot wrong machine, etc). I have one more to add: about 5 years ago Senario: VAX 11/780 w/many ra81 disks and a VERY busy machine. I was adding a new ra81 to a vax 11/750 and read that you pull the plastic cap off the drive select light and break off tabs to set the drive select ID. I wanted to see what they looked like so I pulled one off the 11/780 which brought the drive down and crashed the 11/780. Watch out for seemingly harmless plastic caps. -------------------------------------------------------------------------- From: morrow@cns.ucalgary.ca (Bill Morrow) Newsgroups: comp.sys.sun.admin Subject: Re: Most boneheaded thing done on my system My first week here, I ran strip(1) on all the lib directories to save some disk space. -------------------------------------------------------------------------- From: olsenc@stein2.u.washington.edu (Clint Olsen) Newsgroups: comp.sys.sun.admin Subject: Re: Most boneheaded thing done on my system I was cleaning out a matlab directory one time and there's an etc directory, but rather than say rm -r ./etc, I said: rm -r /etc Ouch! Another time I was running a script which takes a users's mail and moves it to their home directory. When the script ran, it moved /usr/mail/username to ~username. One time, the script happened to have a null user, and I managed to move /usr/mail somewhere. It was a little fun finding it :) -------------------------------------------------------------------------- Newsgroups: comp.sys.sun.admin From: network@research.canon.oz.au (Andrew Raphael) Subject: Re: Most boneheaded thing done on my system I did a good bone-headed thing last night to CISRA's SPARCserver 670MP. I was installing 2 cabinets full of disks to be attached to 4 Fast SCSI/Buffered Ethernet Sbus boards, so lots of new disks were going to appear in /etc/fstab and /etc/exports. I control important files with RCS, so I checked them out with co -l, edited them, and checked them back in with ci -r3 to mark a major system change. Kangaroos in the top paddock or what? I'd left off the -u flag to ci, & I didn't check them back out with co, so when I rebooted the system, there was no /etc/fstab. fsck had kittens, / was mounted read-only, no /usr so most commands are gone, etc. I'm glad it was the middle of the night. The fix wasn't too bad. After mucking around trying to mount /usr & /usr/local where the RCS programs live, I took the easy way out and booted off the CD-ROM. Mount /a, /a/usr, /a/usr/local, check out /a/etc/fstab, and re-boot. [ I haven't told anyone at work this story, so keep it to yourself, OK? :-)] -------------------------------------------------------------------------- From: reedlb@hoser.dnet.dupont.com (Boyd Reed) Newsgroups: comp.sys.sun.admin Subject: Re: Most boneheaded thing done on my system Working on one of our test systems (an RS/6000) trying to clean up the disks to get it ready for some more testing, I found an empty directory /mnt and deleted it. /mnt is where the RS/6000 mounts root. Next reboot.....nothin'! It took a little bit of time to figure out what was going on. -------------------------------------------------------------------------- From: jpd@ucs.usl.edu (Dugal James P.) Newsgroups: comp.sys.sun.admin Subject: Re: Most boneheaded thing... We add kernel patches in such a manner that they are listable by running the RCS program, ident. One day I wanted to see what patches I had applied, and typed "indent /vmunix". Sure enough, after the kernel was indented, I discovered my error. Fortunately, indent had created vmunix.BAK... -------------------------------------------------------------------------- From: bolton@rx.xerox.com (Andy Bolton) Newsgroups: comp.sys.sun.admin Subject: Re: Reboot wrong machine (was Re: Most bonehea A sysadm sitting not 10 feet from me remotely rebooted the wrong machine, as he couldn't remember the name of it. So he tried another.... And another.... Needless to say the users weren't to happy. -------------------------------------------------------------------------- From: rtilson@sundry.Corp.Sun.COM (Rick Tilson) Newsgroups: comp.sys.sun.admin Subject: Re: Most boneheaded thing done on my syste Way back when when I started working here at Sun, I became the proud keeper of a Sun3/50 running SunOS 3.0. Of course I only had a 41Mb hard drive and in my zeal to make filespace I removed [ from /usr/bin (what's this thing doing here?). It's amazing much test is needed to run /etc/rc.*. Then there was the time I didn't check my disk partioning well enough and one day found to my surprise that fsck failed when checking ny home filesystem with BAD MAGIC NUMBER. Seems sd0b extended just slightly into sd0h. -------------------------------------------------------------------------- From: simpsong@sun1.bham.ac.uk (Glyn Simpson) Newsgroups: comp.sys.sun.admin Subject: Re: Most boneheaded thing done on my system I was copying a partition to a spare disk one time and wrote a little program to do it for me. The backup disk was mounted (somethning like /root1) but had some stuff in it, so part of the program cleared this out first. $DESTDIR=/root1 rm -rf $DESTDIR/* Unfortunately I typed DESTDIR wrongly, so it was a null variable. The worst thing was that I decided ro logout after I was to run the program, so scheduled the job to go through 'at'. I couldnt interrupt it as by the time I worked out what was happening, the kill program had been deleted. A quick run down to the machine (which took about 30 seconds) and the machine was off. Too late really - I had to rebuild the system. The worst thing was that we had backups of the machine, but I had forgotten to include one partition on the tape. This partition had all my backup and administrtion programs. I had spent weeks on them, and had to retype them all in again! -------------------------------------------------------------------------- From: bip@sekas.de (Paul Bininda) Newsgroups: comp.sys.sun.admin Subject: Re: Most boneheaded thing done on my system Once upon a time we didn't have enough disk space left on our HP. I therefore wanted to remove the old oracle stuff we still had lying around. I logged in as user oracle on the system console in order to snoop around a little (maybe there are some important files there?). Then I did "su" and started SAM (the system administration shell). And told it to remove the user oracle. This started a "find" command which removed all files on the disk that were owned by user oracle. What I only realised after logging out and not being able to log in again was: at that time /dev/console was owned by user oracle! I logged in from somewhere else and (instead of just creating /dev/console again) rebooted the machine. At least that's what I wanted to do. No console, no boot! -------------------------------------------------------------------------- From: wargopl@sun.soe.clarkson.edu (Peter L. Wargo) Newsgroups: comp.sys.sun.admin Subject: Re: Most boneheaded thing done on my system One recipe I'll always remember: Take one valuable TAR archive on 1/4" tape, one bleary-eyed system admin, and a screaming deadline. Forget to write-protect the tape, place it in the drive, and mistakenly type: tar cvf /dev/rst0 (I meant to type: tar tvf ...) The scream was reported to be heard all the way across NY state... -------------------------------------------------------------------------- From: dtb@squonk.Eng.Sun.COM (David Berry) Newsgroups: comp.sys.sun.admin Subject: Re: Most boneheaded thing done on my system My two worst things, On 4.x editing /etc/fstab so that the / partition could not be mounted. Most recently on 5.x moving /usr/lib/ld.so and /usr/lib/ld.so.1 to a different name. Even mv on 5.x is dynamically linked, so there is very little you can do to get things working again. A statically linked copy of mv from a 4.x machine just dumped core. In both cases I booted a mini-root, mounted the offending disk and repaired the damage. -------------------------------------------------------------------------- Newsgroups: comp.sys.sun.admin From: laszlo@eclipse.cs.colorado.edu (Laszlo Nemeth) Subject: Re: Most boneheaded thing done on my system one of my favorites is: towards the end of a major re-arrangement. you know the kind where 3/260 cpu's swaped out for 4/260 cpu's, disks and controllers swapped arround, and extra ethernet cards installed........front cases left open with power supplies out so we could re-jumper back planes.....we get 2 servers (out of 5) up and all sysadmins login......to check sendmail (actually we just wanted email after 2 days without it)......the server we thought was most stable crashes HARD.....so we all go running up to the machine room to see what we had screwed up and find that a student sys-admin had shorted the power supply with the machine room keys (they live on a VERY large key ring so that they don't go home).....she still hasn't lived it down... -------------------------------------------------------------------------- From: pejn@wampyr.cc.uow.edu.au (Paul Nulsen) Newsgroups: comp.sys.sun.admin Subject: Re: Most boneheaded thing done on my system One of our users with extensive PC experience had a tendency to vent his frustrations on the keyboard. Not long ago his fortran program siezed up for some reason, so he tried to wake it up by punching numerous random keys. Unfortunately, his random sequence included L1-A (what are the odds of that?). Being a PC user, the offer of a reboot was too good to refuse - and, of course, he knew that would unstick his program. This was on a machine acting as our file server. Very soon (about +30 seconds) after this I received a deputation from other users requesting that I try to prevent him from randomly rebooting the computer. It took a little while to work out what had happened, but he has given up randomly thrashing the keyboard now. -------------------------------------------------------------------------- From: markb@advtech.slc.paramax.com (Mark Baranowski) Newsgroups: comp.sys.sun.admin Subject: Re: Most boneheaded thing done on my system Consider the case when you are using the same tape over and over to backup your working directory each night before quitting. On night 1 you type "tar cf /dev/rst0". On night 2 after making extensive changes you mount the same tape and accidentally type "tar xf /dev/rst0". When you come to work on day 3 you wonder why all the files you worked on so hard on day 2 suddenly reverted back to the copies you had on day 1. -------------------------------------------------------------------------- From: wrd@oasys.dt.navy.mil (Wm. Race Dowling) Newsgroups: comp.sys.sun.admin Subject: Most boneheaded things in one day I was installing nroff to a network of 30 nodes using rdist. Here was the Distfile (from memory), which I had used before: HOSTS = ( a b c ... ) FILES = ( nroff ) ${FILES} -> ${HOSTS} install /bin ; About 15 seconds after running rdist I get a call from a user, who cannot login - error (you probably guessed): /bin/login: not found. Cold sweat time. Sun had replaced the directory /bin with a link to /usr/bin. Instead of following the link, the rdist program had overwritten the link with a perfectly good copy of nroff that was now named /bin! The prospect of being up all night having to restore 30 nodes from backup did not exactly make my day. In the middle of the second restore I realized that I could just boot single user and relink /bin. With the help of two friendly power users (who were more than happy to accept root privilege) we were done by 7:30. I took them both to dinner. I drove back later and changed all the root passwords. Rdist bomb, 2 unnecessary restores, root access to two hackers. I guess that's three boneheaded things in one 4-hour period. What's the record? -------------------------------------------------------------------------- From: cmk@hardrock. (chris kushnir) Newsgroups: comp.sys.sun.admin Subject: Re: Most bonehead thing done on ny sys The first was when i was using OW's filemanager as root. I wanted to look in /dev so i double clicked on the file icon but instead the file was selected for drag and drop and when i released the mouse button the /dev/ directory went *pooof*. Not much worked after that. I had to steal a friends external harddisk w/ it's OS and specify it on the boot line. Then came mounting and finxing. I have been vary leary of using the OW filemgr as root since then. The second thing was a couple of weeks ago. The other sys-admin (the same guy who's hard disk i borrowed) tried to install some software on my machine but it failed part way through. He didn't clean up the partially installed dir structure. I did rm -r * one directory to high. After a minute i got worried and did a ps only to see rm where file was NOT one of the ones i wanted to erase. Quick kill, damage done. -------------------------------------------------------------------------- From: dhollist@iphase.com (David Hollister) Newsgroups: comp.sys.sun.admin Subject: Re: Most boneheaded thing done on my system Well, at my previous job, we had to congratulate our realtime closed loop developer when, after adding a new user to his machine, did a chmod -R kessinge . which would have been fine, except that he was in the /usr directory when he did it. Oh, what a wonderful experience. Needless to say, we reloaded the operating system after that as it had become quite unwilling to function properly. UNIX, gotta have it. -------------------------------------------------------------------------- From: andyj@phoebe.loni.ucla.edu Newsgroups: comp.sys.sun.admin Subject: Most boneheaded thing done on my system This was a real nightmare.... I made a new shared library of libc.so.1.8, got everything I needed right, tested it with LD_LIBRARY_PATH set to the new file, ran ldconfig and was just about to leave it when I decided to make a floppy backup. This I did, and as a habit, I always check the tar file after its done to make sure the archive is ok. So there I am with a floppy with my new libc.so.1.8.1, I'm root, and still in /usr/lib. I type tar xvf. The new library is deleted and the console freezes. All telnets die off, everything grinds to a halt. ldconfig is set to the new file, so the whole thing is history. Booting dies as soon as it tries to use libc.so, so I came very close to going the route of having to rebuild the system from mini-root on up. The system would not reboot, but if I ^C enough times, I can halt the boot procedure in the process of disk checking. I can get a single user shell, but every command you could imagine gives a bus error without that libc.so. I try something simple like mounting disks. No dice. The file system is mounted readonly (thats the point in the boot procedure I was able to break out) I run to another machine with my freshly made archive, tar x it to a _safe_ location, and try mounting that machine from the dead one, again no luck, I can't mount anything as mtab is on a read only partition. Never thought I'd ever ever use the -n option, but I am truely thankfull for it now. I am able to mount -n the other machine's disk. I can't even do ls, but sh will let me do export LD_LIBRARY_PATH;LD_LIBRARY_PATH=/remote/pathtolib (Of course I first cd'ed to the directory and set LD_LIBRARY_PATH=. , that didn't work for long). From there I was able to rebuild the damage, and replace the libc.so. From then on no backups were checked imediately, and only then in a safe place.. -Andy Jacobson -------------------------------------------------------------------------- From: pete@sst.icl.co.uk (Pete Bevin) Newsgroups: rec.humor.funny Subject: Why doesn't my password work? Our support department had a phone call recently from a user who had received a new keyboard for his workstation, and found he couldn't log in. So we got him to go in as root and change his password, and that fixed the problem. That afternoon, the user phoned back with the same problem. While we were changing his password again, he added: ``By the way, my secretary came along this afternoon and noticed the C and V keys were the wrong way around, so she switched them around. Does that make a difference?'' No, I'm not telling you who supplied the workstation. -------------------------------------------------------------------------- From: don@zl2tnm.gen.nz (Don Stokes) Newsgroups: alt.folklore.computers Subject: Re: YKYBHTLW abr8030@TAMUTS.TAMU.EDU (Adam Boyd Roach) writes: > Yeah, I've done this myself before. Felt a bit sheepish, since it was > at a lab. Our computer at home has an unusally large gap between its > drives, and on many occasions, if I don't look at what I'm doing, > a 5.25 will end up inside. I've had to take the case off several times > for that particular error. I saw the aftermath of where an experienced computerist did that with a 3.5" floppy, in such a way that the disk went into the top of the drive and tore the drive's upper head off. I couldn't duplicate the action required to do the damage though.... My wiring tech came back to me the other day reporting that a user had succeeded in getting a 9pin D-shell monitor plug into the socket _and_ _screwed_in_ upside down! I'd have said that it wasn't possible if I hadn't personally discovered a DB25 serial plug firmly inserted upside down into its socket on a printer on an earlier occasion. It seems that anything's possible if the user doesn't know otherwise. Oh, if only I had a dollar for every time someone succeeded in putting together something that "can only go together one way" the *other* way. (In the DB9 case above, I think I made the mistake of saying precisely that to the user in question... 8-( ) -- Don Stokes, CSC Network Manager, Victoria University of Wellington, New Zealand Ph+64 4 495-5052 Fax+64 4 471-5386 Work:don@vuw.ac.nz Home:don@zl2tnm.gen.nz -------------------------------------------------------------------------- From: klaus@diku.dk (Klaus Ole Kristiansen) Newsgroups: alt.folklore.computers Subject: Re: YKYBHTLW andy@madhouse.demon.co.uk (Andrew Bray) writes: >In article <25ed4s$nhv@panix.com> wookie@panix.com (Steve Houle) writes: >>Yeah, nothing is impossible to those that don't know anything about it. >>I've had secretaries fold 5.25 inch disks to get them into 3.5 inch disk >>drives. Or cut them up to fit better. You usually get a hint of disaster >>when they ask if they should cut the part off that is hanging out the >>front of the slot. >This reminds me of two stories I heard about "secretaries" and floppy >disks. Perhaps someone could comment if they have any basis in truth. >1) The secretary who diligently backed up here floppies - using the > photocopier. Last time this came up here, someone who had worked on a support hot line told that he had once told someone to send a copy of the instalation floppy, and received a photocopy. This was enough, the problem was that the customer had got a wrong version of the product. Klaus O K -------------------------------------------------------------------------- From: mike@maths.tcd.ie (Mike Rogers) Newsgroups: alt.folklore.computers Subject: Re: Incompetant user stories... swkgohcp@leonis.nus.sg (The Blue Beetle) writes: > Can you beat that?? Well, I for one am glad that 3.5" floppies arrived, 'cause I gave a class once in DOS for Med students with 5.25", and one woman *punched* her disk to make it fit in her folder binder rings. No, really! -- Mike Rogers, #3, 44 Westland##EveryoneHasTheRightToFreedomOfOpinionAndExpressio Row, Dublin 2, Ireland FNORD##nThisRightIncludesFreedomToHoldOpinionsWithoutInt ##############################erferenceAndToSeekReceiveAndImpartInformationAndI deasThroughAnyMediaAndRegardlessOfFrontiers..#10 UN Declaration of Human Rights -------------------------------------------------------------------------- From: cm620@cleveland.Freenet.Edu (Karl W. Reinsch) Newsgroups: alt.folklore.computers Subject: Stupid User Tricks I work in the computer hardware repairs on my college campus. I was given a work order for an IBM compatible in our library that had a "broken" 5 1/4 floppy drive. I arrived to find that the handle to close the drive had been broken off and discs could not be inserted into the drive. I took the computer back to the shop. Upon removing the cover, I could see something shiny and silver through the center of the floppy drive. Someone had stuck a CD-ROM into it. They stuck it in far enough that the little brace for holding floppys flat had come all the way down and locked the cd in the drive. They had also tried to close the drive door so much that the drive grips had carved circles in the cd. Another favorite of mine is a secretary on campus. She was having trouble connecting with her pc to the Vax. A worker called her and asked "What software are you running?" Her reply, "We don't run any software, we only run hardware." -karl. -------------------------------------------------------------------------- From: stremler@sdsu.edu (Stewart Stremler) Newsgroups: alt.folklore.computers Subject: Re: silly user tricks (was: Re: YKYBHTLW) [RING] Me: I.D.S. Computer Room. How may I help you? Them: Um, yes. My computer is hung? Me: And you are? Them: Oh, I'm Sally, at 28th street med center. Me: Um, lessee... [go the the console and check...] ...No, the computer is fine and running perfectly. Them: No, I tell you it is crashed. No matter what we type, nothing happens. I tried rebooting, but nothing still happens. Me: Rebooting? [Glance at the Prime 9955 mod II] How, exactly, did you do that? Them: Why, cntrl-alt-del, of course. *EVERYONE* knows that! Me: Oh, Really. Please, don't do that. You have a terminal, not a PC. You will have to turn the terminal off and then back on now. Them: Now? Me: Yes. Them: Ummmmm.... [click...click] ....Okay. Me: Can you see anything? [Console> STAT US ] Hit return a couple of times. Them: No. Nope, nothing. Me: [LO -96] Now? Them: Oh! It says..."Logout at 21:00...." Me: Okay. Now log back in again. And please refrain from trying to reboot your terminal. It won't work, and will just lock it up. Them: *I* know how to reboot a computer. I took a class at JC! I *know* these things! [CLICK] The thing is...the terminals they used don't have an ALT key. I wonder what she was using..... Stu ---------------------------------------------------------------------------- "OK, so, ten out of ten for style, but minus | Stewart Stremler several million for good thinking, right?" | stremler@ucssun1.sdsu.edu --Zaphod Beeblebrox | FidoNet: 1:202/1111 ---------------------------------------------------------------------------- -------------------------------------------------------------------------- From: edward.rice@his.com (Edward Rice) Newsgroups: alt.folklore.computers Subject: Mysterious computer hardwrae fixes. A friend of mine at Honeywell, the company's senior software fixer-upper for their large-scale 6000 systems, got a trouble call from the Army War College. He got home a week or so later. Furious and amused. The problem appeared to be software failing under heavy load, but in fact turned out to be a sporadically failing piece of hardware whose effects were nullified under most conditions by the condition of a particular register. Unless the top three bits of the register were at exactly the same time that the circuit blipped, nothing would happen. All conditions met, the system crashed hard, and never the same way twice. -------------------------------------------------------------------------- From: andyr@wizzy.com (Andy Rabagliati) Newsgroups: alt.folklore.computers Subject: Re: Mysterious computer hardware fixes. I remember moving an M700 computer (Ferranti military box, 19" rack) across the room (1220 Lab, BAC Bristol), and it failing to power up. Symptom :- blown fuse. Time to debug. Hmmm - put in another one ... fuse blows. Put in a bigger one - it blows too - small puff of smoke. Station five people round the box as spotters - put in big fuse again. Puff of smoke - somewhere at the bottom of the rack. Again .. Aha ! Near power connector. Someone had put the power receptacle too close to the steel wheel castor - which was resting against the live pin of the receptacle. Replace burned receptacle, turn wheel - back on track again. Cheers, Andy. -------------------------------------------------------------------------- Newsgroups: alt.folklore.computers From: wollman@trantor.emba.uvm.edu (Garrett Wollman) Subject: Re: Big short circuits? Date: Thu, 16 Sep 1993 20:23:23 GMT In article , Data Rentals and Sales wrote: Kevin, why can't you put your own name in? >[somebody else writes:] >> Any tales of big short circuits out there? I remember reading about one in Telecom Digest a while back, but I was unable to find it in the archives, so here it is as I remember it. The setting is a telephone central office, in the battery room. In this room, maintenance people walked along above the power system on a metal cat-walk. One day, this telco had a plumber in to fix something up near the ceiling of this room; he did this successfully, but on the way down he dropped his thick plumber's crescent wrench. Oops. The wrench fell on the 48-V battery bus, shorting it with an adjacent ground bus. BLAM! The wrench evaporated, and the arcing almost cut one of the busses in half. The phone system continued to operate. -GAWollman -- Garrett A. Wollman | Shashish is simple, it's discreet, it's brief. ... wollman@emba.uvm.edu | Shashish is the bonding of hearts in spite of distance. uvm-gen!wollman | It is a bond more powerful than absence. We like people UVM disagrees. | who like Shashish. - Claude McKenzie + Florent Vollant -------------------------------------------------------------------------- Newsgroups: alt.folklore.computers From: msawyer@soest.hawaii.edu (Michael Sawyer) Subject: Re: Big short circuits? Date: Fri, 17 Sep 1993 03:04:05 GMT DAVE G. (an13353@anon.penet.fi) wrote: : : Any tales of big short circuits out there? No, I don't mean some wimpy Well, I have a few (not really computer related, but so few posted so far are). This one I saw... I work in the oceanography dept at UH, and a few years back, we were headed out to sea for some on-site research. We were on a 230' ship with a fairly massive power circuit. One of the electric winches used for lowering instruments into the water had been painted between two of our trips to cut down on the rust, and reassembled just before the ship was supposed to leave. However, sure enough, when it was back on deck, the motor didn't work properly. This is one of those big motors which has both a main power breaker and "start" and "stop" buttons to control the actuators which actually power the motor. We would push "start," the motor would start up, and as soon as the starting condensors disengaged, the motor shut off. Of course, first we checked that the hydrolics weren't loading the motor, etc, which they weren't, and made other similar tests. Oddly enough, even after the motor shut off, you had to push "stop" to clear some actuators before pushing "start" again to get it running. Well, they got ready to take one of the access panels off, which was in a very hard to get to spot, and something in my mind started to say "something's not right here." As I thought that, a nice little fireball appeared from the access panel they were removing followed by the engine room alarm siren from below decks. They had not "stopped" the motor and at least one of the actuators was still sending voltage out to the motor, and (VERY luckly; this was wither a 480V or 1KV circuit, I don't remember) the access panel hit the pin, shorting it to ground instead of someone's finger doing the same job. The engine room alarm; for some reason instead of blowing the breaker on the ship, we managed to trip the circuit breaker on shore, cutting power to the whole ship. On a more computer related topic, when I was an undergrad in Physics, I used to work in the Physics demonstration room. We had one of those nice rear-projection TV's for use in the classes showing videos. One of the instructors was showing, as the main part of a "physics for poets" class of a few hundred people the video of the Tacoma Narrows Bridge collapse with a computer controlling the videodisk player, allowing him to switch between the real bridge, computer models, real models, and a number of other things to really explain what was happening. The computer and disk were on the lecturn, and the TV a few feet over plugged into a different outlet. A few minutes after the start of class, he comes back in asking for me to come into the room for some reason. The video cable between the TV and computer system had caught fire (but only burned briefly) and flames had shot out of a number of places on various pieces of equiptment. Somehow we had managed to go 5 ot 6 years in this building without noticing that one of the plugs was wired with Hot and Neutral reversed. He did however, say that this was a good way to wake up a sleepy class. :> -- Michael Sawyer - My opinions are mine, not necessarily UH's, NSF's, or NASA's University of Hawaii Physical Oceanography/Satellite Remote Sensing RIPEM public key available, MD5OfPublicKey: C53C8744A87664168D135C0763DCCC1D -------------------------------------------------------------------------- Newsgroups: alt.folklore.computers From: drs@netcom.com (Data Rentals and Sales) Subject: Re: Big short circuits? Date: Thu, 16 Sep 1993 18:53:25 GMT > Any tales of big short circuits out there? A company I worked for built large power managers/supplies. These handle 440V 1/2MW. The cabinets are 3' wide, half as deep, and 8' tall. Three of these were on board an aircraft carrier. What we were told (when the melted heaps were shipped back), was that a sailor had been ordered to clean the power supplies. He carefully turned off the power to all three of them, opened up the water-tight front doors, and hosed them down with sea-water. He then carefully closed the doors again, and powered them all up. (Wish I'd been there). -- #include _ Kevin D Quitt 96.37% of all statistics are made up -------------------------------------------------------------------------- Newsgroups: alt.folklore.computers From: dnichols@d-and-d.com (DoN. Nichols) Subject: Re: Big short circuits? Date: Fri, 17 Sep 1993 00:01:39 GMT In article <021302Z16091993@anon.penet.fi> an13353@anon.penet.fi writes: > > Any tales of big short circuits out there? No, I don't mean some wimpy > sparks while you were fixing your car stereo, I mean A BIG FUCKING > CONFLAGRATION that scared you SHITLESS or you would have been if you > had been closer... explosions, meltdowns, general cataclysmic type > stuff. Well, a friend of mine once worked for AT&T Long Lines. Downstairs from his area was the room with the big racks of equipment for the normal telephone exchange. As you may know, they float the system off some rather hefty batteries. Each cell is too big for anybody that *I* know to lift. They just hook enough of each in series to get the voltage they need. One of the voltages that they needed was 150VDC. The system always has two sets of batteries for each voltage needed. One is being charged, and the other is being used at any given time. These are switched by *massive* knife switches. The fellow (not my friend) who was given the job of maintaining this stuff was supposed to switch over to the alternate bank, and grease the mating surfaces of the switches to prevent oxidation. Well, one of the switches was a double-pole switch (which meant a crossbar of insulating material and a central handle). He couldn't pull hard enough to open the switch, so he went and got a steel pry bar. He placed this under the insulating crossbar, and levered. Of course, the switch opened suddenly. When it did, he lost control of the prybar. One end moved into contact with the 150VDC bus. (These things are *trickle charged* at several hundred amps.) The other end hit the metal door of the relay rack behind him, and simply sliced through it down to the floor - throwing off a few sparks in the process. Also molten metal droplets. He spent some time in the hospital before having to come back to work to face the question of "was he in the right line of work?" ******************** Second story was told to me a lot more distant in time from the event. A co-worker of mine was in the U.Ó. Navy, and the ship that he was on (fitted with a rather major machine shop, from what I have been told) was connected to other ships at dock. It seems that multiple ships were parked parallel to each other, and lashed down so they wouldn't drift too far and bash each other. There were walkways bridging the gaps between the ships. There were also high-current power distribution leads, three phase, some unspecified voltage, and extremely thick cables. (He was not an electronics tech. :-) In the process of removing on ship from such a link-up, one of the wires came loose and was swinging just above the walkway between the ships. At the bottom of its swing, it would brush against the walkway, and shoot major (and noisy) showers of sparks. The current surge would generate enough of a magnetic field so the wire was given a kick. The result was a pyrotechnic electric powered pendulum. He just watched from what he hoped was a safe distance. Eventually, someone on shore cut the power and they secured the runaway lead, but nobody wanted to use that walkway after that. (Something about the thickness having been reduced by quite a bit in some areas.) Sorry that neither of these were computer-related, (unless you think of the phone system as a giant distributed computer. :-) Enjoy DoN. -- Email: | ...!uunet!ceilidh!dnichols Donald Nichols (DoN.) | Voice (Days): (703) 704-2280 (Eves): (703) 938-4564 --- Black Holes are where God is dividing by zero --- -------------------------------------------------------------------------- From: jrh0017@tamsun.tamu.edu (Joel Ray Holveck) Newsgroups: alt.folklore.computers Subject: Re: Big short circuits? Date: 16 Sep 1993 17:29:56 -0500 In article <930915.081044.0w8.rusnews.w165w@mulvey.com>, Rich Mulvey wrote: >> Any tales of big short circuits out there? > Hmmm.... well, I went to the Antique Wireless Museum in Bloomfield, >NY two weekends ago, and saw a demonstration of a 7000W spark-gap >transmitter. From about 3 feet away. ( I hope the antenna wasn't >hooked up. ;-) If it were, that would explain a lot. >:-) Ob: Then there is the old story that most readers here should know, but the newbies need to. At MIT in the Golden Age of Hacking, many repairs to the computer were performed by the Midnight Computer Wiring Society, an bunch of hackers who felt like adding a new command or some such to the PDP-10. (Does anybody have an opcode list for the ol' 10? Or ANYTHING on the 10? (Oh, dear, I think a new thread's about to form...)) One particular toolsmith's toolbox was often raided for tools to perform this task. The toolsmith in question (call him J. Random Toolsmith (JRT (no pun intended)) for brevity) was particularly upset about this; I mean, these were HIS tools, after all! At one point, he even drew a line on the floor (yes, with paint or tape or something along these lines), and told the hackers not to pass this line. No matter, they kept at it. Well, one particular hacker (J. Random Hacker) had borrowed a screwdriver, and the tip became somewhat marred in the process of using it. When JRT found the screwdriver, he knew who to come to. Now, JRH was normally this little mousey guy, but when backed into a corner, positively exploded. In the conversation (pron: /AR-gyu-mint/) which followed, JRH declared, "Well, it was just about used up anyway!" This blew JRT's mind. A tool being USED UP? Tools were things meant to LAST, not just used until neglect damaged them and then to be tossed out. A few weeks later, JRH was performing some surgery on a power supply. He had shut off the switch, but had neglected to short the (sizable) capacitors. While working on the supply (with, of course, JRT's screwdriver), he accidently bridged the capacitors. BaHoom! JRH was thrown across the room, but otherwise was (relatively) unharmed. However, the end of the screwdriver was melted off. He left it on JRT's toolbox with a sign saying, "USED UP". -- ------------------------------------------------ -=<[Joel Ray "The Brane" Holveck]>=- --- my *real* sigblock is under construction --- ------------------------------------------------ -------------------------------------------------------------------------- From: edward.rice@his.com (Edward Rice) Newsgroups: alt.folklore.computers Subject: Big short circuits? Date: Fri, 17 Sep 1993 20:37:35 Something like a dozen years ago, this area in Northern Virginia was in the middle of a major building boom. Outages due to cut cables were not unusual, and the system was both stressed and expanding rapidly. In Reston, ALL cables are underground (no overhead wiring at all), but Miss Utility is pretty compliant and has been around for quite a while. One weekday morning, someone with a backhoe cut a major power cable. It was probably a nasty experience for the cable, too -- it may have taken several swipes with the blade to accomplish the cut. From the results, VEPCO's system's reaction was inappropriate, probably due to the local grid not being in a well-formed state that month. The supply went way, way down, then (as circuits flipped to correct the outage) back up to a fairly severe spike, then down again, then back up to a really significant spike, and then off completely -- in well under a second. My guess is, the consecutive drops spoofed most of the power protection circuitry into dropped out of action, and the final spike arrived (everywhere in the area) unimpeded. We had Lexitron WP devices, Omron and other glass TTY scopes, and DECwriters. The DECwriters survived. Omron and Lexitron ended up shipping in spares from other parts of the country, because ALL units in about a ten mile square (ca. 170 square kilometers) of high-tech R&D activity got zapped. The final spike, or maybe the one just before that, took down a multi-kilowatt motor generator that was driving a large mainframe up at USGS -- even though the motor generator set was explicitly designed to deal with the worst possible incoming power problems. (It did, however, protect the mainframe from costly damage -- but the mainframe remained down for a day or two while they got parts flown in for the MG set.) -------------------------------------------------------------------------- From: frisbie@flying-disk.com Newsgroups: alt.folklore.computers Subject: Re: Big short circuits? Date: 17 Sep 93 08:36:42 PDT > : Any tales of big short circuits out there? No, I don't mean some wimpy In 1965, I had a summer job at a Beckman Instruments manufacturing plant in Hemet, California. One of the large machines I was responsible for maintaining had been mis-manufactured with *both* the ground (green) and neutral (white) power leads connected to the frame. In normal use this presented no problem, as both are at the same potential. One day I had to make some adjustments, requiring that the machine be moved from its normal position. The power cord would no longer reach and would have stretched across an aisle if it had, so I fetched a 50-foot long heavy-duty extension cord. This cord had been fabricated in-house using 10-gauge rubber-covered cable. It was a beast. Unfortunately, the person who fabricated it had switched the neutral (white) and hot (black) leads at one end of the cable. Normally, while a violation of the NEC, this would cause no problems with most equipment. However, the combination of a mis-wired extension cord with mis-wired equipment created a long, but effective, short circuit. The result was, shall we say, spectacular. When I jammed the plug of the extension cord into the wall outlet, it did not immediately trip the breaker. No, there was enough resistance in the 100 feet (round trip) of wire (plus the machine's cord) to allow the entire length time to heat bright red before melting. Of course this was accompanied by lots of smoke from the burning *rubber* insulation. I was not a popular person in the factory that afternoon. Not only did the place stink to high heaven, but the machine I had set out to adjust had to have all its power wiring replaced (by me), putting it out of production. The next day I ordered an outlet tester and made sure *every* outlet and extension cord was correctly wired. Alan "Flash" Frisbie -- Alan E. Frisbie Frisbie@Flying-Disk.Com -- Flying Disk Systems, Inc. -- 4759 Round Top Drive (213) 256-2575 (voice) -- Los Angeles, CA 90065 (213) 258-3585 (FAX) -------------------------------------------------------------------------- From: nick@cs.unc.edu (Nick England) Newsgroups: alt.folklore.computers Subject: Re: Big short circuits? Date: 17 Sep 1993 10:59:54 -0400 In the late 1960's, beer cans had pull tabs. Students drank beer and collected the tabs as proof of something or other. These tabs were usually linked together into a chain and draped around dorm rooms, growing longer as the school year progressed. Older readers, forgive me - life just hasn't been the same since they started using those lift-and-bend tabs for beer cans - heck you can't even solder the damn aluminum things together to make antennas anymore - but I digress. One night a couple of guys (call them experimenters A and B) on the 12th floor of our dorm decided to find out just how long their chain really was, so they tossed one end out the window. But the chain was longer than the 12 story building was tall, and the end of the chain piled up on the ground. Experimenter B was dispatched downstairs to grab the end and walk away from the building - this being an engineering school (NC State Univ) and all they figured they'd be able to compute the length of the hypotenuse pretty easily. Unfortunately , this experiment failed to take in to account the high voltage feeder line for the entire campus (110kv maybe) which ran nearby. At some point, said chain (grounded to the building and still being dragged along the ground by experimenter B) and said high voltage line were close enough together that an impressive but short-lived circuit was achieved. Both experimenters survived (with some burns), a spray of aluminum was deposited on the dorm and ground, and my friend who had been leaning out of our 10th floor window trying to snag the chain with his umbrella was very glad he hadn't succeeded. -------------------------------------------------------------------------- From: wb8foz@mthvax.cs.miami.edu (David Lesher) Newsgroups: alt.folklore.computers Subject: Re: Big short circuits? Date: 19 Sep 1993 11:32:21 -0400 A certain classical music station had its studios & transmitter in the Terminal Tower in Cleveland. As you might guess, the TT is a 52+ story building, that up until recently had a train station beneath it. For many years, it was the tallest building between NYC & Chicago. The studios were on the ?15th? floor, and the transmitter on the 42nd. The antenna itself was not on the building but rather bolted to the side of the massive flagpole on the peak. (This to avoid Motorboatarola's "exclusive" on all antennas on the building.) The station had a "no-cut" power clause in its lease. But the Terminal management needed to do some work on the distrib. panel at trackside. So one weekend night, they ran three fat temporary cables up the stairs from the 2nd floor sub to the 10th, where one big breaker fed both the studio, and the transmitter (via a dedicated line up the access shaft.) This with the help of Muny Light, the infamous local utility. So at the given time, all forces acting in concert: yanked/locked the usual trackside breaker feeding the trackside vault, killing everything on the 10th, and all the rest of the building. When dead, started cleaning busbars, etc. confirmed the 10th floor box was dead. Open breaker there & lock off. wired the temp cables up to the breaker terminals on 10 then closed the temp. breaker on 2. In the words of a friend, you've never seen such an impressive display of right-hand rule. The lights on 2 went DIM, a massive HUMMM filled the air, and all up and down the stairwell, the cables LEAPED off the floor and thrashed around in some weird dance. Then the breaker on 2 opened, and all was quiet, midst the dark night. Hmm, maybe the guy on 10 did not open breaker. Nope, breaker is OPEN. So of course, they tried it again. Dim, hum, thrash, WACK, silence. Hmm, we have a problem here. Station's off the air. Money is being pissed away. (In those days, many people still listened to Wxxx, a situation that has since changed.) Minor panic. Of course, in a building that is dark, the elevators do not work. This makes it harder to get up to 10. Big Brass accumulates. Somebody calls the local commercial utility, that while competitors with Muny, often bails them out. They confer. Dim, hum, thrash, WACK, silence. Finally somebody questions the 10th floor electrician carefully. Yep, he'd opened the breaker, locked it off, and wired the cables in to feed power up to the 15th & 42'nd. The operative word is up. He was thinking 'up' and had connected to the top of the breaker, the LINE side. The contractors at trackside, being no dummies, had chained all the busbars together, and bolted that to the train tracks. Their asses were in the cabinets, and they were not going to be surprised by dead circuits being live again. So the 2nd floor breaker fed the cables up to to, when it went back down to trackside, thru the chains & jumpers, and back again. And it's interesting. I'm no electrician, but I've see this same mistake twice since then. It never happens if cable connects the breaker to the subpanel. But if busbars behind the breaker are there........ Let's be careful out there...... -- A host is a host from coast to coast.....wb8foz@mthvax.cs.miami.edu & no one will talk to a host that's close............[301] 56-LINUX Unless the host (that isn't close).........................pob 1433 is busy, hung or dead....................................20915-1433 -------------------------------------------------------------------------- Newsgroups: comp.programming,comp.lang.misc,alt.folklore.computers,rec.humor From: hugh@nezsdc.icl.co.nz (Hugh Grierson) Subject: Re: The Programmer's Handy Guide to the Languages (humor) Date: Tue, 21 Sep 93 04:41:46 GMT In article amiga@hosta.dircon.co.uk (Dino Dini) writes: >> $ rm -rf ~ > >Ouch... that's nasty. Maybe we could put together a compendium of dangerous >rm mis-types. One clever fellow decided to clean out core files each night, with the following command in root's crontab. They wondered why they couldn't log in the next morning... find / name core -exec rm {} \; -- Hugh Grierson Fujitsu New Zealand - Local Government Group hugh@icl.co.nz [if that fails, try H.M.Grierson.nez1201@oasis.icl.co.uk] Ski today - work tomorrow -------------------------------------------------------------------------- From: cfb@fc.hp.com (Charlie Brett) Newsgroups: alt.folklore.computers Subject: Cabling stories Date: Wed, 22 Dec 1993 16:36:03 GMT Organization: Hewlett-Packard Fort Collins Site There must be some good cabling stories (such as not bending the cable too tight). Here's one that I was involved with. Many years ago, when I was still repairing equipment, I was on a service call to repair a plotter. The symptom was that it would get confused at times. Since the plotter was connected to another vendors computer, I took a semi-portable system with me to run diagnostics. Running the tests showed no problem. Now, being the seasoned engineer, I was well aware that there were some things the tests may not be checking. So, I had the customer run their application. Sure enough, it would plot for a while, then start getting errors. Hmmmmm, well time to start swapping boards. First datacomm, then processor, control, drive, arghhh! O.K., I'll try the power supply. Still, not fixed. I look at the cable between the computer and the plotter. Nothing unusual, not handwired, well made, goes from the plotter back to the computer, or does it? Now the computer was on one side of the room, and the plotter on the other. Maybe 20 feet at the most. So, I start following the cable. Along the edge of the room, under the computer, and .... what's this? Apparently they had bought a 50 foot cable, and in order to keep it neat, spooled it up, ... around the transformer! Charlie Brett - Ft. Collins, CO -------------------------------------------------------------------------- From: dnk@gtech.com (David N. Kidd) Newsgroups: alt.folklore.computers Subject: Re: Cabling stories Date: Thu, 23 Dec 1993 18:02:38 GMT Worked on two problems at remote sites which were horrible to track. Eventually fixed, and you think SHEESH! How stupid can I get? The first: the DTE transmit pin on the business equipment was very slightly shorter than all others. It didn't reliably make contact in the modem. But it did make contact in our lime monitor patch cables and cable tester. So the only time it didn't work was (sometimes) when you weren't looking. Grrr! The second: overnight reports didn't arrive. But the equipment worked fine all day. Always worked when we tested. Always worked if the customer stayed to watch. Always worked when we went there to test. Always failed when run unattended. Turned out the outlet for the modem was plugged into an outlet controlled by the same switch that turned out the office lights. And the last one out always turned off the lights! ... not to mention the X.25 service that carefully stripped the checksum, corrupted (sometimes) one bit, rechecksummed and retransmitted ... -- David N. Kidd, Software Quality Assurance GTECH Corporation, 55 Technology Way, West Greenwich RI 02817 Phone: (401) 392-7321 Fax: (401) 392-0476 Internet: dkidd@gtech.com -------------------------------------------------------------------------- From: glaz@illuminati.io.com (Yevgeny Glazamitsky) Newsgroups: alt.folklore.computers Subject: Re: Wanted: Best luser story Date: 6 Jan 1994 02:21:40 -0600 >mpcline@cats.ucsc.edu (Matthew Paul Cline) writes: > > To all you who work or have worked on text-support: whats the best >story you have about a luser that called in with a stupid problem? This one happened to a co-worker: The application we were running was not using terminal handler but was hard-wired to a Televideo 950 terminal. Televideo 950 is an old square box- shaped thing, similar in looks to old VT100. So we shipped TV950 to a client who received it and called with the problem: Client: I got the terminal and there is a problem with it. There is no screen. Us: Huh? Is the screen blank and not displaying any text? Client: No, there is no screen. Us: Is the screen broken? Like is there a hole where screen should be? Client: No. ... that went on for a while ... Us: Let's see. You took the terminal out of the box, plugged it into outlet, right? Client: Right. Us: Then you plugged the modem cable into the connector on the back of terminal? Client: On the back of terminal? No, the connector is on the top of terminal. Us: Turn the terminal over!!!! Client: Ah, there it is. -------------------------------------------------------------------------- From: davidh@harlequin.co.uk (David Hembrow) Newsgroups: alt.folklore.computers Subject: Re: Wanted: Best luser story Date: Thu, 6 Jan 1994 10:48:53 GMT In article <2gfov6$bes@darkstar.UCSC.EDU> mpcline@cats.ucsc.edu (Matthew Paul Cline) writes: > To all you who work or have worked on text-support: whats the best >story you have about a luser that called in with a stupid problem? I wasn't really supposed to be doing technical support, but I did take some calls a while back when all the support people were otherwise engaged. The most memorable was the secretary who had gone through the computer deleting any files which she didn't remember typing in. Stuff like autoexec.bat, cconfig.sys, ccpm.sys. "Some of them just contained garbage" she told me. A member of the real technical support staff had to go out to the site to get everything set up and working again - the next week she repeated the performance... -------------------------------------------------------------------------- Newsgroups: alt.folklore.computers From: douc1950@mach1.wlu.ca (Jason Doucette u) Subject: Re: Wanted: Best luser story Date: Thu, 6 Jan 1994 14:18:39 GMT Reminds me of my first journey into the land of tech-support (why'd I stay? Somebody moved the door...), I had just gotten my first computer (Da big bad Tandy 1000SX), and a friend of mine got a similar system a week later. The day his machine arrived he gave me a call... "Something's wrong with our computer" "What's the problem?" "Nothing shows up on the screen" "Is it plugged into the wall?" "YES. I'm not STUPID!" [various questions to see if the computer itself is starting up properly, everything seems normal...] Finally, in a last-ditch effort to exhaust my limited storage of knowledge, "You DO have the monitor plugged into the computer, don't you?" "Uhh...Just a sec...[pause] Found the problem. " -------------------------------------------------------------------------- From: dale@inacc.com (Dale Mensch) Newsgroups: alt.folklore.computers Subject: Re: Wanted: Best luser story Date: 06 Jan 1994 21:35:39 GMT In article <2gfov6$bes@darkstar.UCSC.EDU> mpcline@cats.ucsc.edu (Matthew Paul Cline) writes: > To all you who work or have worked on text-support: whats the best > story you have about a luser that called in with a stupid problem? The kind I worked at a place that sold Suns bundled with a publishing package (A turn-key system if you sold it to Bell Labs). A flegling SunOS administrator called late one night: he'd been perusing some system files and decided that everyone's passwords were too cryptic. He was reading field 2 in /etc/passwd: the ENCRYPTED password. Using vi, he was typing in "better" passwords for everyone, and couldn't figure out why those accounts could no longer log back in. By the time he called us, the only valid account left was root. After playing 20 (thousand) questions to determine what he'd done, versus what he thought he was doing, we told him what we thought was wrong. "Oh. Just a minute." (Long pause while he decides to test the theory by editing root's password string, logging out and trying to log back in) "Yep. That was it. Now what do I do?" -------------------------------------------------------------------------- From: johng@oce.orst.edu (John A. Gregor) Newsgroups: alt.folklore.computers Subject: Re: Wanted: Best luser story Date: 9 Jan 1994 11:15:52 GMT Back in a previous life there was this fellow who liked to type at about 300 words per minute. Unfortunately, he got 299 of these words wrong. He was also possesive about his keyboard, so one would be forced to endure: Luser: "I have this problem, let me show you..." 133 vax: ls /uszr/tmp Luser: "Fuck" 134 vax: las /.udt/tnp Luser: "Fuck" 135 vax: ls /surtmp Luser: "Fuck" Etc. But I digress. Anyway, one day this fellow was blazing away at warp 11 on his terminal when things lock up. He bangs on the return key, hits the fav ^C, ^\, etc. Nothing. So, what does he do next? Does he talk to anyone else? No, ho walks into the machine room and hits the power on the 11/785 with about 35 people logged in and about 50 NFS mounts. The rest of the company is now tapping at their return key... The problem? Mr. rocket scientist had hit ^S in his mad typing frenzy and failed to hit ^Q which was the only way to unsuspend his terminal. The real problem? Mr. Brainiac was the leader for the support group! He did several tricks like this per day. He managed to crash the main vax twice while I was there for my interview for the job. Someday I might recount the fun he had with the 19 byte file that contained the characters: Permission denied.\n -------------------------------------------------------------------------- From: nose@cis.ohio-state.edu (ken william nose) Newsgroups: alt.folklore.computers Subject: Re: Humor wanted Date: 20 Jan 1994 16:18:47 -0500 This from the help desk of a now-defunct PC clone manufacturer: User calls up after unpacking thier PC & going through everything that came with it. Said they had looked through the box & all the other packaging, but "couldn't find their DOS prompt." -------------------------------------------------------------------------- Newsgroups: alt.folklore.computers From: sjm1@crux4.cit.cornell.edu (Seth Morabito) Subject: Re: Joys of doing Tech Support Date: 9 Mar 94 21:53:22 GMT stefanj@illuminati.io.com (Stefan E. Jones) writes: >There are certain >people who lack the cognitive capacity to deal with information abstractly. > Along these lines, I recall a somewhat funny story from last summer, when a friend of mine was working tech support at Cornell University. (Alas, I was not there, and did not witness the events of the story first hand). Apparently, one fine summer morning they got a call at the support office, from a somewhat panicked user, who wanted to get the "triangle" back. After a bit of inquiry on my friend's part, they determined that said user was on a Macintosh, and some strange, ephemeral "triangle" figure on the screen which she was used to having around was no longer there, and would they be kind enough to come by and see if they could get it back for her. They had nothing better to do, so they went over straight away. When they arrived they had to suppress their laughter; the user was in Microsoft Word, and had moved the mouse cursor from the Mac desktop to the Word window, thus turning the friendly arrow cursor into the Word I-bar cursor. Having done so, she panicked and didn't think she should touch the evil computer thingy until tech support could see the problem. That's about the saddest story I've heard of in person, and it always makes me feel, somehow, totally sorry for the user. Her embarrasment must have been tremendous. I only hope she's achieved a better rapport with her computer since then. -------------------------------------------------------------------------- From: jdahl@gamera.rchland.ibm.com (Jared Dahl) Newsgroups: alt.folklore.computers Subject: More Stupid User Tricks Date: 10 Mar 1994 19:01:41 GMT This is interesting ********************************************* Here's a summary of the Wall Street Article - "Befuddled PC Users Flood Help Lines, And No Question Seems to Be Too Basic," by Jim Carlton - Staff Reporter of The Wall Street Journal Austin, TX - Exasperated caller said she couldn't get her new Dell computer to turn on. Customer: "I've pushed and pushed on this foot pedal and nothing happens." Dell Tech: "Foot Pedal?" Customer: "Yes, this little white foot pedal with the on switch." - Foot Pedal turned out to be the mouse - PC makers discovering it's still a low-tech world out there - having success selling PC's to households - now have to deal with people to whom monitors and disk drives are as foreign as another language - 2 years ago, most calls came from techies seeking help on complex problems - Now, as many as 70 percent of calls come from rank novices - Part of reason some companies are now charging for tech support - Questions often so basic, they could be answered by opening the manual - One woman called Dell asking how to install baterries in her new laptop computer - Told directions were on first page of the manual - Woman replied angrily, "I just paid $2000.00 for this damn thing, and I'm not going to read a book." - These buyers rarely refer to manuals - Would rather use the phone - "It's a phenomenon of people wanting to talk to people." - Craig McQuilken of AST Research - Compaq help center in Houston indundated with 8000 calls a day with inquiries like - " A frustrated customer called, who said her... {PC}... would not work. She said she had unpacked the unit, plugged it in, opened it up and sat there for 20 minutes waiting for something to happen. When asked what happened when she pressed the power switch, she asked, "What power switch?"" - So many people have called to ask where the "any" key is on their keyboards when the "Press Any Key" message is displayed - Compaq considering changing message to "Press Return Key" - AST - one customer complained that her mouse was hard to control with the dust cover on it - dust cover turned out to be the plastic bag in which the mouse was packaged - Dell - one customer held the mouse in the air and pointed it at the screen, all the while clicking madly - Compaq - one customer was having diskette problems. After trouble shooting for a while (magnets, heat, etc.), tech asked the customer what else was being done with the diskette. Response: "I put a label on the diskette, roll it into the typewriter..." - AST - customer complied with tech's request to send in a copy of a defective diskette. A few days later, tech received a letter from the customer along with a Xerox copy of the floppy. - Dell - tech advised customer to put his troubled floppy back in the drive and close the door. Customer put the phone down and was heard walking over to shut the door to his room. - Dell - customer called to say he couldn't get his computer to fax anything. After 40 minutes, tech discovered the man was trying to fax a piece of paper by holding it in front of the monitor screen and hitting the "send" key. - Dell - customer needed help setting up an app. Tech referred him to the local Egghead. Customer: "Yeah, I got me a couple of friends." When told that Egghead was a software store, the man replied, "Oh! I thought you meant for me to find a couple of geeks." - Dell - Customer called complaining his keyboard no longer worked. Customer had cleaned his keyboard by submerging it for a day in warm soapy water in his bathtub. - Dell tech once calmed a man who was enraged because "his computer had told him he was bad and an invalid." Tech patiently explained that the computer's "bad command" and "invalid" responses shouldn't be taken personally. - Techs increasingly find themselves taking on role of amateur psychologists - Dell tech (formerly a psychiatric nurse) once defused a potential domestic fight by soothingly talking a man through a computer problem after the man had screamed threats at his wife and children in the background. - Also the lonely hearts reaching out for human contact, even if it happens to be a computer techie. - man from New Hampshire calls Dell every time he experiences a life crisis. Gets a tech to walk him through a contrived computer problem, apparently feeling uplifted by the process. -------------------------------------------------------------------------- From: frisbie@flying-disk.com Newsgroups: alt.folklore.computers,alt.folklore.urban Subject: Re: vibrating hard disk makes computer move Date: 11 Mar 94 07:27:34 PST In article , jcmorris@mwunix.mitre.org (Joe Morris) writes: > The early development of rotating memory devices at IBM was a somewhat > clandestine affair since it didn't have Upper Management approval. One > of the late-development cycle boxes was similar to the eventual > 1301 RAMAC product, with huge platters (30 inch diameter, give or take > 30 years memory loss) with a single head which worked like the tone arm > on an old juke box, running up and down a guide, darting into the disk > stack as needed to read or write data. The disk stack (How many? That > number is completely gone) was mounted on a vertical axle perhaps 40 inches > high. The disks were *massive*, especially by today's standards. In 1967 I was a student, working weekends for IBM as an assistant to a CE. One day we got a panic call from a customer. Their disk (the version of the above 1301 that was used on 1401 & 1440 systems -- I forget the model), was emitting smoke! We hopped in the car and raced to the site. As we ran into the computer room we stopped dead in amazement; everything seemed normal and the operators were going about their tasks. They told us, "The smoke stopped and everything seemed to be working OK, and we had a payroll to get out, so we just kept running". (Does this count as a "blindly ignorant user"?) As it turned out, a fan bearing had seized, causing the motor windings to burn up. Since *all* the fans were on a single large fuse, it took a while for the extra current to blow the fuse. When it finally did, the smoke ceased, along with all cooling air for the racks of circuit cards. When we arrived you could just about fry eggs on that machine. We were afraid that there was severe damage, but upon replacing the fan (and waiting for everything to cool down), it ran all the diagnostics without fail. -------------------------------------------------------------------------- From: n1gak@netcom.com (Scott Statton) Newsgroups: alt.folklore.computers Subject: Re: vibrating hard disk makes computer move Date: Fri, 11 Mar 1994 20:11:44 GMT I have an example from my own real life, no FOAF here . While I was working for a long-distance company in Boston, we ran a NTI CTSS-4000 (a medium sized toll tandem) which used a pair of DG Nova-3s. Each Nova-3 had a Ball 80 Meg washingmachine drive (same geometry as an CDC 9762). One day, the "A" CPU was running, and I got a page around 2 in the morning, that it had crashed (dropping to the "B" CPU) so I log in and try to see what happened. Well -- the crash log wasn't very helpful, but we were running traffic, so I went back to sleep. I went in early and found the "A" disk drive had walked about 4 feet forward from it's usual place, until such time as it pulled it's power cord out. I push it back into position, and force a bus-swap over to it, and watch. It appears that some disk-thrashing had occured, and the vibration caused it to follow the oh-so-gentle slope of the machine room floor. If I rotated the drive 90 degrees, it now moved to the right. So it wasn't following the motion of the head, but the slope of the floor. I re-adjusted the leveling feet, and ty-wrapped the power cord to the chassis. (I had recently replaced that drive, so it was on its casters instead of its feet.) -------------------------------------------------------------------------- Newsgroups: alt.folklore.computers From: cstkb@cend4c1.caledonia.hw.ac.uk (Keith Braithwaite) Subject: Re: More Stupid User Tricks Date: Fri, 11 Mar 1994 14:36:35 GMT When I was at school in Co. Durham, some years ago, there was a drive to increase the number of micros in schools. The DoE (as it was then) had a deal with Acorn to flood schools with BBC Microcomputers. Well, I heard from the teacher in charge of computing in my school (who I helped out from time to time with various things) a story he was told by the technician for Durham schools: A junior school gets its machine, and the teachers start ringing the tech complaining that when the kids type on it the wrong letters show up on the screen. He puzzles about this for a while, asks various questions, and eventually has to go out to the school to look at the machine himself. It turns out that the teachers had decided that the QWERTY layout was too hard for he kids to learn, so they prosed off all the keytops with a screwdriver and put them back on in alphabetical order. Not a word of a lie! -------------------------------------------------------------------------- Subject: More tech support nightmares Date: Tue, 17 May 1994 07:12:11 -0700 (PDT) From: "Robert L. Lewis" Still more of the same. 1. "My hard disk has a virus!". How can you tell, I ask? "When I type DIR, it says VIRUS and some date stuff". (Hint: Never name the directory for virus scanning software VIRUS). 2. Some monitor manufacturers suggest using alchohol to clean the screen. They forget to mention that the monitor should be off. Boom. 3. I told a customer to take his machine to a gas station and have them blow the dust out. The gas station hands him a 150psi air nozzle that belches rusty water and oil. I got to clean up the mess for free. He also mangled the floppy heads with the high pressure. 4. Oxymoron candidate: Disk Protector. That's the cardboard disk they shove in the floppy drive for shipping. More drives have been mangled by shoving in the wrong shape, backwards, or bent than have ever been protected by them. Use a floppy disk instead. 5. What's the difference between a Van DeGraf static generator and a belt driven vacuum cleaner? Answer: Not much. Don't use a vacuum to clean your computer. 6. After the cleaning service crashed the computer for the 4th time by plugging the floor sweeper into the UPS, I decided to take action. I suggested they install "child proof" plastic plugs in any outlets deemed worthy of protection. The order went though the chain of confusion, and I was soon blessed with 1000 child proof plugs hot stampled with "Protected". I gave instructions to install about 10 of them on the protected outlets. However, the maintenance person assigned to the task knew nothing and proceeded to plaster every outlet in the building with the plugs. Mutiny was averted by 5 of us spending all night removing the monsters. 3 years later, they are still appearing. 7. Hint: Do not allow long hair black cats to sleep atop laser printers and tape drives. The black hair is almost invisible in black pattens, gears, and rollers. 8. Forensic filth analysis is a new part of computer repair. I now carry a microscope and some chemicals which are used to determine the exact nature of the filth I remove from keyboards, mice, computers, light pens. While nobody pays me to do this, it definately adds to the entertainment value. 9. New recommended warning labels: Do not put pizza slices in the CD-ROM caddy. Do not apply peanut butter to the floppy disk. Do not use the light pen for a coffee stiring rod. 10. Why do customers think that I maintain a document and device driver library for every conceivable board ever made? 11. From a hard disk drive manufacturer: "The drive stopped working. I poped the little plug and noticed it was awful dry inside. I added some oil but it didn't help". 12. Which arrow key. There are 15 arrows on the keyboard. -------------------------------------------------------------------------- From: rnturn@delphi.com (Richard N. Turner) Newsgroups: alt.folklore.computers Subject: Re: Radios on computers - the olden days Date: 18 Aug 1994 00:53:24 GMT >I remember doing it as far back as a PDP-8/I at school. Once FOCAL [text deleted] >Later on, as a student at WPI, I found out about it being done on >a KA-10. The program was more sophisticated (and developed, I think, [text deleted] >Many songs were developed for it (and when I got the same sort of >thing working on my PDP-11, which I called the MK11, I entered It sounds like nearly everyone has done this in some way in the past. Has anyone else had a radio affect their computer? We did: We had a PDP-11/34 that was running software (under RT-11) to control a GPS satellite receiver and collect data onto tape. The system had a programmable clock running off a 50 Hz signal generated by the satellite receiver. The receiver and the PDP were in different rooms; data and clock signals were run through shielded RS-232 cabling over a 100 foot or so long run. Strange things began happening in the evenings. After a data collection run we examined the data and noticed periods where the measurements we expected to see were *WAY* off. We were tracking as many as six satellites with the receiver and the first thing we looked at was if the strange measurements were connected to a particular satellite. No, wierd data across the board. This happened every night for a couple of weeks and it was starting to drive us crazy. We began to suspect that there was something wrong with the real-time clock. We set up a program to display the system time every ten clock ticks which should have given us five displays each second. We noticed that it occasionaly scrolled off the console too fast to read. Since all of our receiver control software relied on this real-time clock it was no wonder our measurements were fouled up. One night we stayed around to witness what the heck was going on and we noticed that the speakerphones in our offices were talking to us when we weren't on the phone. You know how your powered-off stereo emits voices when the nut with the 100W linear amp on his CB drives by? It suddenly hit us: it must be that PhD candidate two floors up who had his ham radio in his lab. He's chat with people when his dissertation work was going slow (some people pace the floor, he'd talk to Tasmania). Every time he keyed the mike the real time clock in the PDP went absolutely nuts. We complained to the powers that be (were?) and he was banished to a lab out at our airport test site (where he was immensely happy to be away from any disturbances -- he was sort of a hermit-type anyways). Anyone else have any radio-interferes-with-computer (man bites dog) stories? -- RNT -------------------------------------------------------------------------- Newsgroups: alt.folklore.computers From: frank@rover.uchicago.edu (Frank R. Borger) Subject: Re: Radios on computers - the olden days Date: Thu, 18 Aug 1994 14:57:14 GMT In article <9408172052591.DLITE.rnturn@delphi.com> rnturn@delphi.com (Richard N. Turner) writes: >Anyone else have any radio-interferes-with-computer (man bites dog) stories? Saw one article posted in Bob Pease's column in Electronic Design. This engineer was in charge of a remote meteorological data collection site in Antartica, and the remote unit would just up and stop. He'd drive out, open up the unit, everything was fine, button things up, and things would be fine. Finally he noticed that all he had to do was drive to the site, sit there and the unit would restart. Obviously RFI from the truck, (probably ignition,) would get into the system and cause a reboot. Further study showed that only one truck was the magic truck, if he drove up in that one, it would reboot the data collection unit, but other trucks, (same make, model,) wouldn't affect the magic cure. He also described the problems he had convincing the guy who ran the motor pool that he had to have one certain truck. Anyway, for his whole tour of duty, he ended up driving out to the site oncea day, and then driving back. Frank R. Borger - Physicist ___ "I think medical research would show Michael Reese - U of Chicago |___ that being a Cubs fan lengthens Center for Radiation Therapy | |_) _ your life. Or maybe it just _seems_ net: Frank@rover.uchicago.edu | \|_) longer. " - Mike Royko ph: 312-791-8075 fa: 791-2517 |_) -------------------------------------------------------------------------- From: rnturn@delphi.com (Richard N. Turner) Newsgroups: alt.folklore.computers Subject: Re: Radios on computers - the olden days Date: 23 Aug 1994 01:56:50 GMT >>the real time clock in the PDP went absolutely nuts. We complained to the >>powers that be (were?) and he was banished to a lab out at our airport test >>site (where he was immensely happy to be away from any disturbances -- he >>was sort of a hermit-type anyways). > >[..] > >I'm sure the airport authorities would have been thrilled >with their new resident. Actually, they didn't care because it was a rural airport and did not have a certified ILS (that's Instrument Landing System to you non-pilot types) that he'd interfere with. Also, since he only seemed to interfere with equipment that was close enough (say a few hundred feet) to pick up the ham radio signals. The theory was that his antenna was so close (only 10-20 feet) from our satellite antennas that he was injecting enough signal to cause a problem. We never got around to hanging an oscilloscope off the programmable clock interface to see what was happening to the clock signals. Besides, keeping a scope probe on a KW11-P while inserting it into a fully loaded UNIBUS backplane would have been a major pain. -- RNT -------------------------------------------------------------------------- From: beaumont@microwave.msfc.nasa.gov (Bruce Beaumont) Newsgroups: alt.folklore.urban,alt.folklore.computers Subject: Re: Magnetic Fields (was Re: The "any" key) Date: 9 Sep 1994 20:36:40 GMT A friend of mine was stationed in Turkey while in the Air Force back in the 60s. They set up their data center in the basement of an old building, placing a tape drive in an alcove which was the support for a fireplace on the floor above. Once a month the tape on that drive would error, and a subsequent inspection would reveal that the tape was totally garbled. This posed a mystery for some time until someone discovered that the *other* side of the fireplace foundation in the building adjacent had been turned into an elevator shaft, with a huge unshielded motor at the bottom of the shaft. Once a month the business in that building got a delivery... -------------------------------------------------------------------------- From: ddm@wpi.WPI.EDU (Duane D Morin) Newsgroups: alt.folklore.computers Subject: Oh, to be in support once again... Date: 16 Sep 1994 16:33:36 GMT I haven't worked the support lines in a long time. I miss it... "Ok, it says installation completed. Now what?" "Take the disk out and reboot the system." "Reboot? What's that?" "Reset the computer." "How do I do that?" "The reset button is on the front of the computer. Push it." "Just, like, push it? Let it go, too, or hold it in?" "The printer's not working." "What's not working about it?" "It just keeps printing out square c square c square c square c all over the page." "...???...Oh. We call those left brackets." And from the "users know better than the programmer" department: "It says 'baseline error failed.', I think." "Could you check? Having the exact wording would help me." "I'm sure, that's what it says. Baseline error failed." "No it doesn't." "Yes it does." "Ma'am, I wrote the software, and I didn't write that message. Now Please go check and tell me what it actually says." There are two ways that our product can crash in mid test - Signal Failure, or Signal Too {High|Low}. Sure enough, we're regularly told that they've gotten the message "Signal Failure Too Low." A personal favorite. You need to picture this. On the "patient information screen" are two date fields. One labelled "date of birth" is about halfway down the lefthand side of the screen. "Date of test" is in the lower right corner. It's also a read-only field. "Ok, so, explain to me what you do." "Ok well, we enter in the name and id number and everything, right? And then we go over to the corner where it says date of test and type in today's date.." "You do WHAT?" "In the corner, where it says date of test, we put in today's date." "Are you SURE?" "Yes." "In the corner?" "Yes." "You enter in a value?" "Yes." "No, you don't." "Pardon me?" "That field is read only. You can't change it." "Oh, well, not that one. Where it says date of birth. We put in today's date." "Oh. Your use of the words "in the corner" confused me. My mistake." I don't even want to know why they're changing everybody's date of birth to be today's date. -------------------------------------------------------------------------- Newsgroups: alt.folklore.computers Subject: Re: Debugging War Stories From: stuart@zen.mathcs.rhodes.edu (Brian L. Stuart) Date: 12 Dec 94 15:03:44 -0500 My favorite was a hardware problem on an EIA 680 analog computer. (Or was it the 640? One of them was the analog and the other was the digital.) Well I was installing one that had been given to the school (my undergrad alma mater Rose-Hulman). It wouldn't change from the initial condition mode to the operate mode. Now I had noticed earlier that some of the panel indicator lights had burned out, but I figured that I could always replace light bulbs when I got the thing up and running--first things first and all that. So I began to pore over the schematics for the control logic. I was hit with the sort of feeling that you know you'll remember for the rest of your life saying something to the effect: No, surely that wouldn't have... They DID! The designers had used the indicator lights as pull-ups in the logic. When I tried to switch to the operate mode with a burned out light bulb one of the signals stayed low instead of going high. The computer literally didn't work because it had a burned out light bulb. -------------------------------------------------------------------------- From: brown@cs.swarthmore.edu (Randolph G. Brown) Newsgroups: alt.folklore.computers Subject: Re: Debugging War Stories Date: 16 Dec 1994 21:09:30 GMT In article , Tab Julius wrote: >A big portion of what I'm doing is the "bug hunt", How did you trace it Ok, here's one... Less a programmer, than a sysadmin bug hunt... Over the course of the summer, I was peripherally involved in installing SunOS 4.1.3_U1 on our little cluster. We'd brought one client up with it, and discovered that the DNS didn't work. After a little thought and some poking around we discovered that previous admins had patched BIND into the shared library of the other clients, and so DNS wasn't done through yp. (Why are these things never documented --- Interesting things happen when the admins are students...) So, with some trepidation, since we'd never patched shared libraries before, we set up the shared library on the client with BIND. Bingo, DNS worked. We went to bed. I come over the next day and discover that my colleague has been puzzling over a message frequently showing up on the console of the client, which read something like: gethostbyaddr: A record of (name of client) != PTR record of (name) This made us nervous --- I, at least, had visions of network deamons failing miserably and silently. We patched the libraries again, this time being vary careful of our procedure. The messages still showed up. My colleague went away for a little while, and I poured over the source to BIND. After a while, I found a minor bug in the gethostbyaddr routine and fixed it, recompiled the library, and lo and behold, I didn't see any console messages. I went to bed. There were these messages on the console when I got there the next day though, and so I continued to look at the BIND source. We did nslookups on the machine, and I couldn't get them to cause the message. I compiled my own little programs which called gethostbyaddr, and the only way I could reliably get the message to go on the console was to call gethostbyaddr with an invalid value for the length of an internet address. ("but nobody would _do_ that!" I was heard to say...) However, fiddling alone at the machine, I didn't see any other messages. My colleague logged in, and I saw the message. A light went on. I told him to get off the machine. I told him to get on. There was the message. I realized that I had never seen the messages appear spontaneously when I was alone on the machine. I use bash. My colleague uses tcsh. A quick hunt of the tcsh sources found a call to gethostbyaddr in the setting of the REMOTEHOST variable. I read it three times before I caught the bug --- they called gethostbyaddr with the size of the socket address, not the internet address. (see --- people would do that, and it would make sense the first two times; never say never.) One patch, recompile tcsh --- bingo no console messages. After a few days of running smoothly, we transferred the new system to the other clients, and I sent out a couple of bug patches. All that work to find, not a failing network daemon, but to fix an little-used variable in a shell I don't use. -------------------------------------------------------------------------- Newsgroups: alt.sysadmin.recovery,alt.folklore.computers From: wb8foz@netcom.com (David Lesher) Subject: Re: Exploding electric components Date: Mon, 2 Jan 1995 15:48:07 GMT I once designed a set of IO board for a custom Falcon job. (A Falcon was a single-board LSI 11) For various reasons, it used NE555's as lamp drivers, 16 to a 4"x4" board. The lamps were small 14v ?327's? in illuminated pushbuttons. We had 32 boards in a rack. As you might guess, that's a lot of 14 volts, so there was a BIG SCR-regulated supply, 50amps or so. The first time we installed it was in the simulator in the main machine room, along with 50-odd PDP-11/34's, one 3033, and a Cray 1. All was fine till I commanded a lamp test. To keep the loads down to reasonable values, lamp test cycled through them, lamp #1 on all cards, then lamp #2, etc. Second sequence came on with a load CRACK! and a puff of smoke headed for the detector on the ceiling. I recall thinking "gee - we just had an evacuation drill last week" and "I hope it does not hit TWO, as that will charge the sprinkler systems.." The technician just looked at me, and I looked at him. Somehow, the smoke cloud missed the detector. Silence reigned. (sort of -- it WAS the machine room...) There were 4 or 5 boards missing 555's or having only part of them. Seems as if a bunch of the lamps were shorted out of the box. And given the available 14v current......... That's how I learned a 555 makes a great fuse. ISTM they are also CHEAPER than one, too. -------------------------------------------------------------------------- Newsgroups: alt.folklore.computers From: glamm@i10.msi.umn.edu (Bob Glamm) Subject: Re: Exploding electric components Date: Tue, 3 Jan 1995 03:49:30 GMT >That reminds me of one of my favorite warning messages: >"WARNING: Use of this program may damage your monitor hardware" > -- from vgaset, a linux utility >-nate Argh. That brings back heinous memories. I managed to destroy a 14" NI monitor trying to set up a @#Q*$ Diamond SpeedStar 24X to run X. Of course, when the monitor bandwidth is 65MHz, and I programmed it for 80MHz, I didn't quite expect it to destroy itself. Just push the limits a little bit. ;) That was 4 weeks, $100, and 4 calls to CTX. CTX also screwed up the repair, BTW. -------------------------------------------------------------------------- From: pgut1@cs.aukuni.ac.nz (Peter Gutmann) Newsgroups: alt.folklore.computers Subject: Re: Exploding electric components Date: 3 Jan 1995 13:46:28 GMT aardvark@hera.bf.rmit.edu.au (Chris Anderson) writes: > >While on work experience some 8 years ago (or more), I was observing the >techs calibrating some CB radios before they were shipped out. Anyway, >while working on one of them, we heard a load bang, and smoke went all >over the joint. When, after about 20 mintes of seraching, we found the >proble, we discovered that the audio amplifier chip had blown - leaving a >nice crater over the silicon (we could see the chip itself - pretty neat >effect, hey?). A friend of mine once put 240V across a PET motherboard. The main noticeable casualty was some garden-variety TTL chip which exploded with a small bang and blew the ceramic top half across the room. I've found that some 68K's don't like current spikes much (that's what you get for poking around in the circuitry when it's attached to a 5V 20A supply) and turn into rather expensive (at the time) heating elements very quickly. I have a VT-55 next door which arrived here dead. I scraped the guts of an exploded electrolytic out from all over the inside, replaced it with a new one, and it's worked fine ever since (modulo the EPROM character generator slowly dying, but what would you expect from something that's currently at twice it's rated life expectancy). -------------------------------------------------------------------------- From: Rich Greenberg (richgr@netcom.com) Newsgroups: alt.folklore.computers Subject: Re: Exploding electric components Date: 15 Jan 1995 03:02:25 GMT One day I was setting up the experiment de jure and there was a bright flash and a loud bang from behind me. I turned around and a cloud of smoke was rising from one of the other benches and a student clawing for the breaker. He had connected one of the light weight wires (normally used to connect instruments into the circut) that was 8-10 feet or so directly across one phase of the 440. The entire wire had vaporized except for the brass connectors at the ends. Gave me a little more respect for the AC line after that one. -------------------------------------------------------------------------- From: mstatz@cei.net (Mark Statzer) Newsgroups: alt.folklore.computers Subject: Re: Exploding electric components Date: 15 Jan 1995 03:02:25 GMT I had a hot water heater burst in my apartment a few years ago ( while I was on vacation, yet - pumped steaming hot vapor into the place for a *week* ) and after I took my computer to work ( to de-humidify for a few days ) the only damage was when I turned the SVGA Magnavox monitor on - in buzzed real loud for about 15 seconds, then went BANG!, and was happy ever since. I'm still using it now - runs great! I believe, however, that I may have the 'best BANG for your buck' here - I'm the Chief Engineer at a local UHF TV station. I have (had?) a voltage regulator go south, just as I had my hand on the switch to turn the transmitter on, yet! I had been doing some adjustments on the transmitter ( mind you, it's the middle of Nowhere, Arkansas at 3am with my boss and my TX Consultant ) and hit the switch to turn it back on when KA-BANG! This *was* a *mechanical* voltage regulator (480v 3 phase, 500 amp) - looks like a transformer with a face cut out of the side - like a huge wire-wound pot. Four feet high, three feet wide and four feet deep, in it's own rack, with motorized wipers to mechanically regulate the voltage. Presently shorted directly to ground. It melted both drive chains, burned most of the winding insulation off the transformer ( quickly, too! ) and scared the be-jezzus out of the consultant ( a man who regularly sticks his hand into the transmitter, but happened to be standing right next to the rack, reading an article on, of all things, the Internet. ) My head *still* hurts. The replacement part costs $13,000. Good thing *I* don't have to pay for it. -------------------------------------------------------------------------- From: fms@highpower.cs.umd.edu (Marat Fayzullin) Newsgroups: alt.folklore.computers Subject: Re: Computers that refuse to die Date: 21 Mar 1995 16:31:50 GMT Jeffrey Thomas Sheldon (jeff@tiger.lsu.edu) wrote: : Anyone else have any miracles of computing on either micros or mainframes? Long time ago, a friend of mine was playing with his homebrewn 8080-based computer. The board was connected to a universal power supply. He touched the power supply knob and the thing pumped 30V into the board instead of 12V. He did manage to blow the clock generator which had two NAND elements with outputs connected to the power line through resistors. Actually, the silicon in the clock generator chip was gone completely, as I remember, there was a hole in the chip, but no crystal. The computer itself continued working though, after replacing the chip. There was another case of this kind, when somebody powered ZX Spectrum clone with 9V instead of 5V. The board was full of 74xx chips. It worked almost fine afterwards, although all signal fronts leveled a bit. -------------------------------------------------------------------------- From: turtle@charm.net (The Turtle) Newsgroups: alt.folklore.computers Subject: Re: Computers that refuse to die Date: Wed, 22 Mar 1995 22:13:35 LOCAL In article dbird@eskimo.com (Dan Bird) writes: >: >: Anyone else have any miracles of computing on either micros or mainframes? I have an Always IN-2000 SCSI controller that is running quite happily in my file server now. When it was still in my workstation, something on the bus freaked out and the fuse on the board blew. Not having a replacement, and not having an open electronics store at 11:00 on a Sunday night, I stuck a bent staple in the socket and restarted the machine. The thing fired up, even though a short (which I later corrected) caused the bent staple to heat up red hot and melt the fuse socket and burn the *&@#&^@# out of my fingers. As a result, the staple is permanently ensconced in the fuse socket and the controller kicks out 8 mb/sec anytime I tell it to. I won't even tell you about the hard drive I threw out a second-story window. --------------------------------------------------------------------------