Searching google for answers is something we all do with every question or request for information we have. I sometimes wonder how much of this data is simply wrong. This can be because people write stuff they don’t know about, people are sharing false information (that was published by others by accident or on purpose), people with agenda that don’t care about facts, and so on.
This is the same in the professional world as well. Anyone can start a blog (I did it, so there is your proof) and anybody can go to forums and answer questions. With the professional world (as well as in other areas) you need to know if you can trust this person or not. There are several database related sites on the internet that me (and many others) don’t trust and don’t like (don’t ask me for names please), while other sites are excellent and very trustful. The junior DBA or someone that needs answers but is not a DBA at all, wouldn’t know which site is which. they will probably click the few first that appear on google.
There is usually no way to know who is behind a site, blog or answer. There is usually no way to validate that they are what they claim, so we basically need to decide for ourselves and take chances. For example, I claim to be an Oracle ACE and a senior consultant. You can check that I’m an ACE on Oracle’s site, and you can check my past positions on LinkedIn, for example. But does that mean I know what I’m talking about? Does the fact that I am a consultant for almost 20 years say I’m a good one?
I have several examples for stuff on the internet and it sometimes annoys me to see these things. The worst example I had was a few years ago with one of my biggest clients. They have a NetApp storage (they are using NFS) and they had a network problem that disconnected the storage from database server. This is a known issue, once the network is disconnected, NetApp keeps a certain lock on the files and Oracle can’t start because it has a problem with the control files (the error in the alert log says that the control file cannot be locked because it’s already in use). When the sysadmin (a great guy) called me, I told him that I know the problem, but I didn’t remember the command that he needs to use in the NetApp console to release the lock. I was driving and my laptop wasn’t with me so I couldn’t look it up. He opened google and searched the Oracle error message to look for the command. I was on the line when he said that there is a forum thread with the same error, so he read it to me. It was something like this (I’m paraphrasing as I don’t remember, but this is how it sounded to me):
q: “I have a problem starting my database. I had a network problem, so my NFS got disconnected from the database server and now I cannot start the database. Oracle claims that it cannot lock the control files, and I see them in the directory. What should I do?”
a: “Delete the control files and try again”
“WHAT?” I shouted into the phone “DID HE JUST SAY TO DELETE THE CONTROL FILES? Whatever you do or try, DO NOT delete the control files”. The sysadmin said “Of course I won’t unless you tell me”, but I was shocked that this guy suggested it in a forum just like that.
For the “fun” of it, he continue reading. The response was “OK, now Oracle claims that it can’t find the control files”, how surprising.
At the end we found the command. The command “priv set diag; lock status -f” will show the existing locks and “priv set diag; sm_mon -l <server_ip>” will release the locks for the specific IP.
Another thing that I saw lately was a discussion on LinkedIn. This is not a case of people suggesting bad things, but it will explain that using forums or even LinkedIn might be problematic. If, by any chance, you know the discussion or took part of it, I just want to explain the problem, I don’t mean to insult or criticize anyone (as opposed to the person who suggested to delete the control files in the forum thread).
The thread talked about a problem when starting the database. The error was “ORA-01113: file 1 needs media recovery”, and file 1 is SYSTEM tablespace, so this is indeed a problem. The thing with discussions like that is that everyone (me included) are trying to give a short answer to help in this situation. Well, this is tricky since there are many reasons for the system datafile to need recovery, these are a few:
- The file got completely corrupted because of disk problem or whatever. In this case restore will probably be the only solution.
- The database crashed while it was in “backup mode”. In this case all we need to do is to execute “alter database end backup;” and open the database (it actually happened to me in the past). Just as a side note, if you read my post how to diagnose a problem this is an example for why I first check the logs before I do anything. Starting a recovery in this case will be the wrong solution. Doing the right thing will bring the database up and running in no time.
- The database simply crashed and there was a write problem to the datafiles. A regular database recovery will take the required information from the redo logs and will solve the problem.
You see what I mean? Even if I wrote that in the discussion, many other people will write other stuff as well, everyone with their own experience and cases they saw in the past. Some of the answers were actually suggesting to check the alert log and get some more information, which is great, but with so many answers, what should the guy who asked the question do?
This is not simple, the amount of data out there is unbelievable, and some (if not most) is inaccurate (to say the least). We all need to realize who we can trust. We need to be suspicious when asking something in a forum, LinkedIn or Facebook (or even Oracle forums), and think for ourselves (or consult someone) if the answer looks reasonable.