It seems like there are not many Oracle Restart environments (or there are but Oracle doesn’t pay too much attention to them). Otherwise, I can’t explain why I couldn’t find too much information about it in the documentation when it comes to upgrades.
So, to help all of you out there that do use it, here are some notes about the upgrade process.
When I prepared an upgrade document for a customer to plan their database upgrade from 18.104.22.168 to 18c, I ran into a few unclear issues. As I said, this is an Oracle Restart environment, so we have GI installed with ASM and a single Oracle database. To make things even more complicated, we also have Active Data Guard for this, while the DG side is also set up as Oracle Restart.
Here are the issues I had:
According to the GI documentation, before upgrading the GI we should run the cluster verification utility (cluvfy) to make sure that everything looks right. I wasn’t sure I actually need it (as the name of the utility contains “cluster” which Oracle Restart is not), but the documentation doesn’t specify what I should do when upgrading a stand-alone GI, so I tried.
When running cluvfy as described this is what I got:
oracle@db18 grid]$ ./runcluvfy.sh stage -pre crsinst -upgrade -src_crshome /oracle/product/grid -dest_crshome /oracle/product/18/grid -dest_version 22.214.171.124.0 ERROR: Parse error: 'crs' is an invalid argument Brief usage: crsctl query has releaseversion Lists the Oracle Clusterware release version crsctl query has softwareversion Lists the version of Oracle Clusterware software installed on the local node For complete usage, use: crsctl [-h | --help] For detailed help on each command and object and its options use: crsctl -h For example, crsctl relocate resource -h ERROR: Parse error: 'crs' is an invalid argument Brief usage: crsctl query has releaseversion Lists the Oracle Clusterware release version crsctl query has softwareversion Lists the version of Oracle Clusterware software installed on the local node For complete usage, use: crsctl [-h | --help] For detailed help on each command and object and its options use: crsctl -h For example, crsctl relocate resource -h PRVG-0712 : Cannot upgrade: Oracle Clusterware stack not running on this node. Verification cannot proceed
When I debugged cluvfy I saw that it tried to execute the command “crsctl query crs activeversion”. Apparently, this command doesn’t exist in stand-alone GI, so obviously it failed.
The solution (from an SR I opened with Oracle): don’t run cluvfy in Oracle Restart environment. Just run the upgrade and it will use cluvfy internally with the correct settings to check the GI.
Another tool that we suppose to use before upgrading GI is orachk. Again I wasn’t sure I need to run it (the document doesn’t say anything about Oracle Restart either), but it should describe required patches and settings, so I used it too. It did mention a few patches and some other things and I fixed/implemented them. But even after I did everything, I still got a few warnings at the end of the report:
============================================================ Node name - db18 ============================================================ Collecting - ASM Diskgroup Attributes Collecting - ASM initialization parameters WARNING - All the nodes not have active roles in a Flex Cluster. WARNING - Clusterware active version should be set to recommended value INFO - Queries for Monitoring Object Recompilation Before or After Upgrade INFO - Behavior Changes During/After Upgrade To Be Aware Of WARNING - ASM disks permissions and owner are not correct INFO - Be Aware of New Instant Client for Simplified Deployment
It would be nice if it would tell me what it got vs. what was expected (for example, ASM disks permissions), but it doesn’t. My guess is that this is similar to the cluvfy issue and it just gets unexpected output because this is a stand-alone GI.
Solution: No real solution yet, an SR is still open without relevant response so far. My guess is that I’ll get the same answer: don’t run orachk on Oracle Restart environments.
according to Oracle documentation, the ASM password file should be located in ASM in 18c. If we upgrade from 12.1 or 12.2, we should move the file to ASM prior to the upgrade and if we upgrade from 11.2, after the upgrade (probably because 11.2 doesn’t support having the password file in ASM). I upgraded the GI (which went smoothly by the way) and then tried to move the file. The documentation doesn’t say how we should do it, so I had to look it up. This can be done by the “pwcopy” command in ASMCMD. When I tried it I got an error (ORA-15124: ASM file name ‘+DATA/orapw+ASM’ contains an invalid alias name). I tried different paths with no luck.
Solution: according to the SR engineer, in Oracle Restart there is no need to copy the ASM password file to ASM. In any case, my command was wrong. Apparently you cannot have the symbol + in the path, it is allowed only before the diskgroup name, and this makes perfect sense. So eventually I did move the file to the ASM (to +DATA/ASM/orapwasm), and if you do this, don’t forget to configure that with srvctl modify asm -pwfile <path>.
The last issue I ran into was after the entire upgrade process. At this point I had a running 18.3 GI with a 18.3 database and everything looked just fine. The last step of my procedure was to patch everything to the latest RU (18.6 from April 2019 in my case). So I went and looked for the correct patch (GI RU, patch# 29301682) and checked the readme.
In the readme, under the table in section 1.1 (patch information) there is a footnote: “Opatchauto does not support patching in Data Guard environments See Installing patch in Standby-First Mode for more information”. I clicked the link (which links to a different section of the same readme) and it explains that:
- The standby database has to have patch 29301631 installed (which is the database RU 18.6)
- The datapatch script should not be executed on the standby
- The datapatch script should be executed on the primary only after the primary and all standby environments were patched with the RU.
This all makes sense and was the same for previous versions as well, but I don’t understand how this is related to opatchauto (maybe I’m missing something?). Also, there is no alternative way to patch. They do refer to note 1265700.1 which explains how to patch a standby database, but not in an Oracle Restart environment. The support engineer in the SR I opened referred me to note 2246888.1 to manually install the RU. This note also explains how to use opatchauto (which according to the readme cannot be used).
Solution: the support engineer told me to simply use opatchauto and install the patch (ignoring the footnote) but not to run datapatch on the standby (obviously). So this is what I did:
- Install the RU on the standby using root and opatchauto
- Install the RU on the primary using root and opatchauto
- Execute datapatch on the primary
This worked smoothly, I have no idea why the footnote is there.
Everyone says RTFM when you ask questions that are documented. In this post I say FTFM (the first F stands for Fix). If the documentation doesn’t contain my environment, how should I use it to know what to do?
In this scenario the issues were not critical and even without consulting support I could have handled them, but stuff is clearly missing from the documentation and if I can officially use a stand-alone GI the GI documentation should include this environment too.