Playing with RAC Voting Disks

Lately I ran into a case of losing of a voting disk, so this lead me to create a GI environment on my laptop (using virtual box) and play with voting disks failures. In this post I’ll explain what I did and what happened.

Background

I’m not going to explain too much about the voting disks here, but just mention that in order for a node to be up, it has to have access to a majority of the voting disks. This means that if we have 3 voting disks, a node needs access to 2 of them. Otherwise, the node will reboot and the GI stack won’t start until it gain access to at least 2 voting disks.

Because of this rule, there must be an odd number of voting disks: 1, 3, or 5. The number depends on the diskgroup configuration. If the diskgroup for the OCR and voting disks is configured as “external redundancy”, there will be a single voting disk. For a “normal redundancy” diskgroup we will have 3 voting disks (so we need 3 failure groups as well, and not just 2), and for “high redundancy” we’ll have 5 voting disks (and will need 5 failure groups).

The Environment

In this scenario, I will use a normal redundancy group with 3 voting disks. For this I simply installed RAC with a single node using Virtual Box. I also don’t really need a database, so I just installed GI and didn’t install Oracle database software.

In the Virtual Box, I configured three 2GB disks for the OCR and voting disks diskgroup and configured them as “hot-pluggable” so I could disconnect and connect them while the machine is running.

I’m using Oracle linux 6.7 with Oracle 11.2.0.4 and the latest PSU (11.2.0.4.171017). I also configured the diskgroup to have DISK_REPAIR_TIME of 10 minutes. The DISK_REPAIR_TIME setting configures how Oracle behaves in a case of a lost disk. Once a disk is inaccessible, Oracle will buffer the changes that were supposed to be written to this disk for a specific amount of time (set by DISK_REPAIR_TIME). If the disk becomes available within this time, Oracle will simply write the changes to this disk and bring it to date. If the disk is not available after this time has passed, Oracle will force drop it, causing a full rebuild of the disk when it becomes available. In my case, I wanted to check how it affects voting disks and didn’t want to wait 3.6 hours (the default) so I changed it to 10 minutes.

I started with a working environment where “crsctl query css votedisk” shows:


[oracle@RacSingle ~]$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 54d05cdbfed24fd2bf73c98723c63012 (ORCL:OCRVT1) [OCRVT]
2. ONLINE 3c99e9b0ec144f83bfea775f2f9d3356 (ORCL:OCRVT3) [OCRVT]
3. ONLINE 8f1a57d2343c4f1fbf80d81034dbc7a7 (ORCL:OCRVT2) [OCRVT]
Located 3 voting disk(s).

Disconnecting a Disk

The next step was to disconnect one disk from the group, so I disconnected disk2. After a few seconds the crsctl command showed this:


[oracle@RacSingle ~]$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 54d05cdbfed24fd2bf73c98723c63012 (ORCL:OCRVT1) [OCRVT]
2. ONLINE 3c99e9b0ec144f83bfea775f2f9d3356 (ORCL:OCRVT3) [OCRVT]
3. PENDOFFL 8f1a57d2343c4f1fbf80d81034dbc7a7 (ORCL:OCRVT2) [OCRVT]
Located 3 voting disk(s).

And after another few seconds, it showed this:


[oracle@RacSingle ~]$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 54d05cdbfed24fd2bf73c98723c63012 (ORCL:OCRVT1) [OCRVT]
2. ONLINE 3c99e9b0ec144f83bfea775f2f9d3356 (ORCL:OCRVT3) [OCRVT]
Located 2 voting disk(s).

In the GI alert log I saw this:


[cssd(4528)]CRS-1649:An I/O error occured for voting file: ORCL:OCRVT2; details at (:CSSNM00059:) in /oracle/app/grid/log/racsingle/cssd/ocssd.log

And in the ASM alert log I saw this:


WARNING: Write Failed. group:1 disk:1 AU:1 offset:1044480 size:4096

WARNING: GMON has insufficient disks to maintain consensus. Minimum required is 2: updating 2 PST copies from a total of 3.

NOTE: cache closing disk 1 of grp 1: OCRVT2

Now, this is not what I expected to see. I expected to see error messages that would show beyond any doubt that something bad has happened. Loosing a voting disk is not a “warning”, it’s critical. Besides that, I would expect to see the voting disk as OFFLINE as long as we’re in the DISK_REPAIR_TIME period, while in fact, it simply disappears.

As I did expect, after 10 minutes the disk was dropped by the ASM with the following messages in the alert log:


WARNING: Disk 1 (OCRVT2) in group 1 will be dropped in: (234) secs on ASM inst 1
Fri Nov 10 00:04:43 2017
WARNING: Disk 1 (OCRVT2) in group 1 will be dropped in: (51) secs on ASM inst 1
Fri Nov 10 00:07:46 2017
WARNING: PST-initiated drop of 1 disk(s) in group 1(.322522870))
SQL> alter diskgroup OCRVT drop disk OCRVT2 force /* ASM SERVER */

Number of Expected Voting Disks

Another thing that I’m missing, and maybe it’s because I simply don’t know the way to do it, is to ask Oracle for the expected number of voting disks. In this case I can guess that if my system is up and I see 2 disks, there should be probably 3 (if I had 5 voting disks, the system wouldn’t start with only 2 available). But how about a case where I see 3 voting disks? Are these all the voting disks or maybe these are 3 out of 5? I’m not aware of a way to get the number of voting disks Oracle expects to see. I can, of course, check the configuration of the diskgroup to realize the expected number, but I wish there was something more obvious.

Adding the Disk Back

In any case, after such a disk failure, we should add the disk back to the diskgroup (or another disk instead). Remember that if you add the same disk, the header contains asmlib (if you use asmlib) and ASM diskgroup information, so this is what you’ll get when adding it to the diskgroup (even though it’s not part of the group any more):


SQL> alter diskgroup ocrvt add disk 'ORCL:OCRVT2';
alter diskgroup ocrvt add disk 'ORCL:OCRVT2'
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15033: disk 'ORCL:OCRVT2' belongs to diskgroup "OCRVT"

In order to add the disk, overwriting the old information, you should use the “force” option:


SQL> alter diskgroup ocrvt add disk 'ORCL:OCRVT2' force;

Diskgroup altered.

And now the disk is added and after a while you’ll see the 3rd voting disk as well:


[oracle@RacSingle ~]$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 54d05cdbfed24fd2bf73c98723c63012 (ORCL:OCRVT1) [OCRVT]
2. ONLINE 3c99e9b0ec144f83bfea775f2f9d3356 (ORCL:OCRVT3) [OCRVT]
3. ONLINE 129242c4f8254f30bfff3a4f719769d6 (ORCL:OCRVT2) [OCRVT]
Located 3 voting disk(s).

2 thoughts on “Playing with RAC Voting Disks”

  1. I think you meant to write “an odd number of voting disks”

    Currently it says “even”

    On Wed, Nov 15, 2017 at 9:07 AM amitzil – Oracle DBA blog wrote:

    > amitzil posted: “Lately I ran into a case of losing of a voting disk, so > this lead me to create a GI environment on my laptop (using virtual box) > and play with voting disks failures. In this post I’ll explain what I did > and what happened. Background I’m not going to exp” >

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s