Centos5: OCFS2 cluster FS on dual primary DRBD: part 2 – setup dual primary DRBD

This is part 2 where we actually install and configure DRBD devices on top of LVM logical volume

Prerequisites:

  • same 2 servers where we just configured with LVM over RAID1,
  • internet connectivity (to download drbd source and additional RPMs if need be)

Time required: up to 2 hours (depends on many factors)
1.Install DRBD tools and kernel modules

yum -y install drbd83 kmod-drbd83

2. Now we have DRBD tools and kernel module installed along with some set of initial configuration in /etc/drbd*.
Note: you have to have /etc/drbd.d configuration idential on both nodes in order for this setup to work. Also both notes should be reachable to each other via network by hostnames, so I suggest you just add them to /etc/hosts.

127.0.0.1 localhost
10.0.0.10 node1
10.0.0.11 node2

Test by pinging short name. Make sure you have corresponding firewall rules in place to allow communication between hosts – remember that Centos have restrictive firewall running by default (cost me a few menutes to remember this one).
3. Configuration files:
/etc/drbd.conf

# You can find an example in  /usr/share/doc/drbd.../drbd.conf.example
include "drbd.d/global_common.conf";
include "drbd.d/*.res";

– nothing to edit there.
Now, in /etc/drbd.d there is global_common.conf – settings that are global to the server and applied to all resources

global {
   usage-count yes;
   # minor-count dialog-refresh disable-ip-verification
}
common {
  protocol C;
  handlers {
     pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; \
     /usr/lib/drbd/notify-emergency-reboot.sh; \
     echo b > /proc/sysrq-trigger ; reboot -f"
;
     pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; \
     /usr/lib/drbd/notify-emergency-reboot.sh; \
     echo b > /proc/sysrq-trigger ; reboot -f"
;
     local-io-error "/usr/lib/drbd/notify-io-error.sh; \
     /usr/lib/drbd/notify-emergency-shutdown.sh; \
     echo o > /proc/sysrq-trigger ; halt -f"
;
     # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
     # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
     # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
     # before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
     # after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
      }
     startup {
        # wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb;
        }
     disk {
        # on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes
        # no-disk-drain no-md-flushes max-bio-bvecs  
        }
     net {
        # snd‐buf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers
        # max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret
        max-epoch-size 8000;
        # after-sb-0pri
        # after-sb-1pri
        # after-sb-2pri
        # data-integrity-alg no-tcp-cork
        }
     syncer {
        # rate after al-extents use-rle cpu-mask verify-alg csums-alg
        rate 128M;
        }
}

Most of commented out parameters are used for detailed performance/behavior tuning of DRBD and are well out of the scope of this article (and where uncommented are set to defaults). What’s important are these 2 lines
protocol C and rate 128M;. The latter depends on the speed of the connection between the nodes and will define the speed of resource syncronization. For my 1gbps direct switch connection (for the DRBD only dedicated sement) I choose 128Mbps (basically it’ll make sense to do some testing and select your speed based on your network configuration/speed.
And the last configuration file will be our resource config (in our case shared.conf will be the config file for our resource).

resource shared {
   startup {
      become-primary-on both;
      }
   meta-disk internal;
   device /dev/drbd1;
   syncer {
      verify-alg sha1;
      }
   net {
      allow-two-primaries;
      after-sb-0pri discard-zero-changes;
      after-sb-1pri consensus;
      after-sb-2pri disconnect;
      }
   on node1 {
      disk /dev/vg0/shared;
      address 10.0.0.10:7789;
      }
   on node2 {
      disk /dev/vg0/shared;
      address 10.0.0.11:7789;
      }
}

This is also pretty self explanatory, just a few key points. Device name /dev/drbd1 is just the matter of convenience, might be as well /dev/MyPrecious. IPs, hostnames should be assigned properly. Port number is also adjustable for your convenience.
4. Lets start.

node1#
drbdadm create-md shared
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success
node1#
cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16

 1: cs:WFConnection ro:Secondary/Unknown ds:Diskless/Inconsistent C r----
    ns:291712 nr:0 dw:0 dr:297152 al:0 bm:17 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

(don’t mind number of sectors – it’s lab servers screenshots). Now on node2

node2#
 drbdadm create-md shared
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success
node2#
drbdadm up shared
node2#
cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16

 1: cs:WFConnection ro:Secondary/Unknown ds:Inconsistent/DUnknown C r----
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:297152

Back to node1

node1#
 drbdadm -- --overwrite-data-of-peer primary shared

node1#
cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16

 1: cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r----
    ns:24456 nr:0 dw:0 dr:32640 al:0 bm:1 lo:1 pe:12 ua:256 ap:0 ep:1 wo:b oos:273056
        [>...................] synced:  9.6% (273056/297152)K delay_probe: 1
        finish: 0:00:11 speed: 24,096 (24,096) K/sec
node2#
cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16

 1: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r----
    ns:0 nr:85056 dw:85056 dr:0 al:0 bm:5 lo:1 pe:2467 ua:0 ap:0 ep:1 wo:b oos:212096
        [=====>..............] synced: 30.2% (212096/297152)K queue_delay: 15.2 ms
        finish: 0:00:09 speed: 21,264 (21,264) want: 131,072 K/sec

Give it some time to sync – depending on disk size and network speed initial sync could take a while.
After initial sync is done…

node1#
cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16

 1: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
    ns:297152 nr:0 dw:0 dr:297152 al:0 bm:19 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

node2#
  cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16

 1: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r----
    ns:0 nr:297152 dw:297152 dr:0 al:0 bm:19 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

All done, and you are ready to switch to “dual primary” mode. On node2

node2#
 drbdadm primary shared
node2#
 cat /proc/drbd
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:16

 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r----
    ns:0 nr:297152 dw:297152 dr:0 al:0 bm:19 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0

You are done.
Notes:
– At any time during setup process you can verify disk state by issuing drbdadm dstate shared, or connection state with drbdadm cstate shared.
– If for some reason you will see Diskless state for one of the nodes you can try service drbd restart to reset resource/driver state.
Since this device is writable from both nodes it has to be formatted with special cluster filesystem that allows concurrent rw access to the disk.OCFS2 is one of these.
We are almost there – only 1 step left to have real cluster OCFS2 shared resource configured for our nodes.

Share Button

Leave a Comment

NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">