Centos5: OCFS2 cluster FS on dual primary DRBD: why it doesn’t mount OCFS2 on boot?

I have couple of these – how to configure it pretty easy to found (may be later I’ll go over the setup here too). The problem is – with everything else running perfectly fine OCFS2 partitions are never mounted automagically on boot. Just recently I found out why

In SysV type init scripts (RedHat,Debian, whole bunch of other Linxu distros and Solaris use it) – each startup script has specific precedence defined by the number. For example runlevel 2 (system + networking) for the CentOS in question described in /etc/rc2.d, looks like this:

S means Start, K – Kill, number describes the number in the execution sequence – the lower the number the sooner script will be executed when system is switched into this level (that means on boot, or manually via telinit). K-scripts are executed when system is leaving this runlevel (shutdown, or manual), following the same rules.

What we are interested in particular are – DRBD – the low level block devices and OCSF2 – cluster file system that is residing on these devices.

For some reason precedence is incorrect. First, system is trying to mount OCFS2 file systems, then start DRBD network block devices.
How can we fix this?
We can try to decrease DRBD sequence number but then we don’t want system to attempt to start it before networking. So, more pratcical wound be to increase OCFS2 sequence number, so it will be started AFTER DRBD.

mv S25ocfs2 S75ocfs2

We should also repeat the procedure for the runlevel 3 (system with networking and network filesystems) and runlevel 4 and 5 ( just to be consistent).

cd /etc/rc.d; for i in 2 3 4 5; do cd rc$i.d; mv S25ocfs2 S75ocfs2; cd ..; done

Now, on boot, system will try to start DRBD and then attempt to mount OCFS2 filesystem, located on this network block device.

But wait, there is more. What happens on system shutdown is the same sequentual subsystems stop, what will happen if DRBD will be stopped BEFORE OCFS2 will be unmounted?
Right, FS will be left in inconsistent state and on next boot most likely you will face the problems.
Lets check if this is true

This is exactly what happen. First we stop DRBD and then OCFS2 becomes inconsistent. This problem has to be corrected the same way – we need to decrease OCFS2 script sequience number so it will be stopped BEFORE DRBD, not after (there is no point to try to correctly unmount FS when block device is gone, isn’t it?).

for i in 0 1 6; do cd rc$i.d; mv K19ocfs2 K03ocfs2;cd ..; done

Now, there is a chance that OCFS2 will be started properly after DRBD, and stopped before it, not the other way around.

Leave a Comment

NOTE - You can use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">