MySQL Master – Master recovery

As it happened the 2nd time now that I need to fix a problem with a master-master MySQL (MariaDB) setup on one of my machines, here is a step by step tutorial to fix it. I asume that you already know which machine you still trust to have the “valid” data andthe former connectionwas working.


# MASTER_LOG_POSITION = Read_Master_Log_Pos
# MASTER_LOG_FILE = Master_Log_File


mysql -e “STOP SLAVE;”;
mysql -h $SLAVE -e “STOP SLAVE;”;
rm -f /tmp/mysqldump.sql 2>/dev/null
mysqldump –quick –all-databases –all-tablespaces –opt –master-data=1 –apply-slave-statements –user=root –lock-all-tables –flush-privileges –flush-logs –events –tz-utc –result-file=/tmp/mysqldump.sql

mysql -h $SLAVE < /tmp/mysqldump.sql # or do this locally on the $SLAVE
mysql -e “UNLOCK TABLES;”;
mysql -h $SLAVE -e “UNLOCK TABLES;”;

mysql -h $SLAVE -e “reset slave”

mysql -e “reset slave”

mysql -h $SLAVE -e “START SLAVE;”;
mysql -e “START SLAVE;”

mysql -h $SLAVE -e “FLUSH LOGS;”
mysql -e “FLUSH LOGS”

Posted in Infrastructure, openSUSE, SUSE Linux Enterprise | Tagged , , , , , | Leave a comment

Fixing virtual disk problems

What if your grub loader on a virtual machine is broken – or you need to repair or resize a filesystem in one of your virtual machines?

You want to check the storage of your virtual machines on your virtualization server.

  1. shutdown the virtual machine (or virsh destroy $machine) to make sure nothing else is using the file system
  2. Check for the next available loopback device on your virtualization server: losetup -f  (let’s asume /dev/loop0 for the next steps)
  3. Now attach the device (disk) from your virtual machine as loopback device on your server: losetup /dev/loop0 /dev/mapper/$DEVICE
  4. Use kpartx to discover and device-map the partitions: kpartx -av /dev/loop0
  5. Check the partitioning: fdisk -l /dev/loop0
  6. Extend the partitioning, if you like/need (see the man pages for fdisk and resize2fs, if needed)
  7. Run fsck on the filesystem: fsck -vc /dev/mapper/loop0p2
  8. Mount the filesystem into your local system: mount /dev/mapper/loop0p2 /mnt
  9. Do whatever you want there (or chroot into the filesystem after bind-mounting /proc /sys and /dev for example)
  10. Detach the device from your local system: umount /mnt
  11. Delete the partition mapping for the device: kpartx -dv /dev/loop0
  12. Detach the loop device from your local system: losetup -d /dev/loop0
Posted in Infrastructure, openSUSE, SUSE Linux Enterprise, virtualization | Tagged , , , | Leave a comment

RackTables: Permissions

Once you figured out how it works, everything is easy – right? As it took me some minutes and some googl’ing to find out how the permission system (Main page -> Configuration -> Permissions) in Racktables work, here is my short summary:

  • In general, Racktables’ permission engine works “top to bottom”, so the first rule that matches will win – others are simply ignored.
  • you can combine rules via “and“, “or” and “()” to keep it simpler
  • comments can be done inside the page, starting with the hash mark
  • just have a 2nd tab open and browse to the page/tab you want to include/exclude in your permission section. If I go to “Configuration” -> “Tag tree” => “Edit”, I end up with an URL like this: /index.php?page=tagtree&tab=edit – If I want to block or allow access to this page, the rule would look like

allow {$page_tagtree} and {$tab_edit}

Below is my current rule set – including comments. This allows the super admins to edit everything (just add your user name as “allow {$username_lars}” directly below the “allow {$userid_1}” entry to gain super user privileges) – and the rest of the users can loook at all the other pages, but not edit anything.

# the super admins:
allow {$userid_1}
# general rules for all others
# allow to see the “read only” pages per default
allow {$page_reports} or {$tab_default}
# those are always read/write pages: deny
deny {$tab_rackcode} or {$tab_system} or {$page_config} or {$tab_tags}
# permissions per page
# IPv4 space
deny {$page_ipv4space} and ( {$tab_newrange} or {$tab_manage} )
allow {$page_ipv4space}
# IPv6 space
deny {$page_ipv6space} and ( {$tab_newrange} or {$tab_manage} )
allow {$page_ipv6space}
# single Racks
deny {$page_rack} and ( {$tab_tags} or {$tab_design} or {$tab_edit} or {$tab_problems} )
allow {$page_rack}
# whole Rackspace
deny {$page_rackspace} and ( {$tab_editlocations} or {$tab_editrows} )
allow {$page_rackspace}
# VLANs (8021q)
deny {$page_8021q} and ( {$tab_vdlist} or {$tab_vstlist} )
allow {$page_8021q}
# Files
deny {$page_files} and {$tab_manage}
allow {$page_files}
deny {$page_ipv4slb} and ( {$tab_defconfig} or {$tab_new_vs} or {$tab_new_vsg} or {$tab_new_rs} )
allow {$page_ipv4slb}
# Cables
deny {$page_cables} and {$tab_heaps}
allow {$page_cables}
# Objects
deny {$page_depot} and {$tab_addmore}
allow {$page_depot}
# special page (a separate extension to include our monitoring as separate tab)
allow {$page_object} and {$tab_Monitor}

As result, my team is able to work with RackTables (as they are added as super admins) – and all the rest can check what we are doing (as we have LDAP authentication enabled).

After editing, you are enforced to verify your changes before you can safe – but there is more: please also check Main page -> Reports -> RackCode for errors that are not found by the verify script (State: RackTables Version 0.20.10).

Posted in Infrastructure, network | Tagged , , , | 2 Comments

Datacenter in a box

Network setup

Running multiple VMs on one bare metal system like a datacenter.

Today I want to tell you something about my “datacenter in a box” setup, that I’m currently running on my Root Server at Hetzner.

The base system is a bare metal EX root server with 32GB RAM and 2x2TB hard disk running SLES 11 SP3 with KVM support.

On top of that, there are multiple virtual machines up and running that connect to the world via an internal gateway. This gateway runs the SuSEFirewall2 (to forward some ports directly to some VMs, xinetd (for port redirection) and haproxy (mainly for WWW pages).

The virtual machines just use a private network – all external traffic is handled by the gateway.

I defined 3 RAID 1:

  1. /boot with 1 GB space (/dev/md0)
  2. swap with 16GB (/dev/md1) – might be reduced next time, but as long as I have some space, who cares 😉
  3. the “rest” (/dev/md2 with ~1,8TB), setup as LVM volume named “lss” (I’m using the machine name here – using something like “system” might be normal, but you might run into trouble once you mount the disk in another PC)

Continue reading

Posted in network, openSUSE, SUSE Linux Enterprise | Tagged , , | Leave a comment

Piping commandline output through while loop

As I often forget the exact syntax – here is a working example (using osc to search for build results of a specific package in all projects that have this package):

while read line ; do case $line in [a-z]*) echo; echo $line; osc results $line ;; esac; done < <(osc se mosh)

so the following chars (and the whitespaces between them) are important at the end of the wile loop:

< <()

Aside | Posted on by | Tagged | Leave a comment

Trap: failover bonding and miimon on directly connected machines

For HA reasons, I setup a cluster of two nodes that are using “bond0” for the external networks and “bond1” for the communication between the two hosts.

As I setup the machines in the standard way, the bonding module options for the bond1 interface was a simple “miimon=100”, which monitors the link activity – so a broken or removed network cable should bring the interface down and trigger a failover.

Now, as I wanted to test the setup, I did what I always do on such bonding devices to test the failover:

  1. find out the available slaves by cat /sys/class/net/bond1/bonding/slaves
  2. find out the active slave by cat /sys/class/net/bond1/bonding/active_slave
  3. trigger a failover by echo eth0 > /sys/class/net/bond1/bonding/active_slave
  4. test, if ping still works from outside and inside

But for the direct connection, this did not work. So in the first run I assumed a broken network cable. But standing at the back of the machines and seeing all interfaces blinking, I got another idea: if you monitor your bonding devices via miimon, the bonding driver has no need to move to another device if the interface reports an active link. …and exactly that happened to the other machine that had not been triggered by myself to switch the active interface. So the monitored interface on both hosts was still ok – but as the active slave switched on one host and the bonding was setup as active-backup bonding, the other node still ran the IP addresses on the other interface, so the network traffic did not work any more.

As result, I configured this bonding interface now via the “mode=802.3ad” (or easier: “mode=4”), meaning “Dynamic link aggregation”. So now I do not even have a fail-over setting, as wanted, but also an increased bandwidth, allowing twice the amount of packages going over the line. To speed up the boot time, I also set the additional option “primary=eth0” on both hosts, which will bring up those devices as primary slaves on each boot.

…and indeed: the test above now succeeds. 🙂

Posted in network, openSUSE, SUSE Linux Enterprise | Tagged , , , , | Leave a comment

Running csync2 with different hostnames

The paper for csync2 is a bit unclear to me, when it comes to hosts with different hostnames for two or more interfaces. I often ended up in trying to solve my problem in the following way, that luckily solves my problems without big troubles.

Let’s assume, you have two hosts with their public names and, which are also connected via a second network and their hostnames stay more or less the same, but the domain changes: and (yes: this works 😉

To avoid confusion and longer try and error setups, I use the following way:

In the /etc/csync2/csync2.cfg file, use the line:
host csync2test1 csync2test2;
…and adapt the xinetd configuration file /etc/xinetd.d/csync2 to use:
server_args = -N csync2test1 -i
server_args = -N csync2test2 -i
according to your hosts.

Now create an alias for your csync2 command line call in ~/.alias (or wherever you like to place such alias definitions – I used the one that is included in SUSE per default here):
alias csync2="csync2 -N csync2test1"
alias csync2="csync2 -N csync2test2"
on the other node.

For keeping it save (and avoid that someone uses this ugly hostname), just also reserve the DNS entry of this new hosts (csync2test1 and csync2test2) in the network you are using to synchronize via csync2.

After restarting your xinetd servers and sourcing the alias file via
source ~/.alias
everything should work as expected when you call for example csync2 -xv (assuming you followed the rest of the paper correctly).

Posted in network, openSUSE, SUSE Linux Enterprise | Tagged , , , | Leave a comment