Tag Archives: network manager

First thoughts and tips on EL 7

Generally one RHEL/CentOS/Scientific Linux (aka EL) release isn’t radically different to another, however the introduction of EL 7 is a bit of a shake up, introducing systemd, which means new init system, new ways of reading logs plus dropping some older utilities you may rely on and introducing new defaults.

I’m going through and upgrading some of my machines currently so I’ve prepared a few tips for anyone familiar with EL 4/5/6 and getting started with the move to EL 7.

 

systemd

The big/scary change introduced by RHEL 7 is systemd – love it or hate it, either way it’s here to stay. The good news is that an existing RHEL admin can keep doing most of their old tricks and existing commands.

Red Hat’s “service” command still works and now hooks into either legacy init scripts or new systemd processes. And rather than forcing everyone to use the new binary logging format, RHEL 7 logs messages to both the traditional syslog plain text files, as well as the new binary log format that you can access via journalctl – so your existing scripts or grep recipes will work as expected.

Rather than write up a whole bunch about systemd, I recommend you check out this blog post by CertDepot which details some of the commands you’ll want to get familiar with. The Fedora wiki is also useful and details stuff like enabling/disabling services at startup time.

I found the transition pretty easy and some of the new tricks like better integration between output logs and init are nice changes that should make Linux easier to work with for new users longer term thanks to better visibility into what’s going on.

 

Packages to Install

The EL minimum install lacks a few packages that I’d consider key, you may also want to install them as part of your base installs:

  • vim-enhanced – No idea why this doesn’t ship as part of minimum install so as a vim user, it’s very, very frustrating not having it.
  • net-tools – this provides the traditional ifconfig/route/netstat family of network tools. Whilst EL has taken the path of trying to force people onto the newer iproute tools there are still times you may want the older tools, such as for running older shell scripts that haven’t been updated yet.
  • bind-utils – Like tools like host or nslookup? You’ll want this package.
  • mailx – Provides the handy mail command for when you’re debugging your outbound mail.

 

Networking

Firstly be aware that your devices might no longer be simple named ethX, as devices are now named based on their type and role. Generally this is an improvement, since the names should line up more with the hardware on big systems for easier identification, and you can still change the device names if you prefer something else.

Changing the hostname will cause some confusion for long time RHEL users, rather than a line in /etc/sysconfig/network, the hostname is now configured in /etc/hostname like other distributions.

The EL 7 minimum installation now includes NetworkManager as standard. Whilst I think NetworkManager is a fantastic application, it doesn’t really have any place on my servers where I tend to have statically configured addresses and sometimes a few static routes or other trickiness like bridges and tunnels.

You can disable network manager (and instead use the static “network” service) by running the following commands:

systemctl stop NetworkManager
systemctl disable NetworkManager
systemctl restart network

Red Hat have documentation on doing static network configuration, although it is unfortunately weak on the IPv6 front.

Most stuff is the same as older versions, but the approach of configuring static routes bit me. On EL 5 you configured a /etc/sysconfig/network-scripts/route-ethX file to define IPv4 and IPv6 routes that should be created when that interface comes up. With EL7 you now need to split the IPv4 and IPv6 routes apart, otherwise you just get a weird error when you bring the interface up.

For example, previously on an EL 5 system I would have had something like:

# cat /etc/sysconfig/network-scripts/route-eth1
10.8.0.0/16 via 10.8.5.2 dev eth1
2001:db8:1::/48 via 2001:db8:5::2 dev eth1
#

Whereas you now need something like this:

# cat /etc/sysconfig/network-scripts/route-eth1
10.8.0.0/16 via 10.8.5.2 dev eth1
#

# cat /etc/sysconfig/network-scripts/route6-eth1
2001:db8:1::/48 via 2001:db8:5::2 dev eth1
#

Hopefully your environment is not creative enough to need static routes around the place, but hey, someone out there might always be as crazy as me.

 

Firewalls

EL 7 introduces FirewallD as the default firewall application – it offers some interesting sounding features for systems that frequently change networks such as mobile users, however I’m personally quite happy and familiar with iptables rulesets for my server systems which don’t ever change networks.

Fortunately the traditional raw iptables approach is still available, Red Hat dragged their existing iptables/ip6tables service scripts over into systemd, so you can still save your firewall rules into /etc/sysconfig/iptables and /etc/sysconfig/iptables respectively.

# Disable firewalld:
systemctl disable firewalld
systemctl stop firewalld

# Install iptables
yum install iptables-service
systemctl enable iptables
systemctl enable ip6tables
systemctl start iptables
systemctl start ip6tables

 

LAMP Stack

  • Apache has been upgraded from 2.2 to 2.4. Generally things are mostly the same, but some modules have been removed which might break some of your configuration if you take a lift+shift approach.
  • MySQL has been replaced by MariaDB (community developed fork) which means the package names and service have changed, however all the mysql command line tools still exist and work fine.
  • PHP has been upgraded to 5.4.16 which a little bit dated already – over the lifespan of EL 7 it’s going to feel very dated very quickly, so I hope Red Hat puts out some php55 or php56 packages in future releases for those whom want to take advantage of the latest features.

 

Other Resources

  1. If you haven’t already, check out Red Hat’s release notes,they detail heaps of new features and additions to the platform.
  2. To learn more about the changes from previous releases, check out Red Hat’s Migration Guide as a starter.
  3. My guide to running EL 7 on EL 5 as a Xen guest for those of you running older Xen hypervisors.

A tale of two route controllers

Ever since I built a Linux 3.2.0 kernel for my Debian Stable laptop to take advantage of some of the newer kernel features, I have been experiencing occasional short periods of disconnect/reconnect on the Wi-Fi network.

This wasn’t happening heaps (maybe a couple times a day), but it was starting to get annoying, so I decided to sort it out properly and do a kernel driver and microcode update for my Intel Centrino Wireless-N 1000 card.

The firmware/microcode update was easy enough, simply a case of downloading the latest code from Intel and installing into /lib/firmware/ – the kernel driver does the rest, finding it and loading it into the Wi-Fi card at boot time.

Next step was building a new kernel for my machine, I went through and tuned the module selection very carefully tossing out all the hardware my laptop will never use, as I was getting sick of wasting lots of disk space on the billion+ device modules in Linux these days.

After finding that my initial kernel lacked support for my video card (turns out the Lenovo X201i laptops still use AGP-based i915 cards, I was assuming PCIe) I got a working kernel up and running.

Except that my Wi-Fi stability problem was worse than ever, instead of losing connectivity every few hours, it was now doing so ever few minutes. :-(

The logs weren’t particularly helpful – NetworkManager likes to give reason numbers but I couldn’t easily find a documented explanation of these (but maybe I’m looking in the wrong place).

19:44:36 NetworkManager[1650]: <info> (wlan0): device state change: 8 -> 9 (reason 5)
19:44:36 NetworkManager[1650]: <warn> Activation (wlan0) failed for access point (b201)
19:44:36 NetworkManager[1650]: <warn> Activation (wlan0) failed.
19:44:36 NetworkManager[1650]: <info> (wlan0): device state change: 9 -> 3 (reason 0)
19:44:36 NetworkManager[1650]: <info> (wlan0): deactivating device (reason: 0).
19:44:36 NetworkManager[1650]: <info> (wlan0): canceled DHCP transaction, DHCP client pid 3354
19:44:36 kernel: [  391.070772] wlan0: deauthenticating from 00:0c:42:67:8b:bc by local choice (reason=3)
19:44:36 kernel: [  391.185461] wlan0: moving STA 00:0c:42:67:8b:bc to state 2
19:44:36 kernel: [  391.185466] wlan0: moving STA 00:0c:42:67:8b:bc to state 1
19:44:36 kernel: [  391.185470] wlan0: moving STA 00:0c:42:67:8b:bc to state 0
19:44:36 wpa_supplicant[1682]: CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
19:44:36 NetworkManager[1650]: <error> [1337240676.376011] [nm-system.c:1229] check_one_route(): (wlan0): \
         error -34 returned from rtnl_route_del(): Netlink Error (errno = Numerical result out of range)
19:44:36 kernel: [  391.233344] cfg80211: Calling CRDA to update world regulatory domain
19:44:36 avahi-daemon[1633]: Withdrawing address record for 192.168.1.11 on wlan0.
19:44:36 avahi-daemon[1633]: Leaving mDNS multicast group on interface wlan0.IPv4 with address 192.168.1.11.
19:44:36 avahi-daemon[1633]: Interface wlan0.IPv4 no longer relevant for mDNS.
19:44:36 avahi-daemon[1633]: Withdrawing address record for 2407:1000:1003:99:226:c7ff:fe66:b822 on wlan0.
19:44:36 avahi-daemon[1633]: Leaving mDNS multicast group on interface wlan0.IPv6 with address 2407:1000:1003:99:226:c7ff:fe66:b822.
19:44:36 NetworkManager[1650]: <info> (wlan0): writing resolv.conf to /sbin/resolvconf
19:44:36 avahi-daemon[1633]: Joining mDNS multicast group on interface wlan0.IPv6 with address fe80::226:c7ff:fe66:b822.
19:44:36 avahi-daemon[1633]: Registering new address record for fe80::226:c7ff:fe66:b822 on wlan0.*.

So I proceeded to debug:

  1. Cursed and wished my 300m spool of Cat6 ethernet wasn’t in Wellington.
  2. Rolled back the microcode update – my initial thought was that the new code was making the card unstable and the result was the card dropping the connection and NetworkManager doing the clean up.
  3. Did a full power down to make sure that the microcode wasn’t remaining active on the card across reboots (had this problem once with a dodgy GPU once).
  4. Verdict: Microcode upgrade was OK, must be something else.
  5. Upgraded NetworkManager from 0.8.1 to 0.8.4 from Debian Backports – 0.8.1 isn’t too recent, was tempted to try 0.9 series but would have required a lot more backporting work.
  6. Verdict: Appears not to be a NetworkManager issue in the 0.8 series – maybe something fixed in 0.9 or later?
  7. Upgraded wpasupplicant from 0.6.10 to 1.0 by manual backport from unstable – the activation error made me consider it might have been a bug with newer kernels & wpasupplicant’s AP negotiation.
  8. Verdict: No change to the issue.
  9. Built a Linux 3.3 kernel with the older less-crashy 3.2 iwlwifi driver to see if it was driver specific, or otherwise-kernel related.
  10. Verdict: Same issue continued to occur, rolling back driver version infact made no change – something about the 3.3 kernel itself was the problem.
  11. Got suspicious about NetworkManager – either it or the kernel had to be at fault, one possibility was some weird API breakage with the age gap between the software versions being used. The kernel is *usually* pretty solid and something like wifi drivers dropping every couple of minutes would be a pretty serious bug to get through, so I looked through the logs to see if I could get anything more useful with NetworkManager’s logs.
  12. Spotted a kernel error “ICMPv6 RA: ndisc_router_discovery() failed to add default route.“. This error tended to occur shortly before any WiFi disconnection occurred, but not immediately so.
  13. Found an entry in Red Hat’s bugzilla.
  14. And then the upstream bug fix from 19th April.

Turns out that the Linux 3.3 kernel and NetworkManager fight over which one is going to control the default route for each router advertised link – the kernel adds one, Network Manager removes and then the kernel gets upset and drops all router advertisements.

In hindsight, I should have spotted it sooner, but I had discarded the RA statement from being related initially as the disconnection often didn’t happen till a minute of two after the log entry occurred – eg:

19:51:40 kernel: [  814.274903] ICMPv6 RA: ndisc_router_discovery() failed to add default route.
19:52:47 NetworkManager[1650]: <info> (wlan0): device state change: 8 -> 9 (reason 5)

What’s interesting about this bug, is that at first reading it explains a loss of IPv6 connectivity perfectly – however it doesn’t explain why IPv4 or the Wi-Fi connection itself was impacted.

The reason this happened, is that NetworkManager was set to have IPv6 as a requirement for that connection to be established – in the event of IPv6 not working, NetworkManager would consider the interface to be down, even if IPv4 was up.

There is a good reason for this, that the developers detailed on their (excellently written) blog, explaining that by having NetworkManager check for IPv6, it allows applications to be written smarter to better understand their level of connectivity.

For users of the NetworkManager 0.9 series, there’s a patch already committed which you can grab here and I would expect the next NetworkManager update will have this fix.

If you’re on the NetworkManager 0.8 series, this patch won’t apply cleanly – I might make some time to go and backport it, but you can workaround it for now by using the Ignore method so that NetworkManager does nothing and leaves it up to the Linux kernel in the background to negotiate IPv6 addressing.

Breaking vs Working Network Manager Settings

Of course if you’re not connecting to any IPv6 capable networks, you don’t have anything to worry about (other than the fact you’re still stuck in the 20th century).

 

Initially I was a bit annoyed at NetworkManager for being so silly as to drop the whole interface when just one of the two networking stacks was broken, however after thinking about it for a bit, it does make some sense as to why it chose that behavior – often most interface issues can be fixed by reconnecting – maybe the AP got rebooted, maybe the laptop just moved between two of them, etc – a reconnect can solve many of these.

But a smarter approach, would be to determine whether network issues are layer 2 or layer 3 – if it’s just a layer 3 issue, then there’s little need to drop the Wi-Fi connection itself, instead attempt to re-establish IPv4 or IPv6 connectivity where appropriate, and if unable to do so, use the notifications to tell the user that “IPv6 connectivity is experiencing a problem, some hosts and services may be unreachable”.

It’s actually something that Windows does semi-OK – it figures out roughly how borked a user’s connection is and then does a balloon popup stating that there’s limited connectivity or IP conflict, or some other sometimes helpful message.

This may be better in newer versions of NetworkManager, I’ll have to have a play with a more recent release and see.