Planet HantsLUG

August 20, 2014

Adam Trickett

August 19, 2014

Adam Trickett

August 18, 2014

Adam Trickett

August 17, 2014

Adam Trickett

August 16, 2014

Adam Trickett

Bog Roll: Using Less

Since we moved into our current house we have continually aimed to reduce our gas and electricity use year on year. We started with a number of behaviour changes relative to the previous owners - which saved 50% on gas an electricity on the first year. Since then we have gradually replaced old/worn out/inefficient things with modern/highest efficiency replacements/alternatives and significantly improved the insulation in the loft space and under the ground floor.

Year kWh Total
Electricity Gas kWh·m-2
2009 5 000 22 000 233
2010 3 000 19 000 188
2011 2 675 13 945 137.4
2012 3 526 9 606 108.5
2013 2 391 9 402 97.5
2014 1 295 3 749 41.7

The 2009 and 2010 years are estimated and 2010 contains both our and the previous owners usage. 2014 contains only 61% of the year, so isn't comparable yet. I could also do with a scaling factor for the weather as some winters are radically different from others and that makes a big difference to the winter gas figure.

Even though the 2009 figure is an esitmate it's still only about two thirds the EPC survey the house came with when we bought it of E/48 - 314 kWh·m-2, and we're already loads better the best estimate of D/55 - 275 kWh·m-2.

August 16, 2014 11:36 AM

Debian Bits

Debian turns 21!

Today is Debian's 21st anniversary. Plenty of cities are celebrating Debian Day. If you are not close to any of those cities, there's still time for you to organize a little celebration!

Debian 21

Happy 21st birthday Debian!

by Ana Guerrero Lopez and Valessio Brito at August 16, 2014 09:45 AM

August 15, 2014

Steve Kemp

A tale of two products

This is a random post inspired by recent purchases. Some things we buy are practical, others are a little arbitrary.

I tend to avoid buying things for the sake of it, and have explicitly started decluttering our house over the past few years. That said sometimes things just seem sufficiently "cool" that they get bought without too much thought.

This entry is about two things.

A couple of years ago my bathroom was ripped apart and refitted. Gone was the old and nasty room, and in its place was a glorious space. There was only one downside to the new bathroom - you turn on the light and the fan comes on too.

When your wife works funny shifts at the hospital you can find that the (quiet) fan sounds very loud in the middle of the night and wakes you up..

So I figured we could buy a couple of LED lights and scatter them around the place - when it is dark the movement sensors turn on the lights.

These things are amazing. We have one sat on a shelf, one velcroed to the bottom of the sink, and one on the floor, just hidden underneath the toilet.

Due to the shiny-white walls of the room they're all you need in the dark.

By contrast my second purchase was a mistake - The Logitech Harmony 650 Universal Remote Control should be great. It clearly has the features I want - Able to power:

  • Our TV.
  • Our Sky-box.
  • OUr DVD player.

The problem is solely due to the horrific software. You program the device via an application/website which works only under Windows.

I had to resort to installing Windows in a virtual machine to make it run:

# Get the Bus/ID for the USB device
bus=$(lsusb |grep -i Harmony | awk '{print $2}' | tr -d 0)
id=$(lsusb |grep -i Harmony | awk '{print $4}' | tr -d 0:)

# pass to kvm
kvm -localtime ..  -usb -device usb-host,hostbus=$bus,hostaddr=$id ..

That allows the device to be passed through to windows, though you'll later have to jump onto the Qemu console to re-add the device as the software disconnects and reconnects it at random times, and the bus changes. Sigh.

I guess I can pretend it works, and has cut down on the number of remotes sat on our table, but .. The overwhelmingly negative setup and configuration process has really soured me on it.

There is a linux application which will take a configuration file and squirt it onto the device, when attached via a USB cable. This software, which I found during research prior to buying it, is useful but not as much as I'd expected. Why? Well the software lets you upload the config file, but to get a config file you must fully complete the setup on Windows. It is impossible to configure/use this device solely using GNU/Linux.

(Apparently there is MacOS software too, I don't use macs. *shrugs*)

In conclusion - Motion-activated LED lights, more useful than expected, but Harmony causes Discord.

August 15, 2014 12:14 PM

Adam Trickett

August 14, 2014

Adam Trickett

August 13, 2014

Adam Trickett

August 12, 2014

Adam Trickett

August 10, 2014

Adam Trickett

August 09, 2014

Martin Wimpress

Installing Willie IRC Bot on Debian

Willie is an IRC bot written in Python that I've recently started using. This blog post describes how to install Willie on Debian and as usual I will be using virtualenv to isolate this Python application from the rest of the system.

Installing Python

First you'll need Python.

sudo apt-get install libpython2.7 python2.7 python2.7-dev python2.7-minimal

The following will also be required to enable all the features Willie supports.

sudo apt-get install enchant python2.7-dev libxslt1-dev libxml2-dev

Remove any apt installed Python packages that we are about to replace. The versions of these packages in the Debian repositories soon get stale.

sudo apt-get purge python-setuptools python-virtualenv python-pip python-profiler

Install pip.

wget https://bootstrap.pypa.io/get-pip.py
sudo python2.7 get-pip.py

Use pip to install virtualenv.

sudo pip install virtualenv --upgrade

The Snakepit

Create a "Snakepit" directory for storing all the Python virtual environments.

mkdir ~/Snakepit

Create a virtualenv for Willie

The following will create a new virtualenv called willie using Python 2.7 as the interpreter.

virtualenv -p /usr/bin/python2.7 ~/Snakepit/willie

Working on a virtualenv

Activate the virtualenv for Willie.

source ~/Snakepit/willie/bin/activate

Your shell prompt will change, something like (willie)user@host:~$, while a virtualenv is being worked on to indicate which virtualenv is currently active.

While working on a virtualenv you can pip install what you need or manually install any Python libraries safe in the knowledge you will not upset any other virtualenvs or the global packages in the process. Very useful for developing a new branch which may have different library requirements than the current stable release.

When you are finished working in a virtualenv you can deactivate it by simply executing deactivate.

Install Willie

I've decided to use Python 2.7 to run Willie and therefore have to install backports.ssl_match_hostname which is not required if you use Python 3.3.

pip install willie backports.ssl_match_hostname

Additional functionality

Willie has no external dependencies, besides Python. However, some of the modules do have external dependencies. So install the following Python modules so that I can make use of everything Willie can do.

pip install feedparser pytz lxml praw pyenchant pygeoip ipython --upgrade

Configure Willie

I am not going to explain to how to configure Willie because all that good stuff is very well documented by the project.

But for reference, my default.cfg looks something like this:

[core]
nick = nicofyourbot
user = nicofyourbot
name = Give You Bot A Name
host = chat.freenode.net
use_ssl = true
verify_ssl = true
port = 6697
owner = nicofthebotowner
channels = #example
nickserv_password = ************
prefix = \.
timeout = 120

[db]
userdb_type = sqlite
userdb_file = /home/username/.willie/willie.db

Willie as a daemon

From this point on I assume you've completed the first run configuration of Willie and have .willie/default.cfg in your home directory.

Add the following to /etc/init.d/willie.

#!/bin/sh
### BEGIN INIT INFO
# Provides: willie
# Required-Start: $local_fs $remote_fs
# Required-Stop: $local_fs $remote_fs
# Should-Start: $network
# Should-Stop: $network
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Willie IRC Bot.
# Description: Start and stops the Willie IRC bot for a given user.
### END INIT INFO

# NOTE! Replace with the user you want to run Willie.
willie_USER="yourusername"

HOMEDIR=$(getent passwd $willie_USER | awk -F: '{print $6}')
DAEMON="$HOMEDIR/Snakepit/willie/bin/willie"
CONFIG="$HOMEDIR/.willie/default.cfg"

startd() {
    if [ -f ${CONFIG} ]; then
        echo "Starting Willie for $willie_USER"
        start-stop-daemon -c $willie_USER -u $willie_USER -x $DAEMON -S -- --config ${CONFIG} --fork --quiet
    else
        echo "Couldn't start Willie for $willie_USER (no $CONFIG found)"
    fi
}

stopd() {
    echo "Stopping Willie for $willie_USER"
    willie_PID=$(pgrep -fu $willie_USER $DAEMON)
    if [ -z "$willie_PID" ]; then
        echo "Willie for USER $willie_USER: not running."
    else
        kill -15 $willie_PID
    fi
}

status() {
    willie_PID=$(pgrep -fu $willie_USER $DAEMON)
    if [ -z "$willie_PID" ]; then
        echo "Willie for USER $willie_USER: not running."
    else
        echo "Willie for USER $willie_USER: running (pid $willie_PID)"
    fi
}

case "$1" in
    start) startd ;;
    stop) stopd ;;
    restart|reload|force-reload) stopd && startd ;;
    status) status ;;
    *) echo "Usage: /etc/init.d/willie {start|stop|reload|force-reload|restart|status}"
       exit 1
       ;;
esac

exit 0

Set the permissions.

sudo chmod +x /etc/init.d/willie

Check that you can start/stop Willie.

sudo /etc/init.d/willie start
sudo /etc/init.d/willie status
sudo /etc/init.d/willie stop

Add willie to the startup/shutdown sequence.

sudo update-rc.d willie defaults

And that's it. Willie is now running as a daemon inside a virtualenv.

by Martin Wimpress at August 09, 2014 10:11 AM

Steve Kemp

Rebooting the CMS

I run a cluster for the Debian Administration website, and the code is starting to show its age. Unfortunately the code is not so modern, and has evolved a lot of baggage.

Given the relatively clean separation between the logical components I'm interested in trying something new. In brief the current codebase allows:

  • Posting of articles, blog-entries, and polls.
  • The manipulation of the same.
  • User-account management.

It crossed my mind the other night that it might make sense to break this code down into a number of mini-servers - a server to handle all article-related things, a server to handle all poll-related things, etc.

If we have a JSON endpoint that will allow:

  • GET /article/32
  • POST /article/ [create]
  • GET /articles/offset/number [get the most recent]

Then we could have a very thin shim/server on top of that whihc would present the public API. Of course the internal HTTP overhead might make this unworkable, but it is an interesting approach to the problem, and would allow the backend storage to be migrated in the future without too much difficulty.

At the moment I've coded up two trivial servers, one for getting user-data (to allow login requests to succeed), and one for getting article data.

There is a tiny presentation server written to use those back-end servers and it seems like an approach that might work. Of course deployment might be a pain..

It is still an experiment rather than a plan, but it could work out: http://github.com/skx/snooze/.

August 09, 2014 08:59 AM

August 08, 2014

Andy Smith

What’s my btrfs doing? And how do I recover from it?

I’ve been experimenting with btrfs on my home file server for a while. Yes, I know it’s not particularly bleeding edge or anything any more but I’m quite conservative even for just my household’s stuff as restoring from backup would be quite tedious.

Briefly, the btrfs volume was initially across four 2TB disks in RAID10 for data and metadata. At a later date I also added a 500G disk but had never rebalanced so that had no data on it.

$ sudo btrfs filesystem show
Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
        Total devices 5 FS bytes used 1.08TB
        devid    1 size 1.82TB used 555.03GB path /dev/sdh
        devid    3 size 1.82TB used 555.03GB path /dev/sdi
        devid    4 size 1.82TB used 555.03GB path /dev/sdj
        devid    5 size 465.76GB used 0.00 path /dev/sdk
        devid    2 size 1.82TB used 555.03GB path /dev/sdg
 
Btrfs v0.20-rc1-358-g194aa4a
$ sudo btrfs filesystem df /srv/tank
Data, RAID10: total=1.08TB, used=1.08TB
System, RAID10: total=64.00MB, used=128.00KB
System: total=4.00MB, used=0.00
Metadata, RAID10: total=2.52GB, used=1.34GB

Yesterday, one of the disks started misbehaving:

Aug  7 12:17:32 specialbrew kernel: [5392685.363089] ata5.00: failed to read SCR 1 (Emask=0x40)
Aug  7 12:17:32 specialbrew kernel: [5392685.369272] ata5.01: failed to read SCR 1 (Emask=0x40)
Aug  7 12:17:32 specialbrew kernel: [5392685.375651] ata5.02: failed to read SCR 1 (Emask=0x40)
Aug  7 12:17:32 specialbrew kernel: [5392685.381796] ata5.03: failed to read SCR 1 (Emask=0x40)
Aug  7 12:17:32 specialbrew kernel: [5392685.388082] ata5.04: failed to read SCR 1 (Emask=0x40)
Aug  7 12:17:32 specialbrew kernel: [5392685.394213] ata5.05: failed to read SCR 1 (Emask=0x40)
Aug  7 12:17:32 specialbrew kernel: [5392685.400213] ata5.15: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
Aug  7 12:17:32 specialbrew kernel: [5392685.406556] ata5.15: irq_stat 0x00060002, PMP DMA CS errata
Aug  7 12:17:32 specialbrew kernel: [5392685.412787] ata5.00: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
Aug  7 12:17:32 specialbrew kernel: [5392685.419286] ata5.00: failed command: WRITE DMA
Aug  7 12:17:32 specialbrew kernel: [5392685.425504] ata5.00: cmd ca/00:08:56:06:a1/00:00:00:00:00/e0 tag 1 dma 4096 out
Aug  7 12:17:32 specialbrew kernel: [5392685.425504]          res 9a/d7:00:00:00:00/00:00:00:10:9a/00 Emask 0x2 (HSM violation)
Aug  7 12:17:32 specialbrew kernel: [5392685.438350] ata5.00: status: { Busy }
Aug  7 12:17:32 specialbrew kernel: [5392685.444592] ata5.00: error: { ICRC UNC IDNF ABRT }
Aug  7 12:17:32 specialbrew kernel: [5392685.451016] ata5.01: exception Emask 0x100 SAct 0x0 SErr 0x0 action 0x6 frozen
Aug  7 12:17:32 specialbrew kernel: [5392685.457334] ata5.01: failed command: WRITE DMA
Aug  7 12:17:32 specialbrew kernel: [5392685.463784] ata5.01: cmd ca/00:18:de:67:9c/00:00:00:00:00/e0 tag 0 dma 12288 out
Aug  7 12:17:32 specialbrew kernel: [5392685.463784]          res 9a/d7:00:00:00:00/00:00:00:00:9a/00 Emask 0x2 (HSM violation)
.
.
(lots more of that)
.
.
Aug  7 12:17:53 specialbrew kernel: [5392706.325072] btrfs: bdev /dev/sdh errs: wr 9, rd 0, flush 0, corrupt 0, gen 0
Aug  7 12:17:53 specialbrew kernel: [5392706.325228] btrfs: bdev /dev/sdh errs: wr 10, rd 0, flush 0, corrupt 0, gen 0
Aug  7 12:17:53 specialbrew kernel: [5392706.339976] sd 4:3:0:0: [sdh] Stopping disk
Aug  7 12:17:53 specialbrew kernel: [5392706.346436] sd 4:3:0:0: [sdh] START_STOP FAILED
Aug  7 12:17:53 specialbrew kernel: [5392706.352944] sd 4:3:0:0: [sdh]  
Aug  7 12:17:53 specialbrew kernel: [5392706.356489] end_request: I/O error, dev sdh, sector 0
Aug  7 12:17:53 specialbrew kernel: [5392706.365413] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Aug  7 12:17:53 specialbrew kernel: [5392706.475838] lost page write due to I/O error on /dev/sdh
Aug  7 12:17:53 specialbrew kernel: [5392706.482266] lost page write due to I/O error on /dev/sdh
Aug  7 12:17:53 specialbrew kernel: [5392706.488496] lost page write due to I/O error on /dev/sdh

After that point, /dev/sdh no longer existed on the system.

Okay, so then I told btrfs to forget about that device:

$ sudo btrfs device delete missing /srv/tank
$ sudo btrfs filesystem show
Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
        Total devices 5 FS bytes used 1.08TB
        devid    3 size 1.82TB used 555.03GB path /dev/sdi
        devid    4 size 1.82TB used 555.03GB path /dev/sdj
        devid    5 size 465.76GB used 0.00 path /dev/sdk
        devid    2 size 1.82TB used 555.03GB path /dev/sdg
        *** Some devices missing
 
Btrfs v0.20-rc1-358-g194aa4a

Apart from the obvious fact that a device was then missing, things seemed happier at this point. I decided to pull the disk and re-insert it to see if it still gave errors (it’s in a hot swap chassis). After plugging the disk back in it pops up as /dev/sdl and rejoins the volume:

$ sudo btrfs filesystem show
Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
        Total devices 5 FS bytes used 1.08TB
        devid    1 size 1.82TB used 555.04GB path /dev/sdl
        devid    3 size 1.82TB used 555.03GB path /dev/sdi
        devid    4 size 1.82TB used 555.03GB path /dev/sdj
        devid    5 size 465.76GB used 0.00 path /dev/sdk
        devid    2 size 1.82TB used 555.03GB path /dev/sdg
 
Btrfs v0.20-rc1-358-g194aa4a

…but the disk is still very unhappy:

Aug  7 17:46:46 specialbrew kernel: [5412439.946138] sd 4:3:0:0: [sdl] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
Aug  7 17:46:46 specialbrew kernel: [5412439.946142] sd 4:3:0:0: [sdl] 4096-byte physical blocks
Aug  7 17:46:46 specialbrew kernel: [5412439.946247] sd 4:3:0:0: [sdl] Write Protect is off
Aug  7 17:46:46 specialbrew kernel: [5412439.946252] sd 4:3:0:0: [sdl] Mode Sense: 00 3a 00 00
Aug  7 17:46:46 specialbrew kernel: [5412439.946294] sd 4:3:0:0: [sdl] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug  7 17:46:46 specialbrew kernel: [5412439.952286]  sdl: unknown partition table
Aug  7 17:46:46 specialbrew kernel: [5412439.990436] sd 4:3:0:0: [sdl] Attached SCSI disk
Aug  7 17:46:47 specialbrew kernel: [5412440.471412] btrfs: device label tank devid 1 transid 504721 /dev/sdl
Aug  7 17:47:17 specialbrew kernel: [5412470.408079] btrfs: bdev /dev/sdl errs: wr 7464, rd 0, flush 332, corrupt 0, gen 0
Aug  7 17:47:17 specialbrew kernel: [5412470.415931] lost page write due to I/O error on /dev/sdl

Okay. So by then I was prepared to accept that this disk was toast and I just wanted it gone. How to achieve this?

Given that data was still being read off this disk okay (confirmed by dd, iostat), I thought maybe the clever thing to do would be to tell btrfs to delete this disk while it was still part of the volume.

According to the documentation this would rebalance data off of the device to the other devices (still plenty of capacity available for two copies of everything even with one disk missing). That way the period of time where there was a risk of double disk failure leading to data loss would be avoided.

$ sudo btrfs device delete /dev/sdl /srv/tank

*twiddle thumbs*

Nope, still going.

Hmm, what is it doing?

$ sudo btrfs filesystem show
Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
        Total devices 5 FS bytes used 1.08TB
        devid    1 size 1.82TB used 555.04GB path /dev/sdl
        devid    3 size 1.82TB used 556.03GB path /dev/sdi
        devid    4 size 1.82TB used 556.03GB path /dev/sdj
        devid    5 size 465.76GB used 26.00GB path /dev/sdk
        devid    2 size 1.82TB used 556.03GB path /dev/sdg

Seems that it’s written 26GB of data to sdk (previously unused), and a little to some of the others. I’ll guess that it’s using sdk to rebalance onto, and doing so at a rate of about 1GB per minute. So in around 555 minutes this should finish and sdl will be removed, and I can eject the disk and later insert a good one?

Well, it’s now quite a few hours later and sdk is now full, but the btrfs device delete still hasn’t finished, and in fact iostat believes that writes are still taking place to all disks in the volume apart from sdl:

$ sudo iostat -x -d 5 sd{g,i,j,k,l}
Linux 3.13-0.bpo.1-amd64 (specialbrew.localnet)         08/08/14        _x86_64_        (2 CPU)
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdg               6.50     0.89    2.49    1.60    54.30   136.42    93.31     0.43  105.19   73.77  154.12   1.63   0.67
sdk               0.00     0.79    0.00    0.89     0.02    97.93   218.89     0.08   91.43    5.69   91.79   5.70   0.51
sdj               2.26     1.10    0.79    1.38    65.45   136.39   185.57     0.19   86.94   46.38  110.20   5.17   1.12
sdi               8.27     1.34    3.39    1.21    88.11   136.39    97.55     0.60  130.79   46.89  365.87   2.72   1.25
sdl               0.24     0.00    0.01    0.00     1.00     0.00   255.37     0.00    1.40    1.40    0.00   1.08   0.00
 
Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdg               0.00     0.00    0.00   87.20     0.00  4202.40    96.39     0.64    7.39    0.00    7.39   4.43  38.64
sdk               0.00     0.20    0.00  102.40     0.00  3701.60    72.30     2.40   23.38    0.00   23.38   8.63  88.40
sdj               0.00     0.00    0.00   87.20     0.00  4202.40    96.39     0.98   11.28    0.00   11.28   5.20  45.36
sdi               0.00     0.20    0.00  118.00     0.00  4200.80    71.20     1.21   10.24    0.00   10.24   4.45  52.56
sdl               0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
$ sudo btrfs filesystem show
Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
        Total devices 5 FS bytes used 1.08TB
        devid    1 size 1.82TB used 555.04GB path /dev/sdl
        devid    3 size 1.82TB used 555.29GB path /dev/sdi
        devid    4 size 1.82TB used 555.29GB path /dev/sdj
        devid    5 size 465.76GB used 465.76GB path /dev/sdk
        devid    2 size 1.82TB used 555.29GB path /dev/sdg
 
Btrfs v0.20-rc1-358-g194aa4a

Worse still, btrfs thinks it’s out of space:

$ touch foo
touch: cannot touch `foo': No space left on device
$ df -h .
Filesystem      Size  Used Avail Use% Mounted on
-               7.8T  2.2T  5.2T  30% /srv/tank/backups

So, that’s a bit alarming.

I don’t think that btrfs device delete is ever going to finish. I think what I probably should have done is just forcibly yanked sdl and then done btrfs device delete missing, and put up with the window of possible double disk failure.

But what’s done is done and now I need to recover from this.

Should I ctrl-c the btrfs device delete? If I do that and the machine is still responsive, should I then yank sdl?

I have one spare disk slot into which I could place the new disk when it arrives, without rebooting or interrupting anything. I assume that will then register as sdm and I could add it to the btrfs volume. Would the rebalancing then start using that and complete, thus allowing me to yank sdl?

Some input from anyone who’s actually been through this would be appreciated!

Update 2014-08-12

It’s all okay again now. Here’s a quick summary for those who just want to know what I did:

  • Asked for some advice from Hugo, who knows a lot more about btrfs than me!
  • Found I could not ctrl-c the device delete and had to reboot.
  • Discovered I could mount the volume with -oro,degraded,recovery, i.e. read-only. It couldn’t be mounted read-write at this stage.
  • Took a complete local backup of the 1.08TiB of data via the read-only mount onto one of the new 3TB disks that had arrived on the Friday.
  • Made a bug report against the Linux kernel for the fact that mount -odegraded,recovery would go into deadlock.
  • Compiled the latest mainline kernel from source using the instructions in the Debian Linux Kernel Handbook. After booting into it mount -odegraded,recovery worked and I had a read-write volume again.
  • Compiled a new btrfs-tools.
  • Inserted one of the new 3TB disks and did a btrfs replace start /dev/sdj /dev/sdl /srv/tank in order to replace the smallest 500GB device (sdj) with the new 3TB device (sdl).
  • Once that was complete, did btrfs filesystem resize 5:max /srv/tank in order to let btrfs know to use the entirety of the device with id 5 (sdl, the new 3TB disk).
  • Did a btrfs balance start -v -dconvert=raid1,soft -mconvert=raid1,soft /srv/tank to convert everything from RAID-10 to RAID-1 so as to be more flexible in future with different-sized devices.
  • Finally btrfs device delete missing /srv/tank to return the volume to non-degraded state.
$ sudo btrfs filesystem show
Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
        Total devices 4 FS bytes used 1.09TiB
        devid    2 size 1.82TiB used 372.03GiB path /dev/sdg
        devid    3 size 1.82TiB used 373.00GiB path /dev/sdh
        devid    4 size 1.82TiB used 372.00GiB path /dev/sdi
        devid    5 size 2.73TiB used 1.09TiB path /dev/sdl
 
Btrfs v3.14.2

A more detailed account of the escapade follows, with some chat logs between Hugo and I thrown in to help people’s web searching.

A plan is hatched

<grifferz> according to iostat it's writing quite a lot to all four
           disks, and doing no reading at all
<grifferz> but it is also constantly saying
<grifferz> Aug  8 06:48:28 specialbrew kernel: [5459343.262187] btrfs:
           bdev /dev/sdl errs: wr 122021062, rd 0, flush 74622, corrupt
           0, gen 0
<darkling> OK, reading further, I don't think you'll be able to ^C the
           dev delete.
<darkling> So at this point, it's probably a forcible reboot (as polite
           as you can make it, but...)
<darkling> Take the dead disk out before the OS has a chance to see it.
<grifferz> if I waited and did nothing at all until the new disk
           arrives, if I insert it and add it to the volume do you think
           it will recover?
<darkling> This is then the point at which you run into the other
           problem, which is that you've got a small disk in there with
           4 devices on a RAID-10.
<grifferz> if adding the new disk would allow the dev delete to
           complete, presumably I could then do another dev delete for
           the 500G disk
<darkling> No, dev delete is going to fall over on the corrupt sections
           of the device.
<darkling> I wouldn't recommend using it in this case (unless it's dev
           delete missing)
<grifferz> so you would suggest to reboot, yank sdl, hopefully get up
           and running with a missing device, do dev delete missing,
           insert replacement disk, rebalance?
<darkling> It's kind of a known problem. We probably need a "device
           shoot-in-the-head" for cases where the data can't be
           recovered from a device.
<darkling> Yes.
<darkling> With the small device in the array, it might pay to do the
           dev delete missing *after* adding the new disk.
<grifferz> what problems is the 500G disk going to cause me?
<grifferz> apart from this one that I am having now I suppose :)
<darkling> Well, RAID-10 requires four devices, and will write to all
           four equally.
<darkling> So the array fills up when the smallest device is full.
<darkling> (If you have 4 devices)
<darkling> Have a play with http://carfax.org.uk/btrfs-usage/ to see
           the effects.
<grifferz> is that why it now thinks it is full because I had four 2T
           disks and a 500G one and I tried to delete one of the 2T
           ones?
<darkling> Yes.
<grifferz> ah, it's a shame it couldn't warn me of that, and also a
           shame that if I added a new 2T one (which I can probably do
           today) it won't fix itself
<darkling> I generally recommend using RAID-1 rather than RAID-10 if you
           have unequal-sized disks. It behaves rather better for space
           usage.
<grifferz> I bet I can't convert RAID-10 to RAID-1 can I? :)
<darkling> Of course you can. :)
<darkling> btrfs balance start -dconvert=raid1,soft
           -mconvert=raid1,soft /
<grifferz> oh, that's handy. I saw balance had dconvert and mconvert to
           raid1 but I thought that would only be from no redundancy
<darkling> No, it's free conversion between any RAID level.
<grifferz> nice
<grifferz> well, thanks for that, at least I have some sort of plan now.
           I may be in touch again if reboot ends up with a volume that
           won't mount! :)

Disaster!

In which it doesn’t mount, and then it only mounts read-only.

fuuuuuuuuuuuuuuuuuuuuuu

<grifferz> oh dear, I hit a problem! after boot it won't mount:
<grifferz> # mount /srv/tank
<grifferz> Aug  8 19:05:37 specialbrew kernel: [  426.358894] BTRFS:
           device label tank devid 5 transid 798058 /dev/sdj
<grifferz> Aug  8 19:05:37 specialbrew kernel: [  426.372031] BTRFS
           info (device sdj): disk space caching is enabled
<grifferz> Aug  8 19:05:37 specialbrew kernel: [  426.379825] BTRFS:
           failed to read the system array on sdj
<grifferz> Aug  8 19:05:37 specialbrew kernel: [  426.403095] BTRFS:
           open_ctree failed
<grifferz> mount: wrong fs type, bad option, bad superblock on
           /dev/sdj,
<grifferz> googling around but it seems like quite a generic message
<darkling> Was sdj the device that failed earlier?
<grifferz> no it was sdl (which used to be sdh)
<darkling> OK.
<grifferz> # btrfs fi sh
<grifferz> Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
<grifferz>  Total devices 5 FS bytes used 1.08TB
<grifferz>  devid    5 size 465.76GB used 465.76GB path /dev/sdj
<grifferz>  devid    3 size 1.82TB used 555.29GB path /dev/sdh
<grifferz>  devid    4 size 1.82TB used 555.29GB path /dev/sdi
<grifferz>  devid    2 size 1.82TB used 555.29GB path /dev/sdg
<grifferz>  *** Some devices missing
<grifferz> (now)
<grifferz> perhaps ask it to do it via one of the other disks as sdj
           is now the small one?
<darkling> Yeah.
<darkling> Just what I was going to suggest. :)
<grifferz> even when specifying another disk it still says "failed to
           read the system array on sdj"
<darkling> But, with that error, it's not looking very happy. :(
<darkling> What kernel was this on?
<grifferz> it was on 3.13-0 from debian wheezy backports but since I
           rebooted it booted into 3.14-0.bpo.2-amd64
<grifferz> I can try going back to 3.13-0
<darkling> 3.14's probably better to stay with.
<darkling> Just checking it wasn't something antique.
<grifferz> I could also plug that failing disk back in and remove sdj.
           it probably still has enough life to be read from
<darkling> Well, first, what does btrfs check say about the FS?
<darkling> Also try each drive, with -s1 or -s2
<grifferz> check running on sdj, hasn't immediately aborted…
<darkling> Ooh, OK, that's good.
# btrfs check /dev/sdj
Aug  8 19:13:15 specialbrew kernel: [  884.840987] BTRFS: device label tank devid 2 transid 798058 /dev/sdg
Aug  8 19:13:15 specialbrew kernel: [  885.058896] BTRFS: device label tank devid 4 transid 798058 /dev/sdi
Aug  8 19:13:15 specialbrew kernel: [  885.091042] BTRFS: device label tank devid 3 transid 798058 /dev/sdh
Aug  8 19:13:15 specialbrew kernel: [  885.097790] BTRFS: device label tank devid 5 transid 798058 /dev/sdj
Aug  8 19:13:15 specialbrew kernel: [  885.129491] BTRFS: device label tank devid 2 transid 798058 /dev/sdg
Aug  8 19:13:15 specialbrew kernel: [  885.137456] BTRFS: device label tank devid 4 transid 798058 /dev/sdi
Aug  8 19:13:15 specialbrew kernel: [  885.145731] BTRFS: device label tank devid 3 transid 798058 /dev/sdh
Aug  8 19:13:16 specialbrew kernel: [  885.151907] BTRFS: device label tank devid 5 transid 798058 /dev/sdj
warning, device 1 is missing
warning, device 1 is missing
warning devid 1 not found already
Checking filesystem on /dev/sdj
UUID: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 49947638987 bytes used err is 0
total csum bytes: 1160389912
total tree bytes: 1439944704
total fs tree bytes: 150958080
total extent tree bytes: 55762944
btree space waste bytes: 69500665
file data blocks allocated: 1570420359168
 referenced 1568123219968
Btrfs v0.20-rc1-358-g194aa4a
<grifferz> it doesn't seem to have complained. shall I give mounting
           another try, or fsck again from another disk?
<darkling> Hmm. Odd that it's complaining about the system array, then.
<darkling> That check you just did is read-only, so it won't have
           changed anything.
<grifferz> doing the fsck with another device gives identical output
<grifferz> and no, I still can't mount it
<darkling> Oooh, hang on.
<darkling> Try with -odegraded
<grifferz> # mount -odegraded /srv/tank
<grifferz> Aug  8 19:20:58 specialbrew kernel: [ 1347.388182] BTRFS:
           device label tank devid 5 transid 798058 /dev/sdj
<grifferz> Aug  8 19:20:58 specialbrew kernel: [ 1347.628728] BTRFS
           info (device sdj): allowing degraded mounts
<grifferz> Aug  8 19:20:58 specialbrew kernel: [ 1347.633978] BTRFS
           info (device sdj): disk space caching is enabled
<grifferz> Aug  8 19:20:58 specialbrew kernel: [ 1347.725065] BTRFS:
           bdev (null) errs: wr 122025014, rd 0, flush 293476, corrupt
           0, gen 0
<grifferz> Aug  8 19:20:58 specialbrew kernel: [ 1347.730473] BTRFS:
           bdev /dev/sdg errs: wr 3, rd 8, flush 0, corrupt 0, gen 0
<grifferz> prompt not returned yet
<darkling> OK, that's probably good.
<grifferz> bit worrying it says it has an error on another disk though!
<darkling> Those are cumulative over the lifetime of the FS.
<darkling> Wouldn't worry about it too much.
<grifferz> right okay, some of those happened the other day when the
           whole bus was resetting
<grifferz> prompt still not returned :(
<darkling> Bugger...
<grifferz> yeah iostat's not showing any disk activity although the
           rest of the system still works
<darkling> Anything in syslog?
<grifferz> no that was the extent of the syslog messages, except for a
           hung task warning just now but that is for the mount and
           for btrs-transactiblah
<darkling> How about -oro,recovery,degraded?
<darkling> You'll probably have to reboot first, though.
<grifferz> I can't ctrl-c that mount so should I try that in another
           window or reboot and try it?
<grifferz> probably best to reboot I suppose
<grifferz> I suspect the problem's here though:
<grifferz> Aug  8 19:26:33 specialbrew kernel: [ 1682.538282]
           [<ffffffffa02f1610>] ? open_ctree+0x20a0/0x20a0 [btrfs]
<darkling> Yeah, open_ctree is a horrible giant 1000-line function.
<darkling> Almost every mount problem shows up in there, because
           that's where it's used.
<grifferz> hey that appears to have worked!
<darkling> Cool.
<grifferz> but it doesn't say anything useful in the syslog
<grifferz> so I worry that trying it normally will still fail
<darkling> Now unmount and try the same thing without the ro option.
<darkling> Once that works, you'll have to use -odegraded to mount the
           degraded FS until the new disk arrives,
<darkling> or simply balance to RAID-1 immediately, and then balance
           again when you get the new disk.
<grifferz> that mount command hasn't returned :(
<darkling> That's -odegraded,recovery ?
<grifferz> I think I will put the new disk in and take a copy of all my
           data from the read-only mount
<grifferz> and yes that is correct
<darkling> OK, might be worth doing one or both of upgrading to 3.16
           and reporting to bugzilla.kernel.org
<darkling> You could also take a btrfs-image -c9 -t4 of the filesystem
           (while not mounted), just in case someone (josef) wants to
           look at it.

A bug report was duly filed.

A new kernel, and some success.

It’s been a long time since I bothered to compile a kernel. I remember it as being quite tedious. Happily the procedure is now really quite easy. It basically amounted to:

$ wget -qO - https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.16.tar.xz | xzcat | tar xvf -
$ cd linux-3.16
$ cp /boot/config-3.14-0.bpo.2-amd64 .config
$ make oldconfig
(press Return a lot)
$ make deb-pkg
# dpkg -i ../linux-image-3.16.0_3.16.0-1_amd64.deb

That procedure is documented in the Debian Linux Kernel Handbook.

I wasn’t expecting this to make any difference, but it did! When booted into this kernel I was able to do:

# mount -odegraded,recovery /srv/tank
# umount /srv/tank
# mount -odegraded /srv/tank

and end up with a read-write, working volume.

There were no interesting syslog messages.

Thankfully from this point on the volume was fully read-write, so even though a fair bit of work was still needed I could put it back in service and no further reboots were required.

<grifferz> oh, that's interesting. after a short delay, mounting
           -orecovery,degraded on 3.16 does actually work. it appears!
<darkling> \o/
<grifferz> do I need to unmount it and remount it with just -odegraded 
           now?
<darkling> Yes, that should work.
<grifferz> and then I can put the new disk in, add it to the volume,
           rebalance it, remove the small 500G disk, convert to raid-1?
<darkling> A bit faster to use btrfs dev replace to switch out the
           small device for the new one.
<darkling> Then btrfs dev resize n:max /mountpoint for the n that's
           the new device.
<darkling> Then restripe to RAID-1.
<grifferz> right, great, it's mounted with just -odegraded
<grifferz> so: 1) insert new disk, 2) "dev replace" the 500G one for
           this new device?
<darkling> Yes.
<darkling> That will leave the new device with an FS size of 500G, so
           you need to resize it.
<darkling> (Same idea as resizing the partition but not the ext* FS on
           it)
<darkling> The resize should take a few ms. :)
<grifferz> I don't seem to have a "btrfs device replace" command. do I
           need to build a new btrfs-progs?
<darkling> What --version is it?
<darkling> (Probably build a new one, yes)
<grifferz> Btrfs v0.20-rc1-358-g194aa4a
<darkling> Yeah, that's old enough you're mising some features.
<grifferz> ah, it's not "btrfs device replace" it's just "btrfs
           replace …" I've built v3.14.2 now

So that was:

$ sudo btrfs replace start /dev/sdj /dev/sdl /srv/tank

after carefully confirming that /dev/sdj really was the 500G disk and /dev/sdl really was the new 3TB disk I just inserted (several of the device names change throughout this post as disks are ejected and inserted!).

<darkling> Oh, OK. Seems like a slightly odd place to put it. :(
<darkling> The userspace tools are a bit of a mess, from a UI point of
           view.
<darkling> I'm currently struggling with several problems with btrfs
           sub list, for example.
<grifferz> heh: $ sudo btrfs replace status /srv/tank
<grifferz> 0.4% done, 0 write errs, 0 uncorr. read errs
<darkling> Look on the bright side: it's way faster than two balances.
<grifferz> won't this still leave me with a volume that it thinks has a
           device missing though?
<darkling> Yes, but if you're going to remove the small device, this is
           still probably the fastest approach.
<grifferz> after it's finished with the replace and I've done the
           resize, will a "device delete" of the small one leave it
           satisfied?
<darkling> Once the replace has finished, the small device should no
           longer be a part of the FS at all.
<grifferz> oh yeah
<grifferz> surely it should be happy at that point then, with 4 equal
           sized devices?
<darkling> You might want to run wipefs or a shitload of /dev/zero
           with dd over it, just to make sure. (Bonus points for doing
           it from orbit. ;) )
<darkling> The replace is a byte-for-byte replacement of the device.
<darkling> So if you were degraded before that, you're degraded after
           it.
<grifferz> right but after the replace and resize then?
<darkling> The resize just tells the FS that there's more space it can
           use -- it's a trivial O(1) operation.
<grifferz> what will I need to do to make it happy that there aren't
           any missing devices then?
<darkling> An ordinary balance. (Or a balance with -dconvert=raid1 if
           you want to go that way)
<grifferz> I do ultimately. In which case do you think there is any
           reason to do the balances separately?
<darkling> No reason at all.
<grifferz> righto :)

The replace finishes:

Started on 11.Aug 20:52:05, finished on 11.Aug 22:29:54, 0 write errs, 0 uncorr. read errs

It turns out wipefs wasn’t necessary; I did it with -n anyway just to see if it would find anything, but it didn’t.

Time to do the balance/convert.

<grifferz> $ sudo btrfs balance start -v -dconvert=raid1,soft
           -mconvert=raid1,soft /srv/tank
<grifferz> Dumping filters: flags 0x7, state 0x0, force is off
<grifferz>   DATA (flags 0x300): converting, target=16, soft is on
<grifferz>   METADATA (flags 0x300): converting, target=16, soft is on
<grifferz>   SYSTEM (flags 0x300): converting, target=16, soft is on
<grifferz> fingers crossed :)
<grifferz> I am a bit concerned that syslog is mentioning sdj which is
           no longer part of the volume (it was the smallest disk)
<grifferz> Aug 11 22:45:23 specialbrew kernel: [10551.595830] BTRFS
           info (device sdj): found 18 extents
<grifferz> for example
<grifferz> and btrfs fi sh confirms that sdj is not there any more
<grifferz> well I think it is just confused because iostat does not
           think it's touching sdj any more
<grifferz> hah, balance/convert complete, but:
<grifferz> $ sudo btrfs fi sh
<grifferz> Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
<grifferz>         Total devices 5 FS bytes used 1.09TiB
<grifferz>         devid    2 size 1.82TiB used 372.03GiB path /dev/sdg
<grifferz>         devid    3 size 1.82TiB used 373.00GiB path /dev/sdh
<grifferz>         devid    4 size 1.82TiB used 372.00GiB path /dev/sdi
<grifferz>         devid    5 size 2.73TiB used 1.09TiB path /dev/sdl
<grifferz>         *** Some devices missing
<grifferz> Btrfs v3.14.2
<grifferz> so now half my data is on sdl, the rest is split between
           three, and it still thinks something is missing!
<darkling> btrfs dev del missing /mountpoint
<grifferz> aha!
<darkling> And the way that the allocator works is to keep the amount
           of free space as even as possible -- that maximises the
           usage of the FS.
<grifferz> that was it :)
$ sudo btrfs filesystem show
Label: 'tank'  uuid: 472ee2b3-4dc3-4fc1-80bc-5ba967069ceb
        Total devices 4 FS bytes used 1.09TiB
        devid    2 size 1.82TiB used 372.03GiB path /dev/sdg
        devid    3 size 1.82TiB used 373.00GiB path /dev/sdh
        devid    4 size 1.82TiB used 372.00GiB path /dev/sdi
        devid    5 size 2.73TiB used 1.09TiB path /dev/sdl
 
Btrfs v3.14.2

Phew!

everything went better than expected

by Andy at August 08, 2014 06:29 AM