Andrew's Stuff

LVM + ReiserFS Partition Recovery

Posted at Fri, 4 Oct 2013, 23:13:16

Almost a year ago now, I had one of the legs of an LVM mirror fail. Ordinarily, this should not be a problem, however my particular set-up was a little odd. Specifically, I was passing both physical disks through to a kvm/libvirt virtual machine, and then applying LVM there, with both disks as standard mirrors of each other with a handful of ReiserFS partitions on top. While the disks were healthy, this was no problem at all and it worked fine. Unfortunately, when one of the disks started failing, while the host could see that it was clearly broken - it was reporting I/O errors pretty constantly - the guest was just hanging all the time but refused to acknowledge that it was actually dead.

As it was a long time ago now, I don't recall all of the details, but I think that I then removed the disks from the VM and attempted to recover things directly on the host. Unfortunately the host and the guest assigned different UUIDs to the disks, so this was an absolute disaster. After a whole load of fruitless LVM commands I was left with two disks that theoretically contained the same data as each other, one still working flawlessly, but being only one leg of a mirrored pair and LVM was refusing to let me activate any of the logical volumes on it because the other disk was missing. "Yes, of course it's missing! It's a dead disk! Now let me mount this bloody thing so I can copy the data off of it and onto a new (much larger) disk!"

I poked around with it for a bit longer, inevitably only making things worse, getting frustrated and forgetting what I'd done, thus ending up with the disks not even containing their LVM headers anymore so I had no idea what the disk structure should even have been!

At that point I kind of gave up and just left it there. I had better things to spend my time on. I took a full clone of the live disk (in case that disk also failed) and then just left it there.

Until, that is, I replaced my main desktop PC and needed to access the failed data as it had a whole load of software installes that I needed. Some of them I could redownload, if I could figure out where to redownload them from, but others I wouldn't have that luxury with - software from dubious developers who say things like "once you've purchased this software, your download link will be valid for 30 days, after which time you'll have to re-purchase it if you lose it! ... or you can spend £50 on a CD!"

This leads me to last week. I decided to have another stab at rescuing the data. In my case I was very lucky as all of my LVM logical volumes were completely contiguous. The recovery process I describe below will not work for non-contiguous logical volumes! With contiuous volumes, all I needed to do was find the start of each volume/partition, figure out the size of it, and then read out that much data. With a non-contiguous volume you would have to figure out how/where it's split and where each chunk of it is, which would be near-impossible without a backup of the LVM structure somewhere (which, in my case had been lost, but you may be lucky and be able to find a copy of it in /etc/lvm/backup/, /etc/lvm/archive/, the top of one of your LVM physical volumes - each physical volume contains the metadata describing the structure of all of the other disks - or, if you use `etcbackup`, in an old copy of the LVM configuration: probably /etc/lvm/backup/).

I initially tried using TestDisk, which ran quite happily (once I'd recompiled it with ReiserFS support...), however it ended up finding dozens upon dozens of copies of each partition. It was even able to get a list of files from each partition, so I thought that I was pretty much finished and safe, but any attempt to actually copy the data out of the partition resulted in it just hanging indefinitely. So much for that.

I then had my first of two significant flashes of inspiration: my problem was that I didn't know where the start of each partition was, but surely the start of a ReiserFS partition must have a common structure? A known header or "magic string"? If I could just find out what that is, I could search through the disk's contents for that string and get myself a list of all of the partition start points!

It turns out that, yes, ReiserFS has a magic string in the "superblock" of data that it stores at the top of each partition. This magic string is, in ASCII, "ReIsEr2Fs" (yes, magic capitalisation, too)!

"Great!" I thought. I proceeded to open up `hexedit` on the disk clone, [TAB] into ASCII mode, hit "/" to open the "Search" function and search for that "ReIsEr2Fs" string.

Now that I knew where my first partition started, I could look around that location (i.e. the rest of the 'superblock') to get all of the other details of the partition. forensicswiki.org has a page detailing the structure of the ReiserFS superblock, so from that page I was able to determine that 52 bytes before the start of the magic string is the start of the superblock itself (the superblock is outlined in yellow, the magic string in blue, and the superblock location in purple):

superblock

(Before I go any further, a note about the forensicswiki.org page: it currently cites its source as being http://homes.cerias.purdue.edu/~florian/reiser/reiserfs.php, but this URL was 404ing when I tried accessing it. I ended up finding an archived mirror of that page at http://archive.is/O1dy).

The first four bytes (outlined in red, above) contain the number of blocks in the partition. This is stored in little-endian format, so where `hexedit` reports that the block count in my example is 0x00 00 58 02, the "real" hexadecimal number of blocks is 0x02 58 00 00, which is 39,321,600 in decimal. This is not the size of the partition: it is merely the number of blocks in the partition. To determine the full size you also need to know how big the blocks are.

The block size is stored at bytes 44 and 45 of the superblock (outlined in green, above). In my example above this is 0x00 10. Again, this is little-endian, so it's really 0x10 00, i.e. 4096 bytes. This is actually the default block size, so it's fairly likely that you'll also have 4096-byte blocks, but that isn't guaranteed.

Combining this information I was able to determine that this partition was 39,321,600 x 4096 bytes in size - i.e. exactly 150 GB.

The rest of the superblock is surprisingly irrelevant for our purposes!

I then made a note of the superblock's location (0x04 00 00 in this example) and the size of the partition, then hit "/" to find the next magic string. It's at this point that things get a bit weird. By rights, the next magic string should be the start of the next partition (assuming, of course, that no files in that partition happen to contain the same magic string, but that isn't what happened here), however what I found was that each partition had dozens (possibly hundreds) of identical superblock sectors. At first I didn't notice this and ended up creating a list of the locations of each of them. Very quickly I had a list that confidently proclaimed that the first 15 MB of my disk contained more than 4 TB's worth of 150 GB partitions. This is clearly complete nonsense.

I still have no idea what caused that weirdness (according to the ReiserFS specs, the superblock should only exist once...), but it would certainly explain the odd results that TestDisk came back with. At this point I went to bed feeling defeated, as there was no way I wanted to search endlessly through 500 GB of data in a hex editor trying to find things!

In the morning I had my second flash of inspiration: I already knew how large the partition was (from the superblock metadata), so I could simply find the superblock then move forward by almost the size of the partition (I subtracted 512 MB from the partition size to allow for any slight miscalculations - it would later turn out that this probably saved me running into another problem!).

I calculated the location to jump to by taking the current superblock's location (in the image above, this was 0x04 00 00), converting it to decimal (262,144), adding on the size of the partition - 150 GB - in bytes (161,061,273,600; giving me 161,061,535,744), subtracting my 512 MB of safety buffer (536,870,912; leaving 160,524,664,832), then converting the result back in to hex: 0x25 60 04 00 00. You can then press [Enter] in `hexedit` and type the resulting hexadecimal address to jump to that location. Note that `hexedit` is expecting the entered hex address to be big-endian (i.e. "normal"), so this should be entered exactly as your calculator displays it, rather than with the bytes inverted.

Once I'd moved forward to the approximate end of the partition, I could run another search for the magic string without the interference of the phantom duplicate superblocks. Sure enough, my first new search found a superblock describing a completely differently-sized partition! I then simply repeated this search process for each of the partitions until I'd built up a complete list of partition locations & sizes:

partition-list

Once I had my full list of partitions, I chose the smallest one to attempt recovery on first (as I would then waste less time waiting around when I inevitably screwed up...). This was a 5 GB partition.

The first stage of retrieving the files on this partition is to copy the raw partition data to a temporary location. Yes, unfortunately this process requires that you have a fair amount of spare hard disk space, but hard disks are pretty cheap these days (~2.6p/GB!) so this shouldn't be too challenging a problem to resolve. The `dd` command will happily do this work for you, but you'll probably want to use a larger-than-default block sizes to aid performance.

I chose 4096 bytes (i.e. 4 kB) to match the block size my partitions were using, so I had to divide all of my locations and sizes by 4096. Choosing the same block size for `dd` as was used by ReiserFS ensured that all of my partition sizes would be exact multiples of my block size and I wouldn't have to tell `dd` to copy fractions of blocks, which it likely wouldn't enjoy. This means that the superblock location went from being at a location of 0x25 80 04 00 00 (161,061,535,744) bytes to being at 0x02 58 00 40 (39,321,664) blocks. Its size also went from 5,368,709,120 bytes (5 GB) to being 1,310,720 blocks. Therefore the `dd` command was:

`dd if=/dev/sdd1 of=5gb-disk.img bs=4096 skip=39321664 count=5368709120`

I opted to run the `dd` command through `pv` (pipe view) so that I could get a real-time view of the transfer speed, a progress bar and an ETA, but this was optional:

`dd if=/dev/sdd1 bs=4096 skip=39321664 count=5368709120 | pv -s 5G | dd of=5gb-disk.img bs=4096`

Once `dd` had finished I attempted to mount the extracted partition image via a loopback device:

`mkdir 5gb-mount`
`mount -t reiserfs 5gb-disk.img 5gb-mount`

Unfortunately this failed. Checking `dmesg | tail` revealed a complaint that the mounter couldn't find a ReiserFS superblock at the start of the disk. I figured that I was probably supposed to have copied some extra data before the superblock as well, but I didn't fancy trawling through more documentation and web search results. Instead I just created a temporary 50 MB file, initialised a new ReiserFS partition on it, and used `hexedit` to determine the location of its superblock (using the same method detailed above):

`dd if=/dev/zero of=dummy-partition bs=1k count=50k`
`mkfs.reiserfs dummy-partition`
`hexedit dummy-partition`

This revealed that the superblock is actually stored 65,536 (0x1 00 00) bytes after the start of the partition, so I simply subtracted that offset from the locations of each of the superblocks that I had noted down (this was easy, as each location ended in 0x4 00 00, so I simply changed the '4' to a '3'!) and re-ran the original `dd` command.

This time the `mount` command was successful so I was able to run a simple `cp -r` command to copy all of the files from the recovered partition onto a new disk, before `umount`'ing the partition and deleting the temporary file.

After repeating that process with each of the partitions I had a copy of all of my data, but in partitions and on disks with a known structure that could be properly mounted! (Also not residing on a half-failing disk...)

blog comments powered by Disqus