Rescuing a corrupt tarfile

Having upgraded OS recently, I was using a poor quality sneakernet of free USB sticks to transfer some data from my previous installation. This dodgy process strangely enough managed to result in some data corruption of my .tar.bz2 file, leaving me in the position of having to go to other backups to recover my data. :-(

$ tar -xkjvf corrupt_archive.tar.bz2
....
jcarr/Pictures/fluffy_cats.jpg
jcarr/Documents/favourite_java_exceptions.txt

bzip2: Data integrity error when decompressing.
    Input file = (stdin), output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

This is the first time I’ve ever experienced a corruption like this with .tar.bz2. The file was the expected size, so it wasn’t a case of a truncated file, the data was there but something part way through the file was corrupted and causing bzip2 to fail with decompression.

Bzip2 comes with a recovery utility, which works by rescuing each block into an individual file. We then run -t over them to identify any blocks which are clearly corrupt, and delete them accordingly.

$ bzip2recover corrupt_archive.tar.bz2
$ bzip2 -t rec*.tar.bz2

Then we can put the blocks back together in an uncompressed form of the original file (in this case tar);

$ bzip2 -dc rec*.tar.bz2 > recovered_data.tar

Finally we want to extract the actual tar file itself to get the data. However, tar might not be too happy about having lost some blocks inside it, or having other forms of corruption.

# tar -xvf recovered_data.tar
...
jcarr/Pictures/fluffy_cats.jpg
jcarr/Documents/favourite_java_exceptions.txt
tar: Skipping to next header
tar: Archive contains ‘\223%\322TGG!XہI.’ where numeric off_t value expected
tar: Exiting with failure status due to previous errors

I couldn’t figure out a way to get tar to skip over, or repair the file, however I did find a few posts online suggesting the use of the much older cpio utility that still exists on most unixes today.

$ cpio -ivd -H tar < recovered_data.tar

This worked perfectly! cpio complained about some files it couldn’t recover, but it recovered the vast majority of the damaged contents. Of course I can’t trust any files completely that I’ve restored, always possible there is some small corruption after such a restore, however if you lack backups, or your backups themselves are corrupted, this could be the way to go to get back some of your precious data.

In this case I was lucky that the header of the file was still intact  – if bzip2 or tar can’t read the file header to identify it as a tar.bz2 to being with, other measures may need to be taken. There’s heaps of suggestions online, just make a copy of the corrupted file first then try the different suggested methods till you find an approach that (hopefully) works for you.

This entry was posted in Uncategorized and tagged , , , , , , , , . Bookmark the permalink.

1 Response to Rescuing a corrupt tarfile

  1. cancian says:

    awesome, saved my day

Leave a Reply