Page 1 of 1

FileSystem problems on 2.6.18?

PostPosted: Sun Nov 12, 2006 11:08 pm
by Raf256

To summ it up:
On 2.6.18 + grsecurity following step damage file system which is active:
1. Use encrypted swap, like using /etc/fstab

Code: Select all
/dev/hda2   none   swap   sw,pri=5,loop=/dev/loop7,encryption=AES128,phash=random   0   0


2. create a new file system on a partition (confirmed on JFS, reiserfs, ext3). The file system itself is NOT encrypted.

3. keep writting data to that file system (in both cases I wrote about 1 GB of data, but perhaps much less is enought)

4. do swapoff -a && swapon -a

5. mkdir a directory on the filesystem

6. it will result in error (dmesg) about corrupted filesystem and mkdir will fail with filesystem is read-only error

7. unmount the filesystem, try to mount it again - it will be totally unable to work at all (corrupted superblocks etc). Also the swap partition might be damaged

Other partitions on that box (on same and on other disc) appear unharm,
but they where either created long before and in addition have noatime option, or where not written to. Perhaps partitions with noatime somehow avoid corrupting superblock (but still carry silent corruptions) or just where lucky to not be corrupted.

Please someone test is it reproducable

I think the kernel swap-off code is wrongly handling flushing of filesystem data from encrypted swap, like forgeting to decrypt it before the flush or something.
When I hexedit or hexdump the device of partition holding damaged FS, begin of it looked like random data (or... AES encrypted).


So far I reproduced it on:
a) 2.6.18(.2 afair or .1 or not?) + grsecurity Debian Stable Athlon with 512 mb ram
b) 2.6.18 + grsecurity Debian Testing Amd64 in 32bit mode (but using k8 kernel)


All the history:

Hi,
I exprienced filesystems problems recently.
I thought it was a hardware problem, but it reoccurse on different hardware.

Filesystem JFS failed suddennly short after creating it. On 2.6.18.x (.1 probably) + grsecurity. Displayed "corrupted" errors in dmesg, and after unmounting and trying to re-mount it turned out that it was totally destroyed (no superblocks, no nothing).

On other box, with ReiserFS same problem. I am trying to rebuild it but it's hard (no superblock, no journal data, no tree - all was destroyed).

It's on totally different hardware (other boxes, other HDD, other ATA cables). One of boxes is protected by UPS from power spikes etc.
Both HDDs are brand new (from different stores) - 80 and 250 GB ATA.

Both boxes have other Hard Drives and they work fine for over a year, and for long time on 2.6.18 kernel.

It is hard to find a common cause but I dont think its a coincidence.
Old file systems work fine on both systems, and new ones fail shortly after creation.

Btw, the first problem happened not long ago after I swapoff -a then swapon -a. The same case was on second box. On both I encrypt SWAP.

So, perhaps there is a bug somewhere related to some of following:
- turning on/off encrypted swap
- growing a newly-created file system tree for JFS and/or ReiserFS (3.6)

If not then two brand new HDs from different stores have serious hardware malfunction (corruption of superblocks of filesytems) that where not detected by long badblocks runs and other low-level tests, and failure occured after just 2 days of using, 2 discs of 2 I tried in one week time.


== EDIT ==

I can confirm: error reoccures exacly after doing swapoff -a swapon -a
Now on ext3.

I am here using 2.6.18 kernel with grsecurity patch.

Steps to reproduce error:
0. 2.6.18 kernek on amd64 (also on athlon), using encrypted swap partitions
1. create new partition filesystem
2. write heavly to it
3. after finishing writes do swapoff -a swapon -a
4. try to mkdir on that file system - you will recive an error

With ext3 I got following dmesg:


Adding 248992k swap on /dev/loop7. Priority:5 extents:1 across:248992k
Adding 506032k swap on /dev/loop6. Priority:3 extents:1 across:506032k
Adding 48829524k swap on /dev/loop5. Priority:4 extents:1 across:48829524k
attempt to access beyond end of device
hdc8: rw=0, want=2999390520, limit=97659072
EXT3-fs error (device hdc8): read_inode_bitmap: Cannot read inode bitmap - block_group = 121, inode_bitmap = 374923814
Aborting journal on device hdc8.
EXT3-fs error (device hdc8) in ext3_new_inode: IO failure
EXT3-fs error (device hdc8) in ext3_mkdir: IO failure


After unmounting and remounting the filesystem it will be probablly totally shreded.

Re: FileSystem problems on 2.6.18?

PostPosted: Mon Nov 13, 2006 8:46 am
by PaX Team
Raf256 wrote:Filesystem JFS failed suddennly short after creating it. On 2.6.18.x (.1 probably) + grsecurity. Displayed "corrupted" errors in dmesg, and after unmounting and trying to re-mount it turned out that it was totally destroyed (no superblocks, no nothing).
can you determine the nature of the corruption? such as, 'file system metadata overwritten with garbage' or 'zeroed out', etc. also, you should post your grsec .config. the only thing i can think of that could possibly be causing this is SANITIZE and if that's what does this then you've likely found a 'use-after-free' kind of bug in the filesystem code (in which case you should turn off SANITIZE and enable DEBUG_PAGEALLOC instead and hope you'll get an oops).
On other box, with ReiserFS same problem. I am trying to rebuild it but it's hard (no superblock, no journal data, no tree - all was destroyed).
again, what does destroyed mean exactly? 'garbage' or 0s?
Both boxes have other Hard Drives and they work fine for over a year, and for long time on 2.6.18 kernel.
did you also use grsec with 2.6.18? if so, with the same .config (in particular SANITIZE, if you ever used it)?
Steps to reproduce error:
0. 2.6.18 kernek on amd64 (also on athlon), using encrypted swap partitions
1. create new partition filesystem
2. write heavly to it
3. after finishing writes do swapoff -a swapon -a
4. try to mkdir on that file system - you will recive an error
what if you don't encrypt your swap (i.e., leave loop out of the picture)?

PostPosted: Mon Nov 13, 2006 2:12 pm
by Raf256
It was filled with random garbage.

--EDITED--
On both kernels I used same config, including:

# CONFIG_PAX_MEMORY_SANITIZE is not set
# CONFIG_PAX_MEMORY_UDEREF is not set


I will analyze it more in free time.