I'm running IPSec in tunnel mode on 3.10.1 box (Gentoo Hardened, amd64/SMP). When constify plugin is enabled the kernel crashes under heavy load.
It's pretty easy to spot that it's caused by null pointer dereference in function xfrm4_garbage_collect. Static structure xfrm4_policy_afinfo field garbage_collect seems to be null.
Now, what's interesting, field garbage_collect is initialized in function xfrm_policy_register_afinfo. I checked that adding some printks. After calling xfrm_policy_register_afinfo structure has the following values:
- Code: Select all
[ 0.358427] xfrm_policy_register_afinfo: Struct xfrm_policy_afinfo: family=2, dst_ops=8183c4c0, garbage_collect=81416ee0, dst_lookup=81415090, get_saddr=81415020, decode_session=814152e0, get_tos=81414df0, init_dst=0, init_path=81414e10, fill_dst=81414e20, backhole_router=813bfec0
Even after initialization has completed, each xfrm_alloc_dst call this structure retains all the field values:
- Code: Select all
[ 120.876704] xfrm_alloc_dst: Struct xfrm_policy_afinfo: family=2, dst_ops=8183c4c0, garbage_collect=81416ee0, dst_lookup=81415090, get_saddr=81415020, decode_session=814152e0, get_tos=81414df0, init_dst=0, init_path=81414e10, fill_dst=81414e20, backhole_router=813bfec0
Unfortunately, when it comes to xfrm4_garbage_collect garbage_collect field seems to have zero value:
- Code: Select all
[ 176.176290] xfrm4_garbage_collect: Struct xfrm_policy_afinfo: family=2, dst_ops=8183c4c0, garbage_collect=00000000, dst_lookup=81415090, get_saddr=81415020, decode_session=814152e0, get_tos=81414df0, init_dst=0, init_path=81414e10, fill_dst=81414e20, backhole_router=813bfec0
What's even more interesting, when I add condition bypassing call to garbage_collect if it's null, which prevents kernel panic, it seems that garbage_collect value in function xfrm_alloc_dst is still intact:
- Code: Select all
[ 301.313524] xfrm_alloc_dst: Struct xfrm_policy_afinfo: family=2, dst_ops=8183c4c0, garbage_collect=81416ee0, dst_lookup=81415090, get_saddr=81415020, decode_session=814152e0, get_tos=81414df0, init_dst=0, init_path=81414e10, fill_dst=81414e20, backhole_router=813bfec0
Why is this happening? Is this some sort of CPU cache issue?