Mittwoch, 10. Dezember 2014

[solved] Symantec Backup Exec on linux 3.x kernel [fixed]

I just wanted to publish some findings about a problem, which encountered at work, for which there is a simple fix, I could not find somewhere else. And I did not have any other place to publish it, hope google finds it for the right people ;).

At work we use Symantec Backup Exec 2010 to backup our windows machines, but also to backup some data on our (my) Linux servers. After a kernel update, caused by updating from Debian Squeeze to Debian Wheezy, the Backup Exec client beremote failed to start (sorry for the layout mess, but I want google to find it):

ksh # /opt/VRTSralus/bin/beremote
GetIfAddrs(LINUX): failed err = 11
*** glibc detected *** /opt/VRTSralus/bin/beremote: free(): invalid pointer: 0xb5ff84a4 ***
======= Backtrace: =========
/lib/i386-linux-gnu/i686/cmov/libc.so.6(+0x70c91)[0xb58f2c91]
/lib/i386-linux-gnu/i686/cmov/libc.so.6(+0x724f8)[0xb58f44f8]
/lib/i386-linux-gnu/i686/cmov/libc.so.6(cfree+0x6d)[0xb58f763d]
/opt/VRTSralus/bin/libbesocket.so(_Z11freeifaddrsP7ifaddrs+0x2e)[0xb5fd75f6]
/opt/VRTSralus/bin/libbesocket.so(_Z10getifaddrsPP7ifaddrs+0x72c)[0xb5fd75a0]
/opt/VRTSralus/bin/libbesocket.so(_Z20GetAdaptersAddressesmmPvP21_IP_ADAPTER_ADDRESSESPm+0x69)[0xb5fd7ef1]
/opt/VRTSralus/bin/libbesocket.so(_ZN8BESocket13BENetConfigEx18RefreshInformationEPKb+0x41)[0xb5fe1451]
/opt/VRTSralus/bin/libbesocket.so(_ZN8BESocket13BENetConfigExC1Eb+0xf4)[0xb5fe12f8]
/opt/VRTSralus/bin/libbesocket.so(_ZN8BESocket13BENetConfigEx23GetNetworkConfigurationEb+0x2c)[0xb5fe18a4]
/opt/VRTSralus/bin/libbedsvx.so(_Z14VX_EnumSelfDLEP6BE_CFGP8HEAD_DLE+0x300)[0xb61d9d30]
/opt/VRTSralus/bin/libndmpcomm.so(+0x24a58)[0xb6057a58]
/opt/VRTSralus/bin/libndmpcomm.so(+0x24e41)[0xb6057e41]
/opt/VRTSralus/bin/libndmpcomm.so(_Z20NrdsAdvertiserThreadPv+0x12a)[0xb60596f6]
/opt/VRTSralus/bin/libvxACEI.so.3(_ZN18ACE_Thread_Adapter8invoke_iEv+0x52)[0xb77229d6]
/opt/VRTSralus/bin/libvxACEI.so.3(_ZN18ACE_Thread_Adapter6invokeEv+0x61)[0xb7722941]
/opt/VRTSralus/bin/libvxACEI.so.3(ace_thread_adapter+0xe)[0xb76ebd1a]
/lib/i386-linux-gnu/i686/cmov/libpthread.so.0(+0x5c39)[0xb6977c39]
/lib/i386-linux-gnu/i686/cmov/libc.so.6(clone+0x5e)[0xb59599fe]
Reverting back to a 2.x kernel solved the problem. As I needed the new kernel for other reasons, reverting back was no option.
So I asked google and discovered, that only Backup Exec 2014 would work with 3.x kernels. Not nice, as we did not want to upgrade all our environment just because of the linux client.

Time for some strace and gdb sessions.

It was not too difficult to detect, that some ioctrl call was the last thing, happening before the crash. Some more searching on the internet revealed, that the following kernel patch is responsible for the crash: http://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=41c31f318a5209922d051e293c61e4724daad11c. Interesting parts:

- err = -EINVAL;
+ err = -ENOTTY;

- return -EINVAL;
+ return -ENOTTY;

- return -EINVAL;
+ return -ENOTTY;
So now a call returns a different code, which seems to cause a crash in beremote. Who coded beremote!?
Nevertheless, I patched the kernel 3.x to return the old value and beremote started working again.
But this might have effects on other software, which expects the new (correct) return code and I did not want to manually build every new kernel on all of my machines.
Looking at the crash output above, it is clear, that the problem is in libbesocket.so.

Time for some disassembler sessions.

It's been a while, since I last read assembler code, but seems like I am old enough, to still know such stuff ;).

.text:0001FF05   call    _ioctl
.text:0001FF0A   add     esp, 10h
.text:0001FF0D   test    eax, eax
.text:0001FF0F   jns     short loc_1FF26
.text:0001FF11   call    ___errno_location
.text:0001FF16   cmp     dword ptr [eax], 16h
.text:0001FF19   jnz     loc_2026C
EINVAL is defined as 22, hex 0x16. As this is the only place, where such a compare happens, lets patch it! Fire up your favorite hex editor and have a look at offset 0x1ff10:
0001ff10: 15 E8 5A DA FE FF 83 38 16 0F 85 4D 03 00 00 C7
Ah there is a 0x16!

Time to patch beremote!

Now we replace it with the correct value for ENOTTY, which is 0x19!

And .. now beremote starts again and can do backups, even on newer kernels! So, if you have the same problem and google brought you here: The patch above works for the following file versions:
ksh # shasum beremote libbesocket.so
c11b9872438ffbca29f5bf5af9288eb994f92bd0  beremote
a320e4236f75185aa5299fba44291c471f2ac235  libbesocket.so

patched libbesocket.so:

sh # shasum libbesocket.so.patched
e52c0d1bb948bdd472dd6192af52626715f7419c  libbesocket.so.patched

filesizes:

390312 beremote
271624 libbesocket.so
Warning:
  • Be sure to just apply the patch, if you have exactly those files.
  • If you have a different version, have alook above, that is why I wrote down all that stuff. You should be able to search for the above hex values and find 0x16 yourself.
  • If not, you should know, where to look in the disassembly.
  • If not, maybe you should upgrade to Backup Exec 1024 ;).
If I saved you a lot of trouble and/or a lot of money, there is a donate button on http://sourceforge.net/projects/janus-uae/, which you may also use for this patch ;). But of course, this patch is for free!

1 Kommentar: