G++ and GCC don’t mix

My boss recently volunteered me to work with another group having problems calling a library via JNI on a Solaris box. I know Linux, so I wasn’t totally out of my depth, but I’ve never done JNI, and the GNU commands on Solaris (SunOS 5.8) behaves a bit differently than what I’m used to. I was a bit surprised, but said I’d try my best – and found myself enjoying the return to actual code a great deal. Thank goodness they had bash in addition to the default ksh, though.

Running the Java program, the only information was an exception java.lang.UnsatisifiedLinkError with the method name. There was also a line above that stated that the library in question couldn’t be used. Not very helpful. The Solaris box wasn’t under my control, so no attaching a debugger or installing packages. The library being used had been generated on a different Solaris box (same SunOS version); the box I was using didn’t even have GCC installed.

I won’t go through everything I did (four hours, with lots of red herrings), but I’ll mention a few of the things that helped.

  • Ran the java app with jdb. jdb is pretty minimal, but the major debugger functions are there. In particular, narrowing down to the exact line where the exception was thrown (through a mixture of source code inspection, and trial and error) helped a lot. I accidentally picked up an earlier UnsatisifiedLinkException that was involved in the printing of that single line error message. It turned out that the code (auto-generated?) was swallowing an exception and printing to System.out without any information from the causal exception – important details disappeared.
  • Reproduced the problem by creating a test application that did something similar. In this case, a little searching revealed the System.loadLibrary(String) function, where it’s pretty clear that if used to load a native library, should reasonably reproduce such a fatal error. Interestingly, the exception message was quite a lot different, “symbol __gxx_personality_v0: referenced symbol not found“. Some searching revealed a conversation that suggested this had to do with a C++ symbol not being pulled in. This was the tidbit that triggered the discovery of the fatal flaw: the compilation instructions used g++ for the first two steps, and then gccfor the last. Using g++ throughout fixed the problem.
  • Walked through the Java runtime source code to enlighten me what was involved in loading a native library. Doing that, and some searches, helped me understand what the filename of the library on Solaris had to be, the way it was searched for, etc. Not totally necessary, but it helped me to understand the whole situation.
  • This may have been a red herring, but I add it for completeness. The original compilation process was done on SunOS 5.8, but with v3.2 of GCC. The libgcc_s.so.1 on the target box was v3.3. To address that, I went to this site that hosts many precompiled Solaris binaries and found v3.3 for Solaris 8. The site is incredibly slow, however, so a search on the package name turned up a Sunfreeware mirror that was many times faster. A quick trip to the newsgroups taught me that Solaris uses a pkg scheme, and that I had to use pkgtrns to extract the binaries out. Oh, and I also learned that IE can have trouble downloading a gzipped file, treating it like text and corrupting it for some odd reason.

To be honest, I had a lot of fun. There’s something about being able to call on heterogenous types and sources of information to find a proper solution that’s quite enjoyable for me.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

Gravatar
WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.