View from the Trenches: Looking at Thread-Dumps


By Alexandre Rafalovitch

Any software system, however well coded, will at one point or another run into into the slowdown or hang situations caused by interactions and bottlenecks not foreseen by its designers. For a system as complex as BEA WebLogic Platform, with final layer compositions varying by the customer, knowing how to deal with such issues is important.

There are multiple solutions in the market to monitor the runtime behaviour of a complex Java solution such as WebLogic. Dirig's Performance Monitor and Wily Tech's Introscope are just a couple. But what if you don't have one installed or it does not reach deep enough into the system to record the required statistics (such as thread bottleneck information)? Seasoned professionals and BEA support personnel have a ready answer for those all-too-common situations: thread-dumps. Every JVM comes with an ability to generate a thread-dump that shows the state of all threads at that point in time. Take five thread-dumps 30 seconds apart and you can see which threads are moving, which are stuck waiting for a global synchronized method, and which ones are stuck in the eternal deadlock. Or such is the theory anyway.

The Current State of Affairs


In reality, every JVM or even a major release of JVM has a slightly different format of the thread-dump printout; some do simple deadlock analysis, others don't show the locks at all. Even commercial tools that analyze thread-dumps use a JVMPI debugging interface to retrieve their information and are not able to work post-factum on already taken thread-dumps. Below are three excerpts of what the same program state will look like in three different JVMs:

Sun JVM 1.3.1 snapshot (Full dump)

Full thread dump:
		"TDeadLock-1a" prio=5 tid=0x89c0d98 nid=0xa90 waiting for monitor entry [0x8fcf000..0x8fcfdb8]
			at TDeadLock.method2(ThreadDumpTester.java:147)
			at TDeadLock.method1(ThreadDumpTester.java:138)
			at TDeadLock.run(ThreadDumpTester.java:121)
			at java.lang.Thread.run(Thread.java:479)

			......

			FOUND A JAVA LEVEL DEADLOCK:
		----------------------------
		"TDeadLock-1b":
		  waiting to lock monitor 0x8022f4 (object 0x2968468, a java.lang.Object),
		  which is locked by "TDeadLock-1a"
		"TDeadLock-1a":
		  waiting to lock monitor 0x8022d4 (object 0x2968470, a java.lang.Object),
		  which is locked by "TDeadLock-1b"

		JAVA STACK INFORMATION FOR THREADS LISTED ABOVE:
		------------------------------------------------
		Java Stack for "TDeadLock-1b":
		==========
			at TDeadLock.method2(ThreadDumpTester.java:147)
			- waiting to lock <2968468> (a java.lang.Object)
			at TDeadLock.method1(ThreadDumpTester.java:138)
			- locked <2968470> (a java.lang.Object)
			at TDeadLock.run(ThreadDumpTester.java:121)
			at java.lang.Thread.run(Thread.java:479)
		Java Stack for "TDeadLock-1a":
		==========
			at TDeadLock.method2(ThreadDumpTester.java:147)
			- waiting to lock <2968470> (a java.lang.Object)
			at TDeadLock.method1(ThreadDumpTester.java:138)
			- locked <2968468> (a java.lang.Object)
			at TDeadLock.run(ThreadDumpTester.java:121)
			at java.lang.Thread.run(Thread.java:479)

		....
		Found 2 deadlocks.

	


Sun JVM 1.4.1 snapshot (Full dump)

Full thread dump Java HotSpot(TM) Client VM (1.4.1_03-b02 mixed mode):
		"TDeadLock-1a" prio=5 tid=0x00960A60 nid=0x8bc waiting for monitor entry [b0cf000..b0cfd88]
			at TDeadLock.method2(ThreadDumpTester.java:147)
			- waiting to lock <02A3F7E8> (a java.lang.Object)
			at TDeadLock.method1(ThreadDumpTester.java:138)
			- locked <02A3F7E0> (a java.lang.Object)
			at TDeadLock.run(ThreadDumpTester.java:121)
			at java.lang.Thread.run(Thread.java:536)

		.....
		Found one Java-level deadlock:
		=============================
		"TDeadLock-1b":
		  waiting to lock monitor 0x8fec7c (object 0x2a3f7e0, a java.lang.Object),
		  which is held by "TDeadLock-1a"
		"TDeadLock-1a":
		  waiting to lock monitor 0x8fec5c (object 0x2a3f7e8, a java.lang.Object),
		  which is held by "TDeadLock-1b"

		Java stack information for the threads listed above:
		===================================================
		"TDeadLock-1b":
			at TDeadLock.method2(ThreadDumpTester.java:147)
			- waiting to lock <02A3F7E0> (a java.lang.Object)
			at TDeadLock.method1(ThreadDumpTester.java:138)
			- locked <02A3F7E8> (a java.lang.Object)
			at TDeadLock.run(ThreadDumpTester.java:121)
			at java.lang.Thread.run(Thread.java:536)
		"TDeadLock-1a":
			at TDeadLock.method2(ThreadDumpTester.java:147)
			- waiting to lock <02A3F7E8> (a java.lang.Object)
			at TDeadLock.method1(ThreadDumpTester.java:138)
			- locked <02A3F7E0> (a java.lang.Object)
			at TDeadLock.run(ThreadDumpTester.java:121)
			at java.lang.Thread.run(Thread.java:536)

		....
		Found 2 deadlocks.
	


JRockit JVM 1.4.2 snapshot (Full dump)

		===== FULL THREAD DUMP ===============
		Tue Dec 23 16:43:14 2003

		"TDeadLock-1a"  id: 0x00000980  prio: 5  ACTIVE, NONDAEMON, GCABLE
		thread: 0x40EC5808  lastj: 0x4191FDB0  pt_thr: 0x00000228
			at jrockit/vm/Threads.progressiveNap(Native Method)@40D10D60
			at jrockit/vm/Locks.stage2Lock(Native Method)@40D10FD4
			at jrockit/vm/Locks.monitorEnter(Native Method)@40D11070
			at TDeadLock.method2(ThreadDumpTester.java:145)@41690788
			at TDeadLock.method1(ThreadDumpTester.java:138)@416906F7
			at TDeadLock.run(ThreadDumpTester.java:121)@416906A2
			at java/lang/Thread.run(Unknown Source)@4152D501
			at java/lang/Thread.startThreadFromVM(Unknown Source)@40D1F173
			--- End of stack trace

		....

		=====================
		


What's Common?


What's Different?


As you can see, even though there is a lot of useful common information present in all the thread-dump types, a lot of work would have to go into parsing thread-dumps from more than one vendor or even more than one version of JVM from the same vendor. Combine that with a code that will be doing actual comparison, correlating and interpolating the results, and you have an application that is brittle and that is hard to maintain and extend. In fact, I did have two diverging applications parsing the thread-dumps for text and HTML representation, and they were quite hard to keep in synch in terms of features.

A Modest Proposal


To Try these yourself, download the author's application with extractor and xsl processing scripts. What would be a perfect solution here is to be able to convert these thread-dumps into some sort of intermediate structure/language that has flexible structure, can tolerate missing fields, and can be processed by third-party tools in various ways. And, of course, if it can somehow involve a currently popular technology, so much the better.

Do we have such an intermediate language? Indeed we do it is XML. It is somewhat human readable, has many tools written for it, and already has not one but several transformation and reporting technologies built on top of it, such as XSLT, XQuery, and XDB.

In this scenario, an extractor tool will only have one responsibility: parse the JVM-specific format and convert it into the XML document with all encompassing schema.

Following is what the XML document would look like for Sun's 1.4.1 JVM:

Sun JVM 1.4.1 XML (Pretty Printed Fragment) (Full XML)

So we have an XML document now. And while it may be more consistent than original sources, what can we actually do with it? Let's see. We will use a real-life thread-dump sequence with three dumps in it.
(Original Thread-Dumps --> Extracted XML) (Stylesheet --> Full result)
	Dump-1
		d1-Prototyper
			method:: java.lang.Object:wait---Native Method:
			lock:: waiting on-->org.logicalcobwebs.proxool.ConnectionPool$Prototyper(0x5baf0200)
			method:: java.lang.Object:wait---Object.java:426
			method:: org.logicalcobwebs.proxool.ConnectionPool$Prototyper:run---Unknown Source:
		d1-HouseKeeper
			method:: java.lang.Thread:sleep---Native Method:
			method:: org.logicalcobwebs.proxool.ConnectionPool$HouseKeeper:run---Unknown Source:
			method:: java.lang.Thread:run---Thread.java:536
	



(Stylesheet and XML file used for highlight info --> Full resulting HTML page)







Notice that if you will check the full image, only 12 out of the 95 threads had been displayed. The transformation script had automatically identified and isolated only those threads that participate in the locking spanning multiple threads. Other threads may be in the locked state as well (e.g. Object.wait() methods waiting for explicit notify() call), but as no useful relationship could be extracted for them from the thread-dump, they were not displayed here. Other transformations are more useful to analyze the flow for those threads.

Is This All There Is To It?


So, are these examples all that can be done with XML representation of the thread-dumps? Far from it. The transformations given in this article are only a small sample of what can be done with newly liberated thread-dump information. Just to get your creative juices flowing, here are a couple of other possibilities: For now, we invite you to try it for yourself.

Conclusion


By converting the various thread-dump formats into the unified XML format, we can increase portability of thread-dump analysis knowledge and visualize the relations previously hidden in the bulk of data. Additionally, as there are more XML/XSLT tools and experts than Java thread-dump experts, many more people can contribute to creating useful statistics and visualizations.