Thread exhaustion on an HP-UX machine manifests itself by one or more of the following errors in
/var/adm/syslog/syslog.log
vmunix: kthread: table is full vmunix: WARNING: hponc_thread_create(): error creating thread for autofskd (12) sshd[8474]: fatal: fork of unprivileged child failed sshd[1895]: error: fork: Resource temporarily unavailable
Keywords being thread and fork and failed. You should immediatly look at nkthread with kcusage
sudo kcusage nkthread Tunable                Usage / Setting ============================================= nkthread                5254 / 4096
You then get a descrption with
kctune
sudo kctune -v nkthread Tunable            nkthread Description        Maximum number of threads on the system Module             pm_proc Current Value     4096 Value at Next Boot 4096 Value at Last Boot 4096 Default Value      8416 Constraints        nkthread >= 200 nkthread <= 4194304 nkthread >= max_thread_proc nkthread >= nproc + 100 nkthread >= (5 * vx_era_nthreads) Can Change         Immediately or at Next Boot
You then resolve the problem with ie
sudo kctune nkthread+=8192
After that you would, as a good sysadmin start to look at the usage back in time. Mind you that the percentage you see is relative to the new tunable value you just set a moment ago, not what it was at the time of the measurement back in time!
sudo kcusage -m nkthread Tunable:       nkthread Setting:       28051 Time                          Usage     % ============================================= Thu 09/09/10                   6285  22.4 Fri 09/10/10                   6403  22.8 Sat 09/11/10                   6368  22.7 Sun 09/12/10                   6150  21.9 Mon 09/13/10                   6336  22.6 Tue 09/14/10                   6436  22.9 Wed 09/15/10                   6382  22.8 Thu 09/16/10                   6416  22.9 Fri 09/17/10                   6277  22.4 Sat 09/18/10                   6157  21.9 Sun 09/19/10                   6203  22.1 Mon 09/20/10                   6319  22.5 Tue 09/21/10                   6420  22.9 Wed 09/22/10                   6306  22.5 Thu 09/23/10                   6474  23.1 Fri 09/24/10                   6567  23.4 Sat 09/25/10                   6452  23.0 Sun 09/26/10                   5910  21.1 Mon 09/27/10                   8260  29.4 Tue 09/28/10                   8240  29.4 Wed 09/29/10                   6617  23.6 Thu 09/30/10                   6461  23.0 Fri 10/01/10                   5799  20.7 Sat 10/02/10                   5558  19.8 Sun 10/03/10                   5892  21.0 Mon 10/04/10                   6983  24.9 Tue 10/05/10                   6542  23.3 Wed 10/06/10                   6479  23.1 Thu 10/07/10                  12289  43.8 Fri 10/08/10                  11108  39.6 Sat 10/09/10                   5292  18.9
Now you might be able to see what happend when and correlate it with your Change Management procedures to figure out what went wrong. I was not that lucky. This was on a database hotel consisting of 80 databases and database related applications and nearly no change control. So what to do? I needed to correlate the process information visible with ps with a thread. But how did you do that in HP-UX.
First guess, the alway valuable tool glance. And lo and behold the capital Z will show you the thread (also called Light Weight Processes in HP-UX, or LWP for short) information. But on a screen by screen basis. Useless if you have thousands of processes. After a bit of ping pong with a fellow sysadmin we ended up with the pstack (print stack) tool. It works like
$ ps -ef | grep -i java | head -1 dma65t7 6728 6708 0 17:48:17 ?        3:23 /opt/dma65t7/product/6.5/classes/com/documentum/jboss4.2.0/jdk/bin/IA64N/java -Dprogram.name=run.sh -server -Xms256m -Xmx512m - sudo pstack 6728 | grep -i lwpid | sed -e 's,-*,,g' | head -10 lwpid : 7653569 lwpid : 7653570 lwpid : 7653572 lwpid : 7653573 lwpid : 7653574 lwpid : 7653575 lwpid : 7653576 lwpid : 7653577 lwpid : 7653578 lwpid : 7653579
So basically I ended up with the following one-liner
ps -ef > out ;Â ps -ef | awk '{ print $1 " " $2 }' | grep -v root | while read user pid ; do sudo pstack $pid | egrep -i "($pid|lwpid)"| sed -e 's,-*,,g' ; done >> out 2>&1
Which gives a quick and dirty indication of which process eats up all the resources and you can go ask that application owner if that is normal. It wasn’t!
Leave a Reply
You must be logged in to post a comment.