Thread exhaustion on an HP-UX machine manifests itself by one or more of the following errors in
/var/adm/syslog/syslog.log
vmunix: kthread: table is full
vmunix: WARNING: hponc_thread_create(): error creating thread for autofskd (12)
sshd[8474]: fatal: fork of unprivileged child failed
sshd[1895]: error: fork: Resource temporarily unavailable
Keywords being thread and fork and failed. You should immediatly look at nkthread with kcusage
sudo kcusage nkthread
Tunable                Usage / Setting
=============================================
nkthread                5254 / 4096
You then get a descrption with
kctune
sudo kctune -v nkthread
Tunable            nkthread
Description        Maximum number of threads on the system
Module             pm_proc
Current Value     4096
Value at Next Boot 4096
Value at Last Boot 4096
Default Value      8416
Constraints        nkthread >= 200
nkthread <= 4194304
nkthread >= max_thread_proc
nkthread >= nproc + 100
nkthread >= (5 * vx_era_nthreads)
Can Change         Immediately or at Next Boot
You then resolve the problem with ie
sudo kctune nkthread+=8192
After that you would, as a good sysadmin start to look at the usage back in time. Mind you that the percentage you see is relative to the new tunable value you just set a moment ago, not what it was at the time of the measurement back in time!
sudo kcusage -m nkthread
Tunable:Â Â Â Â Â Â Â nkthread
Setting:Â Â Â Â Â Â Â 28051
Time                          Usage     %
=============================================
Thu 09/09/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6285Â Â 22.4
Fri 09/10/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6403Â Â 22.8
Sat 09/11/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6368Â Â 22.7
Sun 09/12/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6150Â Â 21.9
Mon 09/13/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6336Â Â 22.6
Tue 09/14/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6436Â Â 22.9
Wed 09/15/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6382Â Â 22.8
Thu 09/16/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6416Â Â 22.9
Fri 09/17/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6277Â Â 22.4
Sat 09/18/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6157Â Â 21.9
Sun 09/19/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6203Â Â 22.1
Mon 09/20/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6319Â Â 22.5
Tue 09/21/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6420Â Â 22.9
Wed 09/22/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6306Â Â 22.5
Thu 09/23/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6474Â Â 23.1
Fri 09/24/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6567Â Â 23.4
Sat 09/25/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6452Â Â 23.0
Sun 09/26/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 5910Â Â 21.1
Mon 09/27/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 8260Â Â 29.4
Tue 09/28/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 8240Â Â 29.4
Wed 09/29/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6617Â Â 23.6
Thu 09/30/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6461Â Â 23.0
Fri 10/01/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 5799Â Â 20.7
Sat 10/02/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 5558Â Â 19.8
Sun 10/03/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 5892Â Â 21.0
Mon 10/04/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6983Â Â 24.9
Tue 10/05/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6542Â Â 23.3
Wed 10/06/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 6479Â Â 23.1
Thu 10/07/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 12289Â Â 43.8
Fri 10/08/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 11108Â Â 39.6
Sat 10/09/10Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 5292Â Â 18.9
Now you might be able to see what happend when and correlate it with your Change Management procedures to figure out what went wrong. I was not that lucky. This was on a database hotel consisting of 80 databases and database related applications and nearly no change control. So what to do? I needed to correlate the process information visible with ps with a thread. But how did you do that in HP-UX.
First guess, the alway valuable tool glance. And lo and behold the capital Z will show you the thread (also called Light Weight Processes in HP-UX, or LWP for short) information. But on a screen by screen basis. Useless if you have thousands of processes. After a bit of ping pong with a fellow sysadmin we ended up with the pstack (print stack) tool. It works like
$ ps -ef | grep -i java | head -1
dma65t7 6728 6708 0 17:48:17 ?        3:23
/opt/dma65t7/product/6.5/classes/com/documentum/jboss4.2.0/jdk/bin/IA64N/java
-Dprogram.name=run.sh -server -Xms256m -Xmx512m -
sudo pstack 6728 | grep -i lwpid | sed -e 's,-*,,g' | head -10
lwpid : 7653569
lwpid : 7653570
lwpid : 7653572
lwpid : 7653573
lwpid : 7653574
lwpid : 7653575
lwpid : 7653576
lwpid : 7653577
lwpid : 7653578
lwpid : 7653579
So basically I ended up with the following one-liner
ps -ef > out ;Â ps -ef | awk '{ print $1 " " $2 }' | grep -v root |
while read user pid ; do sudo pstack $pid | egrep -i "($pid|lwpid)"|
sed -e 's,-*,,g' ; done >> out 2>&1
Which gives a quick and dirty indication of which process eats up all the resources and you can go ask that application owner if that is normal. It wasn’t!