Archive for the ‘hp-ux’ Category

LUNZ and HP-UX 11.31

Wednesday, November 28th, 2012

LUNZ is a placeholder. It is as simple as that. In HP-UX 11.31, ioscan runs without human intervention. Thus you can end up in a situation where your server can see the storage cabinet, but no luns are presented to the server … yet. And thus you get LUNZ. The same applies for 11.23 and below, but it is unlikely that you scan for devices before your storage admin comes along.

So back on track. You then end up with

disk 3 0/0/0/5/0/0/0.101.140.255.0.0.0 sdisk NO_HW DEVICE DGC LUNZ
/dev/dsk/c26t0d0 /dev/rdsk/c26t0d0
disk 4 0/0/0/5/0/0/0.101.156.255.0.0.0 sdisk NO_HW DEVICE DGC LUNZ
/dev/dsk/c27t0d0 /dev/rdsk/c27t0d0

And no matter how many times you do ioscan -fn, you get no devices. Before you go yelling at your storage admin, then try this

sudo ioscan -fnkC disk | grep LUNZ
disk 3 0/0/0/5/0/0/0.101.140.255.0.0.0 sdisk NO_HW DEVICE DGC LUNZ
disk 4 0/0/0/5/0/0/0.101.156.255.0.0.0 sdisk NO_HW DEVICE DGC LUNZ
disk 7 0/0/0/5/0/0/1.201.140.255.0.0.0 sdisk NO_HW DEVICE DGC LUNZ

sudo rmsf -H 0/0/0/5/0/0/0.101.140.255.0.0.0
sudo rmsf -H 0/0/0/5/0/0/0.101.156.255.0.0.0
sudo rmsf -H 0/0/0/5/0/0/1.201.140.255.0.0.0
sudo rmsf -H 0/0/0/5/0/0/1.201.156.255.0.0.0

Check

sudo ioscan -fnkC disk | grep LUNZ | wc -l
0

Try to get them onboard

sudo ioscan -fn
sudo insf
sudo ioscan -fnkC disk

Check

sudo ioscan -fnkC disk | grep DGC | wc
0 0 0

Still not there?! Odd? Not really. On HP-UX 11.31, you also have the new agile device. This is blocking the access.

sudo ioscan -fnNkC disk |grep LUNZ
disk 5 64000/0xfa00/0×5 esdisk NO_HW DEVICE DGC LUNZ

Run

sudo rmsf -H 64000/0xfa00/0×5

Now run

sudo ioscan -fnC disk
sudo insf

Check. Lo and behold. They are there now.

sudo ioscan -fnkC disk | grep DGC | wc
24 204 2244

HP-UX filecache_max tunable and system unresponsiveness

Thursday, January 12th, 2012

During work today I tweaked the online tunable called filecache_max on an HP-UX 11.31 box. From 1 to 5%. Went fine. I tested what I needed to test and then decided to lower the value again. So I ran kctune filecache_max=2% and then nothing.

Everything(!) stopped working. Well, the system could be pinged, but the other cluster node started to complain about its sister disapperaing. I littlerally got cold hands. This was supposed to be an online operation. I waited for 4 very very long minutes (being a computer professional have learned me to wait for stuff to finish).

After 4 minutes the system was back. Running as nothing had happened. Everything littlerally just froze up trying to reduce the filecache. I learnt it the hard way. Hope this post will prevent you from getting into the same kind of troubles.

Debugging dns problems

Thursday, February 17th, 2011

Recently I faced a DNS problem in a complex setup. I had a very locked down jumphost with one public network and two internal networks and a very nazi firewall controlling what packets went in an out.

On the inside I had a linux machine running BIND, also with a firewall and a locked down setup.

On yet another host on the inside, running HP-UX, DNS resolving worked just fine.

On the jumphost it didn’t work at all.

Took me hours to figure out what was going on. I went over the firewall again and again. On both the jumphost and the DNS server. I went over the bind configuration again and again. The network setup. To no avail. All i got was

Got recursion not available from 192.168.1.79, trying next server

In the end it turned out to be due to the fact that on the linux jump server, I had a two nameserver lines

domain zensonic.dk
search zensonic.dk
nameserver 192.168.1.79
nameserver 192.168.1.80

I hadn’t bothered to setup the DNS at 192.168.1.80 and thus my linux client would not function. As soon as I removed 192.168.1.80 from /etc/resolv.conf everything was as it should be. I hope that you, reading this, saves some hours worth of debugging. If you do, drop me a line/mail/beer :-)

Debugging thread exhaustion on HP-UX

Saturday, October 9th, 2010

Thread exhaustion on an HP-UX machine manifests itself by one or more of the following errors in

/var/adm/syslog/syslog.log
vmunix: kthread: table is full
vmunix: WARNING: hponc_thread_create(): error creating thread for autofskd (12)
sshd[8474]: fatal: fork of unprivileged child failed
sshd[1895]: error: fork: Resource temporarily unavailable

Keywords being thread and fork and failed.  You should immediatly look at nkthread with kcusage

sudo kcusage nkthread
Tunable                 Usage / Setting
=============================================
nkthread                 5254 / 4096

You then get a descrption with

kctune
sudo kctune -v nkthread
Tunable             nkthread
Description         Maximum number of threads on the system
Module              pm_proc
Current Value      4096
Value at Next Boot  4096
Value at Last Boot  4096
Default Value       8416
Constraints         nkthread >= 200
 nkthread <= 4194304
 nkthread >= max_thread_proc
 nkthread >= nproc + 100
 nkthread >= (5 * vx_era_nthreads)
Can Change          Immediately or at Next Boot

You then resolve the problem with ie

sudo kctune nkthread+=8192

After that you would, as a good sysadmin start to look at the usage back in time. Mind you that the percentage you see is relative to the new tunable value you just set a moment ago, not what it was at the time of the measurement back in time!

sudo kcusage -m nkthread
Tunable:        nkthread
Setting:        28051
Time                           Usage      %
=============================================
Thu 09/09/10                    6285   22.4
Fri 09/10/10                    6403   22.8
Sat 09/11/10                    6368   22.7
Sun 09/12/10                    6150   21.9
Mon 09/13/10                    6336   22.6
Tue 09/14/10                    6436   22.9
Wed 09/15/10                    6382   22.8
Thu 09/16/10                    6416   22.9
Fri 09/17/10                    6277   22.4
Sat 09/18/10                    6157   21.9
Sun 09/19/10                    6203   22.1
Mon 09/20/10                    6319   22.5
Tue 09/21/10                    6420   22.9
Wed 09/22/10                    6306   22.5
Thu 09/23/10                    6474   23.1
Fri 09/24/10                    6567   23.4
Sat 09/25/10                    6452   23.0
Sun 09/26/10                    5910   21.1
Mon 09/27/10                    8260   29.4
Tue 09/28/10                    8240   29.4
Wed 09/29/10                    6617   23.6
Thu 09/30/10                    6461   23.0
Fri 10/01/10                    5799   20.7
Sat 10/02/10                    5558   19.8
Sun 10/03/10                    5892   21.0
Mon 10/04/10                    6983   24.9
Tue 10/05/10                    6542   23.3
Wed 10/06/10                    6479   23.1
Thu 10/07/10                   12289   43.8
Fri 10/08/10                   11108   39.6
Sat 10/09/10                    5292   18.9

Now you might be able to see what happend when and correlate it with your Change Management procedures to figure out what went wrong. I was not that lucky. This was on a database hotel consisting of 80 databases and database related applications and nearly no change control. So what to do? I needed to correlate the process information visible with ps with a thread. But how did you do that in HP-UX.

First guess, the alway valuable tool glance. And lo and behold the capital Z will show you the thread (also called Light Weight Processes in HP-UX, or LWP for short) information. But on a screen by screen basis. Useless if you have thousands of processes. After a bit of ping pong with a fellow sysadmin we ended up with the pstack (print stack) tool. It works like

$ ps -ef | grep -i java | head -1
 dma65t7  6728  6708  0 17:48:17 ?         3:23 
/opt/dma65t7/product/6.5/classes/com/documentum/jboss4.2.0/jdk/bin/IA64N/java 
-Dprogram.name=run.sh -server -Xms256m -Xmx512m -

sudo pstack 6728 | grep -i lwpid | sed -e 's,-*,,g' | head -10
lwpid : 7653569
lwpid : 7653570
lwpid : 7653572
lwpid : 7653573
lwpid : 7653574
lwpid : 7653575
lwpid : 7653576
lwpid : 7653577
lwpid : 7653578
lwpid : 7653579

So basically I ended up with the following one-liner

ps -ef > out ;  ps -ef | awk '{ print $1 " " $2 }' | grep -v root |
 while read user pid ; do sudo pstack $pid | egrep -i "($pid|lwpid)"|
 sed -e 's,-*,,g' ; done >> out 2>&1

Which gives a quick and dirty indication of which process eats up all the resources and you can go ask that application owner if that is normal. It wasn’t!

Certfied CSA – HP-UX 11i v3

Friday, June 18th, 2010

Then I got around to get certified in HP-UX. I passed with a score of 80% in 75 minutes at Atea using a standard Prometric test. I had hoped on a little bit more, but I was under a lot of pressure work wise up until the test, so I did not get around to rehearse as much as I wanted to.

I can recommend ‘HP Certified Systems Administrator – 11i V3, 3rd Edition‘ by Asghar Ghori as a help on  getting your CSA.

Next up is HP-UX CSE – High Availability.

Virtual interfaces under HPUX 10.20

Friday, January 22nd, 2010

To define do as you (almost) would do on HPUX 11.00+

sudo vi /etc/rc.config.d/netconf
RARPD=0
INTERFACE_NAME[0]=lan2
IP_ADDRESS[0]=10.17.137.227
LANCONFIG_ARGS[0]=ether
SUBNET_MASK[0]=255.255.255.0
DHCP_ENABLE[0]=0

INTERFACE_NAME[1]=lan2
IP_ADDRESS[1]=10.17.137.226
LANCONFIG_ARGS[1]=ether
SUBNET_MASK[1]=255.255.255.0
DHCP_ENABLE[1]=0
sudo /sbin/init.d/net start

To remove it again

sudo ifalias lan2 del 10.17.137.226