Tuesday, September 16, 2014

Locked accounts and cronjobs on Solaris

In Solaris 10 and above, if an account gets locked then the cronjobs no longer run.  So Sun added a special kind of locking (passwd -N <account>) where the jobs still run but the account can't be logged into.  This is all well and good, if not for the fact that if someone does actually try to log in 3 times and fails (because it is impossible to enter a password that encrypts to the NP string) then the password goes from being NP to *LK*NP - which then causes cron to treat the account as being properly locked and the jobs stop running (this behaviour varies with the Solaris 10 release, the early ones had NP accounts that locked, the later one I checked didn't.  In Solaris 11 NP accounts will lock once again.)

Now there is a simple solution in that case (usermod -K lock_after_retries=no <account>) however that would only work for NP accounts, if the account is actually supposed to have a usable password that you want to lock on failure then you're shooting yourself in the foot by turning off locking just to run cron.

I worked out a better solution, modify /etc/pam.conf and for those accounts that are so important the jobs should continue to run even if the account is locked you just add them to a list and insert a line before the existing cron entry.

cron   account sufficient      pam_sample.so.1 allow=unixlad,root
cron   account required        pam_unix_account.so.1

Basically the pam_sample.so.1 module is used as a sample so you can learn/debug PAM, or even write your own modules, but because of the options it takes you can also use it to manipulate PAM (man pam_sample) by forcing it to return success/failure etc according to terms you set.

Thursday, March 20, 2014

How can I write shell code to run safely under cron ?

One of the problems running shell (ksh/bash) scripts under cron is suddenly finding that your brilliant script no longer behaves the same under cron as it did from the shell.

There are a number of causes of this and the most common are usually caused by incorrect assumptions about the environment the script will run in.

When writing a script to run under cron the PATH may be different from your PATH in your shell.  Always set an explicit PATH at the beginning of the shell script.

e.g.
      PATH=/usr/bin:/usr/local/bin

You can use export at the start of the line if you want, but cron will export a default PATH to begin with, meaning that you just need to change the value of it. On Solaris you may need to set the various paths for loading of dynamic libraries, if this is applicable.

Further (and sometimes fatal) problems will arise if you have environment variables in your shell environment that do not exist at all when running under cron.

Consider this scenario in your script:

    cd $TMP; rm -rf *

On older systems that use / as the home directory for root this can cause the entire filesystem tree to be deleted if TMP is not defined - something I have seen happen because of this exact problem in someone's startup script.  It is always a good idea for root to have its own home directory and not use / itself.

Always set the following option in your shell scripts:

    set -u

This will cause the expansion of an undefined variable to halt execution of the program.

By the way, both of these techniques should be used in all your scripts, regardless of whether they are to be run from the command line or cron.  Remember, the scripts you write may also be run by someone else and in these cases the environment may be sufficiently different to cause problems.

How to make tables from multiple files

      pr -t -m /tmp/a /tmp/b

This will read a line from file 'a' and place it in the first column, followed by a line from file 'b' placed in the second column.

More files results in more columns. Be careful with long lines as individual columns may be truncated.



Solaris cat is Super-Fast!!!

A colleague asked someone at work to do a test of disk performance by cat-ing a 32GB file to /dev/null to determine why we had slow backups.  It only took a fraction of a second - and he wondered why that could be the case.  

So I looked into it.  Firstly I used:

dd if=/path/to/largefile > /dev/null 

to see if it exhibited the same behaviour as cat.  It didn't.  Then I truss-ed both cat and dd to find out what the difference was.  I could see the data as an argument to the write system call in both processes,  but it turns out that cat uses mmap to map chunks of the file into the process address space rather than using the read system call.

So why does this make it really quick to read the file ?  Well, it doesn't.  I tried it this way:

cat /path/to/largefile | cat > /dev/null

Now it's much slower, the speed is far more in line with what you would expect for reading a large file from a disk array.

So what is going on ?  Well when you mmap a chunk of data into a process it's not really reading the data, it's just making it available to page in on demand.  When the only thing you do with that data pointer is pass it to the write system call, and write is pointing its output to /dev/null, the kernel is just throwing away the pointer.  Under normal circumstances if the data was written to stdout or another file it would be the write call that causes the data to be paged in - but since write is doing nothing, just returning, the data is never paged in.

But you say, if the data is not being read how could we have seen the data in the truss output ?  Well, truss is causing a small amount of each mmap-ed segment to be read from disk - a tiny amount compared to the size of the mmap-ed part - just enough for truss to read a few bytes from the start of each pointed to block, so truss is causing the kernel to page in this small amount of data because it is truss that wants to display it - if we weren't truss-ing the cat command the file wouldn't be read at all.  The fact that truss prints an exhaustive list of system calls used by cat to do this masks the slight slowdown that this small number of reads adds to the overall run-time of the cat command during truss-ing.  So take away truss and yes, the process is very nearly instant.