/***************************************************************/ /* Short note on: */ /* 1. how to effectively kill all processes of an application. */ /* 2. how to use ipcrm in order to remove shared memory, */ /* queue's and semaphores. */ /* */ /* Version: 0.1 */ /* Date : 3/12/2010 */ /* By : Albert van der Sel */ /***************************************************************/ If on Unix an application suddenly crashes, in some circumstances "left over" shared memory identifiers, or queue's, or semaphores, may still exist. The effect of this might be, in some exceptional cases, that you can't start that application anymore. Only a reboot will then help, which evidently 'clears' memory in a rigorous way. Actually, the normal behaviour of unix is, that after the last process of that application "is gone", those forementioned structures should really be gone too. But often, one (or a few processes) are easily "overlooked" in the "ps -ef" command, unless you really can easily identify each process of that application. So, while there are still processes running, some of those queue's or segments will not be freed. Still, sometimes the effect of "left over structures" may be witnessed, even if it's really sure that all processes of that application are gone. But ofcourse, you are not always willing, or able, to reboot the system ! That's why we will review here some specific methods and commands that: 1. Makes sure that we have killed all the processes of that application. 2. Look for a file that "sort of" 'signals' that the application is still up, while it isn't. But it prevents you from starting the application again. 3. Remove structures that may remain in memory using ipcrm. 1. How to make sure you have killed ALL processes of an application: ===================================================================== Ofcourse, using below commands poses some risk. If you are not so very familiar with "kill" and "ps": If possible, you should first play a bit on a test system before you try any of the commands on something of value. ==> 1.1 If An application (and all it's processes) runs under one account. Often, an application (and thus all of it's processes) runs under a specific user account. If an application runs under one account, it's quite easy to identify all of it's processes. And thus it's easy to "kill" all those processes. Suppose all processen run under the account of "albert". To kill all of them, in most unixes, you might use: # ps -ef | grep albert |awk '{print $2}'|xargs kill –9 The above command works best. But a variant in linux might be this: # ps -u | grep | awk '{print $1}' | xargs kill -9 ==> 1.2 Kill processes, that runs from "/dir/abc" : Suppose you are sure that the application "starts" from the directory "/dir/abc". To kill all of them, use: # kill `ps -ef | grep /dir/abc | grep -v grep | awk '(print $2)'` ==> 1.3 The "killall" command: Maybe your unix provides for the "killall" command. If so, it can be used in the following way. Logon as the functional user of the application, that is, logon as the account where the application is running under. Then use the killall command, which terminates all processen under your account. Note: the 'killall" in unix is a bit different from most linux distro's, where in linux most often a specific process, or process group, is killed. So, in Linux it might be very usefull also. In most unixes, just use it as (when logged on as that specific user): # killall To check whether your unix system has it, use "man killall" or check the documentation. Perhaps your system have another equivalent command. 2. Pid files: ============= Warning: the theory below, describes a sort of flagging file, often with the extension '.pid' (or other extension), which has the purpose of preventing that the application starts accidently a second time. But it's quite possible that your application uses such file(s) for a whole other purpose (!). Some application developers, or vendors, use a "trick" that prevents us from accidentally "double" starting an application. In this case, when the application starts in a normal way, it will place certain files in a certain directory. These files have no other purpose than to function as a "flag" meaning that the application is "up". Now, if somebody accidentally tries to start the application again, the second instance of the application checks on the existence of those file(s). If they exist, the application refuses to start. Actually, this all sounds not bad at all. But suppose the application crashes, and thus it had no way to remove those "flagging" file(s). Then you have a problem in starting the application afterwards. The solution then is, is to remove the flagging file(s). Quite often, the files have a filename that ends with ".pid", but ofcourse they might use other names as well. Still, if you suspect that this theory might apply in your situation, it can't hurt to search for these special files. So, if the application is for example installed in "/apps/abc", you might try: # cd /apps/abc # find . -name "*.pid" -print If found, the take a look at the date/time of those files. If it's likely that those are indeed flags, then just (for safety reasons) mv them to /tmp (or other place). Then try to start the application. 3. using ipcs and ipcrm: ======================== This might be a somewhat "risky" topic. If possible, you should first play a bit on a test system before you try any of the commands on production. Shared memory, queue's, semaphores, all are facilities for Inter Process Communication (IPC). As said before in the introduction of this note, it's possible that Shared memory identifiers, queue's etc.. will stick around in memory after an application has crashed. For clearing those structures, use the versitile "ipcrm" command. Note: some unixes provide for the 'ipcrmall' script, which will usually clears ALL left over structures. Since it will just clear everything, it can be used for a machine which must be "cleaned" if one application has crashed, and everything must go away. But for multipurpose machines, you should be a bit carefull. Before usage, you must find out how exactly to use it. - With 'ipcs" you can only view semaphores, queue's and shared memory id's and ranges - with 'ipcrm' you can delete semaphores, queue's and shared memory id's and ranges ==> Shortened syntac ipcs: ipcs [ -asmq ] -m shared memory segments -q message queues -s semaphore arrays -a all ==> Shortened syntax ipcrm Command Unix/Linux ipcrm [ -M key | -m id | -Q key | -q id | -S key | -s id ] ... Purpose Removes message queue, semaphore set, or shared memory identifiers. - For most unixes, this syntax works too: ipcrm -r {-q|-m|-s} Name ipcrm -r -u [-o Owner] [-g Group] -m: SharedMemoryID -M: SharedMemoryKey -q: MessageQID -Q: MessageQKey -s: SemaphoreID -S: SemaphoreKey Some examples: -------------- Example 1: An application runs under account 'albert': # for i in `ipcs -s # grep albert # awk '{print $2}'` ; do ipcrm -s $i; done Example 2: if [ $# -ne 1 ] then echo "Usage: use with parameter the SHM owner" exit 1 fi for mem in $(ipcs -m|grep $1|awk -v owner=$1 ' { if ( owner == $5 ) {print $2} }') do ipcrm -m ${mem} if [ $? -eq 0 ] then echo "Shared Memory removed" else echo "Probably Shared memory didn't existed in the first place" fi done Just a few things that might bring you on idea's: -------------------------------------------------- # ipcs | awk '/$USER/ { print $2 }' | xargs ipcrm # ipcs -s | awk 'NR > 3 {print "ipcrm -s", $2}' | sh # ipcs -s | awk ' $3 == "apache" {print $2, $3}' | awk '{ print $1}' | while read i; do ipcrm -s $i; done