summaryrefslogtreecommitdiff
path: root/vimwiki/Nodes.md
blob: 4555781ed69daba50942dd93a800b19ee76f4348 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
__Ganglia__ (https://uhhpc.herts.ac.uk/ganglia/) can be useful to see the state of nodes. 

If a node goes down while a user’s job is running on it, the job will not terminate properly 
and may flood the user’s inbox with notifications. If `Ganglia` or `showstate` report a node 
is down, consider rebooting it with 

`sudo rebootnode.pl nodexxx` 

This will prompt you for the IDRAC password, which is `rianhs4b`. Once a node has been rebooted, 
wait a few minutes, then check that you can ssh into it as a normal user and view your home 
directory and /beegfs. If so, bring it back on line with 

`sudo pbsnodes –c nodexxx` 

If a node is misbehaving and you don’t want to/can’t reboot it, you can temporarily remove it 
from the pool used the job control system with 

`pbsnodes –o nodexxx` 

– also reversed by 

`pbsnodes –c`