From 3b0142cedcde39e4c2097ecd916a870a3ced5ec6 Mon Sep 17 00:00:00 2001 From: Vito Graffagnino Date: Tue, 8 Sep 2020 18:10:49 +0100 Subject: Added the relevent parts of the .config directory. Alss add ssh config --- vimwiki/Nodes.md | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) create mode 100644 vimwiki/Nodes.md (limited to 'vimwiki/Nodes.md') diff --git a/vimwiki/Nodes.md b/vimwiki/Nodes.md new file mode 100644 index 0000000..4555781 --- /dev/null +++ b/vimwiki/Nodes.md @@ -0,0 +1,24 @@ +__Ganglia__ (https://uhhpc.herts.ac.uk/ganglia/) can be useful to see the state of nodes. + +If a node goes down while a user’s job is running on it, the job will not terminate properly +and may flood the user’s inbox with notifications. If `Ganglia` or `showstate` report a node +is down, consider rebooting it with + +`sudo rebootnode.pl nodexxx` + +This will prompt you for the IDRAC password, which is `rianhs4b`. Once a node has been rebooted, +wait a few minutes, then check that you can ssh into it as a normal user and view your home +directory and /beegfs. If so, bring it back on line with + +`sudo pbsnodes –c nodexxx` + +If a node is misbehaving and you don’t want to/can’t reboot it, you can temporarily remove it +from the pool used the job control system with + +`pbsnodes –o nodexxx` + +– also reversed by + +`pbsnodes –c` + + -- cgit v1.2.3