After one week, the first real problem... And even a call on the SmoD phone!
Eric B calls at 19:00 to discuss high loads on many lxbatch nodes, as well as on voalice03. He discovers that the high loads are caused by jobs from the alicesgm user...
At 20:00, I bkill ~200 of these jobs, and notify alice-support. As these are grid jobs, it is not easy to find out which end-user has submitted the jobs.
It seems that bkill-ing the jobs does not cure the inaccessibility of the nodes. Eric will ask Herve to reboot the machines that do not recover before 21:00.
Patricia replies htat she wants to have a look, Herve will reboot voalice03 and lxb7281 (both nodes are Alice VO boxes).
Finally I know what ring-tone has been chosen for the SMoD-phone :)
vendredi 29 décembre 2006
Inscription à :
Publier les commentaires (Atom)
Aucun commentaire:
Enregistrer un commentaire