vendredi 29 décembre 2006

Action! alicesgm jobs causing high loads

After one week, the first real problem... And even a call on the SmoD phone!

Eric B calls at 19:00 to discuss high loads on many lxbatch nodes, as well as on voalice03. He discovers that the high loads are caused by jobs from the alicesgm user...

At 20:00, I bkill ~200 of these jobs, and notify alice-support. As these are grid jobs, it is not easy to find out which end-user has submitted the jobs.

It seems that bkill-ing the jobs does not cure the inaccessibility of the nodes. Eric will ask Herve to reboot the machines that do not recover before 21:00.

Patricia replies htat she wants to have a look, Herve will reboot voalice03 and lxb7281 (both nodes are Alice VO boxes).

Finally I know what ring-tone has been chosen for the SMoD-phone :)

Aucun commentaire: