jeudi 4 janvier 2007

Action! Host-Certificate-wrong alarms on Castor-2 diskservers

Carlos C calls about Host-Certificate-wrong alarms that the operators received this afternoon. This is the first time ever that these alarms fire...

This alarm is generated on Castor diskservers when the host certificate will expire in less than 2 weeks. This is a 'last resort', as the service manager has already been contacted by other means well before. In this case, the service manager has not acted (it's Christmas time!), and the expiry data of January 18 is coming close :)

Thanks to our new host-certificate-manager script (integrated in PrepareInstall) it is simple to fix this:


PrepareInstall --noks --noaims --hostcert lxfsra{060{3,4,5},080{1,2,3,6,7,8}}
wash root@lxfsra060\[3,4,5],lxfsra080\[1-3,6-8] ccm-fetch \; ncm-ncd --co sindes


Even though the alarm was merely a warning, there has been a nasty side-effect; the nodes are disabled by the CastorFilesystemConfiguration.pl cronjob running every ten minutes on the rmmaster nodes. High time to get rid of this cronjob...

mardi 2 janvier 2007

Action! Mail flood coming from lxfsrk421

Frederic Hemmer forwards:

From: SmtpMonitorSink
Sent: Tuesday, January 02, 2007 8:33 AM
To: exchange-service (Exchange service list)
Subject: CERNMX06: Flood blocked !

CERNMX06:
Flood from: 128.142.169.11 in scope InternalIpOutgoing-ByIp blocked !

This is an automatic information email, do not reply.

It seems that the machine is trying to send mails to byniek.zb@wp.pl, and that this has started to fail at 7:15 this morning. In /var/log/maillog there are plenty of records like this one:


Jan 2 09:31:57 lxfsrk421 sendmail[3830]: l028Vqt6003828: to=, delay=00:00:05, xdelay=00:00:05, mailer=relay, pri=30478, relay=cernmxlb.cern.ch. [137.138.166.163], dsn=5.7.1, stat=User unknown
Jan 2 09:31:57 lxfsrk421 sendmail[3830]: l028Vqt6003828: l028Vvt6003830: DSN: User unknown
Jan 2 09:31:57 lxfsrk421 sendmail[3830]: l028Vvt6003830: to=, delay=00:00:00, xdelay=00:00:00, mailer=relay, pri=31502, relay=cernmxlb.cern.ch., dsn=4.0.0, stat=Deferred: Connection reset by cernmxlb.cern.ch.


I stop sendmail on lxfsrk421 at 9:30, and notify PDB.Service@cern.ch.

vendredi 29 décembre 2006

Action! alicesgm jobs causing high loads

After one week, the first real problem... And even a call on the SmoD phone!

Eric B calls at 19:00 to discuss high loads on many lxbatch nodes, as well as on voalice03. He discovers that the high loads are caused by jobs from the alicesgm user...

At 20:00, I bkill ~200 of these jobs, and notify alice-support. As these are grid jobs, it is not easy to find out which end-user has submitted the jobs.

It seems that bkill-ing the jobs does not cure the inaccessibility of the nodes. Eric will ask Herve to reboot the machines that do not recover before 21:00.

Patricia replies htat she wants to have a look, Herve will reboot voalice03 and lxb7281 (both nodes are Alice VO boxes).

Finally I know what ring-tone has been chosen for the SMoD-phone :)

dimanche 24 décembre 2006

mijn medemensen kregen een pakketje

It's Christmas day, and all is foggy and peaceful.

The crabs were excellent yesterday, as were the Tokay pinot gris and the dill schnaps. And our Christmas lunch was very good as well; a mini-julbord, with gravad lax, janssons, rödkål, äggröra, and köttbullar for Kajsa. We left the sill untouched for now...
Tonight there will be chapon, aux marrons et potiron, and tomorrow we are invited to Laila and Michael to annandags julbord.

So maybe this blog is about food and drink after all. Because all seems quiet in the Computer Center as well. So far, the operators have sent a few mails about filesystems errors and the like on Castor diskservers (which Eric Bonfillou should be handling), but that's it. No Castor LSF meltdowns, no Maarten Litmaath P.I., no user questions, no nothing.

Back to the glögg and the lusekatten.

Small update: the chapon was excellent, although next time we will use sugar to caramelize the marrons. We drank a Vornay 'les Caillerets' premier cru, 1995, Jean Boillot & Fils, one of the bottles from Wim, Olivier, Africa and Francois.

vendredi 22 décembre 2006

ijdelheid der ijdelheden

Voila, c'est parti.

The Christmas break has arrived, two freshly cooked crabs are cooling, the gravad lax is halfway done, as is a bottle of Cerdon. All perfectly good things to scribble down in a blog.

So, this Blog is about my diet then? Well, no. But I am not sure what it is about. Let's see... But it is very likely that it will be boring, because I intend to keep track of work-related stuff here. You see, in the coming two weeks I will be on-call for many of the services in the Cern Computer Center. In principle these services run themselves, but in practice they need lots of TLC.

I will therefore keep track of the support calls I will receive, and of how I handle them. This should give an idea of the maturity of our services, and the quality of our procedure and workflows. I think we should organize a post-mortem with the other support people, to compare notes.

But let's not forget that the Christmas break has started. Or, in the words of Nikos: C'est parti!