Thursday, May 17, 2007

Why?

This is a work rant. Proceed with caution.

An issue came up Tuesday evening regarding a piece of middleware that our customer uses and is not supported by us, but by another company who just happens to be a major competitor of ours. The middleware was not processing outbound messages but inbound messages seemed to be working very well. The issue was reported by one customer who stated that it was the same issue that had happened the previous week. The customer then proceeds to give the help desk the ticket number. Armed with this knowledge, the help desk calls the Incident Mgmt team to request this as a high severity case. It was then assigned to the team who previously "fixed" the issue before and they are notified. The team quickly sees that it is not the same issue. As a matter of fact, the issue will have to be handled by another group. The team lets the Inc Mgmt team know about this and the IM team contacts the "responsible" group to let them know that they have a high sev ticket that needs attention. The following morning, the "responsible team" rejects the ticket, stating that they feel the hardware (which is my company's responsibility) should be rebooted. This is an old trick that they have used before when they do not feel like troubleshooting. Hoops are jumped through, red tape is untangled and we get the server rebooted. Now I was very skeptical that the server needed rebooting because it is a rather new Sun (Solaris) server...rarely does a Sun box need rebooting. Kill and restart processes, sure...that needs to be done on any OS. But Unix is a pretty darn stable OS. We have workstation in my office that have been on for tens of months without rebooting.

Okay, the server has been rebooted. The customer was contacted and guess what, the problem is not fixes. Surprise. Inc Mgmt team is scrambling to get people to look into the issue and still NO ONE ELSE has reported a problem. The middleware has several interfaces and normally a lot of wailing and gnashing of teeth occurs when something like this goes down. Our competitor is contacted again and they tell us they will look into it. Our hopes are not very high at this point...the clock has been ticking and they have about 2 hours to resolve it or it will miss target resolution time.

Inc Mmgt calls the tech to check on it and they are informed that another team is looking into the issue parallel with the competitor. It seems that this middleware is deeply tied to an SAP system module that is acting up. The team only took this seriously after another messaging service that uses the same middleware was reported as not working. Since we had the SAP team looking into this, we were sure that a solution would be had soon.

At 5:00 PM I decided to go home, but planned on checking on the status once I got home and logged on my laptop. When I got home and took care of the dogs, I logged in and noticed that all the messaging was now working. The only real complaint the speed. Things are moving very slowly. That is understandable since a lot of messages most likely are queued up. Since in bound and out bound messaging is working, I feel safe in going to Care Group with my wife.

Care Group goes well, but my wife and I are dog tired. We leave a bit early, go home, walk the dogs, and go to bed. We both wake up early, but only she gets up. I remain and drift in an out of a dreamless sleep until my alarm goes off. I get up and go through my normal morning routine and go to work.

I arrive at work and check on the issue and notice no new notes. I check my e-mail and see other things are happening related to the issue, but they are not what the original complaint was. The case gets very messy and I am confused why the original case is still open when the original complaint was resolved.

It appears that ORIGINAL complaint was our customer was not receiving any inbound messages. After that was fixed, then they had trouble with outbound messaging (different issue, but on the same ticket...clock still ticking). That was fixed and then the customer complained about slowness. That is normal and should clear up in 12 hours or so. Now I see that a NEW complaint has been added TO THE SAME TICKET. The complaint is that a partner of our customer cannot see the portal to the middleware. To my knowledge, the messages (inbound and outbound) are processing, but the interface they use to check on them is down...a completely separate issue.

This case gets more complicated as we dig into it and it is starting to get me a little angry. Why are the processes not being followed? I am on my way to the Inc Mgmt teams office to check on the status of the issue. I am most likely going to have to get medieval on someone...

*** Update ***

After my meeting with the Inc Mgmt team we gleaned that there were 4 separate issues, 3 of which were dumped on one ticket. That ticket may have missed its deadline and if it did, the group that was responsible for support of the application heavily influenced that. I plan on getting to the bottom of this early next week. This crap has to stop.

A second ticket was opened for the front end interface that our customer uses to check on submitted jobs. It is intermittently failing and the tech thinks it is a corrupt database. The whole interface is too complex for me to wrap my head around and I hope the teams can get this sorted out. It is not a show stopper, but without this interface, the customer and their partners cannot check the status on transmittals.

No comments: