Use OMS (Log Analytic) to monitor and send alert for BlueScreen of Death

At times there is a driver or two that’s misbehaving and causing bluescreens. As the server automatically reboots after dumping memory to the memory.dmp file you might not get a report from your users that there has been a problem. And depending on your monitoring tool you might not get an alter there either. Operations Manager can easily alert you for things like that, but far from all customers use OpsMgr due to it’s complexity. Luckily, it’s just a 1 minute job to get alert in OMS if you have got a bluescreen! And as OMS can be run in Free mode, you may be able to monitor your servers for free (all depending on the amount of data you collect) and else, it’s really cheap so no big deal if you need to use a standard subscription. Anyway, lets get to the technical stuff!

First of all, enable OMS to collect Eventlog System and all Error messages.

omserrordata

Then create an Alert like this,

oms_bsod

The Alert text to be used is:

That will only alert for Crashes. You can also enable an alert for Event ID 6008 which will alert you for an unexpected shutdown. The difference is that my alert will only send an alert if there was a BSOD while an unexpected alert could also alert if someone pulled the power. Or even combine both into one alert with an OR statement. In my case, I just want to get alerted about the BSOD’s so thats the only thing I look for right now.

Tell how often is should check. There is usually no need to check more than once or twice an hour. And finally define if it should send an email alert or use one of the other alert methods.

Easy as that! Next time you get a bluescreen on a server, you will get an alert by mail so you can debug the dump and find out what’s causing it.

It will look like this,

bsodmail

 

Bugcheck: DRIVER_POWER_STATE_FAILURE (9f)

I experienced a Bluescreen of Death (BSOD) on my Windows 8 Laptop (HP EliteBook 8560w) this morning when it resumed from Hibernate.
I quickly launched WinDBG and opened the crashdump.

WinDBG managed to find the driver that caused this problem by itself this time. But IF WinDBG had not been able to show me the faulty driver, the next step would have been to use the Bugcheck info (0x0000009f) to dig further into this;

The last argument is the interesting one, and which we should look into further with the !irp command.

It will show something similar to this. And it’s the e1c63x64.sys driver that were active at the time of the bluescreen. Same info as !analyze -v managed to figure out by itself.

Hmm, so what driver is that?

intel_driver1Too bad that it were unable to provide more detailed information. But some oldschool properties of the \SystemRoot\system32\DRIVERS\e1c63x64.sys file gave this;

And a quick search on Intel’s Support sites showed that there was a newer version available for my NIC;
Intel(R) 82579LM Gigabit Network Connection here.

Driver updated, and hopefully no more bluescreens due to this driver bug.