Yesterday was one of those moments in IT when you are called into action with little to no idea as to what is going on. There is no magic book or class you can take to prepare you for a debacle like what we experienced yesterday. The morning was going fine until I received a report from a user that their machine was going to shut down within 60 seconds because of a DCOM server error on their PC. I didn't think much of it until I got a second report from a different user. I checked BigFix to make sure that nothing was being pushed to the organization that might be causing these issues and found that nothing was getting rolled out. I then went to physically investigate a single PC and immediately saw a McAfee memory error. I shut down our ePolicy Orchestrator server and started trying to deal with the fallout.
The machines that were affected had no services started except for maybe 5, no internet or network access, and no access to the start menu or other GUI functions.
After some quick investigating, I found that the file c:\windows\system32\svchost.exe was 0 bytes and was modified that morning. I concluded that McAfee must have been wiping this file so I did the following:
dir /s svchost.exe
to locate another copy of the svchost.exe file on the machine. I found two located in:
c:\windows\system32\dllcache
c:\windows\servicepackfiles\i386
I copied the file overtop of the 0 byte file, started the Windows Installer service and then uninstalled McAfee. Once rebooted, the machine worked okay. I made some notes and sent them out to our IT team to get them going. This was in place by 10:30am EST after our initial report at 9:57am EST.
Compared to others, we were very fortunate. The speed at which we disabled the updates prevented complete meltdown and affected approximately 240 machines in total. Using faxes, emails, phone calls, some contractors we were able to get 30+ field sites back up and running including two corporate headquarters. No servers were affected and for the folks that we prevented this from happening to were able to continue to work.
Looking back on the events of yesterday, I am pleased at how well everyone worked together and kept their cool. You don't often see that in a time of crisis but the IT department here did a fantastic job.
Obviously we will not be using McAfee anymore after yesterday and I will never recommend anyone using them in the future. How this passed any type of quality assurance testing is beyond me no matter what BS they may come up with in a PR release to try to offload some of the responsibility for this disaster. I have some meetings today to look at other AntiVirus products and am looking forward to not having to go through this again.

0 Comments:
Post a Comment
Subscribe to Post Comments [Atom]
Links to this post:
Create a Link
<< Home