I guess there is a first time for everything, but this is not something you want to experience. Today for the first time since I have started using Equallogic SAN's I had one reboot in the middle of the day - the logs indicated a CPU Kernel Panic and the modules did failover as expected. We are running firmware 5.1.1 and I have sent the diagnostics off to support to see what they can determine the root cause to be. These SAN's have been rock-solid for us up to this point and I am hoping that this is just a one-off incident. As I get more information back from support I will update this post. Hopefully I will get more information than just upgrade your firmware to 5.1.2 - which I plan to do this weekend.
Update: Dell got back in touch with me and wants to swap out the faulty controller and also a drive that showed errors in the diagnostics. I should have the parts within 4 hours and will see if this problem occurs again.
Got the new controller and drive delivered on time. Installed the new controller and approximately two hours later the SAN did a failover event to the new controller after complaining that communication was lost between the two controllers again. No problems have occurred since that event but I am still concerned about what may be going on.
Rick Mitchell's Stuff - RMSBlog
Rambling's of a sys admin that has been working with networks large and small since 1997. Current obsession - building the ideal private cloud for enterprise use.
Tuesday, January 3, 2012
Friday, December 23, 2011
GFI FaxMaker document conversion problems
GFI FaxMaker is a cheap and effective method to allow for VOIP faxing with minimal headaches. One issues you may run into is document conversion of Excel and Word documents.
Once you install Office on your GFI server, you need to log in the server using the domain account that the GFI services are running under and open Word and Excel to make sure it works properly. in our case, it was hanging both on the initial prompt to configure your name and initials which caused document conversion to fail. Once we clicked the OK button then document conversion worked without a hitch.
I would highly recommend this solution for folks running over a Cisco IPT system - just do not follow their instructions on configuring it through Call Manager. It is much easier to just configure a dial peer using the virtual BrookTrout card and send those calls from the gateway to the GFI server.
Once you install Office on your GFI server, you need to log in the server using the domain account that the GFI services are running under and open Word and Excel to make sure it works properly. in our case, it was hanging both on the initial prompt to configure your name and initials which caused document conversion to fail. Once we clicked the OK button then document conversion worked without a hitch.
I would highly recommend this solution for folks running over a Cisco IPT system - just do not follow their instructions on configuring it through Call Manager. It is much easier to just configure a dial peer using the virtual BrookTrout card and send those calls from the gateway to the GFI server.
Thursday, December 22, 2011
SQL Server 2008 R2 migrations
Migrating to a new version of SQL Server can be a fairly nerve-racking experience, especially for those that have not done it before. What if all of my data is gone? What about my applications? FREAK OUT!
Luckily, Microsoft has done a good job of making the transition and testing of the data move about as simple as it can be. You can use the "Copy Database" wizard to migrate databases either hot (takes a bit longer) or cold (copy and mount the dbf - faster) from an existing database server to your new environment for testing. Simply right click on a database and go to tasks, then copy database.
I have been looking into migrating from SQL Server 2005 to SQL Server 2008 R2 on a failover cluster and this feature has worked pretty well for me. The only issue I ran into was that a vSphere 4.1 database would not migrate while hot but that is to be expected in some circumstances.
As far as building a 2008 R2 failover SQL Server cluster, it is easy to accomplish. Start with at least two nodes running 2008 R2 enterprise edition of Windows Server - set up some shared storage for use as a quorum disk for the cluster and then a larger disk used to house the databases. Once complete and you have the initial cluster set up, the SQL Server 2008 R2 installation wizard will guide you through the installation for the first node and then even have an option to install additional nodes to the cluster. It really doesn't get much simpler than that.
One issue I ran into was the installation of SP1 for SQL Server 2008 R2 in a cluster. You really can't install it via Windows Update as the installation method requires that you install SP1 on the passive node first, then migrate the SQL server to that passive mode and then install SP1 on the previous active node. Windows Update is not smart enough to handle all of that so you need to download and install the full version of SP1 to make this happen.
Luckily, Microsoft has done a good job of making the transition and testing of the data move about as simple as it can be. You can use the "Copy Database" wizard to migrate databases either hot (takes a bit longer) or cold (copy and mount the dbf - faster) from an existing database server to your new environment for testing. Simply right click on a database and go to tasks, then copy database.
I have been looking into migrating from SQL Server 2005 to SQL Server 2008 R2 on a failover cluster and this feature has worked pretty well for me. The only issue I ran into was that a vSphere 4.1 database would not migrate while hot but that is to be expected in some circumstances.
As far as building a 2008 R2 failover SQL Server cluster, it is easy to accomplish. Start with at least two nodes running 2008 R2 enterprise edition of Windows Server - set up some shared storage for use as a quorum disk for the cluster and then a larger disk used to house the databases. Once complete and you have the initial cluster set up, the SQL Server 2008 R2 installation wizard will guide you through the installation for the first node and then even have an option to install additional nodes to the cluster. It really doesn't get much simpler than that.
One issue I ran into was the installation of SP1 for SQL Server 2008 R2 in a cluster. You really can't install it via Windows Update as the installation method requires that you install SP1 on the passive node first, then migrate the SQL server to that passive mode and then install SP1 on the previous active node. Windows Update is not smart enough to handle all of that so you need to download and install the full version of SP1 to make this happen.
Wednesday, December 21, 2011
Exchange 2007 Client Access Servers and a DMZ
One thing that you must keep in mind with Exchange 2007 is that Client Access Servers (OWA, ActiveSync) do not work in a classic DMZ architecture. Not only will it not work, it is also not supported by Microsoft. Why this is the case I have no idea since it really makes no logical sense.
In the end, you will want to NAT to an internal address on your network to a hardened CAS server. Allowing only HTTPS will help but ultimately like any machine exposed to the Internet you must make sure you are keeping up with patches.
Clustering of your CAS servers can be accomplished with Network Load Balancing which is available on the standard edition of Windows 2008 R2. Using this set up, you can NAT to your NLB cluster address so that a single node will not cause your OWA/ActiveSync environment to fail. I have successfully done this at a few places and it works pretty slick.
In the end, you will want to NAT to an internal address on your network to a hardened CAS server. Allowing only HTTPS will help but ultimately like any machine exposed to the Internet you must make sure you are keeping up with patches.
Clustering of your CAS servers can be accomplished with Network Load Balancing which is available on the standard edition of Windows 2008 R2. Using this set up, you can NAT to your NLB cluster address so that a single node will not cause your OWA/ActiveSync environment to fail. I have successfully done this at a few places and it works pretty slick.
Hosting multiple subnets from your ISP on a Cisco ASA
I ran into this issue a few weeks ago when I received an additional block of IP addresses from my ISP due to our expansion of hosted services at one of our data centers. The block we received from our ISP was not on the same subnet as our previous IP addresses and we needed to be able to host services on this new range. Cisco ASA's do not allow for multiple IP addresses to be bound to a particular outside interface so you must use NAT to host these addresses. Keep in mind that your ISP is handling the routing so you don't need to change your default route on your ASA or add any additional routes.
static (DMZ, outside) NEW_IP_FROM_ISP INTERNAL_IP netmask 255.255.255.255
Then you can create an ACL that allows whatever ports you would like to be open for that new outside IP address.
static (DMZ, outside) NEW_IP_FROM_ISP INTERNAL_IP netmask 255.255.255.255
Then you can create an ACL that allows whatever ports you would like to be open for that new outside IP address.
Sunday, November 27, 2011
Maintenance on your servers - worth it?
I have been debating maintenance on servers and what is the appropriate in terms of frequency/down time for organizations of different sizes. Most statistical analysis suggests that the majority of IT issues are human error related and not a result of a technical failure. Security patching on the Microsoft side of a shop is usually the reason most shops decide to reboot their servers on a weekly/monthly basis. However, is the threat of an internal security threat worth the risk of actually introducing issues in your data center?
I remember early in my career the philosophy was to patch only when necessary for internal threats. This allowed for servers to run relatively stable for longer periods of time and also allowed for the business to use these servers with maximum uptime. Now, as times have changed and security flaws are discovered more and more frequently it is not uncommon to hear of shops rebooting their Windows servers on a weekly basis to apply the myriad of security patches that have been released for various products. Advancements in VDI/Clustering/Clones have decreased the need to do a mass reboot of all of your devices throughout the month since you can patch "hot" and eliminate the need to reboot on a consistent basis. For example, lets say you have a cluster of terminal server's that host your applications - you can patch one while remove from the farm and reboot - if issues are encountered, you restore to pre-patch by using a SAN/VM snapshot. Unfortunately, a lot of shops do not rely on these technologies to make life easier on their IT shops.
The days of mass patching and praying for success should be over. Testing by IT departments that are already constrained and/or relying on business units to actually test security patches are not practical in my opinion. IT shops should be working smarter - not harder - in order to provide the right balance between security and uptime/stability. I am not discounting the need to actively manage security patching, but the devices that are end-user facing (PC's, VDI, Terminal Servers) should take priority over critical systems that are not user facing (Exchange, SQL, AD, etc). Shops that have hundreds or thousands of PC's to manage are really the ones who struggle mightily during these times of mass patching. How can you test or prepare for every possible scenario that is out there? It really is a never ending battle and one that makes your department look in a bad light when "nothing works" after a maintenance weekend.
VDI eliminates many of these issues and is something I am a big proponent of moving forward. Remember these soft costs as you calculate your ROI for pushing a project like this forward and you will see why the simplification of IT - making IT a service for your organization can only help provide the reliability and uptime your company deserves.
I remember early in my career the philosophy was to patch only when necessary for internal threats. This allowed for servers to run relatively stable for longer periods of time and also allowed for the business to use these servers with maximum uptime. Now, as times have changed and security flaws are discovered more and more frequently it is not uncommon to hear of shops rebooting their Windows servers on a weekly basis to apply the myriad of security patches that have been released for various products. Advancements in VDI/Clustering/Clones have decreased the need to do a mass reboot of all of your devices throughout the month since you can patch "hot" and eliminate the need to reboot on a consistent basis. For example, lets say you have a cluster of terminal server's that host your applications - you can patch one while remove from the farm and reboot - if issues are encountered, you restore to pre-patch by using a SAN/VM snapshot. Unfortunately, a lot of shops do not rely on these technologies to make life easier on their IT shops.
The days of mass patching and praying for success should be over. Testing by IT departments that are already constrained and/or relying on business units to actually test security patches are not practical in my opinion. IT shops should be working smarter - not harder - in order to provide the right balance between security and uptime/stability. I am not discounting the need to actively manage security patching, but the devices that are end-user facing (PC's, VDI, Terminal Servers) should take priority over critical systems that are not user facing (Exchange, SQL, AD, etc). Shops that have hundreds or thousands of PC's to manage are really the ones who struggle mightily during these times of mass patching. How can you test or prepare for every possible scenario that is out there? It really is a never ending battle and one that makes your department look in a bad light when "nothing works" after a maintenance weekend.
VDI eliminates many of these issues and is something I am a big proponent of moving forward. Remember these soft costs as you calculate your ROI for pushing a project like this forward and you will see why the simplification of IT - making IT a service for your organization can only help provide the reliability and uptime your company deserves.
Gambling - mathematical odd's of going broke
Tonight my wife and a few friends went to our local casino to show my cousin who is in for the holidays the booming WV economy that is gambling. I am not talking about poker - the game that truly is one of the very few hobbies I have outside of work. I am talking about slot machines, blackjack, roulette, craps, etc - mindless games that money is lost on frequently when playing against the house and stacked odds. My wife played slots for around 1 hour and lost $100. A trivial amount but it was well within our means to be able to gamble and enjoy herself. I can't bring myself to actually playing these games but it was pretty interesting to watch people throw their money away and get excited when they "won." People think they win when they hit a blackjack or make that 35-1 roulette number come up on the board, but then you see how much each person has lost to get to that win. It truly is addicting - the rush of gambling is hard to ignore. However, if anyone would take the time to do a quick google search on the odds of these games you would see it is a losing proposition in the long run. Make no mistake - I don't belittle or think less of anyone who plays these games as each person has the right to do what they want with their money. Perfect example of this environment was when a guy walked up out of the blue and placed a $500 bet on "black" on the roulette wheel. He had a chance to double his money but of course the ball wound up on 00 making him a loser. The man slowly drank his beer and looked away. $500 on a bet that is less than 2-1 odds (the house wins if the number 0 comes up for even/odd bets). How can anyone think this is a good situation? I understand that poker is all based on odds and calculating and while it is true that every hand is a winner, and every hand is a loser (thank you Mr. Kenny Rogers) you at least have a good probability in your mind as you place each bet based on the situation of the particular hand. In the case of these table games, you are truly just praying that luck smiles on you. Unfortunately for many their luck has ran out before they ever stepped foot in the casino.
Saturday, November 26, 2011
The private cloud - a business enabler
The concept of a "cloud" architecture in terms of data center design has been around for quite some time, but now the reality of building this relatively cheaply while avoiding the IT outsourcing plague has become fairly straight forward thanks to the advances of Windows Server 2008 R2, VMWare (Hyper-V to some extent) and Thin Client machines. I have taken on a task of using Windows Server 2008 R2 using Remote Desktop Services to allow for application/desktop access from anywhere with an Internet connection. Using a RD Gateway server in our DMZ, we can allow tablets/smart phones/Mac's access via RDP to all of our applications in our terminal server, errr Remote Desktop Services farm using a RD Connection broker (clustered of course) to load balance cloned terminal server application servers. I have also created a website using RDWeb to access our internal applications as well from PC's from outside of our network in a user-friendly environment. After doing all of this, the question becomes - do you need a desktop or a laptop? For 90% of our users, the answer is a resounding "NO."
I have started a roll out of Wyse R10L thin clients with built in dual monitor support via DVI. These machines run around $400 and allow IT to have users connect straight into the RDP farm via round robin DNS and the load balancing connection cluster. Desktops are shared across all terminal server's which allow for remote access to be the same as when they are in the office - the users will have the same look/feel as if they were in their cubes. Users believe these machines are "faster" because this is all running on top of our VMWare cluster on ESXi 4.1 U1 with large amounts of RAM and with Office 2010 64 bit SP1. I truly believe this is the mecca that all IT departments should reach for in that you eliminate the user issues and application patch problems that are 90% of your tasks. As your company grows, you simply add blocks - more storage needed? Add another Equallogic array to your cluster. More terminal servers needed? Clone and add additional servers to the farm. More physical resources needed? Purchase another ESX host and add it to your cluster. IT as a service provided by an internal IT department without the need to try to figure out your resource needs from an ever-changing business environment. Acquiring a company and need to close within 30 days? Simply re-use their existing Internet connection and point to the terminal server farm where all applications are installed and controlled securely via AD access and SSL 2048 bit encryption.
Frankly speaking, this is one of the highlights of my career where all of my experience can be utilized to truly enable the business to do more faster, cheaper, quicker and better than the competition. IT is not about just fixing a printer or building a new PC - it is being an enabler for the business. By using the technologies above, those dreams have been realized.
I have started a roll out of Wyse R10L thin clients with built in dual monitor support via DVI. These machines run around $400 and allow IT to have users connect straight into the RDP farm via round robin DNS and the load balancing connection cluster. Desktops are shared across all terminal server's which allow for remote access to be the same as when they are in the office - the users will have the same look/feel as if they were in their cubes. Users believe these machines are "faster" because this is all running on top of our VMWare cluster on ESXi 4.1 U1 with large amounts of RAM and with Office 2010 64 bit SP1. I truly believe this is the mecca that all IT departments should reach for in that you eliminate the user issues and application patch problems that are 90% of your tasks. As your company grows, you simply add blocks - more storage needed? Add another Equallogic array to your cluster. More terminal servers needed? Clone and add additional servers to the farm. More physical resources needed? Purchase another ESX host and add it to your cluster. IT as a service provided by an internal IT department without the need to try to figure out your resource needs from an ever-changing business environment. Acquiring a company and need to close within 30 days? Simply re-use their existing Internet connection and point to the terminal server farm where all applications are installed and controlled securely via AD access and SSL 2048 bit encryption.
Frankly speaking, this is one of the highlights of my career where all of my experience can be utilized to truly enable the business to do more faster, cheaper, quicker and better than the competition. IT is not about just fixing a printer or building a new PC - it is being an enabler for the business. By using the technologies above, those dreams have been realized.
Thursday, November 24, 2011
VMWare ESX 4.1 migration to ESXi
The age of the command line and a fat install of ESX is officially over. This week I bit the bullet and migrated our cluster from ESX 4.1 U1 to ESXi 4.1 U1 due to the fact that U2 has been released and of course is not out for ESX. Pretty sad actually since I remember the start of the VMWare revolution was all about the CLI.
The process was pretty simple and straightforward - here are the steps I took to complete:
1. Take a screenshot of your network configuration tab inside of vSphere client. The NIC binding will not change once you install ESXi.
2. i decided to not use the internal storage any more and just install on the built in flash drive of my Dell R710/910 servers.
3. When you install on a Dell box, make sure you enable booting from the SD card and disable booting from the internal RAID controller in the BIOS.
4. Damn those servers boot slow - so be prepared to set around and watch.
5. ESXi does not use a service console so make sure on your cluster config you allow the Management Network to communicate for HA or otherwise your new node will not be allowed in the cluster. das.AllowNetwork(x)
6. When you install the Dell MEM module 1.0.1 and the Dell OpenManage pack - make sure you use 6.3 or else you get an internal server error when you try to connect to the OpenManage port (1311). Install the MEM module BEFORE you set up your vswitch's - otherwise your config may get wiped like mine did.
7. The vSphere CLI is your friend. I used the commands esxcli, vicfg-snmp.pl and vicfg-vmknics.pl to set up the multipathing - I am really looking forward to going to vSphere 5 just to get around these insane CLI tasks just to set up jumbo frames and multipathing.
8. VMotion is your friend. I actually did the migration during the day which was pretty cool as the users had no idea what was going on. It helps to have a cluster with over 600gb of physical RAM and 67ghz of CPU power behind it.
The entire process took around 1.5 hours per server. I decided to NOT use a port-channel on our Cisco 4948 for our iSCSI traffic but am continuing to use a port-channel for the LAN side traffic. Since the management network is actually a VMKernel port you need to be careful about your default gateway selection for your VMKernels.
Now that we are fully on ESXi, the process to migrate to vSphere 5 will be much more straight forward.
The process was pretty simple and straightforward - here are the steps I took to complete:
1. Take a screenshot of your network configuration tab inside of vSphere client. The NIC binding will not change once you install ESXi.
2. i decided to not use the internal storage any more and just install on the built in flash drive of my Dell R710/910 servers.
3. When you install on a Dell box, make sure you enable booting from the SD card and disable booting from the internal RAID controller in the BIOS.
4. Damn those servers boot slow - so be prepared to set around and watch.
5. ESXi does not use a service console so make sure on your cluster config you allow the Management Network to communicate for HA or otherwise your new node will not be allowed in the cluster. das.AllowNetwork(x)
6. When you install the Dell MEM module 1.0.1 and the Dell OpenManage pack - make sure you use 6.3 or else you get an internal server error when you try to connect to the OpenManage port (1311). Install the MEM module BEFORE you set up your vswitch's - otherwise your config may get wiped like mine did.
7. The vSphere CLI is your friend. I used the commands esxcli, vicfg-snmp.pl and vicfg-vmknics.pl to set up the multipathing - I am really looking forward to going to vSphere 5 just to get around these insane CLI tasks just to set up jumbo frames and multipathing.
8. VMotion is your friend. I actually did the migration during the day which was pretty cool as the users had no idea what was going on. It helps to have a cluster with over 600gb of physical RAM and 67ghz of CPU power behind it.
The entire process took around 1.5 hours per server. I decided to NOT use a port-channel on our Cisco 4948 for our iSCSI traffic but am continuing to use a port-channel for the LAN side traffic. Since the management network is actually a VMKernel port you need to be careful about your default gateway selection for your VMKernels.
Now that we are fully on ESXi, the process to migrate to vSphere 5 will be much more straight forward.
Subscribe to:
Posts (Atom)