As solid state drives continue to proliferate due to their falling costs and increased reliability, it follows that IT administrators all the way up to CTOs and CIOs will begin to ask themselves when the jump to SSDs makes sense. One may liken the decision to a Windows 7 upgrade or possibly converting from a laptop to a tablet–both tough decisions–but converting to SSDs can be a low-cost/high-impact effort.
Our company employs almost 350 laptops and desktops in the main and satellite offices, as well as those used by road-warrior employees. As many of these PCs were almost five years old and performing more poorly every day, we had to decide whether an OS refresh, including a possible upgrade to Windows 7, was going to be worth the effort. An upgrade to Windows 7 would mean a new training program, additional software licenses, potential data loss, and an assured loss of productivity for at least several months as users adapted to the new operating system. SSDs quickly came under review as a potential panacea.
The price for a typical 120-GB SSD had steadily dropped from $200+ to under $100, there were few if any reports of failures, and the validity of these drives wearing out over time was beginning to seem untrue. What was apparent was that the performance of the SSDs was remarkable. Reports of data throughput upwards of 500Mb/sec shined in comparison to a typical laptop 5400 RPM HDD or Desktop 7200 RPM HDD averaging 250-300Mb/sec.
Surprising power reduction
In addition, we expected the IOps to yield upwards of 20 times the performance of comparable HDDs. A side benefit for the laptop users was the predicted power reduction of almost 50 percent due to the lack of moving parts in the SSD. We ran the SSD through its paces, testing each application against its HDD cousin with both desk and laptop configurations. In all cases, the SSD’s performance was staggering. Boot times under one minute compared to 3-5 minutes with HDDs. Application performance times also improved by 20 percent to almost 80 percent in some cases.
In August 2011, we began by testing several different brands in the sub-$120 category; all were 115-120GB. As all of our drives were full-drive encrypted the conversion process required decryption before cloning then re-encryption of the drive post cloning. A bit-by-bit level copy process was available and would allow us to maintain the encryption through the clone, but this process actually took longer to complete (eight hours vs. four).
Training the field staff to perform the upgrades/conversions was not difficult; we utilized a typical over-the-counter SATA-to-USB kit to connect the SSD externally. This proved to be a plug-and-play, but slow technique, later improved by utilizing a dual-drive dock with a built-in bit-by-bit cloning tool.
Blue screens of death
Initially we experienced some problems with Windows blue screens of death and drive lock-ups with both desktop and laptop computers. Users reported that a reboot would allow the computer to resume its normal behavior, but in one case, the drive was rendered useless. Unfortunately, all of that user’s data was lost. Apparently, many of the SSD manufactures utilize a chipset that erases the drive contents in the event of a chipset failure.
We ultimately found some SSD brands were more likely to have these problems and after four months of testing, we settled on one brand in particular. After having performed roughly 20 conversions with this brand, we scheduled a slow upgrade effort beginning January 2012. Our plan was to upgrade the remaining 330 computers by June 2012. We believed the slow conversion process would allow us time to identify any remaining issues with the new drive.
In mid-March, disaster struck. Within two weeks of each other, three users reported non-responsive drives. Replacement computers were sent and their computers were returned for diagnosis. After significant review both at headquarters and the drive manufacture’s lab, it was found that these SSDs had catastrophically failed in similar fashion to the units tested in August. By now, we had 143 SSDs in production and a total recall (reverting to HDD) was not an option. The field staff was enjoying the extra speed and computer responsiveness. We simply had to determine the root cause of the failure.
By April of 2012, two weeks from the most recent failure, the drive manufacture had concluded their review and assured us that a firmware update would fix the problem. In fact, online research at this time indicated that all of the SSD units were available, with the exception of one or two manufacturers who designed their own and were utilizing the same controller model. Additional research revealed that several other manufacturers were also releasing firmware updates. We scheduled firmware updates on all existing and future SSD installations and assumed we were safe moving forward.
A surge of failures
By June of 2012, we had 234 SSD units in production – many of which had their firmware upgraded. We had only experienced one failure in April, but by the end of May, we had experienced 10 failures company-wide. Of these failures, three had their firmware updated; no data was recoverable from any of these units. We immediately shipped the manufacturer three of our computers for additional testing.
Each computer was running our current software and OS platform. Our hope was that the manufacturer, through verbose logging, could help determine what was causing the failures. The field staff was worried and discussions surrounding complete reversal of the rollout began to ensue. Fears of total data loss and talk of full-disk backups were commonplace. We realized that the manufacture had another SSD available, which utilized a different controller. These units rated at twice the speed of the original units so they were also twice the cost, but we were running out of options. We immediately purchased several units for testing.
During our testing, we encountered no problems whatsoever. In fact, the new units performed 25-45 percent better in the performance tests initially applied in August. We contacted the manufacture immediately. Upon verifying that the new drives had been in the market at a volume slightly over 80% of the original units and had experienced a fractional rate of failure, we asked that they replace all the original units with the newer-faster drives. They complied.
Since this writing, we have replaced 248 SSDs in use and have converted 301 HDDs to SSDs. We have not experienced any additional BSOD or drive failures. We have however extended the life of our existing fleet and increased productivity in the field by at least 25 percent. While we would have loved not to experience this level of failure and frustration, we are pleased with the results of the upgrade and recommend, especially now that most of the controller bugs are gone, that you consider SSD upgrades for the aforementioned reasons.
While many companies have loudly bound into the digital age, some have been flying underneath the radar. Has your company “done what is has to” to keep up… or are you sitting on a ticking time-bomb?
