(Vadym/stock.adobe.com; generated with AI)
In the race to build faster, smarter, and more powerful artificial intelligence (AI) servers, designers often focus on the headline components like graphics processing units (GPUs), tensor processing units (TPUs), and high-speed interconnects. But beneath the surface, passive components like capacitors quietly shoulder the burden of maintaining system stability. When capacitors fail, the consequences ripple through the entire system: voltage regulators falter, processors crash, and servers go offline. And yet, these failures often occur despite the capacitors having passed rigorous lab tests.
So, why is this happening? In this blog, we examine why lab-tested passive components such as capacitors may fail in the field, and consider how designers can mitigate component failure by selecting passive components specifically designed for demanding AI server environments.
AI servers operate in environments far more demanding than the conditions under which most capacitors are tested. High temperatures, elevated humidity, and intense power densities create a perfect storm for component degradation. In data centers, ambient temperatures can exceed 50–60°C, and localized hotspots near processors can push well beyond 100°C. Add to that the presence of moisture from air economizers or liquid cooling systems, and you have a recipe for accelerated wear—even for components that passed standard qualification tests.
To understand why capacitors fail, we need to confront misconceptions surrounding them. Although capacitors are characterized in textbooks as simple components with a dielectric between conducting plates, their actual construction is far more varied and complicated.
Among the many capacitor types are three basic groups and some of their associated failure modes:
Admittedly, capacitors and their designation terminology can be confusing. Sometimes, they are named by their conductive or dielectric materials, such as aluminum, ceramic, or plastic; while other times, they are called out by their construction, such as film or multilayered, and these designations overlap.
Capacitors do not necessarily fail outright or have a single failure mode. While their nominal farad values can change significantly, they are also prone to changes in other specifications. They can degrade in key attributes, such as an increase in ESR or leakage current, or experience changes in other critical parameters.
Do not be misled by their functional simplicity. As with any component, capacitors have multiple potential points of failure (Figure 1).
Figure 1: Despite its conceptual simplicity, the capacitor—like all other components—has many possible failure causes, modes, effects, and consequences, shown here for the metallized film capacitor. (Source: CERN, CC BY 4.0 http://creativecommons.org/licenses/by/4.0/)[1]
Changes in capacitor performance can lead to processor slowdowns, noise-induced issues, voltage-regulator instability, erratic system performance, or full server outages that could negatively affect uptime service level agreements (SLAs) and customer workloads. Many of these system problems are difficult to diagnose due to their intermittent aspects or lack of an obvious link between cause and associated effect.
Capacitor reliability is typically assessed using standardized tests that simulate stress conditions—such as 105°C for 2,000 hours—but often in dry ovens, without ripple current, and under controlled humidity. These include the numerous standards for testing and evaluating capacitors with details for pre-, during-, and post-test setups and procedures, such as:
While these standards and tests are comprehensive, detailed, and valuable for benchmarking, they do not adequately reflect the chaotic reality of AI server deployments. Modern AI servers run 24/7, and these systems are not only thermally stressed but also exposed to humidity levels that can lead to condensation.
A data center’s recommended temperature should range between 18°–27°C, according to ASHRAE guidelines.[2] With power densities reaching 30–50kW per rack,[3] and projections suggesting clusters may soon hit 1,000kW,[4] thermal dissipation becomes a real challenge. Additionally, ASHRAE guidelines allow dew points up to 15°C, meaning moisture intrusion is a real concern. In such conditions, capacitors face challenges that lab tests simply don’t account for.
Humidity and ripple current are particularly damaging. Moisture can degrade packaging materials, while ripple current stresses the internal structure of the capacitor. Together, they accelerate failure mechanisms that are rarely triggered in lab environments
Recognizing the difficulties surrounding capacitors within data centers, YAGEO Group has introduced capacitors that are optimized and evaluated for these AI server applications. The A798 aluminum organic capacitors (AO-CAP®) are high-humidity, high-temperature solid-state aluminum capacitors built to withstand the rigors of AI server operation. With a rated voltage of 2V to 2.5V, these polarized units are available in capacitance values from 150µF to 470µF and housed in two diminutive package sizes measuring just 7.3mm × 4.3mm × 1.9mm and 7.3mm × 4.3mm × 2.8mm (L x W x H).
Their cathode is a solid conductive organic polymer, which results in very low ESR and improved capacitance retention at high frequency. Since there is no liquid electrolyte, the A798 offers long operational lifetimes and high operating temperatures. The inherent low ESR makes the capacitors suitable for handling normally detrimental high ripple currents.
The A798 construction is based on a stacking of aluminum elements, which includes the dielectric Al2O3 and the polymer counter electrode on the surface, while the external layers are built with carbon and silver (Figure 2).
Figure 2: Components in the A798 family offer high capacitance and long life under the stressful conditions of AI servers, due to their advanced materials, sophisticated design, and enhanced implementation. (Source: YAGEO Group)
Internally, several element foils are stacked and positioned within the capacitor construction, which is largely responsible for the very low ESR (Figure 3).
Figure 3: A cutaway diagram of an A798 capacitor shows the many elements that are needed to create the capacitance function. (Source: YAGEO Group)
Enhancements to the design and selected material upgrades were introduced in the A798 series to deliver 1,000 hours at 85°C and a very high 85 percent relative humidity (RH)—at the rated voltage—along with 125°C endurance life and storage. The small package size, high ripple-current capability, high operating temperature, low parasitics, and capacitance stability over life span make the A798 an ideal solution in demanding AI server applications.
Capacitors are essential in ensuring the reliable operation of demanding AI workloads. Failures often result not from poor quality, but from the inability of standardized lab tests to capture all the harsh realities of modern data centers. Standardized tests capture many individual stresses, yet real-world AI server operation often combines multiple factors, such as thermal cycling, ripple current, DC bias, humidity, and localized hotspots that interact in ways the tests do not fully replicate.
As system power densities continue to rise, designers must account for the full range of capacitor limitations and select components proven to perform beyond the lab and specifically designed for demanding AI server environments.
Bill Schweber is a contributing writer for Mouser Electronics and an electronics engineer who has written three textbooks on electronic communications systems, as well as hundreds of technical articles, opinion columns, and product features. In past roles, he worked as a technical website manager for multiple topic-specific sites for EE Times, as well as both the Executive Editor and Analog Editor at EDN.
At Analog Devices, Inc. (a leading vendor of analog and mixed-signal ICs), Bill was in marketing communications (public relations); as a result, he has been on both sides of the technical PR function, presenting company products, stories, and messages to the media and also as the recipient of these.
Prior to the MarCom role at Analog, Bill was associate editor of their respected technical journal, and also worked in their product marketing and applications engineering groups. Before those roles, Bill was at Instron Corp., doing hands-on analog- and power-circuit design and systems integration for materials-testing machine controls.
He has an MSEE (Univ. of Mass) and BSEE (Columbia Univ.), is a Registered Professional Engineer, and holds an Advanced Class amateur radio license. Bill has also planned, written, and presented online courses on a variety of engineering topics, including MOSFET basics, ADC selection, and driving LEDs.
Sources
[1] http://dx.doi.org/10.5170/CERN-2015-003.45 [2] https://www.ashrae.org/file%20library/technical%20resources/bookstore/ashrae_tc0909_power_white_paper_22_june_2016_revised.pdf [3] https://174powerglobal.com/blog/how-ai-changes-data-center-design-forever/ [4] https://www.datacenterdynamics.com/en/news/hyperscalers-prepare-for-1mw-racks-at-ocp-emea-google-announces-new-cdu/
YAGEO Group makes the future possible with electrical innovations and solutions that power the world forward. With one of the broadest selections of component technologies from some of the industry’s most recognized brands, YAGEO Group components are designed to meet the diverse requirements of customers and a full range of end-market segments.