It was late the night of Thursday, July 28, and ALS Electronics Engineer Chris Pappas was getting ready for bed when he got the call from the ALS Electronics Maintenance shop. The power supply to the ALS’s booster bend magnets had tripped, was reset, and tripped again. “I’ll come up there,” Pappas said.
The power supply contains hundreds of capacitors that act as an intermediate filter between the electricity grid input and rectifiers and the booster bend magnets. Without it, the ALS can’t accelerate the electrons to the energy necessary to produce x-ray light.
Pappas and electronics maintenance technicians worked well into the night but couldn’t find an obvious solution, so they decided to try again the next morning.
By happenstance, the morning before the failure, retired engineer Mike Fahmie, who originally specified, installed, and commissioned the power supply, had been at the ALS to discuss the electrical safety training he needed to continue to support the supply’s functioning. On Friday morning, ALS staff called him, and he agreed to come in.
The ALS engineering crews worked throughout the day Friday but went into the weekend with no solutions. The fuses continued to blow consistently on two capacitor banks. Disabling the two banks hadn’t helped, nor had running the booster below its typical 1.9 GeV energy, injecting into the storage ring, then ramping up the storage ring to full energy—the mode in which the ALS operated before the power supply was put in place.
On Saturday, Fahmie developed a new idea. Maybe the capacitor modules blowing the fuse weren’t the problem. Instead, other capacitor modules in the circuit might be degraded, transferring an overabundance of current into the good modules, causing them to blow the fuses.
The team equipped themselves with flashlights and mirrors and went into the banks to visually inspect the capacitors for signs of degradation. Again, they reached a dead end.
“There was no smoking gun,” said Ken Baptiste, Electrical and Controls Engineering leader. “We saw some cans that showed some burns and physical damage, but in hindsight they were probably damaged during manufacturing or when the power supply was being commissioned,” added Pappas.
Next, over a period of two shifts, the team directly measured the ESR—the equivalent series resistance—of all 600 capacitors to try to pinpoint the culprits. For this measurement, test equipment first needed to be designed, fabricated, and tested. During the measurement process, Electronics Maintenance Technicians (EMs) Curtis Gomez and Eric Kawakami found and corrected a safety issue related to the capacitor wiring. But unfortunately, the ESR measurements yielded nothing conclusive.
By the end of the weekend, the team was no closer to a solution, and the pressure was mounting. More than two days of user operations had been lost. Monday and Tuesday were scheduled maintenance days, but user operations were scheduled to restart on Wednesday at 8 a.m.
On Monday morning, Accelerator Operations Deputy Christoph Steier, freshly returned from backpacking in Yosemite, revisited the weekend’s results with Pappas and Baptiste and helped retrieve an additional piece of information from the digital power supply controller: the waveform charts from the last power supply trip. The charts suggested the problem might lie, not in the large banks where the fuses were blown, but instead in another set of capacitors in two different locations that act as a filter to prevent high-frequency ringing.
The 600 electrolytic capacitors that had already been inspected were cylindrical and the size of large cans. The high-frequency capacitors used in the power supply’s final and IGBT filters were a different beast entirely, made of mylar film with metallization on both sides, wound up and put in series and in parallel in a large case the size of a truck battery. Reaching the final filter was straightforward, and the team quickly determined that all eight capacitors were bad. But getting access to the IGBT filters was no easy task.
“A bunch of things had to come out. [The capacitors] were surrounded by a set of rectifiers and by IGBTs [insulated-gate bipolar transistors]. Then there was bus work in the front,” explained Ken Berg, electronics coordinator for the engineering division. According to Baptiste, Berg saved nearly a shift’s worth of work by realizing that only the upper half of the bus work needed to come out to access the capacitors.
EM Doug Bashaw measured the capacitance and found that four of the 4000 μF capacitors read low—two below 2000 μF and two between 2000–3500 μF. Once they were removed, it was clear their cases were cracked. At last, on Tuesday, less than 24 hours before user operations were scheduled to begin, the smoking guns had been found.
“At this point we had some degree of confidence we’d found the problem, although everything we’d thought up to this point had been wrong,” said Pappas. “Things started to add up,” Baptiste elaborated. “The fact that all 600 capacitors checked out rudimentarily okay. The fact that these high-frequency filter capacitors and snubber capacitors were found to be bad and correlated with the high-frequency noise on Christoph’s plots from the night of the failure.”
Fortunately, the ALS had eight spare final filter capacitors of this type and two spare IGBT filter capacitors. The team formed a plan to replace the bad capacitors. Some staff had been working for eight to ten hours by this point, and they knew the work would take at least six to eight more, including testing, so they decided to do the assembly that night, then come in at 7 a.m. the next morning to do the initial testing of the power supply. Quickly, however, they discovered a problem. The new capacitor cases were wider, and space inside the bank was tight.
“Electrically, they were the same exact capacitor,” said Berg, “But physically, the cases were larger. The screw holes lined up on the first one, but when you went to put the others in, the cases were a quarter inch wider, so now you’re a half inch off.”
The team had to redrill the holes, develop a new way to hang the capacitors, and mill the bus bars that bolt onto the capacitors. Electronics Installation Technician Bob Gassaway came up with some of the creative solutions and provided access to the machine shop. Once the capacitors were in place, it was 9 p.m., and one of the final steps was for Mechanical Technician Jason Borsos to reconnect the water hose that cools the rectifier units. The bottom hose fit, but the top one was half an inch off. “It was one of those things where we just looked at each other and went, okay, what else can go wrong?” said Berg. After some adjustments, Borsos reconnected the water, and Gassaway, Berg, and Gomez finished getting the power supply ready for the morning’s test.
On Wednesday morning, the team prepared to turn on the machine. “By mid-morning we had all our ducks in a row,” said Baptiste. “There were eight to ten people in the booster pit. We had the instrumentation on, we had scopes, we had people, we had the safety officer’s okay to proceed with the initial turn on. And it turned on without event.” By lunchtime, the machine was up to full energy, and everything was running nominally.
After a successful test, the final step was to turn off the power supply, take out all the instrumentation that was plugged in, put the supply covers back on, and turn it on one more time. By 3:15 p.m. Wednesday afternoon, after nearly six days of almost continuous work, user operations resumed.
Despite the long hours and the complexity in finding a solution, the team reflected positively on the experience. “There wasn’t any huge special thing that happened… It’s what we do here,” said Borsos. “We have to problem solve and we have to make it happen.” According to Berg, “This was one of the better examples of teamwork. Everybody’s comments and ideas were considered.”
The engineering staff are currently looking into measures to mitigate this type of failure in the future. “This was a failure that had been approaching for years,” said Baptiste. “Those capacitors degraded, degraded, degraded, then finally the ripple in the high frequency was enough to start blowing fuses. Had those fuses not been there, we would have had something much more significant happen.” More specific calculations are being done to estimate the lifetime of the remaining capacitors, some degraded capacitors are being replaced, and more spare parts are being ordered.
Full team credits: Ken Baptiste, Doug Bashaw, Jacque Bell, Ken Berg, Jason Borsos, Ronny Colston, Mike Decool, Mike Fahmie, Bob Gassaway, Dennis Gibson, Curtis Gomez, Eric Kawakami, Tim Kuneli, Octavian Matei, Chris Pappas, Ed Rim, Dave Robin, Sergio Rogoff, Fernando Sannibale, Christoph Steier, Scott Taylor, Marcos Turqueti, Max Vinco, Will Waldron.