South Korea’s ICT ministry has given Kakao and other players one month to come up with emergency response measures which will prevent a repeat of the disastrous fire which brought down the KakaoTalk messenger app in October.
The Ministry of Science and ICT’s investigation into South Korea’s worst-ever tech outage, published today told all those involved to implement an emergency response plan, but singled out Kakao for particular criticism. KakaoTalk’s app, which is used for identification in a vast array of Korean business transactions, was down for days, after a fire broke out in the Pangyo Data Center, run by SK C&C in Gyeonggi Province near Seoul, on October 15th.
Another Korean giant, Naver, suffered an outage, but recovered quickly.
SK, Naver, and Kakao’s fire failures
According to the Ministry, the fire started in lithium-ion batteries, but the impact was made worse because SK C&C’s data center design did not separate the batteries from the UPS. With the fire raging, the UPS was unusable. The fire was also made more likely by SK C&C’s inadequate fire prevention procedures and poor information provided to firefighters.
The Ministry condemns Kakao for concentrating all its services in one data center, which apparently also held the backup servers, making a failover impossible. Kakao is instructed to diversify its services, so they are delivered from more different data centers within South Korea.
According to Korea Bizwire, the fire started at a battery room in the third basement of the Pangyo building, which is run by SK Group’s IT services division, SK C&C. Earlier reports have said the batteries were provided by another SK Group subsidiary.
At a briefing, the ministry confirmed that the failure was in lithium-ion batteries, adding that the fire was difficult to recover from because the batteries were not physically separated from the uninterruptible power supply (UPS) at the data center, according to the Korea Herald.
Lithium-ion battery fires cannot be put out with fire extinguishers, and SK C&C failed to provide information on where to spray water in the event of a fire. The fire disabled the UPS completely, and the power to the facility had to be suspended, the Herald reports.
It took eight hours to extinguish the fire, but by that time Kakao’s and Naver’s servers were all out for the count because of the power shutdown. Kakao had a standby server, but this appears to have been at the same site, and therefore of no value. The ministry report says the backup system did not work well as “the company has only a built-in backup and does not have a double system in a separate data center to take over the role in case of emergency”, reports Bizwire.
By contrast, Naver was able to restore its services in a separate data center.
The ministry has ordered Kakao to establish a backup system which can fail over to a different location, and to produce plans for “multiple nonintegrated data protection solutions”.
Kakao has also been ordered to spell out its plans to compensate customers who suffered damage due to the service disruptions. Already, several customers have joined class action suits claiming loss and damage due to the outage.
The ministry also criticized SK C&C for having a sketchy fire response protocol with no detailed action plans and no fire drills. The services company also needs to strengthen its battery monitoring and fire detection system, the ministry said.
The fire affected nearly all of Kakao’s online services, including Kakao Bank and the Kakao T traffic app, as most of the functions run from the Pangyo data center. The heaviest impact came from the failure of KakaoTalk, as it is used for authentication by many other third-party applications.
All three companies – SK C&C, Kakao, and Naver – were told to draw up a contingency plan for any future data center fires or other disasters, and to report the results of fire drills – which the Ministry ordered them to carry out in the immediate aftermath of the fire.
For its part, the ICT ministry will itself publish guidelines for resilient digital services early in 2023. This will include input from SK C&C, Kakao, and Naver, as well as other industry experts.
“As the failure of data centers and digital services have a profound impact on the economy and society as a whole, operators who provided the cause of the accident will be able to restore public trust in digital services by strictly recognizing the causes of the accident and doing their best to prevent future damage,” said Lee Jong-ho, Minister of Science and ICT, the Herald reports. Data Center Dynamics