NEC Corporation announced the launch of “FireDucks”, a free software program designed to accelerate the table data analysis library “pandas,” which is used for analysis with Python—the most widely used programming language in the world today. Capable of carrying out the data preparation required for data analysis up to 16 times faster than existing products, this newly developed software significantly shortens the time spent on data analysis and lowers computing costs.
In recent years, it has become easier than ever to collect massive amounts of data, including sales data from point-of-sale (POS) terminals, e-commerce, and data from financial transactions. In order to extract valuable analytical results from such data, there is a growing need for data scientists to analyze it using artificial intelligence (AI) and machine learning (ML).
However, in order to prepare for data analysis, large data sets must first be preprocessed. Data scientists are said to spend approximately 45% (*3) of their time preparing data, and this has become a major issue. In addition, the surge in data volume and evolution of AI and ML have led to increased computational complexity. As a result, higher computational costs (e.g., cloud costs) and the consequent rise in power consumption and CO2 emissions have also become problematic.
In view of this, NEC set out to develop FireDucks, a software program designed to accelerate pandas. To develop this software, NEC leveraged the high-performance programming technology and acceleration know-how it has cultivated in its thirty-plus years of experience developing supercomputers.
By making the beta version of FireDucks available to the general public free of charge, NEC hopes to contribute to the reduction of work hours for data scientists to analyze data and the resolution of environmental issues through the conservation of power and lowering of CO2 emissions.
1. Accelerated performance
FireDucks is capable of accelerating software programs created using pandas by up to 16 times and on average by about five times (*2). This reduces the overall time data scientists spend working on data analysis by approximately 30% (*4).
Parallel utilization of all cores and computation reduction are the primary reasons for this level of acceleration. FireDucks utilizes every core of a multi-core CPU to efficiently process large data sets in parallel. Moreover, rather than executing processes in the same order and range specified in the program, the data sets necessary for producing the results are identified from the overall process in advance, which means processing only needs to be performed for those data sets. This in turn makes it possible to accelerate processing.
2. High compatibility
Another feature of this software is its high compatibility with pandas. While some libraries are able to achieve faster processing speeds than pandas, they require multiple steps, including the rewriting of the program. FireDucks, on the other hand, can be easily applied because only one line of the program must be rewritten to perform analysis and coding just as you would if using pandas.