Strategies for Code Transfer in High Energy Physics Experiments for Optimal Performance

High Energy Physics (HEP) experiments are among the prominent scientific fields facing massive challenges related to processing huge amounts of data. In the quest to explore rare interactions among fundamental particles, these experiments increasingly rely on advanced and distributed computing techniques, especially in light of the anticipated rise in performance demands over the coming years. This article focuses on the urgent need to develop portable programming solutions, enabling these experiments to maximize the utilization of available computational resources, including Graphics Processing Units (GPUs) from various manufacturers. In this context, we will review the experiments and results obtained from testing algorithms related to particle tracking and compare the performance and challenges associated with the application of portable programming methods. Through this work, we aim to highlight the importance of innovation in computing environments and its role in enhancing the capabilities of high energy physics experiments.

Challenges Facing High Energy Physics

High Energy Physics (HEP) experiments face significant challenges in the need to process massive amounts of data resulting from interactions of fundamental particles. The experiment known as “CMS” at the Large Hadron Collider (LHC) at CERN, for example, processed hundreds of petabytes of detector data and Monte Carlo (MC) simulations during the period from 2015 to 2018. Looking ahead, experiments like “HL-LHC” and “DUNE” present new computational challenges as the event rate at the “LHC” is expected to increase by a factor of 7.5, meaning that the amounts of data will grow to the exabyte range. Therefore, the need for a shift in traditional computing methods in HEP becomes a necessity rather than a choice.

Dealing with these massive amounts of data requires intensive research and development, as well as significant changes in the software infrastructure and the techniques used for data analysis. One of the key areas that can assist in this transformation is leveraging diverse and parallel computing resources. For instance, in the past, LHC experiments relied on traditional x86 CPUs for most of their computing needs. However, in light of the challenges associated with the increasing data volumes, experts have started to make modifications to their software frameworks to take advantage of High Performance Computing (HPC) resources that increasingly rely on Graphics Processing Units (GPUs).

In light of these challenges, it has become clear that developing HEP algorithms to fit the diverse architecture of computing clusters will be required to meet future needs. This requires rewriting a significant portion of the original programming code, necessitating substantial efforts from multidisciplinary teams. The critical factor here is the choice of available programming tools, which must allow for the same source code to run on multiple computing platforms. This is an essential requirement in terms of efficiency and energy usage.

The Need for Portable Programming Tools

The need for portable programming tools emphasizes the importance of developing HEP software in a way that allows the source code to run efficiently across a variety of different architectures. One of the solutions being utilized is compilation-based programming methods, which means that the code can be linked to libraries or frameworks capable of managing execution details across different architectures. It is important to note that this approach not only optimizes performance but also facilitates code maintenance and alleviates the workload on programming teams dealing with extensive and complex sequences of code.

Some available solutions include libraries like Kokkos and Alpaka, which provide high-level data structures and options for parallel execution. These libraries effectively support HEP data processing, allowing for reductions in the difficulties associated with rewriting and optimization tasks. This enables scientists to focus on physical analysis rather than the intricate details involved in navigating across multiple environments.

This
The transition represents just a station in the HEP journey towards innovation. When it comes to performance, outcomes can vary significantly based on implementation details. Therefore, experiments carried out using baseline reference algorithms are essential for new discoveries, providing a deep understanding of how to effectively leverage tools to improve performance.

Software Portability Experiments and Tools Used

The software portability experiment involves several tools and techniques aimed at enhancing performance and simplifying the process for developers. One important consideration is that performance may vary depending on memory organization and linkage strategies. Studies have shown that employing advanced strategies in memory management can lead to significant performance improvements even when working on simple applications.

For example, a study conducted on a testing algorithm that was used to explore the possibility of software portability. By comparing performance with reference algorithms, valuable insights were gained about how various software portability solutions may be implemented. This includes modern programming frameworks, such as the standard std::execution::par, which has been included in the C++ standard since the release of C++17. This provides an API and high-level design around concurrent loops, but does not allow for the low-level optimizations that can be utilized to enhance performance in native execution.

Many tools and programming environments have been developed, including SYCL, which provides a programming model based on the C++ standard, making it easier to include both host code and kernel code in the same source file. This reflects the importance of continuous research in high-performance programming and how these advancements can meet the growing demand for computing in HEP experiments.

Lessons Learned

The findings presented in the research and the experience gained during transfers include a range of important lessons. The first point is the importance of choosing the right tools. Good tools not only facilitate the transition process but also enable access to new efficiencies. Portable software solutions provide strategic benefits from the outset, with the freedom to work on multiple platforms without the need for radical code modifications.

One of the main issues that arises from the experiments is the need to remain aware of rapid developments in programming fields. Scientists and developers must stay connected with new communities and emerging technologies to ensure optimal performance is achieved. Furthermore, it is important to emphasize that each toolset has its own strengths and weaknesses that must be considered when making decisions.

Ultimately, it is clear that success in addressing computing challenges in HEP relies on flexible code that can adapt to future changes. Well-organized development strategies and an analytical approach to programming will enable scientists to push the boundaries of knowledge in high-energy physics, contributing to the advancement of the scientific community as a whole.

Traditional Kalman Filtering Algorithm and Its Impact on Large Experiments

The traditional Kalman filtering algorithm is one of the fundamental techniques used in High Energy Physics (HEP) experiments to track the movement of particles. This algorithm was developed for multiple purposes related to reconstructing the trajectories of particles, providing high accuracy in the measurements needed to understand particle behavior in experiments like the CMS experiment. The algorithm includes a mathematical model that allows predicting the particle’s path based on a set of measurements and surrounding noise. Despite its apparent simplicity, this algorithm forms the basis for the particle tracking process as it relies on complex calculations to link readings to a specific particle and infer its properties.

The performance of this algorithm varies based on several factors, including the use of small data banks for multiple paths under the experiment dome. For instance, standard programs like “propagate to z” or “propagate to r” are used to evaluate the algorithm’s performance under specific conditions. Both tests represent a comfortable environment for inferring how to improve operational efficiencies and rewrite algorithms to enhance performance. Additionally, the development of these algorithms plays a significant role in the near future for high-acceleration experiments like the LHC.

Project

MKFIT: Performance Improvement with Enhanced Algorithms

The MKFIT project is a collective effort to modernize traditional tracking algorithms used in high-energy physics experiments. This project aims to rewrite the KF algorithm to be more efficient using a multithreaded and vectorized implementation. The new setup aims to process large amounts of data more quickly, up to twice the speed compared to previous applications.

Research indicates that by utilizing parallel processing techniques, performance can be enhanced in processing thousands of tracks present in a single experiment. Additionally, storing data closely together in memory, so that corresponding elements are stored in neighboring locations, enhances the algorithm’s ability to take advantage of SIMD operations, leading to faster individual track computations.

The success achieved by MKFIT includes notable improvements on multi-core CPUs, representing a real advancement in the field of particle physics research. It is important to note that transitioning to mobile computing infrastructure like GPUs requires additional efforts, especially in dealing with irregular memory access patterns.

Challenges in Porting the MKFIT Algorithm to GPU Processing Units

Despite the clear benefits of using GPUs to accelerate computational processes, the process of porting MKFIT to graphical computing environments has not been straightforward. Initial attempts to port MKFIT to CUDA revealed significant challenges in adapting data inputs and achieving acceptable performance in terms of time and efficiency. The irregular patterns of memory access while trying to organize data from different tracks posed the biggest obstacle to successful implementation in the graphical environment.

Through previous experiments, it became evident that in many cases, a comprehensive rewrite of the underlying code is necessary to achieve the desired performance. Therefore, the focus has been on developing portable tools to maximize the benefits from the available infrastructure, with the p2z project created for this purpose representing a promising experiment in exploring transport techniques in the context of tracking charged particles.

Algorithmic Process Description for Particle Tracking

Particle tracking consists of several critical computational steps, including finding the stable track of the particle, known as the “track finding” process. During this process, multiple sets of measurements are tested to identify a parallel group that matches the expected helical path of the particle in a magnetic field. Two main points are emphasized in this process: propagation and update linguistics, both of which require intensive calculations and involve complex chemical operations related to matrices.

The propagation processes depend on reevaluating the set of measurements based on predictive equations, where mathematical steps are utilized to forecast the particle’s location based on its current measurements. On the other hand, the update process involves integrating new measurements to improve estimating accuracy. Both are critical steps in the time taken to reconstruct tracks, making it essential to enhance these processes to improve overall experimental efficiency.

Conclusions and Future Outlook

The ongoing efforts to improve tracking algorithms in particle physics research represent a bridge towards achieving a deeper understanding of the fundamental components of the universe. Technologies like MKFIT and p2z are powerful tools that will help in processing the massive amounts of data generated in future experiments, highlighting the urgent need for investment in new equipment and technologies. Collaboration between different research teams and experimentation with various strategies are key elements in achieving collective goals, emphasizing the importance of teamwork in developing effective data analysis tools.

In light of the significant progress made through new technologies and software, it is hoped that these projects will enhance the effective use of graphical computing, enabling scientists to uncover the mysteries of the universe and provide new answers to deep questions in modern physics.

Organization

Data Paths in Data Teams: The MPTRK Concept

The process of organizing paths in data construction is one of the fundamental steps in high-performance programming, aimed at enhancing efficiency and speed when executing certain algorithms. The MPTRK data pattern, which is used within the MKFIT algorithm, has been introduced as an effective tool for optimizing path processing operations. MPTRK refers to an organized structure known as “Structure-Of-Arrays” or Structure-Of-Arrays, where paths are grouped into certain batches known as batch size (bsize). This structure enables SIMD concurrent operations across elements within each batch, promising to improve system performance on large data transactions.

Defining the batch size (bsize) is a critical aspect that can be optimized according to the platform used. For example, the optimal size on GPUs may be the NVIDIA warp size of 32, while the batch size on CPUs may align with the AVX-512 vector width of 16. To ensure consistency, a size of 32 is used in all cases. This flexibility in defining sizes allows for the development of algorithms tailored to the characteristics of different hardware, leading to performance enhancements.

When examining data storage in the AOSOA pattern that follows MKFIT methodologies, we find that the data is stored in a specific order. The first elements of bsize from the arrays are stored contiguously, followed by the second elements, and so on. This arrangement ensures simplified memory access and reduces the time taken to retrieve data. These practices are more evident in the data illustrations for the p2r and p2z systems, which contain 8192 and 9600 paths respectively. The same model applies to the strike data, underlining the importance of structural organization of data in speeding up processing operations.

The significance of data organization within the MPTRK framework is reflected in the capabilities that allow systems to fully leverage parallel operations, thus saving processing time and enhancing efficiency. Therefore, the effective conversion of paths into batches within this structure is a key reason for the improved performance of the MKFIT program and related algorithms.

Result Tools and Compatibility Representation: Various Techniques

Compatibility tools for parallel programming applications are undergoing continuous evolution, with new features and support for compilers and other tools added periodically. In this context, nine different tools for parallel programming were tested across four diverse architectures. However, testing all possible combinations within the scope of this study was challenging.

The final evaluation of the p2z and p2r implementations was presented, including tables showing which tools or compilers were used, as well as a complete set of p2r implementations in the accompanying tables. This indicates the diversity of options available to developers in the field. For instance, the oneAPI Threading Building Blocks (TBB) library was used to ensure that the implementation accurately reflected what was utilized in the MKFIT project for improved performance.

Regarding GPU implementation, the basic model was built on the CUDA programming model, which enables multithreading and repetitive operations. One of the notable benefits provided by this model is that each track can be processed by a series of threads in parallel computing. The right to leverage list memory features and memory optimizations is critical, allowing developers to implement powerful algorithms on graphics processing units.

The CUDA model offers a high level of abstraction that can be compared to general programming models such as OpenCL and SYCL. However, it is a proprietary model from NVIDIA, so code written using it is not necessarily portable across various processing units. Therefore, the HIP model for AMD devices was adopted, which seeks to achieve portability across different architectures while maintaining many of the similar base rules.

Data organization and the strategies developed under MPTRK and parallel programming principles are essential for optimizing performance and scalability in data-intensive applications.

Directive-based solutions, such as OpenMP and OpenACC, are prominent examples of technologies that enable developers to use code annotations to define application characteristics. These programming models can be incrementally integrated with existing sequential applications, aiding in the acceleration of the transition to parallel versions that can leverage advanced processing libraries. From this perspective, cooperative programming tools are now more available to developers than ever before, allowing them to fully exploit the resources of modern hardware.

Experiences with TBB and CUDA Libraries

The oneAPI Threading Building Blocks (TBB) library is one of the pivotal tools for implementing parallel operations on multi-core processors, providing a simple API that aids in managing and executing threads. Execution threads are organized using clusters or groups of code that are executed in parallel, facilitating optimization and control of operations. In this context, utilizing parallel_for loops specific to TBB to cover events and path sets is an important step toward achieving actual performance.

High performance in TBB execution arises not only from the organization of paths but also from effective methods for parallel processing. Comprehensive monitoring of all tasks distributed across threads ensures appropriate resource allocation. MKFIT similarly employs these solutions to enhance performance in applications requiring optimal multi-threaded operations.

As for the CUDA model, it is considered one of the effective models for handling parallel programming, with a greater focus on enhancing performance through graphics processing units. The CUDA library provides a paradigm where each MPTRK can be processed over a block of threads, which builds on the independence of operations between paths. The transition from a TBB version to CUDA can highlight the advantages gained from each type of programming.

These tools, as mentioned, are available in the form of frameworks that provide software developers with flexible capabilities to select the most suitable for their specific needs. Each tool has its strengths and weaknesses, and as development tools progress, these technologies continue to compete with each other, delivering better outcomes and easing the transition between different programming models.

Choice Compatibility: Analyzing Different Solutions

Choice compatibility between parallel programming models is a crucial aspect of application development. With several models available such as OpenMP, OpenACC, HIP, and CUDA, each option provides its unique advantages that make it suitable for specific purposes. For example, OpenMP offers an option to provide annotations for tuning parallelism, making it easier for developers to add the necessary abstraction to their CPU-based operations, while OpenACC is similar with a focus on parallel hardware.

The transition between programming models has become possible thanks to compatibility techniques that contribute to accelerating software development. The importance lies in reducing the need to rewrite large portions of code, which facilitates the rigorous development process for large projects. Experiences in transitioning between OpenMP and OpenACC provide examples of how to handle the challenges posed by moving between models, as parallelism strategies can vary from one to another depending on compiler responses.

Experiences with EDA (Electronic Data Analysis) transformations demonstrate the feasibility of using such tools, as they provide desired results with minimal modifications. The more flexible the adopted solutions, the more efficiently developers can enhance software structures, opening new avenues for the evolution of custom software tools.

Libraries like Alpaka also commit to compatibility by adopting the idea of adding an abstraction layer to enhance usability across different platforms. All of this points not only to innovation in computational processes but also to providing developers with convenient tools for their development path without compromising performance quality.

Usage

Software Libraries in Data Processing

There are various software libraries aimed at improving data processing performance, among which are Alpaka and Kokkos. These libraries focus on providing unified solutions targeting high performance across various computing architectures. The Alpaka library is a notable example, especially in the field of High-Energy Physics (HEP) experiments, where the CMS experiment has chosen to rely on it as a unified solution to support the use of Graphics Processing Units (GPUs) in LHC Run 3. This choice demonstrates the gradual shift towards using modern and unified technologies due to their role in improving productivity and flexibility.

At the same time, the Kokkos library offers similar solutions, based on the concept of performance portability programming. This library takes advantage of pattern programming techniques to create code that can be executed across multiple platforms, ensuring consistent performance across a variety of computing devices. Kokkos is designed to reduce the complexities arising from programming for multiple devices, thereby easing the developers’ task in saving significant time and resources in data processing.

Additionally, Kokkos encourages developers to express their algorithms using general-purpose parallel programming concepts, which simplifies compatibility with various processing devices. Kokkos also provides specific parallel execution models, giving developers the ability to fine-tune execution details for optimal performance.

Standard Parallelism Techniques Using stdpar in C++

The C++ programming language is the preferred choice for implementing high-performance scientific applications, as recent updates to the ISO C++ standard have provided a set of algorithms that can be executed on graphics processors. C++17, for example, includes a wide array of parallel algorithms that extend the Standard Template Library (STL) algorithms and add execution options to assist in adapting across various computing devices, including multi-core systems and GPUs.

These options, such as std::execution::par and std::execution::unseq, enhance programmers’ ability to achieve improved performance by specifying execution behavior for algorithms. While libraries like NVIDIA’s nvc++ support offloading stdpar algorithms onto graphics processing units, allowing for a higher level of integration between CPU and GPU through unified memory management.

However, challenges abound, as developers must be cautious in allocating and partitioning memory between CPU and GPU to ensure that applications execute smoothly and without violating memory rules. Experiments indicate that developing code using these methods often requires a deep understanding of how devices and memory interact, reflecting the importance of technological knowledge in designing advanced scientific applications.

Programming with SYCL and Its Features

SYCL represents a multi-platform abstraction layer that allows coding on diverse processors using standard C++. SYCL was developed to enhance programming efficiency and facilitate access to a range of different computing architectures. One of the prominent advantages of SYCL is the ability to use regular C++ code for the central processing unit and C++ kernel code for various processors within the same source file, providing a streamlined and integrated development process.

SYCL is designed to be fully compatible with standard C++, allowing developers to use any C++ library in a SYCL application. The SYCL model also focuses on delivering consistent performance across a range of devices, as the abstractions are structured in a way that enables high performance without relying on a specific architecture, making SYCL a versatile tool for developers.

Looking at practical examples, experimental studies of SYCL have revealed significant improvements in the performance of computing applications. We conducted a thorough analysis comparing programming approaches, and the model used with SYCL resembles the traditional approach used with CUDA, facilitating the transition between programming models and enhancing collaboration among different development teams. This also shows the importance of compatibility among multiple programming approaches in improving efficiency and the speed of processing required in complex research environments.

Measurements

Performance and Practical Applications

One of the main challenges in developing scientific applications is measuring the performance of algorithms. The most important performance metric, from the perspective of HEP computing, is the throughput ratio, which defines the number of processing paths that can be handled per second. This ratio has been measured by applying various techniques across a range of computing systems, including NVIDIA, Intel, and AMD graphics processing units.

During experiments, practical performance demonstrated that algorithmic optimization can lead to significant improvements in throughput ratio. The performance of different tools was tested on a variety of computing systems, evaluating the efficiency and accuracy of multi-parallel algorithms. For instance, the X-ray techniques used in experiments such as MKFIT delivered viable results with different programming techniques, highlighting the importance of code optimization and the technologies employed.

Analyzing the results provides valuable insights into how software models can be improved to enhance performance. For example, experiments utilized data collected from multiple platforms, allowing for a broader understanding of the impacts resulting from programming choices and libraries used. This reflects the importance of investment in advanced programming strategies to ensure that applications keep pace with rapid technological developments in the field of scientific computing.

Implementation and Performance Analysis across Various Parallel Libraries

Different parallel libraries are a key tool in performance optimization within computational processes used on graphics processing units (GPUs). This section addresses the differences between libraries such as CUDA, Alpaka, and Kokkos, and how different systems utilize various compilers to optimize the execution tool. When implementing these libraries, a different compiler is used for each, such as using the nvcc compiler for CUDA libraries, and OpenARC for OpenMP and OpenACC libraries. This diversity in compilers can lead to variable performance results, especially when evaluating kernel performance or overall performance.

During performance evaluation, the launch parameters such as the number of blocks and the number of threads per block are considered. In some libraries such as Alpaka and Kokkos, these parameters must be specified manually, as leaving them undefined may lead to using suboptimal values that significantly reduce performance. Given the importance of these parameters, the numbers can directly influence the final performance. In some cases, such as setting the number of registers per thread, it has been observed that this can impact performance by up to 10%. On the other hand, the stdpar library does not allow manual specification of these parameters, which may significantly reduce performance effectiveness compared to other libraries.

At different performance values, test examples such as p2z and p2r were set up on various GPUs like V100 and A100, where it was observed that the time taken to execute the kernel was strictly determined by execution times, reflecting the precise performance of these libraries. Generally, the various transfer-related solutions led to performance very close to that of the original CUDA copies, except for the stdpar library, which suffered from performance issues due to its reliance on unified memory requiring data transfers for each execution, increasing processing costs.

The Impact of Compilers and Software on Performance

Studies have shown a clear dependency of performance based on the used compiler. For instance, in the version relying on OpenACC and OpenMP, the performance was tested on the V100 GPU, and the version compiled using OpenARC exhibited better performance, as this compiler directly followed user settings compared to other compilers like llvm and gcc. This performance variation is attributed to how each compiler handles user configurations in the programming code, with the operational settings in versions compiled using llvm and gcc being lower than those specified, resulting in a performance decline and stress on accessing global memory.

For example, in the case of the p2z version that relies on OpenMP, data was temporarily allocated in the team’s private memory; the results were much better with the compiled version using OpenARC because of its efficient use of shared CUDA memory. In contrast, the compiled versions using llvm and gcc faced issues including the use of non-optimal values in setting the number of blocks and threads, which negatively impacted performance. Overall, the results indicate that the compiler that closely follows user settings significantly affects program performance.

When it comes to data transfer, the implementation of memory binding showed a significant effect on performance, as binding improves the bandwidth ratio of transfers at the memory level. In the performance of different versions of the application, optimized versions using binding may exhibit much higher performance in data transfers, allowing programs to operate more efficiently during operations related to libraries such as OpenACC and Kokkos.

Performance of Solutions Across Different Architectures

When evaluating performance on other GPU architectures, such as AMD and Intel, support is still less developed compared to NVIDIA graphics cards, although this area has seen rapid expansion. The performance of the p2r version of the application was developed for testing on different graphics cards; however, the results indicate that overall performance still requires some improvements. Each specialized application requires specific optimizations to make it more efficient on different visual processing units.

The performance of libraries like HIP (for AMD units) and SYCL (for Intel units) was compared with the native performance of each. Measurements showed that performance was acceptable, but more effort is needed to improve compatibility and to revert to regular performance. For example, performance was measured on the MI-100 unit and the A770 unit. As certain libraries are optimized to operate efficiently across these architectures, the overall performance improves compared to traditional implementations.

Some performance notes indicate that you may not always achieve optimal performance without adjusting the correct settings and optimizations, as well as the importance of bandwidth and data transfer time that must be considered when measuring performance. The potential results reflect the current solutions’ inability to compete with more mature libraries like CUDA on NVIDIA graphics cards. Therefore, optimizing performance integration and library design for AMD and Intel keeps multiple requirements that must be met to reach a performance level close to what NVIDIA offers.

Analysis of AMD and Intel GPU Architectures

AMD and Intel GPU architectures are among the available solutions for improving performance in complex applications, such as path reconstruction applications for charged bodies. The discussion addresses how modern transport tools such as HIP, Alpaka, and Kokkos are utilized to achieve reasonable performance across different GPU platforms. Testing on the JLSE platform, which features AMD EPYC processors and MI100 units, shows that the transition between different environments, such as CUDA and HIP, occurs seamlessly without the need for code changes. For instance, the equivalent version of HIP in Alpaka showed better performance than the CUDA-derived version, while Kokkos performance was close, limited by an approximate factor of 2.

The results indicate that despite the challenges, performance testing for Intel GPUs, such as the A770, showed significantly lower performance figures due to reliance on single-precision floating-point operations. By reducing the precision of operations to single, performance improved; however, reliance on double precision instead can lead to substantial slowdowns ranging from 3 to 30 times. It is important to note that tools like SYCL in Alpaka are still in experimental support phases, and the development of Kokkos is still active, emphasizing the importance of using the latest versions of tools to achieve significant performance improvements.

Performance

Central Processing Unit in Application Execution

The MKFIT application uses the Threading Building Blocks (TBB) library as a reference for CPU-level performance. The discussion indicates that the original execution relied on an older version of the Intel C++ Compiler, which led to performance improvements of up to 2.7 times in execution time. However, due to lack of support for this version, it was opted not to include it in the main results. The data related to the performance of the p2z and p2r fast models on a dual-socket system equipped with an Intel Xeon Gold 6248 chip shows that all were compiled using gcc, reflecting positively on the comparative performance with the original TBB execution.

The Alpaka implementation of the p2z benchmark was able to surpass the reference TBB implementation, reflecting Alpaka’s capability to utilize memory allocation more efficiently. Threading and vectorization directives had significant impacts on performance, where best practices required optimizing data layout and ensuring that loops were correctly designed for vectorized processing during execution. The performance related to SYCL in the p2r model was controversial, achieving only 27% of the reference implementation’s performance, indicating the challenges faced by many developers in transitioning to new languages.

Challenges in Performance Optimization in Diverse Applications

Efforts to transition performance-driven applications from CPU to GPU faced multiple challenges. When optimizing competition among tools, it was found that memory patterns and advanced allocation had a significant impact on final performance. Performance improvements of up to six times were reported in some cases when optimizing how data is stored in memory, taking advantage of memory prepinning for NVIDIA graphics processing units. Meanwhile, the choice of compiler sometimes significantly affects the throughput performance of different units.

Continuous updates to tools and libraries are an essential necessity to ensure sustained performance, as experiments have shown a tangible improvement in Intel GPU performance when updated to newer versions of the Kokkos library. The experiments are varied, as workflows that were effective on certain types of processors or units may not necessarily be effective on others. This dynamic highlights the importance of diversity in tools and their specialization, which helps provide suitable solutions that meet the growing needs of data analysis applications in high-energy physics experiments.

Future Development Opportunities in Performance Tools

Future developments in the computing world are trending towards improving application execution across different processing units while enhancing portability between them. Current tools, such as Alpaka, Kokkos, and SYCL, offer opportunities to reuse legacy code across new infrastructures without the need to rewrite the entire application code. However, achieving the desired performance remains a challenge that requires investment of time and effort in optimizing processes and organizing data appropriately.

Collaboration among diverse resources, from research laboratories to educational institutions, is required to support research related to high-energy physics using these advanced technologies. Providing access to data and applications is an important step towards achieving these goals, which necessitates building an effective community of developers addressing performance and efficiency issues. With the increasing pressure to develop advanced analytical techniques, it is essential to continue exploring new options for performance and portability to achieve success in these ambitious projects.

Data Processing Challenges in High-Energy Physics

Modern experiments in high-energy physics require processing vast amounts of data while searching for extremely rare interactions between fundamental particles. For example, the CMS experiment at the Large Hadron Collider (LHC) at CERN processed hundreds of petabytes of detector data and Monte Carlo simulations during the second run of the collider (2015-2018). Data rates are expected to increase significantly in experiments like the High Luminosity LHC and DUNE in the coming decade. The high-energy physics sector faces additional challenges with the anticipated event rate increasing by a factor of 7.5, meaning data volumes could reach exabyte levels. Therefore, dealing with these large data volumes requires significant transformations in traditional computing methods and an advanced vision for the effective use of computational resources.

Necessity

Parallel and Versatile Programming

The need for parallel and versatile programming is increasing within the framework of modern high-energy physics experiments. Past experiments primarily relied on traditional central processing units (CPUs), but things are changing with the shift towards using graphical processing units (GPUs) that offer enhanced performance. The way forward is to expand existing software ideas to take advantage of the diverse architecture provided by high-performance computing (HPC) centers. Utilizing multi-programming approaches like “Kokkos” or “Alpaka” helps researchers prepare their programs to face the new challenges posed by “Exascale Computing.” By restructuring programs to be compatible with these new resources, performance can be improved and peak computing power exploited.

The Shift Towards Distributed Computing

Distributed computing contributes to enhancing the ability to process massive amounts of data through extensive networks of interconnected computing centers. The Worldwide LHC Computing Grid (WLCG) model exemplifies how multiple centers across various countries cooperate to process data efficiently. However, the need will grow and complicate due to the increasing complexity and volume of data in the future. This will require developing new models that facilitate collaboration among multiple machines and making algorithms capable of operating efficiently across this distributed environment.

Software Development Strategies

Enhancing the capacity to process data from high-energy physics experiments requires developing new strategies in software programming. Developing algorithms to be scalable and utilizing versatile resources such as GPUs is a priority. Additionally, software must be flexible enough to keep pace with potential technological shifts in the coming years. By creating a framework that can easily adapt to future infrastructures, scientists will be able to optimize analytical performance and extract deeper insights from data.

Innovation in Programming Applications

Innovation in programming applications requires creative flexibility to handle big data from high-energy physics research. Various applications such as “Parallel Kalman Filter” provide new models for analyzing spatial and particle data, helping to accelerate and improve performance. By measuring optimal performance, researchers can predict how these new techniques will fit with the forthcoming massive computing systems. Continuous innovation is what will enable experiments to cope with the enormous volume of data and maximize the benefit of meticulous examination of tiny particles.

Future Directions and Remaining Challenges

Despite significant advancements in computing strategies for high-energy physics, there remain challenges that require ongoing attention. The evolving requirements for fragmentation and storage may demand more unconventional solutions to ensure that big data can be processed efficiently and securely. These challenges extend from the need to create new processing and storage models to achieving rapid communication between systems. Maintaining a substantial portion of knowledge and technological development is key to ensuring a promising future in scientific research. The success of these challenges lies in the ability to absorb rapid technological shifts and employ them creatively to enhance the capabilities of modern experiments in this sensitive field.

Challenges in Data Processing in Particle Physics Experiments

Data processing from experiments in the field of high-energy physics (HEP) is a complex operation that involves several successive steps. Starting from data collection, then reconstructing raw data into higher-level information, leading to final data analysis and statistical interpretations. The primary difficulty in this process lies in the volume of data that must be processed and the complexity of the computational operations required to understand events. For example, in the proton collision experiment at the Large Hadron Collider (LHC), analyzing each event requires extracting precise information about the paths of charged particles. This involves using advanced algorithms such as the Kalman filter algorithm, where the location and speed of particles are inferred based on the available information.

Challenges

technology leads to an urgent need for developing interactive and efficient algorithms that can be implemented on various computational platforms. These challenges are not limited to performance only, but also include the convenience of using programming languages, which requires developers to rewrite codes to enable compatibility with multiple platforms. This poses an additional burden on data scientists who face time pressures in analyzing the vast amounts of data generated from experiments.

Performance Improvement in HEP Algorithms

The main goal of the efforts to improve the performance of HEP algorithms is to achieve maximum efficiency during data processing. This is achieved by accelerating computational operations through techniques such as parallel processing and using specialized processors like Graphics Processing Units (GPUs). The MKFIT program is a successful example of optimization, designed to rewrite traditional particle tracking algorithms with the aim of significantly speeding up performance. According to experiments, MKFIT can achieve improvements of up to six times compared to previous implementations in particle tracking.

MKFIT utilizes techniques such as sequential data storage to improve access during computational processes, enabling SIMD (Single Instruction, Multiple Data) operations to be executed quickly and efficiently. The focus is on leveraging data structures that allow for complex calculations to be performed faster. These enhancements are not only limited to algorithm performance but also concern reducing the time taken to analyze each event, allowing scientists to obtain results in a timely manner.

Code Translation Tools and Portable Solutions

Code translation tools and portable solutions are modern necessities in scientific programming, especially regarding the development of HEP algorithms. In the past, programs were written in specialized languages, such as CUDA, making them confined to specific platforms. However, creating code that enhances compatibility with multiple platforms has become a vital reality today. Numerous solutions such as HIP, Kokkos, and SYCL have been developed, all aimed at facilitating the process of transferring code across different environments.

One of the primary goals in developing these tools is the ability to write a single source code that can be compiled and executed on multiple platforms. This not only saves effort and time in maintenance but also enhances the capability to share knowledge and algorithms between different research teams. For example, integrating tools such as OpenMP and OpenACC enhances the ability to manage parallel behaviors and allocate memory efficiently. It is worth noting that all these solutions rely on open standards, but they require specific software packages to run on certain GPU units.

Performance Experiments and Evaluation

Performance evaluation in high-performance software is an integral part of the software development process. This requires conducting rigorous tests to measure how efficiently programs execute their functions under various conditions. In the context of this work, independent test algorithms were used to evaluate the effectiveness of different solutions in code transfer. This included measuring the performance of traditional algorithms against reverse solutions using various tools and providing a comprehensive user experience.

Performance results represent vital aspects of achieving a comprehensive understanding of the competitiveness of software. By measuring quantitative performance values and user experience, the extracted data provides insights into how to improve solutions. Additionally, these experiments contribute to identifying aspects that need continuous improvement, whether in terms of computational performance or ease of programming. These experiments conclude with numerous lessons learned on how to enhance software performance and portable solutions in the field of HEP.

Particle Tracks and Experimental Measurements

Particle physics studies deem it vital to understand the paths taken by charged particles in the presence of a magnetic field. The path is defined by identifying the points or “hits” registered during specific experiments. These hits can be multidimensional, such as positions and angles. Consequently, operations need to test numerous sets of these hits to determine a coherent set representing the expected path of the particle in the magnetic field. Pathfinding algorithms and data reduction are among the core techniques used in this regard.

The process requires a complex operation of testing multiple methods to aggregate the hits. This can be achieved, for instance, through the Kalman filter, which is a mathematical model used to estimate uncertain states from a set of measurements. The Kalman algorithm relies on a prediction step, where the path conditions are determined based on a known prior set, followed by an update step that considers the new hits. By improving accuracy in determining positions, researchers can obtain more precise results regarding the initial interactions of particles.

Structural Design of Detectors

Particle detectors in major experiments, such as CMS and ATLAS, are divided into two main sections: the “barrel” which represents the cylindrical part parallel to the beam tube, and the “endcap” which extends on both ends of the barrel. This design is utilized to enhance the system’s ability to effectively detect charged particles. The motion measurements of charged particles in a uniform magnetic field are based on a helical path, allowing researchers to accurately calculate their positions.

Each layer of the detectors possesses different characteristics, allowing for precise information capture regarding motion and systematic study of impact points. This level of organization facilitates effective information exchange among various system elements and aids in estimating particle motion over the flow of data.

Analysis and Computational Steps

Depending on the input data, the initial step in the pathfinding algorithm is to create “track seeds,” which represent a series of preliminary guesses about the state. These seeds are typically built using a set of hits taken from the inner layers of the detector. The processing involves intensive computations requiring number processing, including leveraging mathematical properties such as trigonometric functions and matrix operations.

The computational processes aim to optimize the overall performance of the system. These operations are formed within a data model known as “Compact Array Structure of the Hit Numbers,” which allows for accelerated and easier data access, thereby enhancing the effectiveness of data processing operations. If executed correctly, the execution time can be reduced and productivity can be doubled.

Implementation of Algorithms and Software Tools

Research into tools and implementation mechanisms reflects the importance of innovation in data processing. A range of analysis tools has been tested on multiple systems, but it has not been possible to test all conceivable scenarios. Most of the methods used rely on advanced software libraries, such as multithreaded libraries that offer improved performance when handling large data sets.

Programming libraries like TBB and CUDA help enhance the effectiveness of implementing pathfinding algorithms by organizing threads and managing tasks to achieve the highest levels of efficiency. These libraries have been utilized in CMS experiments to streamline large complex applications, contributing to accelerating mathematical computations and enhancing the system’s capability to handle data more effectively and accurately.

Performance Improvement through Analysis and Evaluation

The process of analysis and evaluation is a critical task to ensure accuracy and speed of performance in the particle detection system. These operations require continuous monitoring of measurements and results produced from various models. When enhancements are implemented appropriately, high performance can be achieved through a blend of traditional and modern programming techniques.

The challenge

The core lies in achieving a balance between model complexity and processing speed. Through practical tests and multiple criteria, it can be ensured that the techniques used operate efficiently across various platforms. The appropriate selection of software tools and optimization of structures distinguish each application individually, assisting researchers and experts in accurately achieving their goals.

Memory Access Efficiency and Shared Memory Usage

Parallel programming for different devices, such as Graphics Processing Units (GPUs), is characterized by challenges related to memory access efficiency. By analyzing how data is processed, it was determined that retaining intermediate results in local registers can provide more efficient memory access. Optimal memory access is a key factor in performance improvement, as inefficient pathways lead to excessive momentum in execution time. Through the use of shared memory, it is presumed that everyone in the processing block can access it for the purpose of enhancing productivity; however, studies have shown that this option may have the opposite effect in some cases.

In the context of various technologies used in parallel programming, CUDA serves as an example of how to leverage certain features in NVIDIA devices to enhance performance. Although CUDA provides the ability to exploit specific features of NVIDIA hardware, developers must contend with portability barriers when using this platform. This means that code written using CUDA may require significant modifications to function correctly on different hardware.

In parallel with these challenges, AMD introduced the HIP programming model, which focuses on portability between NVIDIA and AMD devices. This model demonstrates how to mitigate portability barriers through a design comparable to CUDA, making the process of writing and translating code between base frameworks easier. This shift highlights the increasing importance of compatibility across multiple systems and the ability to exploit the specific features of each architecture.

Directive-Based Models: OpenMP and OpenACC

High-level directive-based models such as OpenMP and OpenACC are powerful tools for developers seeking to convert sequential applications written in C, C++, or Fortran into parallel versions. These models operate based on the use of specific directives that the code compiler can understand, allowing developers to specify application characteristics such as available parallelism and data sharing rules.

The main advantages of directive-based programming models lie in the ability to gradually transform existing applications into parallel versions without the need for radical changes in the current software architecture. This feature facilitates adaptation to the fast computing needs of multiple platforms. For example, the initial version of OpenMP was designed by transforming the reference CPU implementation of TBB into OpenMP implementation, demonstrating the smooth transitions possible between languages.

Strategies related to parallelism and directives vary between versions targeting different devices. For instance, when targeting a GPU, important OpenMP directives become essential for performance close to the patents used in processing, while in CPU systems, these directives may become unnecessary. Transformations between OpenMP and OpenACC can involve some challenges due to differences in mapping strategies and the variety of configurations supported by different compilers, which can also impact final performance.

Transitioning to Programming Libraries like Alpaka and Kokkos

Libraries such as Alpaka and Kokkos offer new pathways to facilitate parallel application programming. Alpaka is a library that relies on the “single source” concept and has a similar orientation to CUDA at the API level. Alpaka makes it easier for programmers to achieve portability by adding an abstraction layer between applications and device-based programming models, allowing code to be written more effectively across different hardware.

Through…

On another note, Kokkos stands out as a library that relies on meta-programming templates, allowing for the creation of device-independent code. Kokkos presents a set of concepts and principles that require developers to express algorithms in a general way before they are automatically mapped to the specific processing device. This separates the concept of writing device-dependent code from adhering to modern C++ standards.

Kokkos and Alpaka can be excellent choices for people in scientific research, due to the solutions they offer that achieve performance close to native performance when optimizations are executed correctly. Applications used in physical science experiments, such as CMS experiments, utilize Alpaka as a reliable solution to support transport in GPU usage, facilitating the integration of performance and software efficiency.

Standard Parallelism Using stdpar in C++

The C++ programming language is the preferred choice for many high-performance scientific applications. Recent updates to the ISO C++ standard have introduced a set of algorithms capable of running on multiple devices. The use of stdpar, introduced in the C++17 standard, provides a balance between code productivity and computational efficiency.

stdpar allows developers to specify the expected parallelism influence in new algorithms. The general-purpose algorithms present in the Standard Template Library (STL) are designed to be efficient across current and multi-threaded architectures, simplifying the use of these techniques in commercial applications. This update is indicative of the trend towards performance improvement by enabling the parallel execution of algorithms.

Execution policies in C++17, such as std::execution::par and std::execution::par_unseq, offer advanced execution patterns. These policies provide developers with flexibility in writing code that is more compatible with modern hardware. However, transferring algorithmic programs between different systems requires efficiency and alignment of elements to avoid errors due to unmanaged allocations.

Memory Management in GPU and CPU Programming

Memory management is a critical issue in programming multi-processor systems, where developing code that blends CPU and GPU usages requires careful strategies to optimize performance and avoid memory violations. Using pointers to the CPU stack or general objects in GPU code can lead to memory breaches, highlighting the importance of careful memory management in the programming process. The methodology for combining the use of nvc++ in multi-processing is based on a strict approach to memory handling, reflecting the importance of careful allocation and referencing. Although developing code in these applications is quite similar to standard C++ programming, the main difference lies in the considerations mentioned earlier.

This requires developers to have a comprehensive understanding of how to allocate and manage memory effectively. A well-known example of problems that can arise when pointers are misused is referencing a freed memory location, leading to unpredictable behavior. Therefore, developers should adopt precise programming techniques and possess sufficient knowledge of the complex mechanisms followed by GPU processors to implement operations.

SYCL: A Multiplatform Abstraction Layer

SYCL is a standard developed to facilitate writing code that is executable across multiple processors in a “single-source” manner, using standard C++. The Khronos Group promotes this standard; however, the primary focus on performance enhancement is currently being driven by Intel. One of the main advantages of SYCL is its ability to handle regular C++ CPU code alongside a dedicated GPU section in the same source file, significantly simplifying the development process.

Using SYCL provides a unique approach to integrated programming, where developers can utilize various C++ libraries in SYCL applications. This reflects the flexibility of SYCL as a distinctive programming tool. There is a strong emphasis on the application performance portability across a wide range of hardware architectures, allowing for application performance optimization without being tied to a specific architecture or core language.

One

The issues that may arise when using SYCL are how to manage memory effectively. The Unified Shared Memory feature is used to manage data, which requires developers to be aware of how to work efficiently with memory management to avoid latency in data transfer between CPU and GPU. The success of applications using SYCL largely depends on the balance between performance and resource management.

Performance Results of Software on Various Systems

In the field of simulating physical event responses, the ability to process a large number of paths per second is the primary performance metric. Performance was measured by executing about 800,000 paths on a single core, where many software systems were developed for data processing and performance evaluation. Performance was tested on multiple systems including NVIDIA and AMD graphics processing units and Intel central processing units.

The results show that most different solutions for performance compatibility achieved results close to the original performance. Some SYCL-based versions were the lowest performing, where in-depth analyses showed a significant branch when using SYCL, indicating challenges that may require a deeper understanding of the effects on performance. Data performance is often superior due to the integration of memory management and how it is handled effectively.

The impacts of using various compilers were studied, as performance results showed significant differences between them, highlighting that the correct choice of tools can have a significant impact on final results and execution time. Here, it is recognized that results can be affected by external factors such as hardware architectural settings and scalability, which requires significant attention from developers.

Parallel Performance in the OpenMP Version

The performance of OpenMP applications like p2z mainly relies on how memory is managed and parallel execution occurs. In the versions compiled using llvm/gcc/IBM, performance faced significant challenges related to inadequate memory management compared to OpenACC versions. For instance, the version compiled using OpenARC manages temporary data more efficiently by allocating space in CUDA shared memory. In contrast, versions compiled using llvm/gcc/IBM used global memory more extensively, which added more loading due to repeated memory accesses. These accesses or transitions between memory types represented a significant portion of execution time, contributing to the lower performance of these versions.

To illustrate, when compared using the V100 GPU, performance measurements across the versions showed differing results. In OpenACC versions, OpenARC achieved better performance in data transfer compared to the versions that used nvc++. The difference lies in how each compiler handles data transfer calls. For example, while OpenARC converts each element in the data transfer list into a single transfer call, nvc++ splits the data transfer into small multiple calls, resulting in poorer performance in this case.

Therefore, it can be said that the execution environment and the translation tools play a critical role in optimizing parallel performance. Enhancing performance is a key element, and it requires in-depth consideration of how data is managed across memory and the appropriate compilers.

The Impact of Memory Pinning on Performance

Host memory pinning is considered a crucial concept in optimizing GPU application performance. Memory pinning enables direct memory access (DMA), which provides better bandwidth compared to non-DMA transfers. In the context of the Kokkos and OpenACC version of p2z, applying memory pinning had a significant impact on performance. For example, when memory pinning was activated, the performance in data transfer improved significantly, and the time loading became considerably less.

It was

the comparative application on several versions of OpenACC applications, the results showed that the version using shared memory with synchronized transfers yields lower performance than the version relying on thread-private data in local memory. Therefore, it has become clear that relying on well-defined graphs and exploiting memory pinning capabilities are essential for enhancing performance.

When looking at experiments with other GPUs, such as AMD and Intel, it was identified that good memory pinning practices can significantly boost performance. For example, performance tests with the AMD MI-100 demonstrated a substantial speed improvement due to memory pinning, reflecting how the crossbar balance can enhance various programming environments.

Performance Results of AMD and Intel GPUs

The support for AMD and Intel GPU environments faces challenges compared to NVIDIA, but there has been notable progress over time. Through the initial performance of p2r on both AMD and Intel, a significant difference in performance results was observed. On the AMD GPU, frameworks like Kokkos and Alpaka performed well, even without dedicated efforts to optimize performance on those systems. This indicates the ease of transitioning between frameworks without major changes to the code.

When considering throughput measurements on the Intel A770, results were lower due to it being a GPU not designed for high performance. It was also found that relying on double precision operations leads to performance degradation by up to 30 times, highlighting the need to take advantage of implicit modifications that allow handling lower precision data to make application execution efficient.

The results emphasize the importance of selecting the appropriate environment and specific programming needs, helping developers closely leverage the available tools to improve performance and expand support for various systems. Here, the ongoing progress in supporting other systems and the necessity to engage broader communities to ensure performance improvement over time is evident.

Central Processing Unit (CPU) Performance

On the CPU front, utilizing Intel’s Threading Building Blocks (TBB) library is a key tool for performance enhancement in applications like MKFIT. Results varied, with some versions of applications achieving good performance compared to the original releases, especially when using suitable TBB technology. However, past complexity was introduced when the original version was developed with an outdated compiler that is no longer supported, impacting the final delivery. It is noteworthy that the extended versions of p2z were able to achieve performance equivalent to over 70% of the original performance using TBB.

Optimization efforts require careful planning, especially concerning data structure arrangements and appropriate preferences, to efficiently leverage the trio of multi-processors, hard disks, and memory. The performance of the Alpaka-specific versions was exceptional, demonstrating not only efficiency but also the diversity in the necessity for performance enhancement that can be judged over time.

Efforts in developing performance require iterative review and comprehensive analysis to identify the best approach in addressing challenges while the system operates on advanced algorithms. These experiences provide valuable insights into how libraries interact with the internal architecture of the system in the discussed contexts.

Rediscovering Performance in Computing

Rediscovering performance in computing is one of the critical topics that must be addressed with a deep understanding of the challenges associated with different processor types and parallel computing systems. With the growing use of GPU processors, especially of the NVIDIA type, it has become evident that the steps taken to improve performance may yield varying results when applied to other types of processors like AMD and Intel. This necessitates the development of new strategies to ensure that performance improvements on a certain architecture can translate into gains on another architecture.

Throughout

This process has identified multiple factors that significantly impact the final performance, such as memory layout and explicit memory prepinning. For example, it has been observed that optimizing memory layout can lead to execution speed increases of up to six times in some applications. Performance improvements in high-energy computing environments depend on several variables, including the choice of the appropriate compiler, as the active evolution of these tools means that keeping performance updated with newer releases becomes crucial. It seems that using libraries like Kokkos helps achieve noticeable performance increases, as was observed when updating the Intel-specific GPU library resulted in doubling performance.

All these factors emphasize the importance of rethinking traditional approaches and developing new solutions for portability across different systems. Regarding performance, testing specific libraries and applications on several different architectures is essential for achieving optimal performance. Some programming libraries like Alpaka, Kokkos, and SYCL offer portable solutions, but many of them require significant optimization to achieve good performance. Thus, the ability to execute algorithms on different processors will allow important physical experiments like HEP to benefit from diverse computing resources.

The Importance of Developing Portable Transport Solutions

The ability to run algorithms on processors from different vendors is one of the core elements of success in physics experiments like HEP. These technologies provide a significant advantage in exploiting available computing resources, including those present in existing and future centers. Providing robust computing systems means we can better handle data analysis that requires processing a vast amount of data. Many HEP applications need continuous performance optimization to respond to the increasing challenges posed by complex big data.

Moreover, developing tools and software that allow reusing existing algorithms on new platforms requires ongoing effort. The transition from central processing units (CPUs) to graphical processing units (GPUs) is not straightforward, and equal performance is not expected without fine-tuning. Therefore, periodic testing should be conducted to ensure that all software updates improve performance, not the opposite. Also, improvements in libraries like OpenMP and OpenACC open new horizons for researchers, but technical knowledge is still required to enhance how these libraries are applied to build portable solutions.

Furthermore, the speed of data processing requires periodic re-evaluation of programming methods and ensuring that planned solutions can adapt to changes in infrastructure and technological innovations. Efficiency in complex computational processes will only be achieved when developers and researchers collaborate in software optimization and knowledge sharing regarding the most effective methods.

Future Challenges in Software Development for High-Energy Physics

While significant progress has been made in the area of algorithm portability, there are still many challenges that the scientific community must face to ensure the continuity of these efforts. Researchers today face the barrier of complexities associated with multiprocessor systems and constantly changing technology. Increased processing capacity also necessitates dealing with new potential issues, such as compatibility and reliability between different libraries and performance discrepancies across diverse computing systems.

By focusing on the continuous development of tools and software, alternative and innovative methods must be explored to sustainably enhance performance. Researching deep learning and artificial intelligence tools may provide useful insights. Employing new techniques in data analysis can contribute to improving responsiveness to the needs of ambitious projects like those associated with HEP, necessitating flexibility in the strategies employed.

Collaboration
International collaboration is also crucial, as different research centers can share knowledge and technologies, contributing to the development of cost-effective and efficient solutions. Therefore, it is essential to consider the regulation and organization of software to be available and usable across different platforms worldwide. Efforts to improve the methods used will have a significant impact on the quality of research in the future.

Source link: https://www.frontiersin.org/journals/big-data/articles/10.3389/fdata.2024.1485344/full

Artificial intelligence was used ezycontent

.lwrp .lwrp-list-item .lwrp-list-link .lwrp-list-link-title-text,
.lwrp .lwrp-list-item .lwrp-list-no-posts-message{

}@media screen and (max-width: 480px) {
.lwrp.link-whisper-related-posts{

}
.lwrp .lwrp-title{

}.lwrp .lwrp-description{

}
.lwrp .lwrp-list-multi-container{
flex-direction: column;
}
.lwrp .lwrp-list-multi-container ul.lwrp-list{
margin-top: 0px;
margin-bottom: 0px;
padding-top: 0px;
padding-bottom: 0px;
}
.lwrp .lwrp-list-double,
.lwrp .lwrp-list-triple{
width: 100%;
}
.lwrp .lwrp-list-row-container{
justify-content: initial;
flex-direction: column;
}
.lwrp .lwrp-list-row-container .lwrp-list-item{

“`css
width: 100%;
}
.lwrp .lwrp-list-item:not(.lwrp-no-posts-message-item){

}
.lwrp .lwrp-list-item .lwrp-list-link .lwrp-list-link-title-text,
.lwrp .lwrp-list-item .lwrp-list-no-posts-message{

};
}