Pareon allows you to parallelize your code using a simple, visual, point-and-click interface. After building and analyzing your program, Pareon’s graphical user interface allows you to gain insight into your program’s behavior, and shows you where and how to parallelize.
Pareon shows performance or memory bottlenecks, and lets you investigate where it pays off to add parallelism.
Multicore support for multiple platforms
Pareon supports several multicore architectures. Select your target hardware architecture, and Pareon runs your application on this model.
This allows Pareon to understand your application’s behavior in detail. Performance bottlenecks, cache hits/misses, memory hierarchy, and busses all are modeled to estimate application performance and parallelization speedup.
Mobile, tablets, desktops, embedded
Many systems have multicore processors inside these days, since it’s the only way to increase performance. Pareon supports optimizing any code that is written in C/C++ for ARM Cortex-A or x86 architectures. This means Pareon supports optimization of code for smartphones, tablets, laptops, desktops or other devices or systems that use these multicore architectures.
C/C++ and binary libraries
Pareon can analyze and optimize applications written in C or C++. This includes applications that are partially written in assembly, in other languages than C/C++, or are make calls into binary libraries. Pareon can handle non-C/C++ code, but only parallelizes those sections that are written in C/C++.
Build & analyze
Pareon can be called in the same way you call your standard compiler, after which the program can be built and linked. After running the executable, Pareon’s GUI is started, which takes you through three parallelization steps of gaining insight, investigating where it’s best and easiest to parallelize, and giving you step-by-step instructions on how to implement the speedup.
Instead of just analyzing the source code, Pareon compiles the application into its own internal representation and runs its own model of the target architecture. This means the analysis engine has full visibility into everything that is going on inside your program, and provides the detailed analysis that’s required to ensure parallelization is performed correctly and actually increases application performance.
Fully optimizing software requires a deep understanding of the underlying processor hardware architecture. Vector Fabrics works closely with the processor vendors to develop a model for their multicore hardware. Pareon runs your application on this model, allowing it to provide key insight into the effects of for example cache hits and misses, bus bandwidth and memory access times, since these effects influence the parallelization strategy and application speedup.
Pthreads and Win32 threads
Pareon uses the vfTasks library to add threading to your application. This tasking library uses the Pthreads or Win32 threading libraries underneath, covering a wealth of platforms. The additional VfTasks layer simplifies adding threading to your application compared to making calls to the Pthreads or Win32 threading libraries directly.
vfTasks tasking library
vfTasks is a C library to ease the implementation of concurrent tasks. It abstracts the Pthreads or Win32 thread libraries and simplifies the creation of a thread pool to efficiently create and reuse tasks, includes highly efficient inter-task synchronization, and streaming interfaces to stream data from one task to another. The vfTasks library is provided under a BSD license with the added restriction that the software may not be redistributed as a competing product.
Profiling identifies hotspots
Pareon performs an elaborate profiling step, keeping track of where the compute cycles go, counting cache hits/misses in each code section, memory accesses and many program internals. Profiling shows where the hotspots in your code are and provides the basis to explore and find your best parallelization and implementation strategy.
The profile view shows a hierarchy of all invocations in your program, including details about the dependencies and communication patterns that occur. The application code is cross-linked with the profile view and the dependencies, making it easy to browse around.
Pareon performs code coverage analysis to ensure your test set is complete. Since the parallelization decisions are taken based on the execution profile of the application, it’s important that code coverage of the application is high. Code coverage ensure the test is complete, and we’re ready to start parallelizing.
The user can query Pareon whether a selected loop can be parallelized. If the loop contains dependencies restricting partitioning, the tool will detect them. Otherwise, the user gets a list of options for loop parallelization, allowing him to select from possible parallelization options, and choose the number of threads. The tool immediately shows the speedup, taking into account time spent spawning and synchronizing tasks, and other parallelization side effects such a memory contention.
Pareon immediately shows the impact of the parallelization on program performance, taking into account the multicore processor architecture, memory and cache bottlenecks, synchronization and communication and even thread creation overhead. The result is that you can see the impact on application performance, preventing you from wasting time on parallelizing code that wouldn't speed-up your program.
The schedule view shows how much time is spent on creating threads, how much is spent on cleaning up threads, and when, how, and why the threads are waiting on each other. The dependencies between the threads are shown and fully cross-referenced to the source code, allowing you to immediately see what code construct limits further speedup.
Pareon keeps track of where you added parallelism. Once your strategy is complete, and you’ve reached the desired speedup, Pareon presents you with parallelization refactoring steps: detailed step-by-step instructions that show you how to refactor your code to implement the parallelism.