Vector Fabrics Vector Fabrics Blog RSS

How Pareon reduces cost and cuts risks

Many silicon vendors rely on multicore architectures to improve performance. The same vendors have failed to deliver compilation tools that are effective in the hands of the vast majority of software developers. The few tools that are available require both a good understanding of the application and a deep understanding of the target platform. As a result few engineers can exploit multicore architectures to their full potential, raising the bar for many companies to benefit from multicore silicon.

 

Figure 1: Current multicore programming practice

Current practice

In many cases the design of an embedded system starts with a software collection that has not yet been partitioned to match the multicore structure of the target hardware. As a result, the software does not meet its performance requirements and hardware resources are left idle. To resolve this, an expert (or a team of experts) comes in to change the software so that it fits the target multicore structure. Here is what these experts do.

     
  1. Analyze the application. Find the bottlenecks.
  2.  
  3. Partition the software over the available cores. This requires a good understanding of data access patterns and data communication inside the application to match this with the cache architecture and available bandwidth of buses and channels on the target platform. Optimize some of the software kernels for the target instruction set (e.g. Intel SSE, ARM Neon, ...).
  4.  
  5. Identify the loops that can be parallelized. This requires a good understanding of the application: find the data dependencies, find the anti- and output dependencies, find the shared variables. The dependencies can be hidden very deeply, and to find them often requires complex pointer analysis.
  6.  
  7. Predict the speedup. Predict the overhead of synchronizing, the cost of creating and joining threads. Predict the impact of additional cache overhead introduced by distributing workload over multiple CPUs. If parallelizing a loop still seems worth it, go to the next step.
  8.  
  9. Change the software to introduce semaphores, fifos and other communication and synchronization means. Add thread calls to create and join threads. This requires a good understanding of the API’s available on the target platform. In this stage subtle bugs are often introduced, related to data races, deadlock or livelock that may only manifest themselves much later, e.g. after the product has been shipped to the customer.
  10.  
  11. Test. Does it seem to function correctly? Measure. Does the system achieve the required performance level? If not: observe and probe the system. Tooling exists to observe the system; The experts need to interpret these low-level observations in the context of their expert system knowledge, then draw conclusions.
  12.  
  13. Try again to improve performance or handle data races and deadlocks. This involves repeating the above from Step 2 or 3.

Figure 1 visualizes this process. Clearly there are many problems with this design flow. Experts that can successfully complete this flow are a rare breed. Even if you can find them, at the start of a project it is hard to predict how many improvement and bug fix iterations the experts need to go through until the system stabilizes. Therefore product development lead times are uncertain and project costs are hard to control.

The Pareon solution

Multicore platforms are quickly becoming a very attractive option in terms of their cost-performance ratio. But they also become more complex every year, making it harder for developers to benefit from this technology. Pareon enables software developers to realize the highest possible performance on their multicore as shown in Figure 2. Pareon analyzes the program and then focuses the programmers attention on the hot spots, finds the loops that can be parallelized and shows the impact of loop parallelization on the overall system cost and performance. This information is visualized and presented to the system programmer through an intuitive graphical user interface. With all information immediately available, the programmer can select his preferred optimization and parallelization strategy and then receive detailed instructions from Pareon on how to implement that strategy with the confidence that it will not introduce data races or deadlocks, and with prior knowledge of the costs and performance of the final result.

 

Figure 2: The Pareon flow

In addition to supporting the programmer, Pareon also enables the manager of a multicore project to estimate in an early stage how much time and engineering resources will be needed to optimize an application on a particular multicore target. This greatly reduces the risk of multicore projects and results in predictable product delivery lead times.

Posted in category: Market & Skills on Tuesday, September 4, 2012 - 11:30

Comments

Add comment

(required)
(required, will not be published)
(will not be published)
(will not be published)
Notify me of follow-up comments?

Please enter the word you see in the image below: