Topic: tunable pattern-based parallelization

[title: confidential]

This paper implements a tool that can detect parallelizable codes and transform them into parallel codes in sequential software. At first the paper proves that their work is feasible and beneficial by conducting pilot experiments on 31 Java programs with about 300,000 lines of code. It shows that every 60 lines of code has a parallelizable location and by using Java’s parallel data type Future the average speedup is 1.8 (The maximum speedup gain of Future is limited to 2.0).

Since Future doesn’t have good scalability on multicores. So the real implementation of tool would transform sequential codes into two common parallel patterns: Master/Worker and Pipeline. And tunable parameters are added to the description of these two parallel patterns. The tunable parameter would decide how to combine or divide each worker or stage and how many threads would be assigned to each work or stage. All tunable information is stored in configuration file. Basically the tool consists four steps. 1) Create code-profiling graph that describe data dependency and control flow. 2) Statically and dynamically analyze the graph and identify the subgraphs that meet conditions for Master/Worker or Pipeline. 3) Based on step 2, construct according tunable parallel architecture. 4) Code transform.

The evaluations show two conclusions. First, on average 99.98% parallelizable locations are detected and 65.56% of them can result in speedup. Second, the tool can even outperform skilled and experienced engineer’s manually code refactoring by 4%. Also different tunable parameters would lead to different performances, which demonstrate the value of tunable parameters in performance optimization on different target platform. The paper advances its previous work mainly in 1) automatic detection and transform. 2) Higher detection accuracy.

For related work, some researches try to detect and exchange parallelizable code to its parallel counterpart. But the alternative parallel implementation has to exist, so this method cannot deal with code refactoring. Some researches only focus on loop and pipeline parallelism with assumption that most runtime is spent in program loops. But this paper covers different form of parallelism. Another important aspect is that they detect parameters that influence the runtime behavior to enable the parallel code to adapt to current and future multicore hardware. At last some commercial tool try to get the region of highest runtime share by sampling or instrumentation. These regions might bear the highest parallel potential. But these tools neither answer if a parallelization is possible concerning the data or control dependencies, nor do they help in identifying how to parallelize this location.

For auto code transform, they use the Tunable Architecture Description Language (TADL) to represent detected tunable parallel patterns, which is also a source-to-source compiler finally executes the transformation on the base of the description. Related technique includes parallel language (or extension) and parallel compiler. The parallel language still cannot tell where and how to achieve parallelization. The parallel compiler’s parallel potential is bound to parallel loops with a fixed number of iterations and no dependencies.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: