Abstract
We present a dataflow model for modelling parallel Unix shell pipelines. To accurately capture the semantics of complex Unix pipelines, the dataflow model is order-aware, i.e., the order in which a node in the dataflow graph consumes inputs from different edges plays a central role in the semantics of the computation and therefore in the resulting parallelization. We use this model to capture the semantics of transformations that exploit data parallelism available in Unix shell computations and prove their correctness. We additionally formalize the translations from the Unix shell to the dataflow model and from the dataflow model back to a parallel shell script. We implement our model and transformations as the compiler and optimization passes of a system parallelizing shell pipelines, and use it to evaluate the speedup achieved on 47 pipelines.
Supplemental Material
- 2014. A Catalog of Stream Processing Optimizations. ACM Comput. Surv., 46, 4 (2014), Article 46, March, 34 pages. issn:0360-0300 https://doi.org/10.1145/2528412 Google ScholarDigital Library
- Arvind Arasu, Shivnath Babu, and Jennifer Widom. 2006. The CQL Continuous Query Language: Semantic Foundations and Query Execution. The VLDB Journal, 15, 2 (2006), June, 121–142. issn:1066-8888 https://doi.org/10.1007/s00778-004-0147-z Google ScholarDigital Library
- K. Arvind, E. David Culler, Robert Iannucci, Vinod Kathail, Keshav Pingali, and Robert Thomas. 1984. The tagged token dataflow architecture. Technical report, MIT Laboratory for Computer Science.Google Scholar
- K. Arvind and Rishiyur S. Nikhil. 1990. Executing a Program on the MIT Tagged-Token Dataflow Architecture. IEEE Trans. Comput., 39, 3 (1990), March, 300–318. issn:0018-9340 https://doi.org/10.1109/12.48862 Google ScholarDigital Library
- Amnon Barak and Oren La’adan. 1998. The MOSIX Multicomputer Operating System for High Performance Cluster Computing. Future Gener. Comput. Syst., 13, 4–5 (1998), March, 361–372. issn:0167-739X https://doi.org/10.1016/S0167-739X(97)00037-X Google ScholarDigital Library
- Jon Bentley. 1985. Programming Pearls: A Spelling Checker. Commun. ACM, 28, 5 (1985), May, 456–462. issn:0001-0782 https://doi.org/10.1145/3532.315102 Google ScholarDigital Library
- Jon Bentley, Don Knuth, and Doug McIlroy. 1986. Programming Pearls: A Literate Program. Commun. ACM, 29, 6 (1986), June, 471–483. issn:0001-0782 https://doi.org/10.1145/5948.315654 Google ScholarDigital Library
- Gérard Berry and Georges Gonthier. 1992. The ESTEREL Synchronous Programming Language: Design, Semantics, Implementation. Sci. Comput. Program., 19, 2 (1992), Nov., 87–152. issn:0167-6423 https://doi.org/10.1016/0167-6423(92)90005-V Google ScholarDigital Library
- Pawan Bhandari. 2020. Solutions to unixgame.io. https://git.io/Jf2dn Accessed: 2020-04-14.Google Scholar
- Timothy Bourke and Marc Pouzet. 2013. ZéLus: A Synchronous Language with ODEs. In Proceedings of the 16th International Conference on Hybrid Systems: Computation and Control (HSCC ’13). Association for Computing Machinery, New York, NY, USA. 113–118. isbn:9781450315678 https://doi.org/10.1145/2461328.2461348 Google ScholarDigital Library
- Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM, 51, 1 (2008), Jan., 107–113. issn:0001-0782 https://doi.org/10.1145/1327452.1327492 Google ScholarDigital Library
- Jack B. Dennis. 1974. First Version of a Data Flow Procedure Language. In Programming Symposium, B. Robinet (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg. 362–376. isbn:978-3-540-37819-8 https://doi.org/10.1007/3-540-06859-7_145 Google ScholarCross Ref
- Tom Duff. 1990. Rc—a Shell for Plan 9 and UNIX. 283–296. isbn:0030475295Google Scholar
- Jeff Epstein, Andrew P. Black, and Simon Peyton-Jones. 2011. Towards Haskell in the Cloud. In Proceedings of the 4th ACM Symposium on Haskell (Haskell ’11). ACM, New York, NY, USA. 118–129. isbn:978-1-4503-0860-1 https://doi.org/10.1145/2034675.2034690 Google ScholarDigital Library
- Azadeh Farzan and Victor Nicolet. 2017. Synthesis of Divide and Conquer Parallelism for Loops. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). Association for Computing Machinery, New York, NY, USA. 540–555. isbn:9781450349888 https://doi.org/10.1145/3062341.3062355 Google ScholarDigital Library
- Azadeh Farzan and Victor Nicolet. 2019. Modular Divide-and-Conquer Parallelization of Nested Loops. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 610–624. isbn:9781450367127 https://doi.org/10.1145/3314221.3314612 Google ScholarDigital Library
- Michael I Gordon, William Thies, and Saman Amarasinghe. 2006. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs. ACM SIGPLAN Notices, 41, 11 (2006), 151–162.Google ScholarDigital Library
- Michael Greenberg. 2018. The POSIX shell is an interactive DSL for concurrency.Google Scholar
- Michael Greenberg and Austin J. Blatt. 2020. Executable Formal Semantics for the POSIX Shell: Smoosh: the Symbolic, Mechanized, Observable, Operational Shell. Proc. ACM Program. Lang., 4, POPL (2020), Article 43, Jan., 31 pages. https://doi.org/10.1145/3371111 Google ScholarDigital Library
- Michael Greenberg, Konstantinos Kallas, and Nikos Vasilakis. 2021. The Future of the Shell: Unix and Beyond. In Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS ’21). Association for Computing Machinery, New York, NY, USA. 240–241. isbn:9781450384384 https://doi.org/10.1145/3458336.3465296 Google ScholarDigital Library
- Michael Greenberg, Konstantinos Kallas, and Nikos Vasilakis. 2021. Unix Shell Programming: The Next 50 Years. In Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS ’21). Association for Computing Machinery, New York, NY, USA. 104–111. isbn:9781450384384 https://doi.org/10.1145/3458336.3465294 Google ScholarDigital Library
- Shivam Handa, Konstantinos Kallas, Nikos Vasilakis, and Martin Rinard. 2021. An Order-aware Dataflow Model for Parallel Unix Pipelines (Artifact). https://doi.org/10.5281/zenodo.4776838 Google ScholarCross Ref
- Michael Hart. 1971. Project Gutenberg. https://www.gutenberg.org/Google Scholar
- Dan Jurafsky. 2017. Unix for Poets. https://web.stanford.edu/class/cs124/lec/124-2018-UnixForPoets.pdfGoogle Scholar
- Gilles Kahn. 1974. The Semantics of a Simple Language for Parallel Programming. Information Processing, 74 (1974), 471–475.Google Scholar
- Gilles Kahn and David B. MacQueen. 1977. Coroutines and Networks of Parallel Processes. Information Processing, 77 (1977), 993–998.Google Scholar
- Konstantinos Kallas, Filip Niksic, Caleb Stanford, and Rajeev Alur. 2020. DiffStream: Differential Output Testing for Stream Processing Programs. Proc. ACM Program. Lang., 4, OOPSLA (2020), Article 153, Nov., 29 pages. https://doi.org/10.1145/3428221 Google ScholarDigital Library
- Richard M. Karp and Raymond E. Miller. 1966. Properties of a Model for Parallel Computations: Determinacy, Termination, Queueing. SIAM J. Appl. Math., 14, 6 (1966), 1390–1411. https://doi.org/10.1137/0114108 Google ScholarDigital Library
- Charles Edwin Killian, James W. Anderson, Ryan Braud, Ranjit Jhala, and Amin M. Vahdat. 2007. Mace: Language Support for Building Distributed Systems. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’07). ACM, New York, NY, USA. 179–188. isbn:978-1-59593-633-2 https://doi.org/10.1145/1250734.1250755 Google ScholarDigital Library
- Nokia Bell Labs. 2019. The Unix Game—Solve puzzles using Unix pipes. https://unixgame.io/unix50 Accessed: 2020-03-05.Google Scholar
- Paul Le Guernic, Albert Benveniste, Patricia Bournai, and Thierry Gautier. 1986. Signal–A data flow-oriented language for signal processing. IEEE transactions on acoustics, speech, and signal processing, 34, 2 (1986), 362–374.Google ScholarCross Ref
- Edward Ashford Lee and David G. Messerschmitt. 1987. Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing. IEEE Trans. Comput., 36, 1 (1987), Jan., 24–35. issn:0018-9340 https://doi.org/10.1109/TC.1987.5009446 Google ScholarDigital Library
- Edward A. Lee and David G. Messerschmitt. 1987. Synchronous data flow. Proc. IEEE, 75, 9 (1987), 1235–1245.Google ScholarCross Ref
- Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, and Peter A. Tucker. 2005. Semantics and Evaluation Techniques for Window Aggregates in Data Streams. In Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data (SIGMOD ’05). Association for Computing Machinery, New York, NY, USA. 311–322. isbn:1595930604 https://doi.org/10.1145/1066157.1066193 Google ScholarDigital Library
- Konstantinos Mamouras. 2020. Semantic Foundations for Deterministic Dataflow and Stream Processing. In European Symposium on Programming. 394–427.Google Scholar
- Konstantinos Mamouras, Mukund Raghothaman, Rajeev Alur, Zachary G. Ives, and Sanjeev Khanna. 2017. StreamQRE: Modular Specification and Efficient Evaluation of Quantitative Queries over Streaming Data. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). Association for Computing Machinery, New York, NY, USA. 693–708. isbn:9781450349888 https://doi.org/10.1145/3062341.3062369 Google ScholarDigital Library
- Konstantinos Mamouras, Caleb Stanford, Rajeev Alur, Zachary G. Ives, and Val Tannen. 2019. Data-Trace Types for Distributed Stream Processing Systems. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). ACM, New York, NY, USA. 670–685. isbn:978-1-4503-6712-7 https://doi.org/10.1145/3314221.3314580 Google ScholarDigital Library
- Florence Maraninchi and Yann RéMond. 2001. Argos: An Automaton-Based Synchronous Language. Comput. Lang., 27, 1–3 (2001), April, 61–92. issn:0096-0551 https://doi.org/10.1016/S0096-0551(01)00016-9 Google ScholarDigital Library
- Chris McDonald and Trevor I. Dix. 1988. Support for Graphs of Processes in a Command Interpreter. Softw. Pract. Exper., 18, 10 (1988), Oct., 1011–1016. issn:0038-0644 https://doi.org/10.1002/spe.4380181007 Google ScholarDigital Library
- Malcolm D McIlroy, Elliot N Pinson, and Berkley A Tague. 1978. UNIX Time-Sharing System: Foreword. Bell System Technical Journal, 57, 6 (1978), 1899–1904.Google ScholarCross Ref
- Sape J. Mullender, Guido van Rossum, Andrew S. Tanenbaum, Robbert van Renesse, and Hans van Staveren. 1990. Amoeba: A Distributed Operating System for the 1990s. Computer, 23, 5 (1990), May, 44–53. issn:0018-9162 https://doi.org/10.1109/2.53354 Google ScholarDigital Library
- Derek G. Murray, Frank McSherry, Rebecca Isaacs, Michael Isard, Paul Barham, and Martín Abadi. 2013. Naiad: A Timely Dataflow System. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP ’13). ACM, New York, NY, USA. 439–455. isbn:978-1-4503-2388-8 https://doi.org/10.1145/2517349.2522738 Google ScholarDigital Library
- John K Ousterhout, Andrew R. Cherenson, Fred Douglis, Michael N. Nelson, and Brent B. Welch. 1988. The Sprite network operating system. Computer, 21, 2 (1988), 23–36. http://www.research.ibm.com/people/f/fdouglis/papers/sprite.pdfGoogle ScholarDigital Library
- Rob Pike, Dave Presotto, Ken Thompson, and Howard Trickey. 1990. Plan 9 from Bell Labs. In Proceedings of the summer 1990 UKUUG Conference. 1–9. http://css.csail.mit.edu/6.824/2014/papers/plan9.pdfGoogle Scholar
- Deepti Raghavan, Sadjad Fouladi, Philip Levis, and Matei Zaharia. 2020. POSH: A Data-Aware Shell. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 617–631. isbn:978-1-939133-14-4 https://www.usenix.org/conference/atc20/presentation/raghavanGoogle Scholar
- Christophe Ratel, Nicolas Halbwachs, and Pascal Raymond. 1991. Programming and Verifying Critical Systems by Means of the Synchronous Data-Flow Language LUSTRE. 112–119. isbn:0897914554 https://doi.org/10.1145/125083.123062 Google ScholarDigital Library
- Radu Rugina and Martin Rinard. 1999. Automatic Parallelization of Divide and Conquer Algorithms. In Proceedings of the Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’99). Association for Computing Machinery, New York, NY, USA. 72–83. isbn:1581131003 https://doi.org/10.1145/301104.301111 Google ScholarDigital Library
- Scott Schneider, Martin Hirzel, Bugra Gedik, and Kun-Lung Wu. 2015. Safe Data Parallelism for General Streaming. IEEE Trans. Comput., 64, 2 (2015), Feb., 504–517. issn:0018-9340 https://doi.org/10.1109/TC.2013.221 Google ScholarDigital Library
- Peter Sewell, James J. Leifer, Keith Wansbrough, Francesco Zappa Nardelli, Mair Allen-Williams, Pierre Habouzit, and Viktor Vafeiadis. 2005. Acute: High-level Programming Language Design for Distributed Computation. In Proceedings of the Tenth ACM SIGPLAN International Conference on Functional Programming (ICFP ’05). ACM, New York, NY, USA. 15–26. isbn:1-59593-064-7 https://doi.org/10.1145/1086365.1086370 Google ScholarDigital Library
- Calvin Smith and Aws Albarghouthi. 2016. MapReduce Program Synthesis. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). Association for Computing Machinery, New York, NY, USA. 326–340. isbn:9781450342612 https://doi.org/10.1145/2908080.2908102 Google ScholarDigital Library
- Diomidis Spinellis and Marios Fragkoulis. 2017. Extending Unix Pipelines to DAGs. IEEE Trans. Comput., 66, 9 (2017), 1547–1561.Google ScholarDigital Library
- Ole Tange. 2011. GNU Parallel—The Command-Line Power Tool. ;login: The USENIX Magazine, 36, 1 (2011), Feb, 42–47. https://doi.org/10.5281/zenodo.16303 Google ScholarCross Ref
- Ole Tange. 2020. Differences Between GNU parallel and Alternatives. https://www.gnu.org/software/parallel/parallel_alternatives.htmlGoogle Scholar
- Dave Taylor and Brandon Perry. 2016. Wicked Cool Shell Scripts: 101 Scripts for Linux, OS X, and UNIX Systems (2nd ed.). No Starch Press, USA. isbn:1593276028Google Scholar
- William Thies, Michal Karczmarek, and Saman P. Amarasinghe. 2002. StreamIt: A Language for Streaming Applications. In Proceedings of the 11th International Conference on Compiler Construction (CC ’02). Springer-Verlag, Berlin, Heidelberg. 179–196. isbn:3540433694Google ScholarDigital Library
- Eleftheria Tsaliki and Diomidis Spinellis. 2021. The real statistics of buses in Athens. https://insidestory.gr/article/noymera-leoforeia-athinas?token=0MFVISB8N6 Published by the Inside Story news organization.Google Scholar
- Nikos Vasilakis, Konstantinos Kallas, Konstantinos Mamouras, Achilles Benetopoulos, and Lazar Cvetković. 2021. PaSh: Light-Touch Data-Parallel Shell Processing. Association for Computing Machinery, New York, NY, USA. 49–66. isbn:9781450383349 https://doi.org/10.1145/3447786.3456228 Google ScholarDigital Library
- Nikos Vasilakis, Ben Karel, and Jonathan M. Smith. 2015. From Lone Dwarfs to Giant Superclusters: Rethinking Operating System Abstractions for the Cloud. In Proceedings of the 15th USENIX Conference on Hot Topics in Operating Systems (HOTOS’15). USENIX Association, Berkeley, CA, USA. 15–15. http://dl.acm.org/citation.cfm?id=2831090.2831105Google Scholar
- Robert Virding, Claes Wikström, Mike Williams, and Joe Armstrong. 1996. Concurrent Programming in ERLANG (2nd Ed.). Prentice Hall International (UK) Ltd., GBR. isbn:013508301XGoogle Scholar
- W. Gentzsch (Sun Microsystems). 2001. Sun Grid Engine: Towards Creating a Compute Power Grid. In Proceedings of the 1st International Symposium on Cluster Computing and the Grid (CCGRID ’01). IEEE Computer Society, USA. 35. isbn:0769510108Google Scholar
- Edward Walker, Weijia Xu, and Vinoth Chandar. 2009. Composing and Executing Parallel Data-Flow Graphs with Shell Pipes. In Proceedings of the 4th Workshop on Workflows in Support of Large-Scale Science (WORKS ’09). Association for Computing Machinery, New York, NY, USA. Article 11, 10 pages. isbn:9781605587172 https://doi.org/10.1145/1645164.1645175 Google ScholarDigital Library
- Ian Watson and John Gurd. 1979. A prototype data flow computer with token labelling. In 1979 International Workshop on Managing Requirements Knowledge (MARK). 623–628.Google ScholarCross Ref
- Andy B Yoo, Morris A Jette, and Mark Grondona. 2003. Slurm: Simple linux utility for resource management. In Workshop on Job Scheduling Strategies for Parallel Processing. 44–60.Google ScholarCross Ref
- Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael J. Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing. In Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation (NSDI’12). USENIX Association, Berkeley, CA, USA. 2–2. http://dl.acm.org/citation.cfm?id=2228298.2228301Google ScholarDigital Library
- Zhao Zhang, Daniel S. Katz, Timothy G. Armstrong, Justin M. Wozniak, and Ian Foster. 2013. Parallelizing the Execution of Sequential Scripts. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’13). Association for Computing Machinery, New York, NY, USA. Article 31, 12 pages. isbn:9781450323789 https://doi.org/10.1145/2503210.2503222 Google ScholarDigital Library
Index Terms
- An order-aware dataflow model for parallel Unix pipelines
Recommendations
PaSh: light-touch data-parallel shell processing
EuroSys '21: Proceedings of the Sixteenth European Conference on Computer SystemsThis paper presents PaSh, a system for parallelizing POSIX shell scripts. Given a script, PaSh converts it to a dataflow graph, performs a series of semantics-preserving program transformations that expose parallelism, and then converts the dataflow ...
Unix shell programming: the next 50 years
HotOS '21: Proceedings of the Workshop on Hot Topics in Operating SystemsThe Unix shell is a powerful, ubiquitous, and reviled tool for managing computer systems. The shell has been largely ignored by academia and industry. While many replacement shells have been proposed, the Unix shell persists. Two recent threads of ...
Comments