Want to create an interactive transcript for this episode?
Podcast: Software Engineering Daily
Episode: Modin: Pandas Scalability with Devin Petersohn
Description: Pandas is a Python data analysis library, and an essential tool in data science. Pandas allows users to load large quantities of data into a data structure called a dataframe, over which the user can call mathematical operations. When the data fits entirely into memory this works well, but sometimes there is too much data for a single box.The Modin project scales Pandas workflows to multiple machines by utilizing Dask or Ray, which are distributed computing primitives for Python programs. Modin builds an execution plan for large data frames to be operated on against each other...