Tag Archives: rdd

Fixing Spark’s RDD.zip()

I’ve been using Apache Spark recently and like it quite a lot, although it still has several rough edges.  One that I ran into is a quirk in RDD.zip(). I had two RDDs of equal length, but when I zipped them together, the zipped RDD had fewer elements than its parents.  Looking at the documentation […]