I’ve been using Apache Spark recently and like it quite a lot, although it still has several rough edges. One that I ran into is a quirk in RDD.zip(). I had two RDDs of equal length, but when I zipped them together, the zipped RDD had fewer elements than its parents. Looking at the documentation […]