As I understand it lenses don't change the underlying data structure. For ETL you need a way to basically say "the code only uses fields X, Y and Z so we will only load X, Y and Z during runtime." Automatically based on usage without having to keep updating your lens definition. Modern on disk file formats are columnar so they can very efficiently read subsets of the data. If your data has 200 columns than reading the 199 unnecessary ones can be very slow.
They could help with the intermediate data structure but some of them aren't subsets or trivial derivatives. So you really need an inline way to create single use case classes. I think frameless in Scala can do some of this for standard transformations but that requires the black magic of shapeless.
Spark in Python (and the untyped Dataframe API in Scala) compiles everything internally before running it to achieve the above. So it's trivial to have unit tests on empty data structures which "type check" your Spark code.
They could help with the intermediate data structure but some of them aren't subsets or trivial derivatives. So you really need an inline way to create single use case classes. I think frameless in Scala can do some of this for standard transformations but that requires the black magic of shapeless.
Spark in Python (and the untyped Dataframe API in Scala) compiles everything internally before running it to achieve the above. So it's trivial to have unit tests on empty data structures which "type check" your Spark code.