Question d’entretien chez Databricks

How does spark partition data?

Réponses aux questions d'entretien

Utilisateur anonyme

6 juin 2019

I can attest that this is quite true. The company develops great products but it's disappointing how they treat their candidates that they themselves contacted - it speaks highly of the organisation's culture and values.

1

Utilisateur anonyme

13 mai 2019

There are two main answers to this question. Initially spark partitions data via the Hadoop input format when reading from its source. Subsequently, it partitions data according to the level of parellisim such that a single task can process that data. This level of parellisim can be over partition. The level of parellisim can be overridden by some functions on a per function basis.