Aller au contenuAller au pied de page
  • Emplois
  • Entreprises
  • Salaires
  • Pour les employeurs

      Boostez votre carrière

      Découvrez votre salaire potentiel, décrochez des emplois de rêve et partagez vos témoignages de manière anonyme.

      employer cover photo
      employer logo
      employer logo

      ciValue

      Est-ce votre entreprise ?

      À propos
      Avis
      Salaires et avantages
      Emplois
      Entretiens
      Entretiens
      Recherches associées: Avis sur ciValue | Offres d’emploi chez ciValue | Salaires chez ciValue | Avantages sociaux chez ciValue
      Entretiens chez ciValueEntretiens d’embauche pour Data Engineer chez ciValueEntretien chez ciValue


      Glassdoor

      • À propos
      • Récompenses
      • Blog
      • Nous contacter
      • Guides

      Employeurs

      • Compte employeur gratuit
      • Centre employeur
      • Blog pour les employeurs

      Informations

      • Aide
      • Règles de la communauté
      • Conditions d'utilisation
      • Confidentialité et choix publicitaires
      • Ne pas vendre ni partager mes informations
      • Outil de consentement aux cookies

      Travailler avec nous

      • Annonceurs
      • Carrières
      Télécharger l'application

      • Parcourir par :
      • Entreprises
      • Emplois
      • Lieux

      Copyright © 2008-2026. Glassdoor LLC. « Glassdoor », son logo, « Worklife Pro » et « Bowls » sont des marques déposées de Glassdoor LLC.

      Entreprises suivies

      Tenez-vous au courant des dernières opportunités et profitez de conseils d’initiés en suivant les entreprises de vos rêves.

      Recherche d’emplois

      Obtenez des recommandations et des mises à jour personnalisées en démarrant vos recherches.

      Entretien pour Data Engineer

      16 mars 2023
      Candidat à l'entretien anonyme
      Haïfa
      Offre refusée
      Expérience positive
      Entretien moyen

      Candidature

      J'ai postulé en ligne. Le processus a pris 3 semaines. J'ai passé un entretien chez ciValue (Haïfa) en mars 2023

      Entretien

      Phone call with the hiring manager, technical interview on-site (about 1.5 - 2 hours), HR interview (on-site), VP R&D 1-hour interview (on-site). 30-minute VP HR meeting (on-site). Despite the thing that all the interviews have to be on-site and the lack of parking in that area, the process was fine and the people in general made a very positive impression on me. But, the overall feeling from my visits there was very depressing, The office is very small and grey, with small rooms with small desks. Though they are located in a very beautiful green area, I just felt like I have to air to breathe.

      Questions d'entretien [4]

      Question 1

      Spark optimizations: what are the optimizations that can be done for the below snippet code: shoppers_df (customers description DF) 250MB, 15M records: schema: StructType = StructType(Array(StructFiled("shopper_id", LongType, nullable = True), StructField("retailer_id", StringType, nullable = True), StructField("shopper_group_id", StringType, nullable = True), StructField("join_date", DateType, nullable = True), StructField("shopper_type", StringType, nullable = True), StructField("gender", StringType, nullable = True))) sku_df (dimension DF): 15 MB, 90K records purchase_df (transactions DF): 50GB of parquet compressed files 5,000,000,000 records. schema: StructType = StructType(Array(StructFiled("shopper_id", LongType, nullable = True), StructField("product_id", LongType, nullable = True), StructField("pos_id", IntegerType, nullable = True), StructField("purchase_date", DateType, nullable = True), StructField("units", DoubleType, nullable = True), StructField("total_spent", DoubleType, nullable = True))) Current code: products_purchased_df = purchase_df.alias("purchase").join(shoppers_df, on = "shopper_id", how = "left outer").join(sku_df.alias("sku"), on = "product_id").select(Col("purchase.*"), Col("sku.*")) usage: status_df = products_purchased_df.groupBy(["shopper_id", "product_id"]).agg(...) Optimize join statement
      1 réponse

      Question 2

      Data Modelling: Given an input file for shoppers that should be loaded into row based DB, what is the optimized DB model (table / tables & columns) that will performs best for the following queries: 1) Get shoppers that are eligible for email & FB 2) Get shoppers that are eligible for email OR App 3) Get active shoppers (status = "A") that are NOT eligible for SMS Assumptions: there are 4 different delivery channels: e-mail, App, FB, SMS a shopper may have more than one delivery channels shopper has 2 status: A - Active or D - Disabled input data structure: +----------+-------+-------+--------+--------+--------+---------+ | id (key) | status| city | dc_1 | dc_2 | dc_3 | dc_4 | +----------+-------+--------+--------+--------+-------+---------+ |L1 | A | NY | e-mail | SMS | | | +----------+-------+--------+--------+--------+-------+---------+ |L2 | A | LA | e-mail | FB | App | | +----------+-------+--------+--------+--------+-------+---------+ |L3 | D | LA | SMS | FB | | | +----------+-------+--------+--------+--------+-------+---------+
      1 réponse

      Question 3

      Data integrity: Given transaction partition files (100 files), that are batch ingested with pipelines from storage (like S3) to a distributed DWH. What is the preferred data structure ingestion to allow data integrity? (each invoice is fixed or ingested only once). Details: - each invoice has its unique id, and each invoice contains a list of products to be added or fixed - the ingestion procedure upserts the data: update if the invoice already exists or insert if the invoice is new
      1 réponse

      Question 4

      Data Validation: Given transaction input files that are validated before the ETL process, suggest the appropriate technology and metrics to be checked in order to have seamless data integrity? Which types of data validations would you suggest for this structure? File structure: invoise_id (str) timestamp (timestamp) store_id (str) customer_id (str) product_id (str) quantity (float) purchase_spent(float) purchase_discount (float) Assumptions: file volume: 35 M records, side 5 GB transaction files can be single or multiple
      Répondre à cette question