Databricks
职位描述
该职位来源于猎聘 Job Responsibilities Design, develop, deploy, and maintain data pipelines on the Databricks platform, ensuring their efficiency and reliability. Build enterprise-grade data lakehouses using Delta Lake technology, performing data modeling, cleansing, transformation, and integration, while optimizing performance (e.g., file indexing, small file compaction).
Utilize Databricks
Workflows, Apache Airflow, and other tools for workflow scheduling, monitoring, and alerting to ensure stable delivery of data tasks. Collaborate with data scientists and data analysts to provide high-quality, trustworthy data assets and support the deployment and maintenance of machine learning models. Follow data governance best practices and use tools such as Unity Catalog for metadata management, access control, and data lineage tracking.
Job Requirements
Must-Have Qualifications: Bachelor’s degree or above in Computer Science, Information Technology, or related fields, with experience in big data development. Solid hands-on experience with Databricks in commercial projects, proficient in workspace management, cluster configuration, notebook development, and job scheduling. Strong proficiency in PySpark and Spark SQL, with excellent data transformation and processing capabilities.
Deep understanding of Delta Lake principles and core operations such as MERGE INTO, OPTIMIZE, Z-ORDER BY, and transactional features. Excellent SQL skills, capable of independently handling complex queries and logic development. Good English reading and writing skills to support daily technical documentation and team collaboration.
Preferred Qualifications: Databricks official certification (e.g., Certified Associate Developer) is a plus.
Experience with MLflow for managing the machine learning lifecycle is a plus. Familiarity with basic data services on any major cloud platform (Azure, AWS, or GCP) is a plus. Strong logical thinking, team collaboration spirit, and proactive problem-solving ability.
¿Te interesa este puesto?