The right talent can transform your business—and we make that happen. At Collabera, we go beyond staffing to deliver strategic workforce solutions that drive growth, innovation, and agility. With deep industry expertise, a global talent network, and a people-first approach, we connect you with professionals who don’t just fit the role but elevate your business. Partner with us and build a workforce that powers success.
Senior Data Engineer (Python - PySpark and AWS)
Contract: Toronto, Ontario, CA span>
Salary Range: 80.00 - 100.00 | Per Hour
Job Code: 369269
End Date: 2026-06-10
Days Left: 15 days, 22 hours left
- Migrate existing Databricks-based Spark pipelines to AWS EMR (Spark)
- Perform lift-and-shift of ~50+ datasets, some with high complexity and multiple data sources
- Refactor and optimize data pipelines for performance, scalability, and reliability
- Structure and store data using Parquet and Iceberg formats
- Improve and clean up legacy data pipelines built over several years
- Design data with a consumption-first mindset (e.g., partitioning strategies, access patterns, data usability)
- Collaborate with stakeholders to understand data requirements and translate into scalable solutions
- Ensure production readiness including monitoring, orchestration, and deployment
- Work independently to drive delivery from design through implementation
- Develop and optimize large-scale PySpark data pipelines
- Rebuild and enhance Spark workloads in AWS (EMR)
- Leverage tools such as Airflow, AWS Glue, and Lake Formation
- Handle parallel/distributed data processing workloads
- Improve system performance and data quality across pipelines
- Engage with business and technical stakeholders to align on data needs
- Own delivery with minimal oversight in a fast-paced environment
- 8–10+ years of Data Engineering experience (senior-level profiles only)
- Strong hands-on expertise in Python and PySpark
- Deep experience with Apache Spark in distributed environments
- Proven experience working with large-scale, complex data pipelines
- Experience with Databricks (existing environment)
- Strong knowledge of Parquet and Iceberg data formats
- Experience with AWS data ecosystem (EMR preferred)
- Familiarity with Airflow, Glue, and Lake Formation
- Strong understanding of parallel/distributed data processing
- Ability to work independently with strong problem-solving skills
- Experience in ambiguous environments with evolving requirements
- Prior experience in capital markets or investment management
- Experience working with market data / reference data vendors
- Experience designing data products and consumption layers
- Exposure to large-scale data platform migrations or transformations
We may use AI-enabled and/or automated tools to support parts of our recruitment process, including application screening, interview scheduling, and candidate communications. These tools are used to enhance consistency and efficiency. All hiring decisions involve human review and are not based solely on automated processing.
The Company offers a total rewards package in accordance with all applicable federal, provincial, and local laws and requirements. Benefit eligibility and offerings vary based on role, employment status, and work location. For contractor positions, benefits are limited to those entitlements and protections required by applicable law, which may include (as applicable) vacation pay, public holidays, leaves of absence, and other legally mandated benefits or payments.
Job Requirement
- python
- PySpark
- AWS
- apache
- databricks
- airflow
Reach Out to a Recruiter
- Recruiter
- Phone
- Shashank Rathod
- shashank.rathod@collabera.com