Oxylabs
Junior Data Engineer
Harding
FullTime
2 weeks ago
Job Description
We're a team of 500 professionals who develop cutting-edge proxy and web data scraping solutions for thousands of the world's best known businesses, including Fortune 500 companies.
What's in store for you:
You'll be developing complex products with high coding standards, maintaining our own infrastructure, handling petabytes of data, and solving challenges on a daily basis. We got you covered with a team of strong professionals to support you, a well-built tech stack, and loads of ownership.
As our new Junior Data Engineer , you will be responsible for tackling a diverse and challenging range of problems to help us make better business decisions at Oxylabs.io. In this role, you will focus on making data (internal/external, structured/unstructured, batch/real-time, etc.) accessible across Oxylabs.io while ensuring its accuracy and timeliness. You will design, build, and maintain data pipelines, and you'll also have the opportunity to develop various internal tools using Streamlit and the latest LLM models. This is an excellent opportunity for an analytical thinker who thrives in a fast-paced environment, enjoys hands-on technical work, and is eager to learn new technologies. \n
Our tech stack*:
- Google Cloud Platform (Pub/Sub, GCS, BQ) for ingestion, storage and warehousing;
- dbt and SQL for data modelling;
- Python for automation;
- PowerBI and Superset for data visualisation;
- Airflow for orchestration;
- Kubernetes for deployment;
- Streamlit for building data applications.
*We do not expect candidates to master or know all of the technologies today
What you will do:
- Build a solid understanding of the relationships between our datasources through data analysis and modelling;
- Extract knowledge from raw data and provide valuable insights to your team and business users alike, enabling them to move forward;
- Use Google Cloud Platform together with Airflow to build data pipelines that apply business logic, including making choices about levels of aggregation, grouping and transforming fields without compromising scalability and performance, and being responsible for data quality;
- Use Python to develop internal applications, automations, reverse ETLs and act as a bridge between commercial and engineering teams to understand their needs.
What we expect:
- Excellent SQL knowledge;
- Proficiency in at least one programming language, ideally Python;
- Knowledge of the main data modelling/data warehousing principles;
- Experience with version control systems (e.g., Git) and CI/CD;
- You have a technical aptitude and don't shy away from engineering-related discussions;
- You have a keen eye for detail and troubleshooting as well as a high degree of ownership;
- You have a structured mindset approach to solve any kind of problem and able present its outcome in a clear and concise way;
- You're excellent at communicating in both written and spoken Lithuanian and English.
It's a bonus, if you have the following:
- Experience with cloud infrastructure (ideally, Google Cloud);
- Knowledge of Docker and Kubernetes;
- Experience with LLMs (Open AI/Anthropic APIs).
Salary:
- **Gross salary:**2320 - 3310 EUR/month. Keep in mind that we are open to discussing a different salary based on your skills and experience.
\n
Up for the challenge? Let's talk!