Big data is one of the hottest topics in the tech world. If you are a developer looking for a challenging career change, data engineering could be a great choice for you. But what does this role mean and how can you transfer to this field? We interviewed our big data architect, Csaba, who joined the company as a software engineer and then became a data engineer. He told us why he chose this career path and how you can, too.
How did your career at Aliz start? I applied for an internship almost 10 years ago but I was rejected. Surprisingly, one month later they called me back because they had another opening. This time I rejected them because I had already started working at another company. However, I was so unengaged at that company that my roommate talked me into quitting and choosing Aliz. That was one of the best decisions of my life. I have been working here ever since.
How did your career path evolve? When I joined the company there were only 2 projects and 8 colleagues. As a small, young company, it was hard to find new projects and I didn’t have a choice of what to work on. I started working as a junior Java developer. We developed a cloud-based SaaS product for the public sector with the traditional Java enterprise stack. However, I still had a lot of opportunities to grow. In less than 2 years, I became a tech lead, which was a huge challenge but I managed. Thanks to the team’s outstanding motivation and hard work, the company scaled up rapidly. We successfully adopted new directions, such as becoming a Google Cloud partner. I shifted into data engineering and now I work as a big data architect.
How did you become a data engineer? It was kind of a mutual decision within the company. Management was eager to find a unique specialization that would differentiate us from other companies. They are very open and approachable about strategic decisions and asked my opinion as well. As I was very interested in big data and it was my specialization during my Master’s, I suggested they get started in this area.
How did you get started? Both data engineering and Google Cloud were new at the time. There weren’t any established best practices, so we were literally learning by doing. We started with a proof of concept in real-time transactional data processing with Cloud Dataflow and BigQuery. Fortunately, our first client was incredibly patient while we were busy discovering and experimenting. Since the stack was completely new to us, and to everyone in the world back then, we hit some dead ends. It was inevitable that we would make mistakes along the way. We often needed to refactor our code. It took some time to figure out the best practices on this new stack. It was a very exciting experience. I personally learned a lot. The project became a success story and the client was happy.
What do data engineers do? How does your work differ from software engineering? Software engineering is a broad term that can mean any kind of programming from writing front-end code to back-end code. Data engineering is a specialization – we build up the data infrastructure; organize the collection of structured and unstructured data, its processing and storage; and provide the means to turn raw data into valuable and actionable insights. Data engineers focus on the design and implementation of data pipelines, building data lakes and data warehouses. For me personally, the most exciting difference is that as a data engineer I often need to walk the unbeaten path. When you build an application on a more traditional tech-stack, you can find a lot of existing templates and solutions online. However, when you face a data-related problem using cutting-edge tools, the chances are you will be the first one in the whole world to find a solution.
How much does a data engineer have to code and in which language? Coding varies depending on the problem you need to solve and the client’s requirements. Stream processing usually requires a custom solution, so more coding is necessary (around 70% of the engineer’s time). However, there are clients who don’t have their own tech teams, so they try to minimize the coding with drag-and-drop data transformation tools such as Cloud Data Fusion. In these cases, there are more querying and integrating tasks and only around 30% of coding. In my opinion, if you can code in one language, you can code in any other one easily. The syntax is a little different but the concepts are similar. If you have to choose which one to start with, I’d suggest learning Java or Python because these are the most common open-source languages in the big data world. We use both of them depending on the clients’ preferences and the tech-stack we use. For a real-time stream processing solution using Apache Beam and Cloud Dataflow, we prefer Java. If we need to build a batch ELT pipeline with Airflow or if artificial intelligence (AI) and data scientists are heavily involved in the project, we use Python.
Can a data engineer easily become a data scientist? Well, it’s possible but not that simple. Data engineers build the data structure and perform data modeling for data scientists who work with the data once it is in the tables.Of course, data engineers develop useful skills and gain valuable experience (batch processing, building the pipeline for the machine learning model, and putting a model to production) which makes their job easier and faster. However, data science is a different specialty that requires more maths and statistics background. Consequently, senior engineers can only become data scientists if they start from the junior level. If you want to learn data engineering because your goal is to become a data scientist, it’s much wiser to learn data science in the first place.
Why do you like coding in the cloud? Why Google Cloud? The cloud takes care of the infrastructure overhead (installing virtual machines, operational systems, and network settings) so developers can concentrate on coding. There is more room for experimenting – you can set up enough resources in a couple of minutes and you can see your test results right away. The main cloud providers have a lot of similarities. I like GCP because the services are the same as what is used internally by Google. This means that the tools are really battle-tested and the scalability is guaranteed. Also, there are some unique services like BigQuery which cannot be found in other cloud providers’ offers.
Why do you like working at Aliz? This agile company fosters a great environment for continuous personal and professional growth. I get to work with cutting-edge tech on interesting use cases. I am involved in the whole lifecycle of the projects and my opinion is always taken into consideration.
What’s next for you? What are your career goals? I really like what I’m doing, so I want to do the same…just better. I feel comfortable in a dynamic client-facing environment, so I see myself as a technical consultant in the future. I’m also passionate about knowledge sharing, so I will also concentrate more on being an active contributor to local meetup communities and tech conferences.
Does it matter what kind of data you work with? Do you have any industry preferences? Not really. I like working in new domains because I can gain valuable insights and broaden my horizons. I also enjoy working in a familiar domain – such as aviation – where I can apply my industry expertise to make shortcuts and solve problems faster.
Where can aspiring data engineers learn today? Today, there are great training materials and it is easy to find good documentation and helpful information on Stack Overflow. We have a company Coursera account where we can complete pieces of training developed by Google itself. It is useful to start with but in my experience, you have to work on real-life projects to really acquire knowledge.