Can you discuss your experience with using Apache Spark and Hadoop in your professional or academic projects?

How To Approach: Associate

  1. Highlight experience with Apache Spark & Hadoop.
  2. Describe a professional project involving these tools.
  3. Discuss key contributions and responsibilities.
  4. Speak about project results and impact.

Sample Response: Associate

As a Big Data Analyst at TechnoX Solutions for over three years, I've consistently worked with Apache Spark and Hadoop to manage, process, and analyze large datasets for various clients. A project that notably stands out was when we helped a client in the insurance sector to analyze the claims data for the last ten years.

We stored hundreds of terabytes of raw claims data, including textual notes, in our Hadoop Distributed File System (HDFS). To process and analyze this data, our team wrote Spark applications in Java. I was responsible for the part where we had to extract patterns and trends from the textual data. For this, I wrote Spark programs to perform Natural Language Processing (NLP) using tools like Stanford CoreNLP.

Through the project, we derived significant insights which our client used to identify the main causes of claims and redesign their policies accordingly. This led to a 7% reduction in the frequency of claims and 15% improvement in claims processing efficiency.