In my role as a Data Engineer at InfoStream, I've worked extensively with AWS Redshift for our data warehousing needs. Our major project involved building a data warehouse to house data from a diverse array of sources such as S3 buckets, DynamoDB, and on-premise databases before running queries.
My role included loading data to Redshift clusters, indexing the data to optimize query performance, and creating dashboards to visualize data. We leveraged Redshift's massively parallel processing (MPP) capabilities for fast querying and utilized SQL scripts for data extraction and transformation.
A challenge we faced was around optimizing our queries to make significant performance improvements. To tackle this, I employed various techniques such as distributing tables evenly and choosing appropriate sort keys, which greatly enhanced our query execution time.