One of Bytecode IO’s clients is moving from Mixpanel to an AWS Redshift data warehouse. I was tasked with extracting the data out of the Mixpanel API and loading it into Redshift. As the client wanted to slowly reduce their use of Mixpanel, rather than a single dump and load, we went with an hourly data load into Redshift for the foreseeable future.
Writing the Ruby script to save the Mixpanel raw dumps to S3 for a COPY into Redshift was straight forward thanks to the Ruby Mixpanel Data API Client contributed to by keolo however the script was repeatedly getting killed by the OOM killer. Digging deeper, I found that the mixpanel_client library was transforming the API response into a Ruby hash in memory. I forked the repo and added a raw response format that returned the result as a string which reduced the memory footprint significantly. As a raw response type that might be useful to others, I made a GitHub pull request and the changes were merged into the master branch.