Data rarely arrives “clean”. You may have duplicate data in your transactional files, a customer table without a unique key, or you may be dealing with data that has no firm relationship (such as several time series datasets for overlapping time periods).
While it’s possible to use fancy Excel tricks to merge and clean data, using the proper tool for the job ensures accuracy and repeatability. That tool is SQL, or Structured Query Language.
In this 30-minute video, I cover the most common operations that are done for data preparation. For simplicity of setup, I use R, RStudio and sqldf, which automatically sets up a SQLite database environment when you call the package. Viewing this tutorial, and especially following along with the code examples and practicing, will make you a more valuable analyst.
The example data files are available for download.