Data Extraction

Once the necessary data has been identified in the Examination Phase and the database design has been established in the Design Phase, it's time to go through the source data and extract needed data.

For small and relatively uncomplex data sets, the source file(s) can sometimes be loaded into a spreadsheet program and the extraction done directly, usually by deleting unwanted columns and saving the result as a CSV or other formatted text file.

For larger data sets or data available only in multiple files, several steps are often required. I prepare scripts specifically designed to extract only the data previously identifed as necessary and create files for direct input to the MySQL database tables. For large systems, say 100 million or more records, I use a "load process" as opposed to an "insert process." I have found this load process to be reliable and a faster approch than others.