Corporations Data Extract Download into Microsoft SQL Server

In this post l will be detailing the steps l used to ETL – Extract Transform and Load into a Microsoft SQL Server 2012 database from Washington State Corporations Data download. This showcases SQL Server inbuilt SSIS [SQL Server Integration Services] Data import and export wizard. We will be working with the text (tab limited) and not the XML version first.

1)

First download the text form data file from this link, click here.

Unzip the file as shown below.

p1

2)

Create a database bucket and re-size to accomodate a large data import and reasonable file growth by value instead of percentage which is less efficient.

create database CorporationsDataWA;

USE [master]
GO
ALTER DATABASE [CorporationsDataWA]
MODIFY FILE ( NAME = N’CorporationsDataWA’, SIZE = 512000KB , FILEGROWTH = 102400KB )
GO
ALTER DATABASE [CorporationsDataWA]
MODIFY FILE ( NAME = N’CorporationsDataWA_log’, SIZE = 102400KB , FILEGROWTH = 102400KB )
GO

p2

p3

 

3)

Below show step by step screenshots of the ETL process. I also chose to save the SQL server generated workflow process for this ETL as a separate SSIS package for later review, and execution.

Notice the ETL error message generated. The default column width of 50 characters for the business name and address fields was not enough to handle the input data without truncation, l tried 1100 and finally settled for 200 characters for each of these name, address fields. And that worked. With a clear data dictionary one might have known beforehand the maximum characters for those data fields in the text limited input data file.

Arrows indicate important entry fields or steps. Click on images for a larger version.

3-29-2014 11-40-55 AM 3-29-2014 11-41-35 AM 3-29-2014 11-44-03 AM 3-29-2014 11-44-18 AM 3-29-2014 11-48-37 AM 3-29-2014 11-49-00 AM 3-29-2014 11-49-37 AM 3-29-2014 11-49-54 AM 3-29-2014 11-50-27 AM 3-29-2014 11-52-48 AM 3-29-2014 11-53-01 AM 3-29-2014 11-53-32 AM 3-29-2014 11-56-06 AM 3-29-2014 11-56-34 AM 3-29-2014 11-56-50 AM 3-29-2014 11-57-14 AM 3-29-2014 11-58-17 AM 3-29-2014 11-59-01 AM 3-29-2014 11-59-12 AM 3-29-2014 11-59-45 AM 3-29-2014 12-00-54 PM

 

B)

Importing Governing People (same process as above)

3-29-2014 12-35-53 PM 3-29-2014 12-36-30 PM 3-29-2014 12-36-42 PM 3-29-2014 12-36-52 PM 3-29-2014 12-37-03 PM 3-29-2014 12-37-59 PM 3-29-2014 12-38-14 PM 3-29-2014 12-38-32 PM 3-29-2014 12-38-54 PM 3-29-2014 12-39-20 PM 3-29-2014 12-39-32 PM 3-29-2014 12-40-03 PM 3-29-2014 12-40-48 PM 3-29-2014 12-41-03 PM

 

Below are screenshots showing one of the SSIS packages, with content(s) settings.

3-29-2014 1-39-29 PM 3-29-2014 1-43-33 PM 3-29-2014 1-44-15 PM 3-29-2014 1-45-04 PM 3-29-2014 1-45-21 PM 3-29-2014 1-45-47 PM 3-29-2014 1-46-16 PM