Email Attachment Harvesting using CozyRoc SSIS+
OR: How to automatically download email attachments to a folder using SSIS.
by Greg Troyan
I'm putting this page up after searching for a really long time for a product that would automatically retrieve attachments from emails for processing into an SQL Server. As much as I wish the entire world would produce clean .CSV files available via FTP sites for easy retrieval, there exist some holdouts that, for whatever reason, only deliver reports via email.
After receiving the question, "Why not just have someone open the email and save the attachment to a location every day?" for the upteenth time, I vowed to solve this problem in a way that didn't require Outlook being involved and could be run as a scheduled task or as a job in SQL Server itself. Enter the CozyRoc receive mail task - a control flow task for receiving and processing emails.
I took a look that the sample package from the CozyRoc site and used that to build my package.
The theory here is that you first build a list of emails to process and store it in an object variable. Then you iterate over that list using a Foreach Loop Container to retrieve a single email, save its attachment then delete it before moving on to the next email in the list.
Here's what the final package looks like:
I created a dedicated email account that would be the destination for any automated emailed reports and populated it with some sample .csv files.
Where each email contains a .csv file
I then create a IMAP Connection to our Exchange server, labeling it "IMAP" and entering the proper settings and testing it to verify that the connection works:
I then use that connector to build the email list over which to iterate:
Drag out a Receive email task to the control flow page and label it "Get Mail List"
Edit the Receive Mail task, selecting "Get Mails List" in the Action drop down list
We need to put the results of the Get Mail List action somewhere, and that will be a new ResultVariable called MailList1. Click on the drop down arrow next to ResultVariable and select "<New variable...>"
Give it the name MailList1 and type "Object"
Your Receive Mail Task should look like this:
Note: It is possible to filter which emails get processed, using the IMAP SEARCH command syntax as specified in RFC3501. The gist of it is that you enter 'SINCE 01-Apr-2012' to retrieve emails since the beginning of April, or 'SUBJECT <text>' to filter by subject, etc.
Next, drag out a Foreach Loop container and connect the "Get Mail List" object to it using the green arrow thingy and open its properties. Click on the Collection section and change the Enumerator section to "Foreach From Variable Enumerator" and select User::MailList1 as the variable object.
This step tells the Foreach Loop container that it will be iterating over the list of emails created by the "Get Mail List" object, performing whatever steps lie within the containier against that list.
Then click on "Variable Mappings" and set the Index to 0 (basically start at the first email in the inbox and proceed from there).
Now it is time to create the email processing bits.
First we need to build the Connection where we will store email bodies for further processing.
Right click on the white space in the "Connection Managers" pane and select "New File Connection". For Usage Type, select "Create File" from the drop down menu and give it a path and file name.
Rename this Connection Manager "EmailBody"
This step tells SSIS that each email will be downloaded to a text file called EmailBody.txt (whose MIME contents will be visible in that directory). As each email is processed, it will be written to EmailBody.txt, overwriting the previous email in that file. This is acceptable because we are only interested in the attachment it contains.
Drag a "Received Email Task" object into the Foreach Loop Container and label it "Get Mail"
Change the Action of the "Get Mail" receive email task to "Get Mail"
Make the following changes:
- Change "Local" to "EmailBody" (telling SSIS where to store the body of the email)
- Change "OverwriteExisting" to "True" (to allow more than one email to be processed)
- Change "MailConnection" to "IMAP" (to use the IMAP Mail Connection Manager you created above)
- Change "MailVariable" to "User::MailList1" (so SSIS knows which email it is processing)
Now we extract the attachment from the current email stored in EmailBody.TXT
First we need to tell SSIS where to save the attachments. Right click on the white space in the "Connection Managers" pane and select "New File Connection". For Usage Type, select "Existing Folder" from the drop down menu and give it a path.
Rename this Connection Manager "EmailAttachments".
Drag a "Receive Email Task" object into the Foreach Loop Container and connect it to the "Get Mail" object with the green arrow thingy and edit its name to "Get Attachment"
Edit the properties of "Get Attachment" and change the Action to "Open Mail"
- Change "Local" to "EmailBody" (to tell SSIS where the current email file is stored whose attachment will be saved)
- Change "Attachment" to "EmailAttachments" (to tell SSIS where to save the extracted attachment)
Finally, we want to delete the processed emails from the Inbox so they don't clutter it up and remain to be processed again the next time the job is run:
Drag a "Receive Email Task" object into the Foreach Loop Container and connect it to the "Get Attachment" object with the green arrow thingy and edit its name to "Delete Email"
Open the properties of the "Delete Email" Receive Email Task object and make the following changes:
-Change "Action" to "Delete Email"
- Change "MailConnection" to "IMAP" (to tell SSIS how to connect to the Exchange server)
- Change "MailVariable" to "User::MailList1" (to tell SSIS which emails to delete)
Save your package.
Running this package against the email account listed above will produce the following output:
The three .CSV attachments appear in the D:\EmailAttachments folder specified in the "EmailAttachments" File Connection Manager and there exists a single EmailBody.TXT file that contains the full email contents of the last email processed.
Opening the receiving email account will show no emails in the inbox or deleted items, since the emails were deleted programmatically.
I hope this helps.