Hi peeps! Today we are going to look at a concept I call destructive selects.
Imagine this scenario, you have a table that contains a unit of work, maybe a source for a data driven subscription or some sort of ETL processing. Obviously you only want to action this unit of work once (i.e. send the report once or process the row once), so you need to remove this record from the unit of work table or it’ll keep getting actioned.
Now you could select from the table and then go back and delete the relevant row(s). However this is two separate operations, which could add issues such as locking and blocking, especially if you’re doing this to support multithreading in an ETL process as per my example below. Not to mention you’ve got two operations to generate a plan for, assign resources to and execute.
The destructive selects method allows you to do this all in one operation. So lets see my example below:
Now I’ve seen things similar to the above done to take advantage of multi threading / multiple cores / concurrency in SSIS. The idea is you have all four sources feeding from a single table but doing the work in parallel. Now in this example, you only want each row once or you’ll end up with duplicates in the final destination and having to do a separate select and then delete would be a nightmare. So what we do is the put the below SQL in each data source:
DELETE TOP (100) FROM dbo.UnitOfWorkTable OUTPUT DELETED.Co1, DELETED.Col2, DELETED.Col3
This is a destructive select, you have removed the data from the original “UnitOfWork” table and “selected” it out to the data source.
That’s all there is to it really, as I said at the beginning of the post, its a neat little tip for doing things such as the above example but also stuff like adhoc deliveries using data driven subscriptions, ensuring that each row is only ever processed once.
The only caveat I’d add here is that you need to make sure you have a system in place to account for errors so that no rows “go missing”. For my example, I personally would pipe any rows that generate errors into a second output file or table and continue processing the rest.
Well thanks for reading, please post in the comments section below if you have any questions or drop me a message on twitter at @TSQLNinja