The Update Set Merging versus Batching Debate

Have you ever run a deployment where the system seemed to hang for the longest time?

The reason it appears to hang is most likely due to changes to a table’s structure in the database.

Any change to a table’s structure requires the database to make a shadow copy of that database table. The database then copies the existing data from the original table to the new table, drops the original table and then renames the shadow table to the original name.

That’s a lot of disk activity, especially when we’re working with wide tables (many columns) and rows. I have two tables that immediately come into mind here, task and cmdb_ci.

Take two update sets. The first one adds a column to a wide and deep table, the second removes that column.

Scenario 1 – Batched update sets.
The developer creates an update set, let’s call it “US-A”. They add a column to the task table, add some other objects, and then decide to test their update so they complete it and migrate to Test. Let’s assume that deployment takes 5 minutes.

They later realize that they don’t need that column, or that it belongs in an extended table. They create Update Set B (“US-B”) and remove the column. They then deploy that to the Test environment to test it.

Hypothetical Total Transaction Time: 5 minutes to add the column to task, 5 minutes to remove that column from task (it’s actually more). So we are looking at roughly 12 minutes to fully execute both update sets.

With batched update sets, this scenario will play out in all future migrations. The system will execute US-A and then US-B at a cost of 12 or more minutes.

Scenario 2 – Merged update sets.
Same thing as scenario 1, except the create column action is now merged or overwritten by the remove column operation. During the migration the system will look for that column to remove it, realize that it doesn’t exist and ignore the operation. The total cost in this case is at most, one second? You are also saving yourself a lot of unnecessary disk I/O. Less moving parts, less potential for issues.

Since we only have one update set to move that functionality, there is less overhead for the release manager and less potential to miss something.

I am of the opinion that batched update sets do have a role, but that role is limited. I could see them being used to create parent-child relationships between releases. If we set R(N+1) as a child of R(N) then for any future deployment, the system will know how to correctly deploy each sequential release.