Database Management

Automated Production to Development Refresh for your Cloud Databases

Frank Lamprea

July 22, 2024

Introduction

Today’s data-rich environments require frequent data movement from one place to another. It is not uncommon for software engineers, data architects, and other consumers of corporate data to require testing environments that closely resemble the dataset in production. This allows accurate and high-fidelity testing that enables better code and more realistic QA while maintaining the separation of production and testing environments and adhering to security and operational requirements.

The request to mirror environments sounds simple, but the logistics required to make it happen can be complex, tedious, and time-consuming. For example:

How easily can data be backed up daily?
How easily can the data be cleansed of PII or other sensitive data?
How easily can the data be moved or staged, including archive logs, for point-in-time recovery?
Is the process automated? Or does it require significant labor?
Do you have to write and maintain your own code to enable this process? Or can you use a simple, out-of-the-box solution?
What happens if there is an ad hoc request for an unscheduled refresh?

Solving these challenges is why we at Tessell have built native features that enable data refreshing across environments. Furthermore, we ensured that the solution is comprehensive, easy to use, and can also be programmatically automated using the Tessell API. Let's dive into how the solution works and how to use it.

Creating the Initial Clone

Data refreshing in Tessell works on a Source → clone model. The source is a live database (typically production) that is snapshotted on a regular basis. The clone is a separate virtual machine whose data is replaced on a scheduled basis from the source. Snapshots from the source are generated daily or on demand. Additionally, archive and transaction logs are stored for a set period to allow refreshing to a specific point.

Before creating a clone, you will need to determine your refresh strategy. Do you need to refresh daily, weekly, or monthly? This will dictate the SLA you need to define on the source to ensure the snapshot retention aligns with the refresh timeline.

View the SLA assigned to the Production Source

Browse to the overview page for the service instance
On the top right click on Availability Machine

Once in the Availability Machine, click on Update SLA on the top right.

If the SLA retention is not sufficient, you can select a different SLA or create a new SLA that matches your requirements in the DB Governance section of the Tessell interface.

Create the Clone

We can create the initial clone after you have validated that the SLA will retain the historical snapshots.

In the Availability Machine for the source. Select the Clone tab.

Click on Create Clone
Select the snapshot to use for the initial data seed.

Alternatively, select the PITR region if you want to seed the clone with data to an exact point in time.

Click on Clone. This will launch the provisioning wizard, which operates the same as a typical provisioning task in Tessell. However, you will see a section to add a Refresh Schedule. Fill out the refresh schedule with the required time frame. Automated refreshes are set up to run weekly or daily.

The pre and post-script sections allow you to run a script before the database is refreshed. For example, to extract data or settings from the target database before it is over-written. The post-script section allows you to perform actions on the database after the data has been refreshed—for example, re-import data or save settings in the pre-script step.
If required, you can load scripts in the Tessell Script Library within the interface. Scripts can be shell scripts or SQL scripts.

Here is an example of a script that manipulates users after the refresh is complete

Copied to clipboard!

  
#!/bin/bash

# Variables
NEW_USER="NEWUSER"
NEW_USER_PASSWORD="myPassword123"
OUTPUT_FILE="output.log"

# SQL script to add a new user
# This assumes your SOURCE DB already has a user named
# CHANGEME and another named DELETEME
SQL_SCRIPT=$(cat <<EOF
CREATE USER $NEW_USER IDENTIFIED BY "$NEW_USER_PASSWORD";
GRANT CONNECT, RESOURCE TO $NEW_USER;
DROP USER DELETEME;
ALTER USER CHANGEME IDENTIFIED BY "Goodpassword123";
EXIT;
EOF
)

# Create a temporary script to be run as the oracle user
TEMP_SCRIPT="/tmp/create_user.sql"

# Write the SQL script to the temporary file
echo "$SQL_SCRIPT" > "$TEMP_SCRIPT"

# Execute the SQL script using sqlplus as sysdba and redirect output to a file
sudo -u oracle sqlplus -s / as sysdba @"$TEMP_SCRIPT" > "$OUTPUT_FILE" 2>&1

# Check if the user was added successfully by searching for "ORA-" in the output file
if grep -q "ORA-" "$OUTPUT_FILE"; then
  echo "Failed to manage users. See $OUTPUT_FILE for details."
else
  echo "Users managed successfully. See $OUTPUT_FILE for details."
fi

# Clean up the temporary script file
rm -f "$TEMP_SCRIPT"

Submit the request to provision the clone

Database Rename and Multi-Instance Support

Tessell supports renaming the Database during the cloning operation. This will allow you to logically manage the database names to avoid confusion, for example, Dev1, Dev2, etc. You can also rename the “master” credential to ensure each database has a unique login (if required).

If you plan to place multiple clones derived from the same source database on a shared instance, you will need to ensure that each database instance has a unique name. Otherwise, Tessell will throw an error indicating that two databases with the same name cannot reside on the same virtual host.

Checking Refresh Status

After submitting the provisioning request, Tessell will build the initial clone. Once the clone is complete, you will see a few new objects that track the status and history of refreshes.

Checking the last run

Run history will appear in two places

Last Refresh History in the Clone’s Service Instance Overview page

The entire History in the Events section of the Service Instance

Viewing and Editing Refresh Schedule

The Schedules section of the Clone’s Service Instance screen contains any existing schedule. You can also modify or delete the schedule from here.

Manual Refresh

Frequently, refresh requests are ad hoc in nature. Tessell has provided functionality to allow a refresh to launch on demand through the Interface or by calling the API.

On-Demand Refresh - User Interface

The Clone’s Service Instance page will display a “Refresh Data” button on the top right corner. This button allows a user to trigger a refresh event manually.

With the Clone’s Service Instance page, click on Refresh Data.

Select the Date you wish to refresh. The calendar will only show the dates with data available based on the Source Instance’s SLA.

You can select Snapshots or PITR in the recovery mode
- Snapshots will display available snapshots for the selected data
- PITR will invoke a clock that allows you to select the exact time to recover

Optionally, you can choose Pre Scripts and Post Scripts from the library.
Click Submit to initiate the refresh.

On-Demand Refresh - API

While in the Refresh Data screen, you can click on the <> Code button on the top right corner to see examples of refreshing data using the Tessell API.

The code will dynamically update based on your inputs in the form. As a best practice, fill out the form, then click <> Code to see the resulting API call.

These might also interest you