Introduction
Today’s data-rich environments require frequent data movement from one place to another. It is not uncommon for software engineers, data architects, and other consumers of corporate data to require testing environments that closely resemble the dataset in production. This allows accurate and high-fidelity testing that enables better code and more realistic QA while maintaining the separation of production and testing environments and adhering to security and operational requirements.
The request to mirror environments sounds simple, but the logistics required to make it happen can be complex, tedious, and time-consuming. For example:
- How easily can data be backed up daily?
- How easily can the data be cleansed of PII or other sensitive data?
- How easily can the data be moved or staged, including archive logs, for point-in-time recovery?
- Is the process automated? Or does it require significant labor?
- Do you have to write and maintain your own code to enable this process? Or can you use a simple, out-of-the-box solution?
- What happens if there is an ad hoc request for an unscheduled refresh?
Solving these challenges is why we at Tessell have built native features that enable data refreshing across environments. Furthermore, we ensured that the solution is comprehensive, easy to use, and can also be programmatically automated using the Tessell API. Let's dive into how the solution works and how to use it.
Creating the Initial Clone
Data refreshing in Tessell works on a Source → clone model. The source is a live database (typically production) that is snapshotted on a regular basis. The clone is a separate virtual machine whose data is replaced on a scheduled basis from the source. Snapshots from the source are generated daily or on demand. Additionally, archive and transaction logs are stored for a set period to allow refreshing to a specific point.
Before creating a clone, you will need to determine your refresh strategy. Do you need to refresh daily, weekly, or monthly? This will dictate the SLA you need to define on the source to ensure the snapshot retention aligns with the refresh timeline.
View the SLA assigned to the Production Source
- Browse to the overview page for the service instance
- On the top right click on Availability Machine
- Once in the Availability Machine, click on Update SLA on the top right.
If the SLA retention is not sufficient, you can select a different SLA or create a new SLA that matches your requirements in the DB Governance section of the Tessell interface.
Create the Clone
We can create the initial clone after you have validated that the SLA will retain the historical snapshots.
- In the Availability Machine for the source. Select the Clone tab.
- Click on Create Clone
- Select the snapshot to use for the initial data seed.
- Alternatively, select the PITR region if you want to seed the clone with data to an exact point in time.
- Click on Clone. This will launch the provisioning wizard, which operates the same as a typical provisioning task in Tessell. However, you will see a section to add a Refresh Schedule. Fill out the refresh schedule with the required time frame. Automated refreshes are set up to run weekly or daily.
- The pre and post-script sections allow you to run a script before the database is refreshed. For example, to extract data or settings from the target database before it is over-written. The post-script section allows you to perform actions on the database after the data has been refreshed—for example, re-import data or save settings in the pre-script step.
If required, you can load scripts in the Tessell Script Library within the interface. Scripts can be shell scripts or SQL scripts.
Here is an example of a script that manipulates users after the refresh is complete
- Submit the request to provision the clone
Database Rename and Multi-Instance Support
Tessell supports renaming the Database during the cloning operation. This will allow you to logically manage the database names to avoid confusion, for example, Dev1, Dev2, etc. You can also rename the “master” credential to ensure each database has a unique login (if required).
If you plan to place multiple clones derived from the same source database on a shared instance, you will need to ensure that each database instance has a unique name. Otherwise, Tessell will throw an error indicating that two databases with the same name cannot reside on the same virtual host.
Checking Refresh Status
After submitting the provisioning request, Tessell will build the initial clone. Once the clone is complete, you will see a few new objects that track the status and history of refreshes.
Checking the last run
Run history will appear in two places
- Last Refresh History in the Clone’s Service Instance Overview page
- The entire History in the Events section of the Service Instance
Viewing and Editing Refresh Schedule
The Schedules section of the Clone’s Service Instance screen contains any existing schedule. You can also modify or delete the schedule from here.
Manual Refresh
Frequently, refresh requests are ad hoc in nature. Tessell has provided functionality to allow a refresh to launch on demand through the Interface or by calling the API.
On-Demand Refresh - User Interface
The Clone’s Service Instance page will display a “Refresh Data” button on the top right corner. This button allows a user to trigger a refresh event manually.
- With the Clone’s Service Instance page, click on Refresh Data.
- Select the Date you wish to refresh. The calendar will only show the dates with data available based on the Source Instance’s SLA.
- You can select Snapshots or PITR in the recovery mode
- Snapshots will display available snapshots for the selected data
- PITR will invoke a clock that allows you to select the exact time to recover
- Optionally, you can choose Pre Scripts and Post Scripts from the library.
- Click Submit to initiate the refresh.
On-Demand Refresh - API
While in the Refresh Data screen, you can click on the <> Code button on the top right corner to see examples of refreshing data using the Tessell API.
The code will dynamically update based on your inputs in the form. As a best practice, fill out the form, then click <> Code to see the resulting API call.