Generate synthetic data with Neosync
Learn how to generate synthetic data in your Neon database with Neosync
Neosync is an open-source synthetic data orchestration platform that can create synthetic data and sync it across all of your Neon database environments.
In this guide, we'll show you how to seed a Neon database with synthetic data for testing and rapid development using Neosync.
Prerequisites
To complete the steps in the guide, you require the following:
Neon setup
In Neon, we'll create a database for the synthetic data, define a table, and retrieve the database connection string.
Create a database
To create a database, which we'll call neosync
, perform the following steps:
- Navigate to the Neon Console.
- Select your project.
- Select Databases from the sidebar.
- Select the branch where you want to create the database.
- Click New Database.
- Enter a database name (
neosync
), and select a Postgres role to be the database owner. - Click Create.
Create a table
Next, we'll create the table for your data.
-
In the Neon Console, select the SQL Editor from the sidebar.
-
Select the correct branch and the
neosync
database you just created. -
Run the following commands to create your schema:
note
Installing the Postgres UUID extension to auto-generate UUIDs for the
id
column is optional. If you prefer, you can let Neonsync generate the UUIDs column values for you.
Copy the connection string for your database
Navigate to the Dashboard in Neon and copy the connection string for the destination database from the Connection Details widget.
note
Make sure you select the correct database (neosync
) from the Database drop-down menu.
Your connection string should look something like this:
Neosync setup
In Neosync, we'll configure a connection to your Neon database and create a job that populates the database with synthetic data.
Configure a connection to the Neon database
-
Navigate to Neosync and login. Go to Connections > New Connection then click on Neon.
-
Enter a unique name for the connection in the Connection Name field. We'll give the connection the following name:
neon-neosync
-
Paste the Neon database connection string in the Connection URL field and click Test Connection to verify that the connection works.
-
Click Submit to save the connection configuration.
Generate synthetic data
To generate data, you need to create a Job in Neosync:
-
Click on Jobs and then click on New Job. You are presented with a few job types. Since you are seeding a table from scratch, select the Data Generation option and click Next.
-
Give the job a name and set Initiate Job Run to Yes. We'll call it
generate-user-data
. You can leave the schedule and advanced options alone. Click Next to move onto the Connect page. -
On the Connect page, select the connection you configured previously (
neon-neosync
) from the dropdown and click Next.note
There are a few different options on the Connect page, such as Truncate Before Insert, Truncate Cascade, etc., but we don't need these right now, so you can ignore them.
-
On the Schema page:
-
Specify a value for Number of Rows. We'll create 1000 rows of data to use in this example.
-
Under Table Selection, select the schema and table (
public.users
) where you want to generate synthetic data and move it from the source to the destination table. -
For each column in your table, select a Transfomer to define the type of data you want to generate for the column. For the
age
column, we used theGenerate Random Int64
to randomly generate ages between 18 and 40. You can configure the generator by clicking on the edit icon next to the transformer and setting min and max values. -
After the transformers are configured, select the checkboxes for all of the transformers and click Submit to create the Job that we defined previously. On the Job page, you can see that the job ran successfully, creating 1000 rows of synthetic data to work within just a few seconds.
-
-
Verify that the data was created in Neon by navigating to the Neon Console and selecting the Tables from the sidebar. Your data should be visible in the
public.users
table.
Conclusion
In this guide, we stepped through how to seed your Neon database using Neosync. This was a minimal example, but you can follow the same steps to generate tens of thousands or more rows of data. The ability to easily generate synthetic data is particularly helpful if you're working on a new application and don't have data yet or want to augment your existing database with more data for performance testing.
Neosync is also able to handle referential integrity in case you need to generate data for tables linked by referential integrity constraints.