Troubleshoot issues with SQL Data Sync
When you horizontally partitioning data across multiple SQL Azure databases or using Data Sync Server for SQL Azure, there might come a time when you need to write to a member database without causing primary key merge conflicts. In this case you need to be able to generate a primary key that is unique across all databases. In this article we will discuss different techniques to generate primary keys and their advantages and disadvantage.
The GUID is guaranteed to be unique across all databases. Another option is to use a bigint data type in place of an int. In this technique, the primary key is generated from being an identity column; however each identity in each database starts at a different offset.
Different offset create the non-conflicting primary keys. The first question most people ask, is bigint data type big enough to represent all the primary keys need. The bigInt data type can be as large as 9,, because it is stored in 8 bytes. This is 4, times bigger than the maximum size of an int data type: 2, This means that you could potentially have 4 billion SQL Azure databases horizontally partitioned with tables of around 2 billion rows.
More information about data types and sizes can be found here. In this technique a single identity database is built where all the primary keys are stored, however none of the data.
This identity database just has a set of matching tables that contain a single column of integers int data type as an auto incrementing identity. When an insert is needed on any of the tables across the whole partition, the data tier code inserts into the identity database and fetches the IDENTITY.
This primary key from the identity database is used as the primary key to insert into the member database or the partition. Because the identity database is generating the keys there is never a conflict. If all your tables where single column integers in the primary key database you could have 25, tables with two million rows table size of 2 Megabytes in a 50 Gigabyte SQL Azure database. Or some combination of that, like 12, tables of 4 million rows, or 6, tables of 8 million rows. Another technique is to use two columns to represent the primary key.
The first column is an integer that specifies the partition or the member database. With multiple member or partition databases the second column would have conflicts, however together the two columns would create a unique primary key.
Change your preferences any time. Stack Overflow for Teams is a private, secure spot for you and your coworkers to find and share information. Learn more. Asked 1 year, 11 months ago. Active 5 months ago. Viewed 6k times. Any ideas? Related info: dba. What is your use case? Any Suggestions over same. Active Oldest Votes.
This right here. Please mark it as an answer if it answered your question. If azure sql dw does not support primary and foreign keys then how azure calls it a relational since the relationship between tables is established by these keys?
AecorSoft AecorSoft 2 2 silver badges 7 7 bronze badges. Sign up or log in Sign up using Google. Sign up using Facebook. Sign up using Email and Password. Post as a guest Name. Email Required, but never shown. The Overflow Blog. Podcast Programming tutorials can be a real drag. Featured on Meta. Community and Moderator guidelines for escalating issues via new response….
For sync setup. For ongoing sync. For deprovisioning. Azure SQL Database supports only a single set of credentials. To accomplish these tasks within this constraint, consider the following options:. When you create a new SQL Database instance, set the maximum size so that it's always larger than the database you deploy. If you don't set the maximum size to larger than the deployed database, sync fails. Ensure that you stay within the SQL Database instance size limits.
Ensure that you account for this metadata when you calculate space needed. The amount of added overhead is related to the width of the tables for example, narrow tables require more overhead and the amount of traffic.
You don't have to include all the tables that are in a database in a sync group. The tables that you include in a sync group affect efficiency and costs. Include tables, and the tables they are dependent on, in a sync group only if business needs require it. Each table in a sync group must have a primary key. Empty tables provide the best performance at initialization time. If the target table is empty, Data Sync uses bulk insert to load the data. Otherwise, Data Sync does a row-by-row comparison and insertion to check for conflicts.
If performance is not a concern, however, you can set up sync between tables that already contain data. To minimize latency, keep the hub database close to the greatest concentration of the sync group's database traffic. Apply the preceding guidelines to complex sync group configurations, such as those that are a mix of enterprise-to-cloud and cloud-to-cloud scenarios. In this section, we discuss the initial sync of a sync group. Learn how to help prevent an initial sync from taking longer and being more costly than necessary.
When you create a sync group, start with data in only one database.Set up synchronization. In the browser, navigate to Azure portaland locate the SQL databases tab:.
Click on the Sync to other databases command and the Data Sync page will open in which the configuration of data synchronization will be started:. As it can be seen on this page, currently there is no any sync group and any sync agent. The sync agent needs to be installed on-premises database server. More about sync agent will be discussed later in this article. To start configuration, click the New Sync Group button and the New sync group page opens:. In the Sync Group Name box, enter the name for the new sync group e.
Under the Sync Metadata Database group, choose the New database or Using existing database radio button. For this example, the New database radio button will be used to create a new database in which will be placed sync metadata and logs. In the Name box, enter a new name Sync Database for the sync metadata database, configure the other options on this page and press the OK button. Under the Automatic Sync section, choose whether the data synchronization should be automatic — by pressing the On button or manually — by pressing the Off button.
If the On button is pressed, the Sync Frequency section will appear. Here can be set how frequent the data synchronization will occur:. Under the Conflict Resolution, one of two options can be chosen in case a conflict occurs:.
The first option is the Hub win. If this option is chosen and the confits occur, the data in the hub database overwrite the conflicting data in the member database.
The second option is Member win. In a conflict situation, the data in the member database will overwrite the data in the hub database. As an example, in the Conflict Resolution, drop down box the Hub win option will be chosen.
After creating sync group, the Add sync members section will be enabled for configuration. If the local sync agent is already installed and set on the machine, on the Select Sync Agent tab, pick the Existing agents radio button and from the combo box choose the agent. In case where the sync agent is not set, choose the Create a new agent radio button.
Additional three steps under the Select Sync Agent tab will appear:. Press the Next button to continue. On the License Agreement and Privacy Information window, read the agreement and if you agree, select the I Agree radio button and press the Next button:.
On this page, enter the Windows credentials and press the Next button:. This page indicates that the installation packed is ready to be installed, press the Next button to install the sync agent:. Now, back to the Azure portal and, in the Agent Name box, enter the name for the sync agent and press the Create and Generate Key button:. After a while, the agent key will be generated in the Generate an agent key box:.
The Sync Metadata Database Configuration window will appear:. In the Agent Key field, paste the copied agent key and in the Login and Password fields, enter the existing credentials for the Azure SQL Database server where the Hub database is located:.
To test if everything is ok, press the Test Connection button.For sync setup. For ongoing sync. For deprovisioning. Azure SQL Database supports only a single set of credentials. To accomplish these tasks within this constraint, consider the following options:. When you create a new SQL Database instance, set the maximum size so that it's always larger than the database you deploy. If you don't set the maximum size to larger than the deployed database, sync fails. Ensure that you stay within the SQL Database instance size limits.
SQL Data Sync stores additional metadata with each database. Ensure that you account for this metadata when you calculate space needed. The amount of added overhead is related to the width of the tables for example, narrow tables require more overhead and the amount of traffic. You don't have to include all the tables that are in a database in a sync group. The tables that you include in a sync group affect efficiency and costs.
Include tables, and the tables they are dependent on, in a sync group only if business needs require it. Each table in a sync group must have a primary key. Empty tables provide the best performance at initialization time. If the target table is empty, Data Sync uses bulk insert to load the data. Otherwise, Data Sync does a row-by-row comparison and insertion to check for conflicts.
If performance is not a concern, however, you can set up sync between tables that already contain data. To minimize latency, keep the hub database close to the greatest concentration of the sync group's database traffic. Apply the preceding guidelines to complex sync group configurations, such as those that are a mix of enterprise-to-cloud and cloud-to-cloud scenarios. In this section, we discuss the initial sync of a sync group.
Learn how to help prevent an initial sync from taking longer and being more costly than necessary. When you create a sync group, start with data in only one database. If you have data in multiple databases, SQL Data Sync treats each row as a conflict that needs to be resolved.
This conflict resolution causes the initial sync to go slowly.
How to Sync Azure SQL databases and on-premises databases with SQL Data Sync
If you have data in multiple databases, initial sync might take between several days and several months, depending on the database size.If there is a resolution for an issue, it's provided here. Sync fails in the portal UI for on-premises databases that are associated with the client agent. My sync group is stuck in the processing state.
I see inconsistent primary key data after a successful sync. I see a significant degradation in performance. Column does not allow nulls. How does Data Sync handle circular references? That is, when the same data is synced in multiple sync groups, and keeps changing as a result? On the local computer that's running the agent, you see System.
Tutorial: Set up SQL Data Sync between Azure SQL Database and SQL Server on-premises
IOException errors in the Event Log. The errors say that the disk has insufficient space. It doesn't respond to the stop command, and the logs show no new entries. Any of the following conditions might result in a sync group being stuck in the processing state:.
If the preceding information doesn't move your sync group out of the processing state, Microsoft Support can reset the status of your sync group. In the post, include your subscription ID and the sync group ID for the group that needs to be reset. A Microsoft Support engineer will respond to your post, and will let you know when the status has been reset. If tables that have the same name but which are from different database schemas are included in a sync, you see erroneous data in the tables after the sync.
The SQL Data Sync provisioning process uses the same tracking tables for tables that have the same name but which are in different schemas. Because of this, changes from both tables are reflected in the same tracking table. This causes erroneous data changes during sync.
Ensure that the names of tables that are involved in a sync are different, even if the tables belong to different schemas in a database. A sync is reported as successful, and the log shows no failed or skipped rows, but you observe that primary key data is inconsistent among the databases in the sync group.
This result is by design. Changes in any primary key column result in inconsistent data in the rows where the primary key was changed. To prevent this issue, ensure that no data in a primary key column is changed. To fix this issue after it has occurred, delete the row that has inconsistent data from all endpoints in the sync group. Then, reinsert the row. Your performance degrades significantly, possibly to the point where you can't even open the Data Sync UI.
The most likely cause is a sync loop. A sync loop occurs when a sync by sync group A triggers a sync by sync group B, which then triggers a sync by sync group A. The actual situation might be more complex, and it might involve more than two sync groups in the loop.Here are the main use cases for Data Sync:. Data Sync is based around the concept of a Sync Group. A Sync Group is a group of databases that you want to synchronize.
Data Sync uses a hub and spoke topology to synchronize data. You define one of the databases in the sync group as the Hub Database. The rest of the databases are member databases. Sync occurs only between the Hub and individual members. If you're using an on premises database as a member database, you have to install and configure a local sync agent. Since Data Sync is trigger-based, transactional consistency isn't guaranteed.
Microsoft guarantees that all changes are made eventually and that Data Sync doesn't cause data loss. Data Sync uses insert, update, and delete triggers to track changes. It creates side tables in the user database for change tracking. These change tracking activities have an impact on your database workload.
Assess your service tier and upgrade if needed. Provisioning and deprovisioning during sync group creation, update, and deletion may also impact the database performance. There may be up to 30 endpoints in a single sync group if there is only one sync group. If there is more than one sync group, the total number of endpoints across all sync groups cannot exceed If a database belongs to multiple sync groups, it is counted as multiple endpoints, not one.
Best practices for SQL Data Sync
However, you still collect data transfer charges for data movement in and out of your SQL Database instance. For more info, see SQL Database pricing. Not directly. You can sync between SQL Server on-premises databases indirectly, however, by creating a Hub database in Azure, and then adding the on-premises databases to the sync group.
You can sync between SQL Databases that belong to resource groups owned by different subscriptions. You can sync between SQL Databases that belong to different clouds, you have to use PowerShell to add the sync members that belong to the different subscriptions.
Create the schema manually in the new database by scripting it from the original. After you create the schema, add the tables to a sync group to copy the data and keep it synced.
You can't back up and restore to a specific point in time because SQL Data Sync synchronizations are not versioned. Furthermore, SQL Data Sync does not back up other SQL objects, such as stored procedures, and doesn't do the equivalent of a restore operation quickly. Do you have to update the schema of a database in a sync group?
Schema changes aren't automatically replicated. For some solutions, see the following articles:. To monitor activity and troubleshoot issues, see the following articles:.
You may also leave feedback directly on GitHub.
Skip to main content. Exit focus mode. Learn at your own pace.