FIX SCCM SQL Based Database Replication Failure Between CAS Primary| ConfigMgr

0
FIX SCCM SQL Based Database Replication Failure

A quick post about FIX SCCM SQL Based Database Replication. SQL based replicated issues are common when you have a Configuration Manager hierarchy with CAS, Primary, or Secondary servers. Let’s check what the necessary troubleshooting steps an SCCM admin can perform.

Introduction

The replication was failed for Multiple groups and tried to run Replication Link Analyzer, the replication Link Analyzer suggested to reinit and fix the issue. Proceeded with the suggested step and it started reinit for replication Group “Configuration Data”.

The “Configuration Data” re-initialization was stuck for more than 4-5 hours with status 2 on Primary and status 4 on CAS for the replication group “Configuration Data” and it was not moving.

We have tried with restart SMS_executive on CAS and Primary sites but it did not help. Also tried with fail over on primary site SQL Server.

Umar has a detailed blog about SQL Based Replication and my recommendation is to read his blog to understand SQL based replication technologies in detail.

I would also recommend reading Sudheesh’s blog about SCCM.PUB file-based manual SQL based replication. More details –  https://blogs.technet.microsoft.com/sudheesn.

Microsoft Documentation-ConfigMgr DRS Troubleshooting https://support.microsoft

Altaro Office 365 Backup
Advertisement Altaro Office 365 Backup

Verify SCCM Replication issue Groups from SCCM SQL DB Queries

If DRS data replication is getting failed from Primary to CAS and vice versa, follow the below steps to fix the issue.

Run “spdiagdrs” and verify if any messages pending in outgoing messages queue and see which DRS replications are failed for global and site.

Below is the query to track the re-initialization status of ‘Configuration Data’ on CAS and  PR1 site CM database. 

SELECT * FROM RCM_DrsInitializationTracking WHERE ReplicationGroup = 'Configuration Data' and SiteRequesting = 'PR1' order by CreatedTime desc

Found both the site was trying to finish the re initialization for replication group ‘Configuration Data’ with different replication Request Tracking GUID

On primary Site (PRI): The request tracking GUID ending with E4F with status 2 and the same id was Aborted on CAS site.

On Central Site (CAS) : The request tracking GUID ending with 73C with status 4 and the same id was Aborted on PR1 primary site.

FIX SCCM SQL Based Database Replication
FIX SCCM SQL Based Database Replication

Hear one site was sending the process and other was not accepting due the aborted ( the request tracking status 7).

NOTE: It’s not recommended to update/modify SQL Database directly. If you see any error on SQL, then my recommendation is to raise a ticket with MS and get it fixed. Or You can try backing up SQL DB before running the following modification queries.

Solution 1 – FIX SCCM SQL Based Database Replication

Aborted the stuck replication Request Tracking GUID on Central Site CAS and Primary Site PR1 CM database using the below query. ( Change the request tracking GUID in Below

UPDATE CM_DrsInitializationTracking SET InitializationStatus =7 WHERE RequestTrackingGUID = '<<Request Tracking GUID>>'
  • After ran the Above query on both the site new process for reinit RG “Configuration Data” started automatically, and reinit completed, the site came to active.
  • If initialization did not start automatically, we can use below SQL command for reinit it manually.
  • To initialize the replication group, you can run the below SQL query on the Primary site or CAS site CM database.  
EXEC spDrsSendSubscriptionInvalid '<ReceivingSiteCode/Subscriber>',  '<SendingSiteCode/Publisher>', '<ReplicationGroupName>'

Example:

The replication are failed for Configuration data from PRI to CAS .

EXEC spDrsSendSubscriptionInvalid 'PRI', 'CAS', 'Configuration Data'

The replication are failed for “Configuration data” from CAS to PRI .

EXEC spDrsSendSubscriptionInvalid 'CAS', 'PRI', 'Configuration Data'

Query to check Initialization status.

select InitializationPercent, InitializationStatus, TryCount,* from RCM_DrsInitializationTracking where InitializationStatus not in (6,7)order by CreatedTime desc;

Initialization Status

  • 1 is Making Request
  • 2 is Sent BCP file
  • 3 is Acknowledgement from CAS Server
  • 4 is BCP Finished
  • 5 is CAS Prepare CAB file, after copy of CAB to Primary state changes to
  • 6 is Good
  • 7 is Previous attempt is aborted

Solution 2 : FIX SCCM SQL Based Database Replication

If DRS data replication is getting failed from Primary to CAS and vice versa, follow the below steps to fix the issue.

Run spdiagdrs and verify if any messages pending in outgoing messages queue and see which DRS replications are failed for global and site.

Run below queries on CAS and Primary DB to see if any backlogs.

Select * from DrssendHistory where ProcessedTime is NULL
Select * from RCM_ReplicationLinkStatus where snapshotapplied <>1
select * from sys.transmission_queue order by enqueue_time desc

If you see any transmission backlogs run the below query to cleanup backlogs

declare @conversation uniqueidentifier while exists (select 1 from sys.transmission_queue ) begin set @conversation = (select top 1 conversation_handle from sys.transmission_queue ) end conversation @conversation with cleanup end

Global data replication Re-initiation:

  • Create a .PUB file in rcm inbox folder on primary site for failed global data group
FIX SCCM SQL Based Database Replication
FIX SCCM SQL Based Database Replication
  • Verify rcmctrl.log for data processing in primary server
FIX SCCM SQL Based Database Replication
FIX SCCM SQL Based Database Replication
  • You can check the percentage initialization with below query.
Select * from RCM_DrsInitializationTracking where InitializationStatus not in (6, 7) order by createdtime desc
FIX SCCM SQL Based Database Replication
FIX SCCM SQL Based Database Replication
  • Once the existing data group replication has completed you can re-initiate other global data groups one by one

NOTE! – Do not re-initiate two data groups at time, we’ll need to do it one at a time and wait for first one to be completed

Next Step is re-initiate failed site DRS data one by one from Primary server

Site data replication Re-initiation:

  • Create a .PUB file in CAS server rcm inbox folder for the failed site data group
FIX SCCM SQL Based Database Replication
FIX SCCM SQL Based Database Replication
  • Verify rcmctrl.log for data processing in primary server
FIX SCCM SQL Based Database Replication
FIX SCCM SQL Based Database Replication
  • You can check the percentage initialization with below query.
Select * from RCM_DrsInitializationTracking where InitializationStatus not in (6, 7) order by createdtime desc
FIX SCCM SQL Based Database Replication
FIX SCCM SQL Based Database Replication
  • At 41 percentage you see below folders created on rcm box
FIX SCCM SQL Based Database Replication
FIX SCCM SQL Based Database Replication
  • You can below message in rcmctrl.log once data group replication is completed
FIX SCCM SQL Based Database Replication
FIX SCCM SQL Based Database Replication
  • Once the existing data group replication has completed you can re-initiate other site data groups one by one

NOTE! – Do not re-initiate two data groups at time, we’ll need to do it one at a time and wait for first one to be completed.

Resources

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.