How to Fix SCCM ConfigMgr Inbox Backlog Issues

10

You can have a look at the Site to Site Replication post for more details about the new replication model. This post is a continuation site to site replication post. A number of Stored Procedures can be used to find more details about the backlog along with monitoring of Transmission Queue.The first and very useful one is the stored procedure called “spDiagDRS”. Run “EXEC spDiagDRS” to get the below results (shown in the pic).

CM12Backlog.jpg

The stored procedure “spDiagDRS” will offer details about queued messages. Have a look at the columns named “OutgoingMessagesInQueue” and “IncomingMessagesInQueue”. In ideal scenario, there should NOT be any queued messages and the values of those columns should be ZERO. In my example, “OutgoingMessagesInQueue” is 257 that means some error in the send and we have a backlog.The stored procedure “spDiagDRS” will also tell us about Status and LastSyncTime of each Replication Group. In my example, the SiteSending is CAS and SiteReceiving is PR1.

imageimage

Apart from spDiagDRS, there are some very useful stored procedures that we can use at the time of backlog troubleshooting. See, the list of Stored Procedures below.

image

More details about these stored procedures in future blog posts …. For the time being, you can check out the following examples of these along with parameters.

EXEC spDiagMessagesInQueue

EXEC spDiagGetReplicationGroupStats ‘Configuration Data’, ‘PR1’

EXEC spDiagGetProcedureStats ‘100’

EXEC spDiagGetQueryStats ’10’

EXEC spDiagGetRunningQueries ’10’

EXEC spDiagStartTrace

EXEC spDiagStopTrace

Transmission Queue is the another option that we should look at, incase of a backlog (or the out going messages are stuck). All the other queues (ConfigMgrDRSSiteQueue, ConfigMgrRCMQueue, ConfigMgrDRSMsgBuilderQueue, ConfigMgrDRSQueue etc. ) shown in the following pic are application related queues.

image

To check Transmission Queue, you need to run the below SQL query. With the below query, we can check Transmission for a particular primary site (in the below query – CAS server site code = CAS. Primary site code is PR1).

SELECT TOP 1000 *, casted_message_body =
CASE message_type_name WHEN ‘X’
THEN CAST(message_body AS NVARCHAR(MAX))
ELSE message_body
END
FROM [CM_CAS].[sys].[transmission_queue] where to_service_name = ‘ConfigMgrDRS_SitePR1’

In the below pic, you can see the records waiting for transmit. Have a look at the “transmission_status” column, this will provide more details about any transmission errors. This will be very helpful for further troubleshooting.

image

image

vLogs view is the DRS (Data Replication Service) log file. This will provide us more details about DRS process and backlog.

Run the following SQL query – “Select top 1000 * from vLogs order by LogTime desc” to get more details about these logs.

RCM_ReplicationLinkStatus table can also provide us more details about the link status of between the sites.

Run the SQL Query – “select * from RCM_ReplicationLinkStatus” . Look at the StatusName column for more details like Failed, Degraded etc..

image

TRACE stored procedure – For In-depth analysis of backlog . This can be performed by using following stored procedures. Caution – this will create lot of overhead on SQL server also use lot of disk space because of the creation ConfigMgrDBTrace.trc file.

EXEC spDiagStartTrace

EXEC spDiagStopTrace

You can start the trace process with “EXEC spDiagStartTrace”.  This process will create  .trc (trace file) – in the SQL installed location “C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\DATA\ConfigMgrDBTrace.trc”. Also, this process will start tracing each and every event of the SQL server. To stop trace use “EXEC spDiagStopTrace”. Ensure that you STOP the trace ASAP otherwise it may create some adverse impact on the server.

You can use SQL Server Profiler to open the .trc file. You will get depth details about each event performed by SQL server during the time of TRACE. like Duration, EventClass, StartTime etc.

image

10 COMMENTS

  1. Hi Anoop, good one. So, as per this I see that I have 13412 in “OutgoingMessagesInQueue”. Now, I certain points on which I need clarification are:-

    1. Is this because of these many messages are there in queue, my database replication status showing as “Link has failed” ? If yes, how can I push them or probably clear them to make the link active?
    2. If not, what could be the reason for my link showing as failed when I have verified that all the necessary ports are open and as a matter of fact, it worked till couple of days back.
    3. I understand that these backlogs will get generated when the link breaks(hope my understanding is correct). If yes, when the network link gets rectified shouldn’t be these backlogs start getting pushed and make the database replication healthy?

    I am not a SCCM guy, so confused within few concepts. I would really appreciate if you can throw some light on it.

  2. I have entries in my ConfigMgrRCMQueue related to an old site.
    This caused the ConfigMgrRCMQueue te become disabled and I can’t enable it. Now my other links to other secondaries is down.
    Any idea how to get removed the entries in ConfigMgrRCMQueue so that it can be enabled again ?

  3. Hi Anoop,

    I have entries in my ConfigMgrRCMQueue which caused all my links to be down.
    Could you tell me how i can clear the entries from ConfigMgrRCMQueue ?
    Those entries are related to old secondary sites 🙁

  4. Hello,

    I am facing the following issue:

    We have 4 Primary sites and one Central CAS Server. We lost the The CAS site by a HD failure. We only have a recent backup of the Site CAS Database only no backup for any configuration for the site.

    We Prepared a new server did a Fresh installation of Windows 2008 R2 Sp1 install all Prerequisite for System Center Configuration Manager 2012 Sp1 and Run the Setup for Sysem Center Configuration Manager. I choose reinstall

    1- Recover A Site
    2- Reinstall this site Server (CAS Server)
    3- Use a site Database that has been manually recovered. (I restored the last backup of the database)

    Follow the Wizard, and the installation completed successfully.

    Now when I am opening my CAS Console it is Read only mode. It has been like this since 24 hrs so far and no sign of improvement.

    If I open any other primary site it is also in Read Only mode. All Site are in the unknow state, see attachements.

    Is this a normal behavior? How Can I stop this replication?

    What should I do next?

    any helps,

    Thanks,

LEAVE A REPLY

Please enter your comment!
Please enter your name here