guy labs guy labs

VMware Snapshot and recovery: fix active directory replication

Two weeks ago I tried to install the newest updates onto one of our virtual domain controllers. Shortly explained it was not the best idea and I had to recover to the snapshot taken just before I launched the update process, lucky enough I remembered to take one. 🙂

Yesterday morning I was told that the active directory content was different on both domain controllers. I found that hard to believe and had to take a look myself and yes, the content was different. How did that happen?

I knew recovering from a snapshot may result in issues but I didn’t see it in the first place. After looking around I found out I had a so called USN rollback to do because of a “dirty” rollback of the active directory. Microsoft is aware of the issue and has posted an article about it: http://support.microsoft.com/default.aspx?scid=kb;EN-US;875495

So the way would have been demoting and promoting a domain controller, and this during the day? No way… there had to be another solution. So I checked the replication status:

Result was:

This was odd, I checked the USN on both machines and they were identical. So the replication was not a total failure, but somehow not working as usual. Next I wanted to make sure the inbound and outbound replication were working and that the global catalog was still working:

And there it was, the output revealed the problem:

Inbound and outbound replication were disabled on the recovered virtual domain controller. So enabling those two parameters did the trick:

After waiting for a minute I double checked repadmin:

Sync was back to normal, and having a look into active directory revealed that all objects were synced again. Make sure to configure sync directions in AD sites and services before you enable the sync again to sync in the right direction.

If somehow you have to revert a domain controller from a backup or a snapshot make sure to read the technet articles on how to do it. Also consider reading into technical manual of your backup software to make sure to restore by best practice to avoid failures like these.

76,380 total views, 20 views today

Share Post :

27 Comments

  • JR
    December 6, 2013 10:25 PM at 10:25 PM 

    Hi, that did not work for me. After few sec Netlogon service had been paused and replication disabled again. My recovered DC was not hosting FSMO, could be this a difference/reason? J.

    • alan
      December 10, 2013 3:45 PM at 3:45 PM 

      Hi JR
      I’m sorry to hear this has not worked out for you. Could you please post your output of “REPADMIN /OPTIONS *”?
      I’m not sure but don’t think that FSMO roles have something to do with it. Maybe limiting replication into one direction until failed DC is back to same USN would help.
      Cheers
      Alan

  • Unknown
    June 11, 2014 9:09 AM at 9:09 AM 

    Thanks much. probably saved me from getting fired.

  • Blaise K
    December 17, 2014 8:29 AM at 8:29 AM 

    This article saved me about 10 hours of overnight work. Thank you sir!

    • alan
      December 18, 2014 9:11 AM at 9:11 AM 

      Glad my post could help, enjoy your well afforded Christmas time then. 🙂

  • rick5234
    February 2, 2015 4:59 PM at 4:59 PM 

    I execute these commands to remove the replication restrictions, but they don’t work. The DISABLE_INBOUND and DISABLE_OUTBOUND don’t go away. I need to remove AD and rename this server, but I can’t do it. Any suggestions? Thanks.

    • alan
      February 23, 2015 10:20 AM at 10:20 AM 

      Sorry about the late response, was away.
      Could you please post the result after entering the commands? Pretty hard to suggest what to do without any results.
      Thanks.

  • Davey G
    February 22, 2015 7:46 PM at 7:46 PM 

    Add another to your list of successful fixes. Many thanks for publishing this.

    I inherited an ESX environment and never realized the parent domain DC was restored via a snapshot (as it just worked) and couldn’t work out why I could not add another DC.

    Pin a medal upon your chest sir.

  • Anthony
    March 31, 2015 1:06 PM at 1:06 PM 

    Can this fix be put into a script to run automatically after restoring from a snapshot? Thanks!

    • alan
      April 26, 2015 6:15 PM at 6:15 PM 

      Hi Anthony

      If I understand you correctly this is something you would put in a script after recovering from a snapshot. I think there’s nothing stating against but I would strongly advise you not to recover a DC to often as this procedure is neither covered by Microsoft (AFAIK) neither by any virtualization software provider.

      Why would you write a script for this?

      Cheers

  • chandra
    April 21, 2015 4:45 AM at 4:45 AM 

    Hi Alan,

    Does this work even in case of USNs not matching? I ended up in same situation as yours. my repadmin /options* output is similar as yours, but USN s on 2 DCs are different, the DC which was reverted back to snapshot has higher USN than the good DC.. Please suggest options to avoid a repromoting the server.

    • alan
      April 26, 2015 6:13 PM at 6:13 PM 

      Hi chandra

      As far as I can think of this should not interfere with a different USN. Make sure you check which server holds the most accurate USN and enable inbound replication on the other DC holding a lower USN. Wait until USN are identical then re-enable outbound replication. That should solve your issue. Keep me posted if you need further assistance.

      Cheers

      • chandra
        May 1, 2015 3:29 AM at 3:29 AM 

        Hi Alan,

        Thanks for your help,will try your solution.

        Regards
        Chandra

  • Davina
    April 24, 2015 10:55 PM at 10:55 PM 

    This fix was a huge help in my environment where it wouldn’t replicate. We’d spent half a day on it before I stumbled upon your post. Thank you!

  • Rob
    May 14, 2015 9:23 PM at 9:23 PM 

    Those of you who are trying to re-enable the replication options in repadmin but having them fail, check to see if there is some other failure in SYSVOL replication going on. Look in the DFSR event log for problems such as a corrupt DFSR database or disk space pressure in the staging areas. These issues will need to be resolved before repadmin will successfully allow DISABLE_INBOUND_REPL DISABLE_OUTBOUND_REPL options to be turned off. Also make sure to unpause NETLOGON before trying to set those options.

  • Mikael
    January 6, 2016 5:19 PM at 5:19 PM 

    Holy crap, your simple solution did the trick. Did not want to demote and all the other steps. Much appreciated!

    • alan
      January 18, 2016 3:44 PM at 3:44 PM 

      Hi Mikeal
      Glad it helped.
      Cheers

  • Vishal
    February 17, 2016 4:09 AM at 4:09 AM 

    hi Alan,

    I am trying to re-enable the replication. But, after some times, it falls back to disabled. Any suggestions?

    Vishal

    • alan
      April 7, 2016 4:18 PM at 4:18 PM 

      Hi Vishal
      Sorry I just discovered your mail today, it was in SPAM after mail server replacement, so sorry. Hopefully this issue has been solved meanwhile, if not: how many domain controllers are we talking about? Have you tried only letting run the dc with most accurate USN for a short while before re-enabling replication? What is the output of repadmin /options* ? Best regards Alan

  • FRANK C
    March 10, 2016 6:27 PM at 6:27 PM 

    EXCELENTE DATO. MUCHAS GRACIAS

  • EDDYB
    April 8, 2016 12:30 PM at 12:30 PM 

    Thank you so much!!!! You saved me….

  • Justin
    June 16, 2016 9:00 PM at 9:00 PM 

    Alan,

    I was unable to fix my issue with your procedure, but was able to fix the situation at fault which was for me a weird network issue with VMWare.

    Restored Snapshot of a VM will not join Network Domain

    After a restore the VM wll not join the domain, even after removing it from the domain and trying to re-add it. Upon re-Adding it to the domain it failed on a DNS error “Could not be found” Invalid Default Gateway. After troubleshooting the root cause was found to be the network adapter of the VM itself.

    Go into the Edit Settings… of the affected VM. Edit the Network Adapter>Network Connection>Network Label.

    In this case they APPEARED to be correct, but when I changed them all of a sudden the machine could join the domain!

  • Yizairalie Pabon
    October 11, 2016 3:58 PM at 3:58 PM 

    Any other ideas of why it goes back to reject replications???? The -disable inbound and outbound works for 1 second and if i run the command /options * the second time it goes back to disable! I check DFS Rep issue and have nothing.

  • Bernie
    April 15, 2017 2:58 PM at 2:58 PM 

    Thank you you made my day

  • Mullah
    October 22, 2017 1:34 PM at 1:34 PM 

    One of the best solution i have seen ever! Very simple and tricky. Appreciating your experiences in this regard.
    I have just resolved this issue from this blog last weekend.

    I need to know why the “dsa not writable is still showing value as 4” in the affected dc (HKLM\System\CurrentControlSet\Services\NTDS\Paramaters)?
    What would be the possible affect if i left that value?….a bit worried.

  • B3m
    November 8, 2017 4:29 PM at 4:29 PM 

    You saved us, thanks.

Leave a Reply

Time limit is exhausted. Please reload CAPTCHA.

This site uses Akismet to reduce spam. Learn how your comment data is processed.