January 15, 2025

2 Fast 2 MCM

These visual representations helps understand the complex workflows within the Machine Controller Manager.

Machine Controller Manager Architecture

The system consists of three main controllers working in concert
Each controller handles specific aspects of machine lifecycle management
Interfaces with both cloud providers and Kubernetes clusters
Manages the full lifecycle of machines from creation to deletion

Let’s start with an overview of the main components and their interactions:

stateDiagram-v2
    direction TB
    
    state "Machine Controller Manager" as MCM {
        state "Machine Controller" as MC
        state "Safety Controller" as SC
        state "MCM Controller" as MCMC
        
        [*] --> MC
        [*] --> SC
        [*] --> MCMC
    }
    
    state "Cloud Provider" as CP {
        VMs
        API
    }
    
    state "Kubernetes Cluster" as K8S {
        state "Control Plane" as CP_K8S {
            API_Server
            etcd
        }
        
        state "Node Components" as NC {
            kubelet
            container_runtime
        }
    }
    
    MCM --> CP : Manages VMs
    MCM --> K8S : Manages Nodes
    
    note right of MCM
        Handles:
        - Machine lifecycle
        - Safety checks
        - Deployments/Sets
    end note

Machine Controller Core Flows

Now, let’s dive into the Machine Controller’s core reconciliation flows for different resources. It handles three main types of reconciliation:

Secret Reconciliation: Manages secrets referenced by MachineClasses
MachineClass Reconciliation: Handles machine class lifecycle
Machine Reconciliation: Core machine lifecycle management

---
  config:
    layout: elk
---
stateDiagram-v2
    state "Machine Controller" as MC {
        state "Secret Reconciliation" as SR {
            [*] --> FetchSecret
            FetchSecret --> GetMachineClass
            GetMachineClass --> CheckReferences
            CheckReferences --> FinalizerAdd : Has References
            CheckReferences --> FinalizerRemove : No References
            FinalizerAdd --> [*]
            FinalizerRemove --> [*]
        }

        state "MachineClass Reconciliation" as MCR {
            [*] --> FetchClass
            FetchClass --> GetMachines
            GetMachines --> CheckMachines
            CheckMachines --> AddFinalizer : Has Machines
            CheckMachines --> RemoveFinalizer : No Machines
            AddFinalizer --> EnqueueMachines
            EnqueueMachines --> [*]
            RemoveFinalizer --> [*]
        }

        state "Machine Reconciliation" as MR {
            [*] --> FetchMachine
            FetchMachine --> CheckFrozen
            
            CheckFrozen --> ValidateMachine : Not Frozen
            CheckFrozen --> RetryLater : Frozen
            
            ValidateMachine --> ValidateMachineClass
            VaildateMachineClass --> DeletionTimestamp

            DeletionTimestamp --> DeletionFlow : Deletion Requested
            DeletionTimestamp --> AddFinalizers : No Deletion
            
            AddFinalizers --> CheckPhase&NodeLabel
            
            CheckPhase&NodeLabel --> ReconcileHealth : Has Node & Non-empty phase
            CheckPhase&NodeLabel --> CreationFlow : No Node or<br/>CrashLoopBackOff<br/>or EmptyPhase
            
            ReconcileHealth --> SyncNodeName
            SyncNodeName --> SyncTemplates
            SyncTemplates --> [*]
            
            CreationFlow --> [*]
            DeletionFlow --> [*]
        }
    }

Machine Creation

Machine Creation Flow:

Complex process involving multiple status checks
Handles initialization and error cases
Includes node verification and cleanup of stale resources
Multiple retry mechanisms for resilience

---
  config:
    look: handDrawn
---
stateDiagram-v2
    classDef imp font-weight:bold,stroke-width:5px;
        state "From <u>CreateResponse</u>: Assign Node Name & ProviderID" as ANPIDCMR
        state "From <u>GetMachineStatusResponse</u>: Assign Node Name & ProviderID" as ANPIDGMS
        state "From <u>GetMachineStatusResponse</u>: Assign Node Name & ProviderID" as ANPIDGMSR
        state "Assign Node Name<br/>from Machine label" as ANML
        state "Phase: <i>Pending</i><br/>State: <i>Processing</i><br/>OpType: Create" as CPPP
        state "State: <i>Failed</i><br/>OpType: <i>Create</i>" as SFFF
        
        [*] --> AddBootToken&MachineName
        AddBootToken&MachineName --> GetMachineStatus:::imp
        
        GetMachineStatus:::imp --> ANPIDGMS : Success
        ANPIDGMS --> UpdateAnnotationsLabels
        UpdateAnnotationsLabels --> CPPP : Phase <i>""(empty) or CrashLoopBackOff</i>
        CPPP --> StatusUpdate
        StatusUpdate --> [*]
        
        GetMachineStatus:::imp --> CheckNodeExists : NotFound or Unimplemented
        CheckNodeExists --> ANML : Node Exists
        ANML --> UpdateAnnotationsLabels
        
        CheckNodeExists --> CreateMachine:::imp : No Node
        CreateMachine:::imp --> ANPIDCMR : Successful creation
        CreateMachine:::imp --> CheckFailurePhase : Creation Error
        ANPIDCMR --> SetUninitialized : Node name is Machine Name
        SetUninitialized --> UpdateAnnotationLabel
        UpdateAnnotationLabel --> InitializeMachine:::imp
        InitializeMachine:::imp --> [*]
        
        ANPIDCMR --> DeleteMachine:::imp : <u>Stale Node</u><br/>NodeName is not MachineName
        DeleteMachine:::imp --> SFFF: "VM using old node obj"
        
        GetMachineStatus:::imp --> ANPIDGMSR : Uninitialized
        ANPIDGMSR --> SetUninitialized
        
        GetMachineStatus:::imp --> CheckFailurePhase : Other Errors
        CheckFailurePhase --> Failed : Timeout
        CheckFailurePhase --> CrashLoopBackOff : Not timed out
        Failed --> SFFF
        CrashLoopBackOff --> SFFF
        
        SFFF --> [*]

Health Check

---
  config:
    layout: elk
---
stateDiagram-v2
    state "Health Reconciliation" as HR {
        state "Phase: <i>Unknown</i><br/>State: <i>Processing</i><br/>LastOp: <i>HealthChk</i>" as PUSP
        state "Phase: <i>Failed</i><br/>State: <i>Failed</i>" as PFSF
        state "LastOp State: Successful<br/>Phase: Running" as SSPR

        [*] --> GetMachineNode
        GetMachineNode --> PUSP : Not Found & RunningPhase<br/>Node object missing
        GetMachineNode --> Found

        Found --> MachineCondSetToNodeCond : NodeCondition != MachineCondition
        Found --> isHealthy : TODO (isHealthy)

        GetMachineNode --> CreationTimeout : PendingPhase
        GetMachineNode --> HealthTimeout : UnknownPhase

        CreationTimeout --> PFSF : Now - LastUpdateTime > Timeout
        HealthTimeout --> GetDeploymentName : Now - LastUpdateTime > Timeout
        CreationTimeout --> EnqueueAfter : Not timed out
        HealthTimeout --> EnqueueAfter : Not timed out


        GetDeploymentName --> RegisterPermit
        RegisterPermit --> TryMarkingMachineFailed
        TryMarkingMachineFailed --> InProgressMachines++ : Phase not<br/>Unknown or Running<br/>Machines "getting replaced"
        InProgressMachines++ --> PFSF:  InProgressMachines < MaxReplacements(1)

        MachineCondSetToNodeCond --> isHealthy
        isHealthy --> PUSP: Not Healthy & RunningPhase
        isHealthy --> CheckLastOp : Healthy & NotRunningPhase &<br/>NoCriticalComponentNotReadyTaint

        CheckLastOp --> DeleteBootstrapToken: TypeCreate &<br/> State is not Successful<br/>(Machine creation happened)
        CheckLastOp --> LastOpType=HealthChk: Not Create<br/>(Machine re-joined)

        DeleteBootstrapToken --> SSPR
        LastOpType=HealthChk --> SSPR

        SSPR --> UpdateStatus
        PUSP --> UpdateStatus
        PFSF --> UpdateStatus

        UpdateStatus --> [*]
        EnqueueAfter --> [*]
    }

Machine Deletion

Machine Deletion Flow:

Carefully orchestrated process to ensure clean resource cleanup
Involves multiple phases from drain to final cleanup
Handles volume attachments and node cleanup
Includes finalizer management for resource protection

---
  config:
    layout: elk
---
stateDiagram-v2
    state "Deletion Flow" as DF {
        direction LR
        state "ProcessPhase" as PP
        state "UpdateStatus" as US

        [*] --> CheckFinalizers
        CheckFinalizers --> SetTerminating
        SetTerminating --> PP

        PP --> GetVMStatus
        GetVMStatus --> [*]
        PP --> InitiateDrain
        InitiateDrain --> [*]
        PP --> DeleteVolumeAttachments
        DeleteVolumeAttachments --> [*]
        PP --> InitiateVMDeletion
        InitiateVMDeletion --> [*]
        PP --> InitiateNodeDeletion
        InitiateNodeDeletion --> [*]
        PP --> RemoveFinalizers
        RemoveFinalizers --> [*]
        PP --> US
        US --> [*]
    }

---
  config:
    layout: elk
---
stateDiagram-v2
    state "Initiate Drain" as ND {
        [*] --> ValidateNode
        state "UpdateStatus" as USD
        state "State: Processing<br/>Type: Delete" as SPTD
        state "CheckNodeCondition<br/>'Ready' or 'Read-only FS'" as CNC
        state "Phase is not Terminating" as NAT
        state "Terminating<br/>Reason: Unhealthy" as TRU
        state "Terminating<br/>Reason: ScaleDown" as TRSD
        state "SkipDrain<br/>State: Failed" as CUFail
        state "State: Processing<br/>Desc: DelVolAttachments" as SPDDVA
        state "State: Processing<br/>Desc: InitVMDeletion" as SPDIVD
        state "State: Failed<br/>Desc: InitiateDrain" as SFDID

        ValidateNode --> SPTD : NodeName is empty
        SPTD --> USD
        ValidateNode --> CNC
        CNC --> ForceDeletion : Read-Only/NotReady &<br/>Last-transition Timeout
        CNC --> NormalDrain : Healthy
        CNC --> ForceDeletion : "force-delete" label on machine or Drain<br/> Timeout on deletion

        ForceDeletion --> UpdateTerminationCondition
        NormalDrain --> UpdateTerminationCondition

        UpdateTerminationCondition --> RunDrain : Phase is empty or CrashLoopBackOff
        UpdateTerminationCondition --> NAT : Non-creation Phase
        NAT --> TRU : Phase is failed
        NAT --> TRSD : Phase not failed
        TRU --> TerminationConditionUpdate
        TRSD --> TerminationConditionUpdate

        TerminationConditionUpdate --> CUFail : Update failure<br/>during NormalDrain
        TerminationConditionUpdate --> RunDrain : Update failure<br/>during ForceDeletion
        TerminationConditionUpdate --> RunDrain : Update Successful
        CUFail --> USD

        RunDrain --> SPDDVA : Drain successful<br/>during ForceDeletion
        RunDrain --> SPDIVD : Drain successful<br/>during NormalDrain
        RunDrain --> SPDDVA : Drain failed<br/>"force-delete" label present
        RunDrain --> SFDID : Drain failed<br/>"force-delete" label absent

        SPDDVA --> USD
        SPDIVD --> USD
        SFDID --> USD

        USD --> [*]
    }

Let’s visualize the Node Drain process, which is a critical part of machine deletion:

Sophisticated pod eviction handling
Supports both forced and normal drain scenarios
Handles PDB (Pod Disruption Budget) violations
Includes parallel and serial eviction strategies

---
  config:
    layout: elk
---
stateDiagram-v2
    state "RunDrain" as Normal {
        state "CordonNode (Sealing off)<br/>(Set Unschedulable to true)" as CN
        [*] --> CN
        CN --> WaitForPodCacheSync
        WaitForPodCacheSync --> GetPodsForDeletion : TODO
        
        %% http://localhost:3000/machine-controller/node_drain.html#drainoptionsgetpodsfordeletion
        %% mirrorPodFilter: pod doesnt have MirrorPodAnnotation (set by kubelet when creating mirror pods)
        %% localStorageFilter
        %% unreplicatedFilter
        %% daemonSetFilter
        
        GetPodsForDeletion --> DeleteOrEvictPods

        DeleteOrEvictPods --> UpdateNodeCondition
        UpdateNodeCondition --> [*]
        
        state "DeleteOrEvictPods" as EP {
            [*] --> CheckEvictionSupport

            CheckEvictionSupport --> ParallelEviction : ForceDeletion
            CheckEvictionSupport --> MixedEviction : NormalDrain

            MixedEviction --> ParallelEvictNoPV
            MixedEviction --> SerialEvictWithPV

            ParallelEvictNoPV --> WaitForEviction
            SerialEvictWithPV --> WaitForEviction
            ParallelEviction --> WaitForEviction
            WaitForEviction --> HandlePDBViolation
            HandlePDBViolation --> RetryEviction
            RetryEviction --> [*]
        }
}

---
title: EvictPodsNoPV
---
stateDiagram-v2
    classDef imp font-weight:bold,stroke-width:5px;
        state "Retry count >= MaxEvictRetries" as Term
        state "Set attemptEvict as False" as AEF
        state "Sleep(EvictRetryInterval)" as SRC

        [*] --> Term:::imp

        Term:::imp --> CheckAttemptEvict : No
        Term:::imp --> AEF : Yes
        AEF --> CheckAttemptEvict

        CheckAttemptEvict --> EvictPod : True
        CheckAttemptEvict --> DeletePod : False

        EvictPod --> CheckErr
        DeletePod --> CheckErr

        CheckErr --> BreakLoop:::imp : nil
        CheckErr --> LogEvict : notFound
        CheckErr --> EvictFailErr : AttemptEvict is False
        CheckErr --> PDBViolation : APIErr too many req

        PDBViolation --> GetPDB

        GetPDB --> SRC : No PDB
        GetPDB --> CheckMisconfigured : PDB exists

        CheckMisconfigured --> MisconfigErr : Generation is ObserverGen<br/>HealthyPods >= ExpectedPods<br/>DisruptionsAllowed is 0
        CheckMisconfigured --> SRC : No

        SRC:::imp --> Term : count++


        BreakLoop:::imp --> ReturnSuccess:::imp : ForceDeletion
        BreakLoop:::imp --> GetTerminationGracePeriod : NormalDrain

        GetTerminationGracePeriod --> SetToTimeout : GracePeriod > Timeout
        GetTerminationGracePeriod --> WaitForDeletion : Grace < Timeout
        SetToTimeout --> WaitForDeletion

        WaitForDeletion --> TimeoutErr : timeout &<br/>pod exists
        WaitForDeletion --> WaitErr : err
        WaitForDeletion --> ReturnSuccess:::imp : timeout &<br/>pod deleted

        LogEvict --> [*]
        EvictFailErr --> [*]
        MisconfigErr --> [*]
        TimeoutErr --> [*]
        WaitErr --> [*]
        ReturnSuccess:::imp --> [*]

---
title: EvictPodsWithPV
config:
  layout: elk
---
stateDiagram-v2
    classDef imp font-weight:bold,stroke-width:5px;
        state "Retry count < MaxEvictRetries" as Term
        state "Sleep(EvictRetryInterval)" as SRC
        state "CheckRemainingPods" as CRP
        
        [*] --> SortPodsByPriority
        SortPodsByPriority --> podVolumeInfoMap
        note left of podVolumeInfoMap
            Creates a map from pod to list of attached PVs (VolName, VolID -> GetVolumeID)
        end note

        podVolumeInfoMap --> AttemptEvict
        AttemptEvict --> evictPodPVInternal(Delete):::imp : false
        AttemptEvict --> Term:::imp : true
        Term:::imp --> evictPodPVInternal(Evict):::imp : true
        evictPodPVInternal(Evict):::imp --> break:::imp : FastTrack or<br/>All pods evicted
        evictPodPVInternal(Evict):::imp --> SRC : Not FastTrack and<br/>Pods Remaining
        SRC --> Term:::imp : count++

        Term:::imp --> evictPodPVInternal(Delete):::imp : false<br/>Not FastTrack and<br/>Pods Remaining
        break:::imp --> [*] : All pods evicted

        break:::imp --> CRP : FastTrack
        evictPodPVInternal(Delete):::imp --> CRP

        CRP --> Success:::imp : Node Not Found
        CRP --> ChkAttemptEvict
        ChkAttemptEvict --> EvictErr : True
        ChkAttemptEvict --> DeleteErr : False

---
title: EvictPodsWithPVInternal
config:
  layout: elk
---
stateDiagram-v2
    classDef imp font-weight:bold,stroke-width:5px;
        state "Add Pod to RetryPods" as Retry
        state "Log NotFound<br/>DeleteWorker" as LogNotFound
        [*] --> SelectPod : Start Eviction Process

        SelectPod --> CheckContextTimeout:::imp

        CheckContextTimeout:::imp --> AbortProcess : Context Done
        CheckContextTimeout:::imp --> AddWorker(AttachmentHandler) : Context Not Done

        AddWorker(AttachmentHandler) --> EvictOrDelete

        EvictOrDelete --> CheckEvictionResult:::imp

        CheckEvictionResult:::imp --> EvictionFailed
        EvictionFailed --> PDBViolation : Eviction Attempted &<br/>TooManyRequests
        EvictionFailed --> PodAlreadyGone : Pod Not Found
        EvictionFailed --> EvictionError : Other Errors
        CheckEvictionResult:::imp --> WaitForVolumeDetach : Successful Eviction

        PDBViolation --> GetPDB
        GetPDB --> CheckMisconfigured : PDB Exists
        GetPDB --> Retry : NoPDB
        CheckMisconfigured --> MisconfigErr : Generation is ObserverGen<br/>HealthyPods >= ExpectedPods<br/>DisruptionsAllowed is 0
        CheckMisconfigured --> Retry:::imp : NotMisconfig
        MisconfigErr --> DeleteWorker

        PodAlreadyGone --> DeleteWorker

        EvictionError --> Retry:::imp

        WaitForVolumeDetach --> CheckDetachResult:::imp : TerminationGracePeriod + DetachTimeout

        CheckDetachResult:::imp --> LogNotFound : Node Not Found
        CheckDetachResult:::imp --> DetachError : Detach Failed
        CheckDetachResult:::imp --> WaitForReattach : Successful Detach

        LogNotFound --> AbortProcess
        DetachError --> DeleteWorker

        WaitForReattach --> CheckReattachResult:::imp : PvReattachTimeout

        CheckReattachResult:::imp --> ReattachTimeout : Timeout
        CheckReattachResult:::imp --> LogError : Reattach Failed
        CheckReattachResult:::imp --> SuccessfulEviction:::imp : Successful Reattach

        ReattachTimeout --> DeleteWorker : TODO IsThisCorrect?
        LogError --> DeleteWorker
        SuccessfulEviction:::imp --> DeleteWorker : Pod Processed

        DeleteWorker --> [*]
        Retry:::imp --> DeleteWorker
        AbortProcess --> Exit:::imp : Terminate (FastTrack)<br/>Return Remaining Pods

Safety Controller

Orphan VM Check:
- Runs periodically (every 15 minutes) to detect and clean up orphaned VMs
- Lists all VMs in the cloud provider matching the cluster’s tag
- Maps VMs to machine objects using ProviderID
- Handles nodes without machine objects:
  - Adds NotManagedByMCM annotation after timeout
  - Removes annotation if machine object is found
- Logs all cleanup operations for audit purposes
API Server Safety:
- Monitors connectivity to both control and target API servers
- Implements a freezing mechanism when API servers are unreachable
- Manages machine controller state based on API server health:
  - Freezes operations if timeout exceeded
  - Unfreezes when API servers become available
- Handles machine status updates during API server recovery

---
  config:
    layout: elk
---
stateDiagram-v2
    state "Safety Controller" as SC {
        state "Orphan VM Check" as OVC {
            [*] --> ListCloudVMs
            ListCloudVMs --> MapToMachines
            MapToMachines --> CheckOrphans
            
            state "CheckOrphans" as CO {
                [*] --> NoMachineObject
                NoMachineObject --> ConfirmDeletion
                ConfirmDeletion --> DeleteVM
                DeleteVM --> LogDeletion
            }
            
            CheckOrphans --> AnnotateNodes
            
            state "AnnotateNodes" as AN {
                [*] --> CheckNodeMachine
                CheckNodeMachine --> MultipleMatch : Multiple Machines
                CheckNodeMachine --> NoMatch : No Machine
                CheckNodeMachine --> SingleMatch : One Machine
                
                NoMatch --> TimeoutCheck
                TimeoutCheck --> AddAnnotation : Timeout Exceeded
                
                SingleMatch --> RemoveAnnotation : Has Annotation
                
                AddAnnotation --> UpdateNode
                RemoveAnnotation --> UpdateNode
            }
        }

        state "API Server Safety" as ASS {
            [*] --> CheckFrozen
            CheckFrozen --> CheckAPIServer : Frozen
            CheckFrozen --> MonitorAPI : Not Frozen
            
            CheckAPIServer --> Unfreeze : API Up
            CheckAPIServer --> Requeue : API Down
            
            MonitorAPI --> SetInactiveTime : API Down
            MonitorAPI --> ClearInactiveTime : API Up
            
            SetInactiveTime --> CheckTimeout
            CheckTimeout --> Freeze : Timeout Exceeded
            
            Unfreeze --> UpdateMachines
            UpdateMachines --> ResetTimeout
        }
    }

MachineSet Controller

Core Reconciliation:
- Validates MachineSet specifications
- Manages finalizers for proper cleanup
- Implements machine ownership through controller references
- Synchronizes node templates and configurations
Replica Management:
- Implements sophisticated scaling logic:
  - Slow-start batching for scale-up operations
  - Prioritized scale-down based on machine health
- Handles stale machine cleanup
- Maintains desired replica count
- Updates status to reflect current state

---
  config:
    layout: elk
---
stateDiagram-v2
    state "MachineSet Controller" as MSC {
        state "Sync MachineSet<br/>NodeTemplate<br/>to Machine" as SyncNodeTemplates
        state "Sync MachineSet<br/>MachineConfiguration<br/>to Machine" as SyncMachineConfig
        state "Sync MachineSet<br/>MachineClass.Kind<br/>to Machine" as SyncMachineKind

        [*] --> FetchMachineSet
        FetchMachineSet --> ValidateSpec
        ValidateSpec --> AddFinalizers : Deletion Not Requested
        
        AddFinalizers --> ClaimMachines
        
        state "ClaimMachines (Returns filtered machines)" as CM {
            [*] --> CreateControllerRefMgr
            CreateControllerRefMgr --> GetControllerRef
            GetControllerRef --> Orphan : Nil<br/>(No Owner)
            GetControllerRef --> CheckUID : Not Nil<br/>(Owner Exists)
            
            CheckUID --> Ignore : Mismatch<br/>(Wrong Owner)
            CheckUID --> MatchSelector : UID Same
            Orphan --> CheckDeletion
            CheckDeletion --> SelectorMatch : No Deletion
            SelectorMatch --> AdoptOrphan : Selector Match
            
            MatchSelector --> KeepClaim : Selector Match<br/>Already Owned
            MatchSelector --> DeletionCheck : Selector Mismatch
            DeletionCheck --> AttemptRelease : No Deletion

            KeepClaim --> AddToClaimed
            AdoptOrphan --> AddToClaimed
            AttemptRelease --> RemoveFromClaimed
        }
        
        ClaimMachines --> SyncNodeTemplates
        SyncNodeTemplates --> SyncMachineConfig
        SyncMachineConfig --> SyncMachineKind
        SyncMachineKind --> CheckFilteredMachines : Deletion Requested
        SyncMachineKind --> ManageReplicas : No Deletion

        CheckFilteredMachines --> RemoveFinalizers : Zero Owned Machines
        CheckFilteredMachines --> CheckFinalizerPresent : Backed Machines
        CheckFinalizerPresent --> TerminateMachines
        RemoveFinalizers --> UpdateStatus
        TerminateMachines --> UpdateStatus

        state "ManageReplicas" as MR {
            [*] --> CheckMachinePhase
            CheckMachinePhase --> ActiveMachines : Phase<br/>NotFailedOrTerminating
            CheckMachinePhase --> StaleMachines : PhaseFailed

            ActiveMachines --> CheckDiff
            StaleMachines --> TerminateStale
            TerminateStale --> CheckDiff
            
            CheckDiff --> ScaleUp : ActiveMachines<br/>Less than<br/>Replica Count
            CheckDiff --> ScaleDown : ActiveMachines<br/>More than<br/>Replica Count
            
            ScaleUp --> NotFrozenAnd<br/>NotToBeDeleted
            NotFrozenAnd<br/>NotToBeDeleted --> SlowStartBatch : TODO Expectations
            SlowStartBatch --> CreateMachines
            
            ScaleDown --> SortMachines
            SortMachines --> DeleteExcess
        }
        
        ManageReplicas --> UpdateStatus
        UpdateStatus --> [*]
    }

MachineDeployment Controller

Deployment Management:

Handles multiple MachineSets for a deployment
Maintains deployment history through revisions
Supports pausing and resuming deployments
Implements rollback functionality

Deployment Strategies:
- Recreate Strategy:
  - Scales down old MachineSets completely
  - Creates and scales up new MachineSet
  - Ensures clean cutover between versions
- Rolling Update Strategy:
  - Gradually scales up new MachineSet
  - Gradually scales down old MachineSets
  - Maintains availability during updates
  - Handles surge and unavailability constraints
Scaling Operations:
- Detects and handles scaling events
- Manages desired replica counts across MachineSets
- Updates annotations for autoscaler integration
- Ensures proper resource cleanup

---
  config:
    layout: elk
---
stateDiagram-v2
    state "TODO MachineDeployment Controller" as MDC {
        [*] --> FetchDeployment
        FetchDeployment --> LogFrozenOrTBD
        LogFrozenOrTBD --> ValidateSpec
        ValidateSpec --> CheckDeletion
        
        state "GetMachineSets" as GMS {
            [*] --> CreateControllerRefMgr
            CreateControllerRefMgr --> GetControllerRef
            GetControllerRef --> Orphan : Nil<br/>(No Owner)
            GetControllerRef --> CheckUID : Not Nil<br/>(Owner Exists)

            CheckUID --> Ignore : Mismatch<br/>(Wrong Owner)
            CheckUID --> MatchSelector : UID Same
            Orphan --> CheckDelete
            CheckDelete --> SelectorMatch : No Deletion
            SelectorMatch --> AdoptOrphan : Selector Match

            MatchSelector --> KeepClaim : Selector Match<br/>Already Owned
            MatchSelector --> DeletionCheck : Selector Mismatch
            DeletionCheck --> AttemptRelease : No Deletion

            KeepClaim --> AddToClaimed
            AdoptOrphan --> AddToClaimed
            AttemptRelease --> RemoveFromClaimed
        }
        
        CheckDeletion --> AddFinalizer : No Deletion
        AddFinalizer --> StatusUpdate
        StatusUpdate --> GetMachineSets

        GetMachineSets --> BuildMachineMap<br/>MSetUIDToMachines
        BuildMachineMap<br/>MSetUIDToMachines --> DeleteChk
        DeleteChk --> CheckPausedCond : No Deletion
        DeleteChk --> ProcessDeletion : Deletion Requested
        
        state "Process Deletion" as DC {
            [*] --> Exit : Finalizer<br/>NotPresent
            [*] --> RemoveFinalizers : NoBackingMS
            [*] --> TerminateMachineSets : BackingMS
            
            TerminateMachineSets --> SyncStatusOnly<br/>UpdateMcdStatus
            RemoveFinalizers --> Exit
        }

        state "Check Paused Condition" as CPC {
            [*] --> GetCondition<br/>TypeProcessing
            
            GetCondition<br/>TypeProcessing --> [*] : CondReason<br/>TimeOut
            GetCondition<br/>TypeProcessing --> ExistingPaused : CondReason<br/>Paused
            GetCondition<br/>TypeProcessing --> NotExistingPaused : Else
            
            NotExistingPaused --> Spec.Paused
            Spec.Paused --> SetPausedCondition : true

            ExistingPaused --> SpecPaused
            SpecPaused --> SetResumedCondition : False

            SetPausedCondition --> UpdateMcdStatus
            SetResumedCondition --> UpdateMcdStatus

            UpdateMcdStatus --> [*]
        }

        CheckPausedCond --> SetPrioAnnotation : TODO

        SetPrioAnnotation --> Sync : Spec.Paused true<br/>TODO
        SetPrioAnnotation --> CheckRollbackTo : Spec.Paused false
        
        state "Rollback" as RB {
            [*] --> FindRevision
            FindRevision --> FindMatchingMS : RollbackTo.Revision<br/>Present
            FindRevision --> ClearRollbackTo : No last revision

            FindMatchingMS --> Remove<br/>PreferNoSched<br/>Taint : MSRevisionAnnotation<br/>same as<br/>RollbackTo Revision
            FindMatchingMS --> ClearRollbackTo : NoMachineSetFound
            
            Remove<br/>PreferNoSched<br/>Taint --> UpdateMcdTemplate
            UpdateMcdTemplate --> UpdateMcdAnnotations : Copy MS template<br/>Remove label<br/>machine-template-hash

            UpdateMcdAnnotations --> ClearRollbckTo
            ClearRollbckTo --> EmitRollbackEvent
        }
        
        CheckRollbackTo --> Rollback : Rollback Requested
        CheckRollbackTo --> IsScalingEvent : No Rollback
        
        state "Is Scaling Event" as SC {
            [*] --> GetMS<br/>SyncRev
            GetMS<br/>SyncRev --> NotScaling : err
            GetMS<br/>SyncRev --> NotScaling : No New MS
            
            GetMS<br/>SyncRev --> CheckActiveMS : MS Replicas > 0
            CheckActiveMS --> ScalingEvent : NoActiveMS &<br/>MCD Replicas > 0<br/>(ScaleFromZero)
            
            CheckActiveMS --> GetMSDesiredReplica<br/>Annotation
            GetMSDesiredReplica<br/>Annotation --> ScalingEvent : Desired not equal<br/>to MCD Replicas

            CheckActiveMS --> NotScaling : NoActiveMS or<br/>Desired = MCD Replicas<br/>(For all active)
        }
        
        IsScalingEvent --> Sync : Scale Event
        IsScalingEvent --> DeployStrategy : No Scale Event

        state "Sync" as SN {
            [*] --> GetMS<br/>SyncRevision
            GetMS<br/>SyncRevision --> Scale
            Scale --> CleanMCD : Paused and<br/>No RollbackTo
            Scale --> SyncMCDStatus

            state "Find Active or Latest MS" as ALMS {
            [*] --> SortMS by CreationTime<br/>FilterActiveMS
            }

            state "TODO Scale" as SCC {
                state "ReplicasToAdd<br/>AllowedSize - AllMSReplicaCnt" as ReplicasToAdd
                
                [*] --> GetActiveOrLatestMS
                GetActiveOrLatestMS --> CheckActiveMSReplicas : not nil
                GetActiveOrLatestMS --> CheckNewMS<br/>Saturated

                CheckActiveMSReplicas --> FIXME : ActiveMSRep = mcdRep

                CheckNewMS<br/>Saturated --> ScaleDownOldMS : true
                CheckNewMS<br/>Saturated --> IsRollingUpdate : false

                IsRollingUpdate --> FilterActiveMS : true
                FilterActiveMS --> GetReplicaCount<br/>AllMS

                GetReplicaCount<br/>AllMS --> FindAllowedSize

                FindAllowedSize --> Zero : MCD Replicas <= 0
                FindAllowedSize --> McdReplicas+MaxSurge : MCD Replicas > 0

                Zero --> ReplicasToAdd
                McdReplicas+MaxSurge --> ReplicasToAdd

                ReplicasToAdd --> ScaleUp : more than 0
                ReplicasToAdd --> ScaleDown : < 0

                ScaleUp --> map[name]=NewRep : oldMS = Replicas
                ScaleUp --> map[name]=NewRep : newMS = Rep+RepToAdd
                
            }
        }

        state "TODO DeployStrategy" as DS {
            state "Recreate" as RC {
                [*] --> OldScaleDown
                OldScaleDown --> CreateNew
                CreateNew --> NewScaleUp
            }
            
            state "RollingUpdate" as RU {
                [*] --> ScaleUpNew
                [*] --> ScaleDownOld
                ScaleDownOld --> CleanupOld
            }
        }
        
        DeployStrategy --> UpdateStatus
        UpdateStatus --> [*]
    }

Summary

Each of these controllers implements sophisticated error handling and retry mechanisms:

Error Handling:
- Categorizes errors into recoverable and non-recoverable
- Implements exponential backoff for retries
- Maintains error counters and conditions
- Updates status to reflect error states
Resource Protection:
- Uses finalizers to prevent premature deletion
- Implements owner references for proper garbage collection
- Maintains consistent state through careful status updates
- Handles race conditions through proper locking
Performance Considerations:
- Implements work queues for efficient processing
- Uses informers for efficient cache handling
- Batches operations when possible
- Implements rate limiting for API calls
Monitoring and Metrics:
- Tracks operation durations
- Records error counts and types
- Provides health metrics
- Implements proper logging for debugging

The entire system works together to provide:

Reliable machine lifecycle management
Proper cleanup of resources
Scaling capabilities
Rolling updates and rollbacks
Protection against race conditions and API server issues
Efficient resource utilization
Proper monitoring and debugging capabilities

This comprehensive system ensures robust machine management while maintaining high availability and proper resource utilization. The controllers work together to maintain the desired state while handling various edge cases and failure scenarios.

← → Home