The Long Way to Windows Container on Amazon EKS: Node Affinity

After dealing with the vpc-resource-controller, I can finally see the IIS page. But a running sample does not mean anything. So I wrapped a few deployment YAML up to see if our workloads work.

To correctly schedule Windows workload, we need to choose the nodes with the os label set to windows. Otherwise, it could be scheduled on Linux worker node and just stuck there.

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
          - key: beta.kubernetes.io/os
            operator: In
            values:
              - windows

I applied the YAML file, the pod was never started. It was really frustrating to see situations like this.

But we still had to know why, so:

$ k describe <poor-pod>
(...omitted)
Events:
  Type     Reason                  Age                  From                                                     Message
  ----     ------                  ----                 ----                                                     -------
  Normal   Scheduled               4m4s                 default-scheduler                                        Successfully assigned fake-namespace/fake-pod-name to ip-10-xx-xx-xx.ap-northeast-1.compute.internal
  Warning  FailedCreatePodSandBox  3m58s                kubelet, ip-10-xx-xx-xx.ap-northeast-1.compute.internal  Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "0a32725f52780fbb1099efee02d1da4523981e12be0987a694ba38acf48be829" network for pod "fake-pod-name": NetworkPlugin cni failed to set up pod "fake-pod-name_fake-namespace" network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address, failed to clean up sandbox container "0a32725f52780fbb1099efee02d1da4523981e12be0987a694ba38acf48be829" network for pod "fake-pod-name": NetworkPlugin cni failed to teardown pod "fake-pod-name_fake-namespace" network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address]
  Normal   SandboxChanged          8s (x16 over 3m55s)  kubelet, ip-10-xx-xx-xx.ap-northeast-1.compute.internal  Pod sandbox changed, it will be killed and re-created.

Hmm, the error NetworkPlugin cni failed to teardown pod "fake-pod-name_fake-namespace" network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address seems a little bit fimiliar. Didn’t we fix a similar issue before?

Oddly enough, I checked the previous sample IIS workload and it just works as usual. I had to start the Google journey all over again.

A few hours later, I still had no idea. So I switched to the tab of Windows Support document and checked the deployment manifest again.

I compared that YAML and the one I slightly modified from the following command:

$ k run <deployment-name> --image <image> --dry-run -o yaml

Was my manifest legit? Of course, otherwise it couldn’t be applied at very begining.
Did I specify imagePullPolicy? No, it’s not needed.
Did I expose the containerPort? No, I don’t even need to expose any ports.
Did I need to use command? No, I just don’t need it.
Did I use node selector? No, I used node affinity for scheduling.
Did I add additional things? Yes, I had to add imagePullSecrets to pull image from our private docker registry.

The only difference left will be the image. But when I logged in that Windows EC2 and manually run the container with docker run, it just works. Besides, it’s hardly container’s issue since the pod was never up.

So I tried to add the fields back one by one and see what will happen.

imagePullPolicy? Check. containerPort? command? No, I can’t even convince myself to add it, it simply makes no sense.

Node Selector? OK, let me just replace node affinity with node selector, although the effect here are basically the same.

I then re-applied the deployment manifest.

And the pod started. 🤦‍♂️

Events:
  Type    Reason     Age   From                                                     Message
  ----    ------     ----  ----                                                     -------
  Normal  Scheduled  8s    default-scheduler                                        Successfully assigned fake-namespace/fake-pod-name to ip-10-xx-xx-xx.ap-northeast-1.compute.internal
  Normal  Pulled     7s    kubelet, ip-10-xx-xx-xx.ap-northeast-1.compute.internal  Container image "fake-registry/image:fake-tag" already present on machine
  Normal  Created    7s    kubelet, ip-10-xx-xx-xx.ap-northeast-1.compute.internal  Created container fake-container-name
  Normal  Started    6s    kubelet, ip-10-xx-xx-xx.ap-northeast-1.compute.internal  Started container fake-container-name

So, that was it. You can’t use Node Affinity…for now. You can only use node selector.

To confirm this, I opened another support ticket. The support engineer verified this issue and has reported to internal team.

Update 2020/01/11:

Apparently, this is a “common feature request”. And yet this information is nowhere to be seen.