The Long Way to Windows Container on Amazon EKS: Node Affinity
After dealing with the vpc-resource-controller, I can finally see the IIS page. But a running sample does not mean anything. So I wrapped a few deployment YAML up to see if our workloads work.
To correctly schedule Windows workload, we need to choose the nodes with the os label set to
windows. Otherwise, it could be scheduled on Linux worker node and just stuck there.
I applied the YAML file, the pod was never started. It was really frustrating to see situations like this.
But we still had to know why, so:
Hmm, the error
NetworkPlugin cni failed to teardown pod "fake-pod-name_fake-namespace" network: failed to parse Kubernetes args: pod does not have label vpc.amazonaws.com/PrivateIPv4Address seems a little bit fimiliar. Didn’t we fix a similar issue before?
Oddly enough, I checked the previous sample IIS workload and it just works as usual. I had to start the Google journey all over again.
A few hours later, I still had no idea. So I switched to the tab of Windows Support document and checked the deployment manifest again.
I compared that YAML and the one I slightly modified from the following command:
- Was my manifest legit? Of course, otherwise it couldn’t be applied at very begining.
- Did I specify
imagePullPolicy? No, it’s not needed.
- Did I expose the
containerPort? No, I don’t even need to expose any ports.
- Did I need to use
command? No, I just don’t need it.
- Did I use node selector? No, I used node affinity for scheduling.
- Did I add additional things? Yes, I had to add
imagePullSecretsto pull image from our private docker registry.
The only difference left will be the image. But when I logged in that Windows EC2 and manually run the container with
docker run, it just works. Besides, it’s hardly container’s issue since the pod was never up.
So I tried to add the fields back one by one and see what will happen.
command? No, I can’t even convince myself to add it, it simply makes no sense.
Node Selector? OK, let me just replace node affinity with node selector, although the effect here are basically the same.
I then re-applied the deployment manifest.
And the pod started. 🤦♂️
So, that was it. You can’t use Node Affinity…for now. You can only use node selector.
To confirm this, I opened another support ticket. The support engineer verified this issue and has reported to internal team.
Apparently, this is a “common feature request”. And yet this information is nowhere to be seen.