본문 바로가기

Develop/DevOps

[MLOps] Kubernetes CKA자격증 공부 - Troubleshooting

반응형

Application Failure

# 먼저 Application에 대해서 접속이 가능한지 확인
curl http://web-service-ip:node-port

# 위에 에러가 나면 유저가 접속하기 위한 service를 확인
## 이때 service의 Selector부분과 EndPoint가 옳바르게 대응되고있는지 확인필요함
kubectl describe service web-service

# Pod 확인
kubectl get pod # 어떤 Pod이 떠있는지 확인하고
kubectl describe pod web # 특정 Pod에 대해 정보를 확인하고
kubectl logs web # Pod에서 출력되는 로그들을 확인한다

Port, TargetPort, NodePort 의 개념부족

  • NodePort - 외부에서 접속하기 위해 사용하는 포트
  • Port - Cluster 내부에서 사용할 Service 객체의 포트
  • targetPort - Service 객체로 전달된 요청을 Pod(Deployment)로 전달할때 사용하는 포트

Contraol Plane Failure

# 먼저 Node의 상태 확인
kubectl get nodes

# Pod들의 상태확인 (kube-system namespace상의 pod들도 확인)
kubectl get pods
kubectl get pods -n kube-system

# Controlplane service 상태확인
service kube-apiserver status
service kube-controller-manager status
service kube-scheduler status
service kubelet status
service kube-proxy status

# Controlplane service 로그확인
kubectl logs kube-apiserver-master -n kube-system
## 혹은 sudo journalctl -u kube-apiserver

보통 Kube-system namespace에 있는애들 pod을 수정하기위해서는

  • /etc/kubernetes/manifests/[시스템 이름].yaml 파일을 수정해야함

describe으로도 원인파악이 잘안된다면, logs명령어로 알아보는걸 습관화하자!

Worker Node Failure

kubectl describe node [node name]을 입력했을때, Conditions부분을 유심히 봐야한다. False가 된부분은 정상적인거고, Unknown이나 Ready가 False인 상태면 문제가 있는상태.

또한 문제가 생기는 node쪽에 ssh로 접속하여 상태를 확인하는것이 방법인듯

# node의 상태확인
kubectl get nodes

# 특정 노드 상세 상태 확인
kubectl describe node [worker-1]
## 여기서 문제가 되는 부분은 Status가 True 아니면 Unknown로 설정된다
## 만약에 아무런 이슈가 없다면 Ready Type에 True가 설정됨

# 문제가 발생한 Node로 넘어가서 다양한 명령어로 조사진행
## CPU 점유율 체크
top
## 메모리 점유율 체크(할당공간 여유확인)
df -h

# kubelet 상태 확인
service kubelet status
sudo journalctl -u kubelet

# kubelet 인증서 만료 여부 확인
openssl x509 -in /var/lib/kubelet/worker-1.crt -text

문제 풀때 오답 노트

  • 주로 kubelet 쪽에 문제가 생기는 상황이 많았음 (kubelet이 시작이 안됐다거나, controlplane의 포트 번호가 틀린것등등)
  • kubelet configurefile 고치는 문제들
    • controlplane의 포트번호 알아내는 방법은 /etc/kubernets/manifests/kube-apiserver.yaml에서 secure-port포트번호를 볼수있음 (확실하진않음)
# node01에 문제가 발생했고, NetworkUnavailable쪽만 False로 뜨고 나머지는 Unknown상태였음
# 다음과같은 명령어로 해결
systemctl status kubelet.service # 로 kubelet살아있는지 확인
ssh node01 "service kubelet start"

# kubectl로 마스터 노드 주소 및 포트 알아내기
kubectl cluster-info

Network Troubleshooting

Kubernetes resources for coreDNS are: 

  1. a service account named coredns,
  2. cluster-roles named coredns and kube-dns
  3. clusterrolebindings named coredns and kube-dns
  4. a deployment named coredns,
  5. a configmap named coredns and a
  6. service named kube-dns.

Troubleshooting issues related to coreDNS

1. If you find CoreDNS pods in pending state first check network plugin is installed.

2. coredns pods have CrashLoopBackOff or Error state

If you have nodes that are running SELinux with an older version of Docker you might experience a scenario where the coredns pods are not starting. To solve that you can try one of the following options:

a)Upgrade to a newer version of Docker.

b)Disable SELinux.

c)Modify the coredns deployment to set allowPrivilegeEscalation to true:

 

  1. kubectl -n kube-system get deployment coredns -o yaml | \
  2. sed 's/allowPrivilegeEscalation: false/allowPrivilegeEscalation: true/g' | \
  3. kubectl apply -f -

d)Another cause for CoreDNS to have CrashLoopBackOff is when a CoreDNS Pod deployed in Kubernetes detects a loop.

 

  There are many ways to work around this issue, some are listed here:

 

  • Add the following to your kubelet config yaml: resolvConf: <path-to-your-real-resolv-conf-file> This flag tells kubelet to pass an alternate resolv.conf to Pods. For systems using systemd-resolved, /run/systemd/resolve/resolv.conf is typically the location of the "real" resolv.conf, although this can be different depending on your distribution.
  • Disable the local DNS cache on host nodes, and restore /etc/resolv.conf to the original.
  • A quick fix is to edit your Corefile, replacing forward . /etc/resolv.conf with the IP address of your upstream DNS, for example forward . 8.8.8.8. But this only fixes the issue for CoreDNS, kubelet will continue to forward the invalid resolv.conf to all default dnsPolicy Pods, leaving them unable to resolve DNS.
  •  

3. If CoreDNS pods and the kube-dns service is working fine, check the kube-dns service has valid endpoints.

              kubectl -n kube-system get ep kube-dns

If there are no endpoints for the service, inspect the service and make sure it uses the correct selectors and ports.

 

 

Kube-Proxy

---------

kube-proxy is a network proxy that runs on each node in the cluster. kube-proxy maintains network rules on nodes. These network rules allow network communication to the Pods from network sessions inside or outside of the cluster.

 

In a cluster configured with kubeadm, you can find kube-proxy as a daemonset.

 

kubeproxy is responsible for watching services and endpoint associated with each service. When the client is going to connect to the service using the virtual IP the kubeproxy is responsible for sending traffic to actual pods.

 

If you run a kubectl describe ds kube-proxy -n kube-system you can see that the kube-proxy binary runs with following command inside the kube-proxy container.

 

  1. Command:
  2. /usr/local/bin/kube-proxy
  3. --config=/var/lib/kube-proxy/config.conf
  4. --hostname-override=$(NODE_NAME)

 

    So it fetches the configuration from a configuration file ie, /var/lib/kube-proxy/config.conf and we can override the hostname with the node name of at which the pod is running.

 

  In the config file we define the clusterCIDR, kubeproxy mode, ipvs, iptables, bindaddress, kube-config etc.

 

Troubleshooting issues related to kube-proxy

1. Check kube-proxy pod in the kube-system namespace is running.

2. Check kube-proxy logs.

3. Check configmap is correctly defined and the config file for running kube-proxy binary is correct.

4. kube-config is defined in the config map.

5. check kube-proxy is running inside the container

  1. # netstat -plan | grep kube-proxy
  2. tcp 0 0 0.0.0.0:30081 0.0.0.0:* LISTEN 1/kube-proxy
  3. tcp 0 0 127.0.0.1:10249 0.0.0.0:* LISTEN 1/kube-proxy
  4. tcp 0 0 172.17.0.12:33706 172.17.0.12:6443 ESTABLISHED 1/kube-proxy
  5. tcp6 0 0 :::10256 :::*
반응형