Skip to main content

Command Palette

Search for a command to run...

Sad Servers Devops Challenges Solutions Part 1:

Updated
3 min read
P

As a associate system administrator I worked on Redhat Linux servers, including user management, permissions, services, and performance monitoring Automated routine administrative tasks using Bash scripting and cron jobs, reducing manual effort by ~30% I am aws certified sysops administrator and Google Certified Cloud Engineer. Determined to transition my career into cloud architect /Cloud Support role


Intro:

SadServers is a leetcode type of platform for system administrator/devops/sre engineers to solve real troubleshooting problem using free and paid virtual servers . I have attempted free and both paid challenges to practise for mock interviews and here is writeup of of my solutions:

1. Minimize disk-filling process: Identify and terminate the process writing to /var/log/bad.log:

lsof /var/log/bad.log
# Output shows PID 621 (badlog.py), kill it:
kill 621
# Or as one-liner:
kill $(lsof -t /var/log/bad.log)

*2. Find secret number hidden in .txt files: Count "Alice" occurrences:

grep Alice -Hnc *.txt | cut -d: -f2 | python3 -c "import sys; print(sum(int(l) for l in sys.stdin))"

Find file with exactly one "Alice" and next line's number:

grep Alice -Hnc *.txt | grep -Ee :1$ | cut -d: -f1 | xargs -I{} sh -c 'grep -A1 Alice {} | grep -oEe "[0-9]+"'

Concatenate results:

PART1=$(...)  # as above
PART2=$(...)
echo -n "\(PART1\)PART2" > /home/admin/solution

3. Find IP with most requests in access.log:

grep -oEe '[0-9]{1,3}(?:\.[0-9]{1,3}){3}' 
access.log | sort | uniq -c | sort -n | tail -1 | awk '{print $2}' > /home/admin/highestip.txt

4. Fix web server (Apache) permission issue: Check permissions:

ls -lhA /var/www/html/
# Fix:
sudo chmod a+r /var/www/html/index.html
# Then verify:
curl localhost

If local curl hangs, fix firewall:

sudo iptables -F

5. Fix Postgres disk full issue: Check disk usage:

df -h
# Remove large files:
rm -f /opt/pgdata/*.bk
# Restart PostgreSQL:
sudo systemctl restart postgresql

6. Fix Nginx starting error (configuration and open files): Remove invalid line:

sed -i -e '1d' /etc/nginx/sites-enabled/default

Fix open files limit:

sudo sed -i -e '/LimitNOFILE/d' /etc/systemd/system/nginx.service
sudo systemctl daemon-reload
sudo systemctl restart nginx

7. Enable Nginx web page:

chmod a+r /var/www/html/index.html
iptables -F

8. Set up Docker web app on port 8888: Update Dockerfile:

EXPOSE 8888
CMD ["node", "server.js"]

Rebuild:

sudo docker build -t app:latest .

Run container:

sudo docker run -d -p 8888:8888 app:latest

9. Fix Kubernetes deployment for webapp: Update deployment.yml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: webapp-deployment
  namespace: web
spec:
  selector:
    matchLabels:
      app: webapp
  replicas: 1
  template:
    metadata:
      labels:
        app: webapp
    spec:
      containers:
      - name: webapp
        image: webapp
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8888

Update service:

apiVersion: v1
kind: Service
metadata:
  name: webapp-service
  namespace: web
spec:
  type: LoadBalancer
  selector:
    app: webapp
  ports:
    - port: 8888
      targetPort: 8888
      nodePort: 30007

Apply:

kubectl apply -f deployment.yml -f nodeport.yml

10. Run program requiring communication: If wtfit needs a service, but network is broken, run with:

LD_LIBRARY_PATH=/lib/x86_64-linux-gnu ./wtfit

If permissions are broken:

sudo /lib/x86_64-linux-gnu/ld-2.31.so /usr/bin/chmod +x /usr/bin/chmod

This summarizes the key commands and fixes for each scenario efficiently.