Linux ·

Linux Troubleshooting Interview Questions (With Answers)

Linux troubleshooting interviews separate the people who've actually fixed systems under pressure from the ones who've only read about it. Here's what those questions actually look like, and how to answer them well.

Linux Troubleshooting Interview Questions (With Answers)

What interviewers are really testing

When a hiring manager asks you a Linux troubleshooting question, they're not quizzing you on man pages. They want to know if you have a mental model for attacking an unknown problem. Do you start from the hardware layer and work up? Do you check logs first or run commands first? Do you panic when df shows 100% disk usage, or do you immediately start narrowing it down?

That's the real test. The specific commands are almost secondary.

This guide covers the questions that actually show up in sysadmin, L2, L3, and Linux admin interviews, organized by topic. Each section includes the question as it's typically asked, a solid answer structure, and the reasoning behind what interviewers want to hear.

Basic Linux troubleshooting interview questions

Q: A Linux server is running slow. How do you approach it?

This is probably the most common opening question in any Linux ops interview. It sounds vague on purpose. The interviewer wants to see if you have a systematic approach or if you just start typing random commands.

A strong answer goes like this: start by establishing a baseline. Is the slowness new or has it always been there? Check uptime and load averages with uptime or top. A load average well above the number of CPU cores is a red flag. Then separate the bottleneck into CPU, memory, disk I/O, or network:

Say that out loud in the interview, step by step. The method matters as much as the commands.

Q: Disk space is at 100%. What do you do?

First, confirm it with df -h and identify which filesystem is full. Then find the culprit with du -sh /* 2>/dev/null | sort -rh | head -20 to walk down the directory tree. Common causes: log files in /var/log, a runaway application writing to temp space, or large core dump files.

One gotcha that separates experienced people from beginners: sometimes df shows a partition full but du totals don't add up. That usually means deleted files still being held open by a running process. lsof | grep deleted will show them. Restarting the relevant process releases those file handles and reclaims the space.

Our guide on how to free up disk space on Linux goes into the full toolkit for this situation, including package cache cleanup and journal truncation.

Q: A user can't log in. Walk me through your process.

Is the account locked? Check with passwd -S username. Is the home directory missing or owned by root? Wrong shell in /etc/passwd? PAM misconfiguration? Check /var/log/auth.log (Debian/Ubuntu) or /var/log/secure (RHEL/CentOS). Also verify the user's shell actually exists in /etc/shells.

If it's SSH specifically: check /etc/ssh/sshd_config for AllowUsers or DenyUsers directives, confirm the SSH daemon is running with systemctl status sshd, and look at the client-side with ssh -vvv user@host for verbose output.

Linux booting issues interview questions

Q: A Linux server doesn't boot. How do you troubleshoot it?

This is where many candidates freeze. The answer depends on what stage the boot fails at, which is exactly what you should say first.

Booting has distinct stages: BIOS/UEFI, bootloader (usually GRUB), kernel loading, initramfs, then systemd/init. A blank screen with no BIOS output points to hardware. GRUB errors suggest a misconfigured bootloader or corrupted MBR/GPT. A kernel panic during initramfs usually means missing drivers or a broken initrd image. If systemd hangs, use systemctl list-units --state=failed after recovering into single-user mode.

For a GRUB-level fix, boot from a live USB, chroot into the system, and run grub-install plus update-grub. For a corrupted initramfs on RHEL-based systems, boot into rescue mode and run dracut -f.

Experienced interviewers follow up with: "What if you can't get to single-user mode either?" The answer: boot from external media, mount the filesystem, and inspect logs in /var/log or use journalctl on the mounted path.

Q: systemd service keeps failing. What do you check?

systemctl status servicename gives you the last few log lines and the exit code. journalctl -u servicename -n 50 --no-pager gives you more context. Look at the ExecStart directive in the unit file (found in /etc/systemd/system/ or /lib/systemd/system/) and try running that command manually as the same user the service runs as. Permission errors and missing environment variables are the most common causes.

Linux network troubleshooting interview questions and answers

Network questions are practically guaranteed at any level above basic helpdesk. Here's how to answer them.

Q: How do you troubleshoot Linux network issues?

This is the People Also Ask question that covers most network scenarios, so it deserves a complete answer.

Start with the physical/link layer: is the interface up? ip link show tells you. If an interface shows DOWN, bring it up with ip link set eth0 up. Check for an IP address with ip addr show.

Can you reach the default gateway? ip route show displays the routing table. Ping the gateway. If that works but DNS fails, check /etc/resolv.conf for nameserver entries and test with dig google.com @8.8.8.8 to bypass the local resolver. If DNS works but a specific service is unreachable, use traceroute or mtr to find where packets drop.

For deeper socket-level inspection: ss -tulnp shows listening ports and which processes own them. tcpdump -i eth0 port 80 lets you watch actual traffic. If you need to check firewall rules, iptables -L -n -v or nft list ruleset on newer systems.

The folks at It's FOSS cover many of these networking commands in their command-line tutorials, which are worth bookmarking for reference on the commands you don't use every day.

Wikipedia's article on the OSI model is useful context here: structured network troubleshooting literally follows the OSI layers from bottom to top, and knowing the model helps you explain your reasoning to an interviewer without sounding like you're guessing.

Q: You can ping an IP but not a hostname. What's wrong?

DNS. That's almost always the answer. Verify with nslookup hostname or dig hostname. Check /etc/resolv.conf for valid nameserver entries, and check /etc/nsswitch.conf to confirm the resolution order (files, DNS, etc.). Also check /etc/hosts for any conflicting static entries.

Q: A web server is running but port 80 is not accessible from outside.

Three places to look in this order: the service is bound to localhost only (check with ss -tlnp | grep :80 and look for 127.0.0.1:80 instead of 0.0.0.0:80), firewall is blocking it (iptables -L or check firewalld/ufw status), or a network-level firewall/security group upstream is blocking it (common in cloud environments).

Linux performance troubleshooting interview questions

Q: How do you find what's causing high CPU usage?

Start with top sorted by CPU (press P). Note the process name and PID. If a single thread inside a multi-threaded application is the culprit, top -H -p PID shows per-thread CPU. For kernel-level analysis, perf top shows which kernel symbols are hot. If you're seeing a lot of system time vs. user time, the issue might be excessive context switching or interrupt handling, which vmstat 1 can reveal in the cs (context switches) and in (interrupts) columns.

Q: Memory usage is high. How do you tell if it's actually a problem?

This catches a lot of candidates. Linux uses free RAM for file caching aggressively, so high memory usage shown by free -h is often not a problem at all. The real concern is swap usage. If swap is growing, the system is paging out, which tanks performance. Check the si and so columns in vmstat 1 (swap in and swap out). High values there mean you have a real memory pressure problem.

To find which processes are consuming the most memory: ps aux --sort=-%mem | head -15. For a more accurate view of actual private memory per process, look at /proc/PID/status and the VmRSS line, which shows physical RAM actually in use (not including shared libraries counted multiple times).

Our breakdown of how much RAM Linux actually needs puts these numbers in context for different server workloads.

Q: How do you identify a process causing high disk I/O?

iotop -o (the -o flag shows only processes actively doing I/O) is the fastest way. If iotop isn't installed, iostat -xz 1 will show device utilization and average wait times. An %util value close to 100% on a device means it's saturated. Combine that with lsof to find which process has files open on that device: lsof +D /mountpoint.

Scenario-based Linux troubleshooting interview questions

L3 and senior roles usually drop the direct "how do you" format and instead give you a scenario. These feel more like war stories than quiz questions.

Scenario: "We deployed code last night and now our app is returning 502 errors."

Walk through this live: check if the application process is running. If it's a Python/Node/Java app behind Nginx, check if the upstream process is alive with systemctl status appname or ps aux | grep appname. Check Nginx logs in /var/log/nginx/error.log for the specific upstream connection errors. Is the app listening on the expected port and address? ss -tlnp. Check the application's own logs for crash-on-startup errors. Did a config change break something? git log or check the deployment diff.

Scenario: "Cron job isn't running."

Is crond/cron running? systemctl status cron. Check /var/log/syslog or /var/log/cron for any output. Is the crontab syntax correct? Use crontab -l to list entries. Does the script have execute permissions? Does it use absolute paths to all commands? Cron has a minimal environment, so scripts that work interactively often fail in cron because PATH doesn't include /usr/local/bin or similar locations.

What to study before a Linux admin interview

The Linux Foundation's LFCS certification curriculum is a solid proxy for what L2/L3 roles expect you to know. The exam objectives cover storage management, network configuration, service management, and security, which maps almost exactly to the troubleshooting categories interviewers use.

The Linux Performance site from Brendan Gregg, a senior performance engineer at Netflix, is the best single reference for performance troubleshooting methodology. His "USE Method" (Utilization, Saturation, Errors per resource) gives you a repeatable framework that impresses interviewers who know performance well.

For commands, you need genuine hands-on practice, not just reading. Set up a VM and deliberately break things: corrupt a bootloader, fill a disk, write a memory-leaking script. If you want a structured starting point, the 30 essential Linux commands for beginners covers the core toolkit, though you'll want to push well beyond that list for any interview above the entry level.

One more thing: interviewers remember candidates who say "I don't know, but here's how I'd figure it out" far better than candidates who bluff. A clear methodology with a gap in command knowledge is fixable. Confident wrong answers are a red flag. Go into the interview willing to think out loud, because that's exactly what the job actually looks like.