Proxmox – PVE & PBS Diagnostic Scripts

Solving Intermittent Proxmox Backup Server Disconnections: An MTU Mismatch Story

Recently, I upgraded my Proxmox Backup Server (PBS) to support 10GbE networking. What should have been a straightforward performance upgrade turned into a frustrating troubleshooting session when my Proxmox VE nodes started experiencing intermittent disconnections from the backup server. Here’s how I diagnosed and resolved the issue.

The Problem

After installing a new 10GbE network card (Broadcom NetXtreme II BCM57810) in my Proxmox Backup Server, I began experiencing:

  • ✅ Successful pings between PVE nodes and PBS
  • ❌ Intermittent inability to access PBS datastores from PVE nodes
  • ❌ PBS web interface occasionally unreachable
  • ❌ Backup jobs failing with “Connection refused” errors on port 8007
  • ❌ Error messages: “500 Can’t connect to 192.168.1.76:8007 (Connection refused)”

The most puzzling aspect was that basic connectivity appeared fine—ping worked in both directions, and all hosts were on the same VLAN.

Initial Suspicions

When introducing 10GbE networking, several potential culprits come to mind:

  • MTU mismatches (especially with jumbo frames)
  • Network driver issues with the new card
  • Hardware offloading problems
  • Firewall configuration
  • PBS service configuration issues

The Diagnostic Approach

Rather than guessing, I created comprehensive diagnostic scripts to gather all relevant information from both sides of the connection.

Diagnostic Script for Proxmox VE Nodes

This script collects network configuration, driver information, connectivity tests, and more:

#!/bin/bash
# PVE Network Diagnostics Script
# Save as: pve-diagnostics.sh

OUTPUT_FILE="pve-diagnostics-$(hostname)-$(date +%Y%m%d-%H%M%S).txt"

# Prompt for PBS IP
read -p "Enter PBS Server IP address: " PBS_IP

echo "==================================================================" | tee -a $OUTPUT_FILE
echo "Proxmox VE Network Diagnostics" | tee -a $OUTPUT_FILE
echo "Hostname: $(hostname)" | tee -a $OUTPUT_FILE
echo "Date: $(date)" | tee -a $OUTPUT_FILE
echo "PVE Version: $(pveversion)" | tee -a $OUTPUT_FILE
echo "Testing PBS at: $PBS_IP" | tee -a $OUTPUT_FILE
echo "==================================================================" | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

# Network Interface Information
echo "### NETWORK INTERFACES ###" | tee -a $OUTPUT_FILE
ip addr show | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

echo "### NETWORK INTERFACE CONFIGURATION ###" | tee -a $OUTPUT_FILE
cat /etc/network/interfaces | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

# MTU Information
echo "### MTU SETTINGS (Summary) ###" | tee -a $OUTPUT_FILE
ip link show | grep -E "^[0-9]+:|mtu" | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

# Routing Information
echo "### ROUTING TABLE ###" | tee -a $OUTPUT_FILE
ip route show | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

# DNS Configuration
echo "### DNS CONFIGURATION ###" | tee -a $OUTPUT_FILE
cat /etc/resolv.conf | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

# PBS Storage Configuration
echo "### PROXMOX STORAGE CONFIGURATION ###" | tee -a $OUTPUT_FILE
cat /etc/pve/storage.cfg | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

echo "### PBS STORAGE STATUS ###" | tee -a $OUTPUT_FILE
pvesm status | grep -i pbs | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

# Network Driver Information
echo "### NETWORK DRIVERS AND HARDWARE ###" | tee -a $OUTPUT_FILE
for iface in $(ls /sys/class/net/ | grep -v lo); do
    echo "Interface: $iface" | tee -a $OUTPUT_FILE
    ethtool $iface 2>/dev/null | grep -E "Speed:|Duplex:|Link detected:" | tee -a $OUTPUT_FILE
    echo "Driver info:" | tee -a $OUTPUT_FILE
    ethtool -i $iface 2>/dev/null | tee -a $OUTPUT_FILE
    echo "" | tee -a $OUTPUT_FILE
done

# Hardware Offload Settings
echo "### HARDWARE OFFLOAD SETTINGS ###" | tee -a $OUTPUT_FILE
for iface in $(ls /sys/class/net/ | grep -v lo); do
    echo "Interface: $iface" | tee -a $OUTPUT_FILE
    ethtool -k $iface 2>/dev/null | grep -E "tcp-segmentation-offload|generic-segmentation-offload|generic-receive-offload" | tee -a $OUTPUT_FILE
    echo "" | tee -a $OUTPUT_FILE
done

# Connectivity Tests to PBS
echo "### CONNECTIVITY TESTS TO PBS ($PBS_IP) ###" | tee -a $OUTPUT_FILE

echo "Standard Ping Test:" | tee -a $OUTPUT_FILE
ping -c 4 $PBS_IP | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

echo "MTU Test (Standard - 1472 bytes):" | tee -a $OUTPUT_FILE
ping -M do -s 1472 -c 3 $PBS_IP 2>&1 | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

echo "MTU Test (Jumbo Frames - 8972 bytes):" | tee -a $OUTPUT_FILE
ping -M do -s 8972 -c 3 $PBS_IP 2>&1 | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

echo "TCP Port 8007 Test (PBS API):" | tee -a $OUTPUT_FILE
timeout 5 bash -c "echo > /dev/tcp/$PBS_IP/8007" 2>&1 && echo "Port 8007 is OPEN" | tee -a $OUTPUT_FILE || echo "Port 8007 is CLOSED or FILTERED" | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

# Network Statistics
echo "### NETWORK INTERFACE STATISTICS ###" | tee -a $OUTPUT_FILE
for iface in $(ls /sys/class/net/ | grep -v lo); do
    echo "Interface: $iface" | tee -a $OUTPUT_FILE
    ip -s link show $iface | tee -a $OUTPUT_FILE
    echo "" | tee -a $OUTPUT_FILE
done

# Recent Kernel Messages
echo "### RECENT KERNEL MESSAGES (NETWORK) ###" | tee -a $OUTPUT_FILE
dmesg | grep -i -E "ethernet|network|link|mtu" | tail -50 | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

echo "==================================================================" | tee -a $OUTPUT_FILE
echo "Diagnostics complete. Output saved to: $OUTPUT_FILE" | tee -a $OUTPUT_FILE
echo "==================================================================" | tee -a $OUTPUT_FILE

Diagnostic Script for Proxmox Backup Server

A similar comprehensive script for the PBS side:

#!/bin/bash
# PBS Network Diagnostics Script
# Save as: pbs-diagnostics.sh

OUTPUT_FILE="pbs-diagnostics-$(hostname)-$(date +%Y%m%d-%H%M%S).txt"

echo "==================================================================" | tee -a $OUTPUT_FILE
echo "Proxmox Backup Server Network Diagnostics" | tee -a $OUTPUT_FILE
echo "Hostname: $(hostname)" | tee -a $OUTPUT_FILE
echo "Date: $(date)" | tee -a $OUTPUT_FILE
echo "PBS Version: $(proxmox-backup-manager version 2>/dev/null || echo 'N/A')" | tee -a $OUTPUT_FILE
echo "==================================================================" | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

# Network Interface Information
echo "### NETWORK INTERFACES ###" | tee -a $OUTPUT_FILE
ip addr show | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

echo "### NETWORK INTERFACE CONFIGURATION ###" | tee -a $OUTPUT_FILE
cat /etc/network/interfaces | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

# MTU Information
echo "### MTU SETTINGS (Summary) ###" | tee -a $OUTPUT_FILE
ip link show | grep -E "^[0-9]+:|mtu" | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

# Network Driver Information
echo "### NETWORK DRIVERS AND HARDWARE ###" | tee -a $OUTPUT_FILE
for iface in $(ls /sys/class/net/ | grep -v lo); do
    echo "Interface: $iface" | tee -a $OUTPUT_FILE
    ethtool $iface 2>/dev/null | grep -E "Speed:|Duplex:|Link detected:" | tee -a $OUTPUT_FILE
    echo "Driver info:" | tee -a $OUTPUT_FILE
    ethtool -i $iface 2>/dev/null | tee -a $OUTPUT_FILE
    echo "" | tee -a $OUTPUT_FILE
done

# PCI Network Card Info
echo "### PCI NETWORK CARDS ###" | tee -a $OUTPUT_FILE
lspci | grep -i ethernet | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

# PBS Service Status
echo "### PBS SERVICE STATUS ###" | tee -a $OUTPUT_FILE
systemctl status proxmox-backup --no-pager | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE
systemctl status proxmox-backup-proxy --no-pager | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

# Active Network Connections
echo "### ACTIVE NETWORK CONNECTIONS (Port 8007) ###" | tee -a $OUTPUT_FILE
ss -antp | grep 8007 | tee -a $OUTPUT_FILE
echo "" | tee -a $OUTPUT_FILE

# Network Statistics
echo "### NETWORK INTERFACE STATISTICS ###" | tee -a $OUTPUT_FILE
for iface in $(ls /sys/class/net/ | grep -v lo); do
    echo "Interface: $iface" | tee -a $OUTPUT_FILE
    ip -s link show $iface | tee -a $OUTPUT_FILE
    echo "" | tee -a $OUTPUT_FILE
done

echo "==================================================================" | tee -a $OUTPUT_FILE
echo "Diagnostics complete. Output saved to: $OUTPUT_FILE" | tee -a $OUTPUT_FILE
echo "==================================================================" | tee -a $OUTPUT_FILE

Running the Diagnostics

I executed both scripts and analyzed the output:

# On PBS:
chmod +x pbs-diagnostics.sh
./pbs-diagnostics.sh

# On each PVE node:
chmod +x pve-diagnostics.sh
./pve-diagnostics.sh
# Enter PBS IP when prompted: 192.168.1.76

The Smoking Gun: MTU Mismatch Revealed

The diagnostic output revealed the issue immediately:

From PVE Node (pve3):

MTU Test (Standard - 1472 bytes):
✅ 3 packets transmitted, 3 received, 0% packet loss

MTU Test (Jumbo Frames - 8972 bytes):
❌ 3 packets transmitted, 0 received, 100% packet loss

vmbr0: mtu 9000  <-- PVE configured for jumbo frames

From PBS:

enp1s0f1: mtu 1500  <-- PBS using standard MTU!

Network Statistics:
RX: 40,002 dropped packets  <-- Packets too large being dropped!

The root cause: My PVE nodes were configured with MTU 9000 (jumbo frames), but when I installed the new 10GbE card in PBS, the interface defaulted to MTU 1500. This mismatch caused:

  • Small packets (pings, TCP handshakes) to work fine
  • Large packets (data transfer, API calls) to be dropped silently
  • TCP connections to establish but fail during data transfer
  • 40,000+ dropped packets on the PBS receive side

The Solution

The fix was straightforward—configure PBS to match the PVE nodes’ MTU settings.

Step 1: Edit PBS Network Configuration

nano /etc/network/interfaces

Step 2: Add MTU Setting

Change from:

auto enp1s0f1
iface enp1s0f1 inet static
    address 192.168.1.76/24
    gateway 192.168.1.1

To:

auto enp1s0f1
iface enp1s0f1 inet static
    address 192.168.1.76/24
    gateway 192.168.1.1
    mtu 9000

Step 3: Apply and Verify

# Apply the change (or reboot)
ifdown enp1s0f1 && ifup enp1s0f1

# Verify MTU is now 9000
ip addr show enp1s0f1 | grep mtu

# Test from PVE node
ping -M do -s 8972 192.168.1.76
# Should now show: 3 packets transmitted, 3 received, 0% packet loss ✅

# Test PBS connectivity
curl -k https://192.168.1.76:8007
# Should now connect successfully ✅

Results

After applying the MTU fix:

  • ✅ All PVE nodes could successfully connect to PBS
  • ✅ Jumbo frame ping tests passed (8972 bytes)
  • ✅ Port 8007 connections worked consistently
  • ✅ Backup jobs completed successfully
  • ✅ PBS web interface remained stable and accessible
  • ✅ No more dropped packets on PBS interface

Key Takeaways

1. MTU Mismatches Are Silent Killers

Unlike complete network failures, MTU mismatches allow small packets through while silently dropping larger ones. This creates the confusing scenario where basic connectivity (ping, DNS) works, but applications fail intermittently.

2. Always Test Jumbo Frames Explicitly

The diagnostic ping tests were crucial:

# Test standard MTU
ping -M do -s 1472 <target>

# Test jumbo frames
ping -M do -s 8972 <target>

This simple test immediately revealed the problem.

3. Check Interface Statistics

The 40,000 dropped packets on PBS’s receive side was a clear indicator that packets were being rejected due to size violations.

4. Network Changes Require Consistency

When upgrading network hardware:

  • Verify MTU settings match across all hosts
  • Ensure switches support jumbo frames if used
  • Test with actual traffic patterns, not just ping
  • Check both physical interfaces and bridges/VLANs

5. Document Your Network Configuration

Having consistent MTU settings documented as part of your infrastructure standards prevents these issues during upgrades or expansions.

Important Notes

Switch Compatibility: Jumbo frames (MTU 9000) require switch support. If your switches don’t support jumbo frames on the relevant VLANs, you must set all hosts to MTU 1500.

Bridge Configuration: In Proxmox, if you’re using bridges (vmbr0, vmbr1, etc.), both the physical interface AND the bridge must have matching MTU settings:

auto eth0
iface eth0 inet manual
    mtu 9000

auto vmbr0
iface vmbr0 inet static
    address 192.168.1.10/24
    bridge-ports eth0
    bridge-stp off
    bridge-fd 0
    mtu 9000  <-- Don't forget this!

Conclusion

What started as a mysterious intermittent connectivity issue turned out to be a textbook case of MTU mismatch. The comprehensive diagnostic scripts made it easy to identify the problem quickly, and the fix was a simple one-line configuration change.

If you’re experiencing similar issues with Proxmox Backup Server (or any network services) after hardware changes, especially involving 10GbE or jumbo frames, I highly recommend:

  1. Running comprehensive diagnostics rather than guessing
  2. Testing with various packet sizes to expose MTU issues
  3. Checking interface statistics for dropped packets
  4. Ensuring consistency across all network components

The diagnostic scripts provided here can save hours of troubleshooting by quickly gathering all relevant information from both ends of the connection.

FAQ

Q: Why does my Proxmox Backup Server show connection refused?

A: The most common cause after network upgrades is MTU mismatch between PVE nodes and PBS, especially when using jumbo frames (MTU 9000).

Q: How do I test for MTU issues in Proxmox? A: Use ping with specific sizes: “ping -M do -s 1472” for standard and “ping -M do -s 8972” for jumbo frames.

Q: What MTU should I use for Proxmox with 10GbE? A: Use MTU 9000 (jumbo frames) if your switches support it, otherwise MTU 1500. All nodes must match.

Q: Can MTU mismatches cause intermittent problems? A: Yes! Small packets (pings, handshakes) work fine while large packets (data transfer) fail silently.

Additional Resources


Have you experienced similar network issues with Proxmox or other virtualization platforms? Share your experiences in the comments below!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.