Can’t SSH via Wireguard

imokimok OG
in Help

Hello.
I've been installing a Proxmox cluster.

  • Node 1 in provider A
  • Node 2 in provider A
  • Node 3 in provider B

For security reasons (maybe), I’m using WireGuard, and everything works fine except that node 3 can’t connect to the other nodes via SSH. If I use the public IPs of the nodes, the connection succeeds.

SSH log: https://pastebin.com/raw/UTvb50xx

Any ideas?

Comments

  • NeoonNeoon OGContent WriterSenpai

    missconfigured wireguard or DPI is blocking it.

  • Do a tcpdump on both sides on the wireguard interfaces for the ssh traffic, and then maybe also do a tcpdump on both sides on the wireguard traffic (and match up the packets each time).

  • MTU so low MSS even lower TCP cannot cope

    Thanked by (3)tmntwitw skhron WSS

    vps9 hostname is available. affbrr

  • It's always DNS. When it isn't, it's always MTU. :D

  • Check MTU first as mentioned - most likely issue. I had one situation where it was a DSCP issue. Check that DSCP 0x8 is not being dropped. Quick test is to force-change outgoing DSCP to 0x10 (this is done on Node 3 in your case):

    iptables -t mangle -A OUTPUT -p tcp --sport 22 -j DSCP --set-dscp 0x10
    

    Or whatever firewall equivalent.

  • imokimok OG

    Adjusting MTU didn't work.
    DSCP didn't work.

    Only SSH incoming/outgoing connections are blocked in node 3

  • skorousskorous OGSenpai

    @imok said:
    Adjusting MTU didn't work.
    DSCP didn't work.

    Only SSH incoming/outgoing connections are blocked in node 3

    Did you try the tcpdump as mentionedby @cmeerw ? That'll show you if the packets are getting there and then dropped. Since it's ssh specific and other traffic is fine it feels like it has to be firewall.

  • imokimok OG

    this is tcpdump on node3. I was trying to connect from node1.

    tcpdump: data link type LINUX_SLL2
    tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
    listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
    10:04:25.118746 wg0   In  IP node1.example.com.47796 > node2.example.com.50773: Flags [S], seq 569356754, win 64240, options [mss 1460,sackOK,TS val 376911544 ecr 0,nop,wscale 7], length 0
    10:04:25.118778 wg0   Out IP node2.example.com.50773 > node1.example.com.47796: Flags [S.], seq 870297605, ack 569356755, win 65160, options [mss 1460,sackOK,TS val 1168804705 ecr 376911544,nop,wscale 7], length 0
    10:04:25.121180 wg0   In  IP node1.example.com.47796 > node2.example.com.50773: Flags [.], ack 1, win 502, options [nop,nop,TS val 376911547 ecr 1168804705], length 0
    10:04:25.121532 wg0   In  IP node1.example.com.47796 > node2.example.com.50773: Flags [P.], seq 1:41, ack 1, win 502, options [nop,nop,TS val 376911547 ecr 1168804705], length 40
    10:04:25.121555 wg0   Out IP node2.example.com.50773 > node1.example.com.47796: Flags [.], ack 41, win 509, options [nop,nop,TS val 1168804708 ecr 376911547], length 0
    10:04:25.147858 wg0   Out IP node2.example.com.50773 > node1.example.com.47796: Flags [P.], seq 1:41, ack 41, win 509, options [nop,nop,TS val 1168804734 ecr 376911547], length 40
    10:04:25.149698 wg0   In  IP node1.example.com.47796 > node2.example.com.50773: Flags [.], ack 41, win 502, options [nop,nop,TS val 376911575 ecr 1168804734], length 0
    10:04:25.150297 wg0   In  IP node1.example.com.47796 > node2.example.com.50773: Flags [P.], seq 1489:1601, ack 41, win 502, options [nop,nop,TS val 376911576 ecr 1168804734], length 112
    10:04:25.150330 wg0   Out IP node2.example.com.50773 > node1.example.com.47796: Flags [.], ack 41, win 509, options [nop,nop,TS val 1168804736 ecr 376911575,nop,nop,sack 1 {1489:1601}], length 0
    10:04:25.150427 wg0   Out IP node2.example.com.50773 > node1.example.com.47796: Flags [P.], seq 41:1177, ack 41, win 509, options [nop,nop,TS val 1168804737 ecr 376911575,nop,nop,sack 1 {1489:1601}], length 1136
    10:04:25.358433 wg0   Out IP node2.example.com.50773 > node1.example.com.47796: Flags [P.], seq 41:1177, ack 41, win 509, options [nop,nop,TS val 1168804945 ecr 376911575,nop,nop,sack 1 {1489:1601}], length 1136
    10:04:25.361120 wg0   In  IP node1.example.com.47796 > node2.example.com.50773: Flags [.], ack 1177, win 501, options [nop,nop,TS val 376911787 ecr 1168804945,nop,nop,sack 1 {41:1177}], length 0
    10:06:25.149196 wg0   Out IP node2.example.com.50773 > node1.example.com.47796: Flags [F.], seq 1177, ack 41, win 509, options [nop,nop,TS val 1168924735 ecr 376911575,nop,nop,sack 1 {1489:1601}], length 0
    10:06:25.151542 wg0   In  IP node1.example.com.47796 > node2.example.com.50773: Flags [FP.], seq 1601:2809, ack 1178, win 501, options [nop,nop,TS val 377031577 ecr 1168924735], length 1208
    10:06:25.151596 wg0   Out IP node2.example.com.50773 > node1.example.com.47796: Flags [R], seq 870298783, win 0, length 0
    

    @skorous said: Since it's ssh specific and other traffic is fine it feels like it has to be firewall.

    I don't have any firewall in place.

    iptables -L -v -n
    Chain INPUT (policy ACCEPT 3582K packets, 923M bytes)
     pkts bytes target     prot opt in     out     source               destination         
    
    Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination         
    
    Chain OUTPUT (policy ACCEPT 3515K packets, 815M bytes)
     pkts bytes target     prot opt in     out     source               destination       
    
    
    iptables -t nat -L -v -n
    Chain PREROUTING (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination         
    
    Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination         
    
    Chain OUTPUT (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination         
    
    Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
     pkts bytes target     prot opt in     out     source               destination      
    

    pve-firewall is stopped.

    ufw not installed.

    grep -Ei 'listenaddress|allow|deny|match' /etc/ssh/sshd_config
    #ListenAddress 0.0.0.0
    #ListenAddress ::
    # be allowed through the KbdInteractiveAuthentication and
    #AllowAgentForwarding yes
    #AllowTcpForwarding yes
    # Allow client to pass locale environment variables
    #Match User anoncvs
    #   AllowTcpForwarding no
    

    I'm ready to nuke this node.

  • AuroraZeroAuroraZero Hosting ProviderRetired

    Can you ssh in normally?

  • imokimok OG

    @AuroraZero said:
    Can you ssh in normally?

    Yes, I can use SSH normally with public addresses. Only in/out in the wireguard network fails.

    The rest of the nodes don't have any issue.

    The main difference is this node is in another provider (BreezeHost). Ping is between 2-3ms

  • @imok said: this is tcpdump on node3.

    mss 1460 is asking for trouble - why isn't it lower? (shouldn't it be something like 1380 with the default wireguard MTU of 1420?) To me it looks like an MTU problem. You did say "Adjusting MTU didn't work." - how did you do that? And how did the tcpdump look after the adjustment?

    BTW, you are getting seq 1:41 and then seq 1489:1601, so you are missing seq 41:1489 which presumably gets dropped somewhere because of an MTU issue.

    Thanked by (2)yoursunny Not_Oles
  • skorousskorous OGSenpai

    @imok said:
    this is tcpdump on node3. I was trying to connect from node1.

    tcpdump: data link type LINUX_SLL2
    tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
    listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
    10:04:25.118746 wg0   In  IP node1.example.com.47796 > node2.example.com.50773: Flags [S], seq 569356754, win 64240, options [mss 1460,sackOK,TS val 376911544 ecr 0,nop,wscale 7], length 0
    10:04:25.118778 wg0   Out IP node2.example.com.50773 > node1.example.com.47796: Flags [S.], seq 870297605, ack 569356755, win 65160, options [mss 1460,sackOK,TS val 1168804705 ecr 376911544,nop,wscale 7], length 0
    10:04:25.121180 wg0   In  IP node1.example.com.47796 > node2.example.com.50773: Flags [.], ack 1, win 502, options [nop,nop,TS val 376911547 ecr 1168804705], length 0
    10:04:25.121532 wg0   In  IP node1.example.com.47796 > node2.example.com.50773: Flags [P.], seq 1:41, ack 1, win 502, options [nop,nop,TS val 376911547 ecr 1168804705], length 40
    10:04:25.121555 wg0   Out IP node2.example.com.50773 > node1.example.com.47796: Flags [.], ack 41, win 509, options [nop,nop,TS val 1168804708 ecr 376911547], length 0
    10:04:25.147858 wg0   Out IP node2.example.com.50773 > node1.example.com.47796: Flags [P.], seq 1:41, ack 41, win 509, options [nop,nop,TS val 1168804734 ecr 376911547], length 40
    10:04:25.149698 wg0   In  IP node1.example.com.47796 > node2.example.com.50773: Flags [.], ack 41, win 502, options [nop,nop,TS val 376911575 ecr 1168804734], length 0
    10:04:25.150297 wg0   In  IP node1.example.com.47796 > node2.example.com.50773: Flags [P.], seq 1489:1601, ack 41, win 502, options [nop,nop,TS val 376911576 ecr 1168804734], length 112
    10:04:25.150330 wg0   Out IP node2.example.com.50773 > node1.example.com.47796: Flags [.], ack 41, win 509, options [nop,nop,TS val 1168804736 ecr 376911575,nop,nop,sack 1 {1489:1601}], length 0
    10:04:25.150427 wg0   Out IP node2.example.com.50773 > node1.example.com.47796: Flags [P.], seq 41:1177, ack 41, win 509, options [nop,nop,TS val 1168804737 ecr 376911575,nop,nop,sack 1 {1489:1601}], length 1136
    10:04:25.358433 wg0   Out IP node2.example.com.50773 > node1.example.com.47796: Flags [P.], seq 41:1177, ack 41, win 509, options [nop,nop,TS val 1168804945 ecr 376911575,nop,nop,sack 1 {1489:1601}], length 1136
    10:04:25.361120 wg0   In  IP node1.example.com.47796 > node2.example.com.50773: Flags [.], ack 1177, win 501, options [nop,nop,TS val 376911787 ecr 1168804945,nop,nop,sack 1 {41:1177}], length 0
    10:06:25.149196 wg0   Out IP node2.example.com.50773 > node1.example.com.47796: Flags [F.], seq 1177, ack 41, win 509, options [nop,nop,TS val 1168924735 ecr 376911575,nop,nop,sack 1 {1489:1601}], length 0
    10:06:25.151542 wg0   In  IP node1.example.com.47796 > node2.example.com.50773: Flags [FP.], seq 1601:2809, ack 1178, win 501, options [nop,nop,TS val 377031577 ecr 1168924735], length 1208
    10:06:25.151596 wg0   Out IP node2.example.com.50773 > node1.example.com.47796: Flags [R], seq 870298783, win 0, length 0
    

    Sorry if I'm being dim but if you're running that on Node 3 why are you only seeing traffic between Node 1 and Node 2?

  • @imok said:
    The main difference is this node is in another provider (BreezeHost). Ping is between 2-3ms

    Same O.S image? Not the first time I've experienced weird issues with these shitty panel templates.

  • imokimok OG
    edited May 9

    @cmeerw said:

    @imok said: this is tcpdump on node3.

    mss 1460 is asking for trouble - why isn't it lower? (shouldn't it be something like 1380 with the default wireguard MTU of 1420?) To me it looks like an MTU problem. You did say "Adjusting MTU didn't work." - how did you do that? And how did the tcpdump look after the adjustment?

    BTW, you are getting seq 1:41 and then seq 1489:1601, so you are missing seq 41:1489 which presumably gets dropped somewhere because of an MTU issue.

    I already tried with lower MTU. Will check again with 1380.

    EDIT: Same issue with 1380 and lower.

    @skorous said: Sorry if I'm being dim but if you're running that on Node 3 why are you only seeing traffic between Node 1 and Node 2?

    node2 is actually node3. My mistake obfuscating the logs.

    @zgato said: Same O.S image? Not the first time I've experienced weird issues with these shitty panel templates.

    These are bare metal servers. Directly from the PVE ISO.

  • imokimok OG

    I'm rebuilding node3 right now.

  • skorousskorous OGSenpai
    edited May 9

    @imok said: node2 is actually node3. My mistake obfuscating the logs.

    In which case, that doesn't show any traffic on port 22/tcp which would imply the traffic is never making it there, right? Or were are you running a random port for sshd?

  • imokimok OG

    ssh port is 50773

  • AuroraZeroAuroraZero Hosting ProviderRetired

    sudo netstat -plant | grep :what ever port you are using

    Also ssh -vvv

  • @imok said:

    @cmeerw said:

    @imok said: this is tcpdump on node3.

    mss 1460 is asking for trouble - why isn't it lower? (shouldn't it be something like 1380 with the default wireguard MTU of 1420?) To me it looks like an MTU problem. You did say "Adjusting MTU didn't work." - how did you do that? And how did the tcpdump look after the adjustment?

    BTW, you are getting seq 1:41 and then seq 1489:1601, so you are missing seq 41:1489 which presumably gets dropped somewhere because of an MTU issue.

    I already tried with lower MTU. Will check again with 1380.

    EDIT: Same issue with 1380 and lower.

    Can you show the corresponding tcpdump for that? (and then also the tcpdump for the wireguard traffic)

    And then compare that with the tcpdump for the non-wireguarded ssh connection.

    And then you can do some ping tests between the nodes with varying packet sizes.

    Thanked by (1)Not_Oles
  • imokimok OG
    edited May 10

    Reinstalled node3 from scratch and I can ping 10.100.0.1 and 10.100.0.2 after installing wireguard (not configured yet)

    I didn't tried ping before installing wireguard.

    WTFFFFFF!!?

    traceroute 10.100.0.1 on node3:

    traceroute to 10.100.0.1 (10.100.0.1), 30 hops max, 60 byte packets
     1  <MY_GATEWAY> (<MY_GATEWAY>)  0.552 ms  0.514 ms  0.487 ms
     2  node1.example.com (10.100.0.1)  38.731 ms  38.714 ms  38.689 ms
    
  • imokimok OG
    edited May 13

    I reinstalled another distro for testing and those private IPs were reachable. I'm waiting for an answer from the provider.

    I'm reinstalling again PVE and I will need to use another set of IPs.

  • Cluster is gone. Deleted the wrong files in the wrong node. Fuck.

  • AuroraZeroAuroraZero Hosting ProviderRetired

    @imok said:
    Cluster is gone. Deleted the wrong files in the wrong node. Fuck.

    Da hail you say?

  • Maybe it was a bad idea to run the cluster on Wireguard.

    While migrating a VM, everything went down.

  • AuroraZeroAuroraZero Hosting ProviderRetired

    @imok said:
    Maybe it was a bad idea to run the cluster on Wireguard.

    While migrating a VM, everything went down.

    inevitable

  • NeoonNeoon OGContent WriterSenpai

    @imok said:
    Maybe it was a bad idea to run the cluster on Wireguard.

    While migrating a VM, everything went down.

    How that?

  • Looks like the connection gets stuck during key exchange. Double check if WireGuard MTU/MSS is set correctly on node 3 (try MTU 1280 or 1420). Also, make sure no firewall or iptables rules are blocking or mangling packets on node 3. Could also try SSH with -o IPQoS=none just to test.

  • Thanks guys. Sorry I didn't give you an update on time.

    The main issue was the provider already using the IP 10.0.0.3 for their networking stuff. That's why ping was working but not ssh.

    It took me some time but right now my Proxmox cluster over Wireguard is working perfectly. I had issues with DDoS protection, corosync network bandwidth and Proxmox firewall. But it was fun.

    Thanked by (1)skhron
Sign In or Register to comment.