Disabling VNC in Virtualizor ⇒ Lost Connectivity?

This post is originally published on yoursunny.com blog https://yoursunny.com/t/2021/Virtualizor-VNC-netplan/

The KVM server hosting my website went offline last month.
Thinking the server might have crashed, I went to Virtualizor, the VPS control panel, to reboot the VPS.
It did not solve the problem, so I proceeded with my disaster recovery plan.

The hosting provider, Spartan Host, explained that it was a router bug.
They fixed the router after 4 hours, but my server did not come online.

Symptom

To investigate what went wrong with my VPS, I came back to Virtualizor to enable VNC access.
Having VNC access is like attaching a monitor and a keyboard to the server.
It would allow me to see any error messages printed on the screen and login to check whether there are configuration errors.

I didn't see any error through VNC connection.
Thinking it might be a routing problem, I logged in with username and password, and ran a traceroute.
To my surprise, the traceroute was able to reach Internet destination.
Moreover, I can SSH into this server again.

Seeing the problem went away, I disabled VNC access in Virtualizor.
Then, I pressed the reboot button in Virtualizor, so that the hypervisor would apply the VNC settings; rebooting via SSH would not be effective.

Then, I started a ping to the server from my desktop, and eagerly waited.
One minute, two minutes, …, the server did not come online.
What went wrong again?

I repeated the process, re-enabled VNC access, and saw nothing wrong.
I disabled VNC again, and the VPS lost connectivity again.
Clearly, there's a correlation between the VNC toggle and network connectivity.

Diagnostics Ⅰ

Virtualizor is known to push feature updates in patch releases that sometimes breaks things, so I asked whether there was a Virtualizor update recently, but the answer was no.
I couldn't figure out the problem, so I opened a support ticket with Spartan Host, and sent over the Netplan configuration file in this Ubuntu 20.04 server.

network:
  version: 2
  ethernets:
    ens3:
      dhcp4: true
      addresses:
        - 2001:db8:2604:9cc0::1/64
        - 2001:db8:2604:9cc0::80/64:
            lifetime: 0
      routes:
        - to: ::/0
          via: 2001:db8:2604::1
          on-link: true

In this configuration:

  • IPv4 address is acquired from DHCP, which is provided by the hypervisor.
  • IPv6 is statically configured.
  • NIC name ens3 is hard-coded, because it never changes.

I've been using similar Netplan configuration files in several other KVM servers, and never had a problem.

Ryan McCully, the managing director at Spartan Host, performed some tests on my KVM.
He found that, if VNC is disabled, the network interface on the VPS seems to be "completely dead", as there's no ARP or any other traffic seen at the hypervisor side.
He was also puzzled why VNC would affect network interface, but offered an explanation of why my VPS was working in the past few months:

  • As mentioned above, setting changes in Virtualizor are applied to the hypervisor only after the reboot button in Virtualizor is pressed.
  • Most likely, when I finished the initial setup, I disabled VNC but didn't reboot in Virtualizor.
  • In this case, VNC tabs are hidden in Virtualizor, but VNC is still enabled on the hypervisor.

Diagnostics Ⅱ

Ryan spent a few more hours of extensive testing, and was able to reproduce this issue.
It turned out that disabling VNC in Virtualizor changes the network interface name.

The KVM hypervisor realizes VNC through an emulated VGA monitor.
Enabling VNC attaches a VGA controller to the virtual server, while disabling VNC detaches it.
To see this effect, we can schedule to run lspci command upon reboot in crontab, and look at the output file when we regain access.

$ : VNC enabled
$ lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Cirrus Logic GD 5446
00:03.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:04.0 Multimedia audio controller: Intel Corporation 82801AA AC'97 Audio Controller (rev 01)
00:05.0 SCSI storage controller: Red Hat, Inc. Virtio block device
00:06.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon

$ : VNC disabled
$ lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:03.0 Multimedia audio controller: Intel Corporation 82801AA AC'97 Audio Controller (rev 01)
00:04.0 SCSI storage controller: Red Hat, Inc. Virtio block device
00:05.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon

More importantly, addition or removal of the VGA controller changes the PCI address of the Ethernet controller.
This, in turn, changes the network interface name, because Ubuntu adopts Consistent Network Device Naming, in which the name of a network interface is derived from its PCI address.

VNC PCI address interface name
enabled 03:00.0 ens3
disabled 02:00.0 ens2

Therefore, "consistent" network device naming scheme is consistent only if the PCI address doesn't change.
However, PCI addresses aren't always stable.
I've seen PCI address changing on a dedicated server when I configure PCI bifurcation in BIOS settings.
Now I've seen PCI address changing on a KVM virtual server.

Treatment

My Netplan configuration assumes the Ethernet adapter on the KVM server is always "ens3".
Disabling VNC changes the network interface name to "ens2", and Netplan would not bring up an interface that it doesn't know about.
This caused the VPS to lose connectivity.

To solve this problem, the Netplan configuration should identify the network interface by its MAC address.
MAC address can be considered a stable identifier of the network interface, because it's one of the outputs from Virtualizor's Create VPS API.

Therefore, I changed Netplan configuration to this:

network:
  version: 2
  ethernets:
    uplink:
      match:
        macaddress: 6a:ed:d6:3b:49:f4
      set-name: uplink
      dhcp4: true
      addresses:
        - 2001:db8:2604:9cc0::1/64
        - 2001:db8:2604:9cc0::80/64:
            lifetime: 0
      routes:
        - to: ::/0
          via: 2001:db8:2604::1
          on-link: true

Unless the VPS is deleted and re-created with a different MAC address, this should continue to work.
I also renamed the network interface to "uplink", so that I don't need to check whether it's "ens3" or "ens2" when I type commands.

Elsewhere

I checked the VNC situation on several other KVM servers that I have.

WebHorizon and Evolution Host both modified Virtualizor such that there isn't an option to disable VNC.
This prevents the issue, but increases the risk of my VPS being compromised via VNC.

WebHosting24 kept the VNC option intact in Virtualizor.
Disabling VNC would lead to changed network interface names, but my new Netplan configuration works.

SolusVM, the VPS control panel used at VirMach and Nexril, keeps the VGA controller attached at all times.
Disabling VNC blocks the VNC port so that nobody can connect to it, but does not affect the KVM server itself.
I think this is a better approach.

Acknowledgement

Kudos to Ryan McCully at Spartan Host for helping me hunt down this issue.
I wouldn't have anticipated the root cause without his help.

ServerFactory aff best VPS; HostBrr aff best storage.

Thanked by (2)angstrom MichaelCee

Comments

  • Mr_TomMr_Tom OG
    edited April 2021

    There's a checkbox in Virtualizor "Disable network configuration" - if that's not checked each time the VPS is rebooted Virtualizor will attempt to reset the network settings (incase the end user has made a mistake/changed something).

    I've seen similar issues where having "Enable VGA" turned off has caused something similar (ie, no network when rebooting even with the above checked).
    Edit: If "Enable VGA" is on, then turning VNC on/off shouldn't affect the VGA device in lspci which would mean this doesn't happen.

  • If people just would implement it the proxmox way, you request a VNC session, socket will be spawned and you can connect.
    If you disconnect, the socket and session is killed, worse even, by default VNC is always enabled and often matches your password for the panel send via email.

  • @Mr_Tom said:
    There's a checkbox in Virtualizor "Disable network configuration" - if that's not checked each time the VPS is rebooted Virtualizor will attempt to reset the network settings

    Overwriting network configuration is an annoying feature in both Virtualizor and SolusVM.
    Luckily, neither mess with Netplan configuration.


    @Neoon said:
    you request a VNC session, socket will be spawned and you can connect.
    If you disconnect, the socket and session is killed

    In the old times with OpenVZ 6, the serial port is handled similarly in SolusVM.
    User can request a serial port with a time limit (e.g. one hour).
    SolusVM generates a random password upon request.
    Disconnecting wouldn't close the port, but the port auto-closes after the selected time period.

    Are we gonna get this feature in microLXC?

    ServerFactory aff best VPS; HostBrr aff best storage.

  • @yoursunny said:
    Are we gonna get this feature in microLXC?

    Basically LXD has no thing called "serial port" or such.
    But you can run commands inside of a container.

    I can for sure implement a terminal option, but it won't be fast.
    However not on my priorities yet, the idea was more to add options like:

    • reconfigure network
    • update ssh key

    Which are one click and should fix the most common issues people had.

  • OujiOuji OG
    edited April 2021

    I thought this might be the issue with my IPv6-only VPS but that's not it. I simply cannot resolve any DNS on Ubuntu 20.04.

    Already tried changing the /etc/systemd/resolved.conffile to see Google IPv6 DNS to no avail, not really sure whatelse I can do.

    I found out that the systemd-resolvedis not starting by itself after booting up the system. Not sure why.

  • avelineaveline Hosting ProviderOG

    It's a very weird behaviour to remove the VGA controller when disabling VNC.

    You can set net.ifnames=0 in kernel options then it will be always eth0.

    We do it in all Linux VM templates to make life easier.

    Thanked by (1)Brueggus

    Misaka.io | Blazing fast AnyCast DNS with 60+ POPs GeoDNS, AXFR, DNSSEC supported.
    And Reliable high-performance virtual server | Ashburn, New York, Seattle, San Jose, Hong Kong, Tokyo, Singapore, São Paulo, Johannesburg

    ping.sx | Ping any server from global locations in parallel

  • @aveline said:
    It's a very weird behaviour to remove the VGA controller when disabling VNC.

    You can set net.ifnames=0 in kernel options then it will be always eth0.

    We do it in all Linux VM templates to make life easier.

    We are supposed to look forward and embrace the coolest stuff, e.g. systemd and Netplan and Docker.
    I just didn't expect PCI address changing in a KVM.

    Last time I witnessed a changed PCI address was on a bare metal Supermicro server.
    I went into BIOS and enabled PCI bifurcation (to install two NVMe in one PCIe x8 slot), and then all the enp* cards changed names.
    The built-in card is still eno1 though.

    ServerFactory aff best VPS; HostBrr aff best storage.

Sign In or Register to comment.