HOST-C, Chat, Updates, Stuff

1141516171820»

Comments

  • host_chost_c Hosting Provider

    Ok, official info was sent to remaining affected customers, please check your e-mails.

    Here is a time-lined description of the event of the fuckup:

    July 27, 2025 – Afternoon (GMT+3):

    Multiple Seagate ST18000NM019J drives (firmware KM02) across two nodes suddenly powered down due to a firmware-related failure. Drives began reporting critical SMART alerts (Data channel impending failure), causing the RAID-6/60 array to become unavailable.

    Result:
    Addon storage volumes became inaccessible, and VPS services depending on those volumes were disrupted. Some NVMe-based systems also experienced write issues due to OS-level I/O buffering.

    July 28, 2025 – Morning:
    Our team accessed the datacenter, identified the fault, and began recovery efforts. All NVMe-only VPS services were successfully migrated to healthy nodes.

    July 28–29, 2025:
    RAID array access was restored in degraded mode, enabling partial access to addon volumes at limited transfer speeds.

    🧪 Root Cause

    Firmware fault affecting multiple ST18000NM019J (KM02) drives simultaneously

    RAID controller entered fault mode due to concurrent SMART failures

    No physical disk damage, no reallocated sectors or ECC errors — this was purely firmware-triggered

    🛡️ Mitigation Going Forward

    We are conducting a full infrastructure audit to identify any remaining ST18000NM019J drives with KM02 firmware

    Affected drives will be proactively replaced or updated, where supported

    RAID monitoring thresholds and firmware validation processes are being tightened to catch these failures earlier

    This was an unprecedented firmware-level failure that bypassed typical RAID fault tolerance. We appreciate your understanding as we finalize recovery efforts for impacted systems.

    Here is an output of one of the drives, maybe it can help others to check theirs if they have the same model used, all 6 reported exactly the same error, have the same powered on hours ( ~266 days ) and were brand new.

    === START OF INFORMATION SECTION ===
    Vendor:               SEAGATE
    Product:              ST18000NM019J
    Revision:             KM02
    Compliance:           SPC-5
    User Capacity:        18,000,207,937,536 bytes [18.0 TB]
    Logical block size:   4096 bytes
    LU is fully provisioned
    Rotation Rate:        7200 rpm
    Form Factor:          3.5 inches
    Logical Unit id:      0x5000c500d8a51a07
    Serial number:        ZR57B8800000G20806CV
    Device type:          disk
    Transport protocol:   SAS (SPL-4)
    Local Time is:        Mon Jul 28 17:36:48 2025 UTC
    SMART support is:     Available - device has SMART capability.
    SMART support is:     Enabled
    Temperature Warning:  Enabled
    
    === START OF READ SMART DATA SECTION ===
    

    SMART Health Status: Data channel impending failure general hard drive failure [asc=5d, ascq=30]

    Grown defects during certification <not available>
    Total blocks reassigned during format <not available>
    Total new blocks reassigned <not available>
    Power on minutes since format <not available>
    Current Drive Temperature:     31 C
    Drive Trip Temperature:        60 C
    
    Accumulated power on time, hours:minutes 6367:42
    Manufactured in week 01 of year 2022
    Specified cycle count over device lifetime:  50000
    Accumulated start-stop cycles:  34
    Specified load-unload count over device lifetime:  600000
    Accumulated load-unload cycles:  291
    Elements in grown defect list: 1
    
    Vendor (Seagate Cache) information
      Blocks sent to initiator = 3828
      Blocks received from initiator = 1650689
      Blocks read from cache and sent to initiator = 9094
      Number of read and write commands whose size <= segment size = 29
      Number of read and write commands whose size > segment size = 0
    
    Vendor (Seagate/Hitachi) factory information
      number of hours powered up = 6367.70
      number of minutes until next internal SMART test = 53
    
    Seagate FARM log supported [try: -l farm]
    
    Error counter log:
               Errors Corrected by           Total   Correction     Gigabytes    Total
                   ECC          rereads/    errors   algorithm      processed    uncorrected
               fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
    read:          0        0         0         0          0          0.016           0
    write:         0        0         0         0          0          6.889           0
    
    Non-medium error count:        0
    
    Pending defect count:0 Pending Defects
    

    The error in bold triggered the detach of the drives from the raid array.

    Here is a screen shot from the log of one of the dells servers ( R740 ) showing 2 drives leaving the " chat" at the precise same time, DST was not set on the server so that is why the time shows only 12:00

    Host-C - VPS & Storage VPS Services – Reliable, Scalable and Fast - AS211462

    "If there is no struggle there is no progress"

  • FreekFreek Senpai

    Hope all is well with @host_c (Last Active: September 17). Just renewed my awesome 5TB VPS for another quarter :)

    Thanked by (2)imok dartagnan

    LinuxFreek.com — Hosted on 🇪🇺 Scaleway Stardust with Native IPv6 | IPv4 Proxy, WAF & DNS powered by 🇳🇱 DutchIS

  • bingobangobongobingobangobongo Hosting Provider

    Surely those euro summer vacations are over by now?!? 😬🤷‍♂️

    Thanked by (2)dartagnan sh97

    Rock Solid Web Hosting, VPS & VDS with a Refreshing Approach - Xeon Scalable, DDoS protection and Enterprise Hardware! HostBilby Inc.

  • @bingobangobongo said:
    Surely those euro summer vacations are over by now?!? 😬🤷‍♂️

    I hope so

  • AuroraZeroAuroraZero ModeratorHosting ProviderRetired

    They are busy upgrading and fixing things. The team is working on it and I am sure they will be ready for the next few months soon.

  • Upgraded my 5TB VPS from Debian 12 to 13 and the networking died. Make sure you know your root password in advance, the control panel can't change it, at least on debian 13, and you need it for VNC. Thankfully I found it lying around.

    To fix networking, put this in /etc/netplan/99-static.yaml (and chmod 600):

    network:
      version: 2
      renderer: networkd
      ethernets:
        eth0:
          match:
            macaddress: "YOURMAC"
          set-name: eth0
          dhcp4: false
          dhcp6: false
          accept-ra: true
          addresses:
            - YOURV4/32
            - YOURV6/128
          routes:
            - to: 0.0.0.0/0
              via: YOURGATEWAY
              on-link: true
          nameservers:
            addresses:
              - 1.1.1.1
              - 1.0.0.1
    

    I think doing this after upgrade but before the post-upgrade reboot should prevent death of networking. Get addresses/gateway from control panel. Don't set v6 gateway, it'll break IPv6. After upgrade my server was set to use DHCP for some reason, which didn't get any addresses.

    I don't actually know why it broke but this is working across reboots. nameservers might be unnecessary but when networking first died it also wiped DNS server setup, it didn't have any set at all.

    Thanked by (1)eliphas
  • Thanked by (2)localhost imok

    I reserve the right to license all of my content under: CC BY-NC-ND. What happens on this forum should stay on this forum.

  • AmadexAmadex Hosting Provider

    This is my fav backup server, ever!

    root@vserver2:~# curl -sL https://yabs.sh | bash
    # ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## #
    #              Yet-Another-Bench-Script              #
    #                     v2025-04-20                    #
    # https://github.com/masonr/yet-another-bench-script #
    # ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## #
    
    Thu Oct 30 13:06:59 CET 2025
    
    Basic System Information:
    ---------------------------------
    Uptime     : 0 days, 0 hours, 1 minutes
    Processor  : Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
    CPU cores  : 6 @ 2593.904 MHz
    AES-NI     : ✔ Enabled
    VM-x/AMD-V : ✔ Enabled
    RAM        : 7.8 GiB
    Swap       : 2.0 GiB
    Disk       : 116.2 GiB
    Distro     : Ubuntu 24.04.2 LTS
    Kernel     : 6.8.0-86-generic
    VM Type    : KVM
    IPv4/IPv6  : ✔ Online / ✔ Online
    
    IPv6 Network Information:
    ---------------------------------
    ISP        : Andrei Tiberiu Holt
    ASN        : AS211462 Andrei Tiberiu Holt
    Host       : Andrei Tiberiu Holt
    Location   : Oradea, Bihor County (BH)
    Country    : Romania
    
    fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/sda1):
    ---------------------------------
    Block Size | 4k            (IOPS) | 64k           (IOPS)
      ------   | ---            ----  | ----           ---- 
    Read       | 324.66 MB/s  (81.1k) | 1.66 GB/s    (26.0k)
    Write      | 325.51 MB/s  (81.3k) | 1.67 GB/s    (26.1k)
    Total      | 650.17 MB/s (162.5k) | 3.33 GB/s    (52.1k)
               |                      |                     
    Block Size | 512k          (IOPS) | 1m            (IOPS)
      ------   | ---            ----  | ----           ---- 
    Read       | 2.41 GB/s     (4.7k) | 2.43 GB/s     (2.3k)
    Write      | 2.54 GB/s     (4.9k) | 2.59 GB/s     (2.5k)
    Total      | 4.95 GB/s     (9.6k) | 5.03 GB/s     (4.9k)
    
    iperf3 Network Speed Tests (IPv4):
    ---------------------------------
    Provider        | Location (Link)           | Send Speed      | Recv Speed      | Ping           
    -----           | -----                     | ----            | ----            | ----           
    Clouvider       | London, UK (10G)          | 677 Mbits/sec   | 979 Mbits/sec   | 35.7 ms        
    Eranium         | Amsterdam, NL (100G)      | 646 Mbits/sec   | 966 Mbits/sec   | 33.1 ms        
    Uztelecom       | Tashkent, UZ (10G)        | 344 Mbits/sec   | 888 Mbits/sec   | 115 ms         
    Leaseweb        | Singapore, SG (10G)       | 284 Mbits/sec   | 681 Mbits/sec   | 181 ms         
    Clouvider       | Los Angeles, CA, US (10G) | 291 Mbits/sec   | 334 Mbits/sec   | 172 ms         
    Leaseweb        | NYC, NY, US (10G)         | 431 Mbits/sec   | 822 Mbits/sec   | 104 ms         
    Edgoo           | Sao Paulo, BR (1G)        | 219 Mbits/sec   | 845 Mbits/sec   | 231 ms         
    
    iperf3 Network Speed Tests (IPv6):
    ---------------------------------
    Provider        | Location (Link)           | Send Speed      | Recv Speed      | Ping           
    -----           | -----                     | ----            | ----            | ----           
    Clouvider       | London, UK (10G)          | 672 Mbits/sec   | 973 Mbits/sec   | 35.8 ms        
    Eranium         | Amsterdam, NL (100G)      | 604 Mbits/sec   | 974 Mbits/sec   | 33.1 ms        
    Uztelecom       | Tashkent, UZ (10G)        | 408 Mbits/sec   | 857 Mbits/sec   | 115 ms         
    Leaseweb        | Singapore, SG (10G)       | 250 Mbits/sec   | 866 Mbits/sec   | 181 ms         
    Clouvider       | Los Angeles, CA, US (10G) | 310 Mbits/sec   | 691 Mbits/sec   | 172 ms         
    Leaseweb        | NYC, NY, US (10G)         | 403 Mbits/sec   | 906 Mbits/sec   | 104 ms         
    Edgoo           | Sao Paulo, BR (1G)        | 213 Mbits/sec   | 819 Mbits/sec   | 231 ms         
    
    Geekbench 6 Benchmark Test:
    ---------------------------------
    Test            | Value                         
                    |                               
    Single Core     | 1056                          
    Multi Core      | 4340                          
    Full Test       | https://browser.geekbench.com/v6/cpu/14749923
    
    YABS completed in 14 min 1 sec
    
  • @Amadex how much per year?

  • AmadexAmadex Hosting Provider

    @dartagnan said:
    @Amadex how much per year?

    0€, @host_c likes me

    Thanked by (1)dartagnan
  • @Amadex said:

    @dartagnan said:
    @Amadex how much per year?

    0€, @host_c likes me

    I'm jealous

  • AuroraZeroAuroraZero ModeratorHosting ProviderRetired

    @dartagnan said:

    @Amadex said:

    @dartagnan said:
    @Amadex how much per year?

    0€, @host_c likes me

    I'm jealous

    NO reason to be jealous man . His penis is still smaller than yours.

    Thanked by (3)root Amadex dartagnan
Sign In or Register to comment.