My EPYC Quest (for failure?)
TL;DR: In this post, I'll take you along on my journey to get a dual-socket EPYC Engineering Sample CPU server up and running. My goal is to use it as a Proxmox host, co-located in a datacenter (yes, really). This will be a blog-style thread, and you're all welcome to chime in with comments, advice, or even just laugh at my incompetence, ignorance—you name it. 😉
What’s This All About?
As some of you may know, I was given four AMD EPYC 7742 Engineering Sample CPUs, along with a couple dozen RAM sticks. At first, I was thrilled, thinking I could just slap these into a motherboard and build my own EPYC monster server. However, after some Googling and advice from other LES members, reality quickly set in.
Engineering Sample CPUs are exactly that—samples designed for engineers to test a CPU’s characteristics before mass production. This means they likely won’t behave exactly like the final retail versions. Some features may be missing, performance could be subpar, and they might require modifications or tweaks just to get them running.
After researching the specific Engineering Sample I had, I found a goldmine of information on Serve The Home. Someone had gone to great lengths to get this particular sample working, even publishing several custom BIOS versions to make it functional. Apparently, the CPU performs fairly well, though its memory controller is about 20% slower than the final retail version. Additionally, it consumes slightly more power. But this got me thinking—if it’s possible to make it work, then why not try? The only problem? I have zero experience with server co-location. After digesting all the information, I came to my senses and thought: Wow, this is a really stupid idea… let’s do it!
But Why?
Good question. Why go through all this effort instead of just finding a good VPS deal and calling it a day?
Well, I’ve always wanted my own dedicated server. I run a few Proxmox hosts at home, but they don’t come close to what this EPYC beast could be. My goal is to run a self-hosted LLM for use with Home Assistant. Sure, I could use ChatGPT or another cloud-based service, but that’s not why we’re here on this forum, is it?
On top of that, I enjoy tinkering with hardware. Over the years, I’ve learned a lot from experimenting with computer components, and those skills have been valuable in my career. Even now, I’ve already picked up a few new things (see below).
Worst-case scenario? I waste about 750 euros and some time but gain valuable knowledge and experience. And if it all goes south, I can still sell the motherboard (and chassis, I guess) to recoup some of the costs.
Parts
CPU
As mentioned earlier, these are AMD EPYC 7742 Engineering Sample CPUs, specifically the ZS1406E2VJUG5 variant. More details can be found here.
During my research, I discovered that EPYC CPUs can be vendor-locked. For instance, if you install a new EPYC CPU into a Dell motherboard, it will prompt you to execute the Platform Secure Boot Process, which permanently locks the CPU to Dell systems—making it incompatible with any other brand. I had no idea such a thing existed, so naturally, my first concern was: Are my EPYC CPUs vendor-locked?
According to my contact, they probably aren’t, since they were only used with official AMD Engineering Boards rather than third-party vendor boards like Dell. But of course, there’s only one way to be absolutely sure—test it and see what happens.
RAM
Along with the CPUs, I was also gifted a couple dozen RAM sticks. Initially, when the box was handed to me, I only glanced at the top sticks, which were all DDR4 ECC Registered RAM. However, upon closer inspection, I realized that most of the RAM in the box was actually standard DDR4 or even DDR5—neither of which are compatible with my EPYC CPUs.
After sorting through everything, I ended up with the following RAM sticks that are (somewhat) compatible:
- 8× 16GB @ 2666MHz
- 2× 32GB @ 3200MHz
- 4× 32GB @ 2400MHz
- 4× 16GB @ 2133MHz
This configuration is far from ideal (ahem), given AMD’s DIMM population guidelines:
- Populate open channels before adding a second DIMM to any channel.
- Balance memory capacity across channel pairs on a given CPU.
- Balance memory capacity across both CPU sockets in a dual-socket system.
- For optimal performance, AMD recommends populating all eight memory channels per socket, ensuring each channel has the same capacity.
To make sense of this mess, I turned to ChatGPT for an optimization strategy. It suggested the following:
CPU 1 (Socket P1)
DIMM Slot | Channel Pair | Module Size | Nominal Speed |
---|---|---|---|
P1-DIMMA1 | Pair 0 (A–B) | 32 GB | 2400 MHz |
P1-DIMMB1 | Pair 0 (A–B) | 32 GB | 2400 MHz |
P1-DIMMC1 | Pair 1 (C–D) | 16 GB | 2666 MHz |
P1-DIMMD1 | Pair 1 (C–D) | 16 GB | 2666 MHz |
P1-DIMME1 | Pair 2 (E–F) | 16 GB | 2666 MHz |
P1-DIMMF1 | Pair 2 (E–F) | 16 GB | 2666 MHz |
P1-DIMMG1 | Pair 3 (G–H) | 16 GB | 2133 MHz |
P1-DIMMH1 | Pair 3 (G–H) | 16 GB | 2133 MHz |
Total for CPU 1: (2 × 32 GB) + (6 × 16 GB) = 160 GB
CPU 2 (Socket P2)
DIMM Slot | Channel Pair | Module Size | Nominal Speed |
---|---|---|---|
P2-DIMMA1 | Pair 0 (A–B) | 32 GB | 2400 MHz |
P2-DIMMB1 | Pair 0 (A–B) | 32 GB | 2400 MHz |
P2-DIMMC1 | Pair 1 (C–D) | 16 GB | 2666 MHz |
P2-DIMMD1 | Pair 1 (C–D) | 16 GB | 2666 MHz |
P2-DIMME1 | Pair 2 (E–F) | 16 GB | 2666 MHz |
P2-DIMMF1 | Pair 2 (E–F) | 16 GB | 2666 MHz |
P2-DIMMG1 | Pair 3 (G–H) | 16 GB | 2133 MHz |
P2-DIMMH1 | Pair 3 (G–H) | 16 GB | 2133 MHz |
Total for CPU 2: (2 × 32 GB) + (6 × 16 GB) = 160 GB
According to its recommendation, the 2× 32GB @ 3200MHz modules should be left unused because there aren’t four identical 32GB sticks at that speed to maintain balance across both CPU sockets.
Additionally, memory speed is dictated by the slowest DIMM in the system. Since I have 16GB modules running at 2133MHz, the entire memory subsystem on that CPU will default to 2133MHz (more on this later).
Motherboard
As mentioned earlier, I found custom BIOSes on Serve The Home specifically designed for this type of Engineering Sample. Since I’m aiming for a dual-socket setup, my choices are limited to either the Supermicro H11DSi or H11DSi-NT—the only dual-socket motherboards with a compatible custom BIOS. The only difference between the two is networking: the NT version features 10GbE, while the non-NT model has only 1GbE.
Over the past few weeks, I’ve been trying to source this exact motherboard. While there are plenty of listings on AliExpress and eBay, actually getting my hands on one proved to be a nightmare. Every time I placed an order, the seller would reach out via private message or WhatsApp with one of the following excuses:
- The price has suddenly increased.
- The motherboard was found faulty before shipment.
- It’s "no longer in stock."
After several failed attempts, I finally thought I had secured one—only for the seller to scam me by providing a fake tracking number. I had to dispute the purchase with AliExpress, but thankfully, I got my money back.
So, I turned to eBay and found a seller with 38,000+ positive feedback. I placed the order, but guess what? Another excuse. They suddenly couldn’t sell me the board and refunded my order.
In the end, by pure luck, I stumbled upon a listing from a private seller in the USA who was actually willing to ship it to me. I’m currently awaiting shipment, but he has confirmed the sale, so now it’s just a waiting game. To be continued...
Chassis (+ PSU)
Since this server will be co-located in a datacenter, I need a rackmountable chassis. Initially, I considered using the tower co-location option at Skylink Datacenter, but their max tower dimensions don’t accommodate E-ATX boards. So, I had to switch to a rackmount chassis instead.
I went with a 2U chassis for a balance between cooling and expansion flexibility. While 1U is cheaper in terms of co-location, it comes with cooling challenges and is a PITA when dealing with expansion cards. With 2U, I get:
- More flexibility for expansion cards
- Better cooling due to larger heatsink options (a plus considering the TDPs of these EPYCs)
I managed to find a Supermicro SC826 chassis for €150, which includes:
- 2U rackmount case
- 920-watt power supply
- LSI 9211-8i controller
- SAS2 BPN-SAS2-826EL1 backplane
It should arrive this weekend, so once again, let’s wait and see what I actually bought.
Cooling
The EPYC 7742 CPUs use Socket SP3, so I went with a cost-effective Intertech A-38 cooler. It’s 2U in height and rated for TDPs up to 280W, perfect for this setup. More info: Intertech A-38 Cooler
Storage
I haven’t fully decided on storage yet, so I’m open to suggestions. That said, I do have:
- 2× 1TB Samsung SATA SSDs (most likely for backups)
- Several NVMe SSDs (need to check exact models, but they’re at least 1TB each)
IPMI/KVM
As @beanman109 rightfully pointed out, these are Engineering Sample CPUs, which means they could be unstable. If the server crashes or gets stuck, I’d either have to make frequent trips to the datacenter or pay for expensive remote hands. That got me thinking—how can I work around this?
As we all know, exposing IPMI over the internet is a terrible idea. So, what about using a third-party KVM solution, like Pi-KVM? I actually have a Pi-KVM v3 lying around that I could use, but I’d have to squeeze it inside the server chassis, which feels really janky.
After some Googling, I stumbled upon the NanoKVM PCIe—a low-profile KVM solution with remote power cycling, WiFi support, and Tailscale integration:
🔗 NanoKVM PCIe
Coincidentally, I already own a NanoKVM Lite, so I’m familiar with the product. I know it got bad press recently due to its terrible software, but things have improved significantly since then.
Of course, for this to work, the datacenter needs to provide WiFi without a captive portal, since my only Ethernet port will already be in use. And yes, this does introduce a WiFi dependency, but let’s be real—I’m not running a Fortune 500 company on this server anyway.
Datacenter (Power Costs)
Now, let’s talk about the elephant in the room—where am I actually going to put this thing?
Ideally, I’d like to co-locate the server in the south of the Netherlands. That way, I can drop it off in person and visit relatively easily if something breaks. But cost is also a factor—I mean, we’re on LES after all.
There’s a datacenter just down the road, but it’s way too premium for my budget. Instead, I’m considering Skylink Datacenter in Kerkrade, which I think offers fair pricing.
Budget: ~€50/month (most of which will go toward power—this beast isn’t exactly energy-efficient).
Bill of Materials
So far, here’s the BOM:
Component | Price (€) |
---|---|
2× AMD EPYC 7742 Engineering Sample CPUs | €0 |
320GB DDR4 "Frankenstein" RAM setup | €0 |
SuperMicro H11DSi Motherboard | €445 |
Supermicro SC826 Chassis + PSU | €150 |
CPU Coolers (2× Intertech A-38) | €100 |
Storage | TBD |
KVM Solution | TBD |
Subtotal: €695
Thanks for Making It This Far
I genuinely appreciate your interest in my adventure! 😊 Now, onto some noob questions—I’d love to hear your thoughts:
1️⃣ Memory Upgrade Idea
Due to my Frankenstein RAM setup, my memory is currently limited to 2133MHz. I did some scouting and found 32GB 2400MHz Registered ECC DDR4 modules for €20 each. I'm considering buying four of these to replace my four 2133MHz modules, which would bring the overall RAM speed up to 2400MHz.
What do you think? Worth it?
2️⃣ KVM & IPMI Access in a Datacenter
What do you think of my out-of-the-box KVM idea (NanoKVM PCIe)? Also, how do datacenter providers typically handle IPMI access? Do they only provide physical access, or do they have secure remote access solutions (other than "expensive remote hands")?
3️⃣ Power Estimation for Co-Location
When requesting co-location offers, how should I estimate and specify power requirements?
- The CPUs alone have a TDP of 225W each, so in theory, that’s 450W for both.
- Add in the disks, motherboard, and some overhead, and I estimate around 550W under full load.
- But again, this is theoretical max power draw—real-world usage is probably lower.
What’s the best way to specify this to a provider? Do they typically charge based on peak power draw or actual usage?
4️⃣ How Does IP Addressing Work in a Datacenter?
Since I'll be running Proxmox, I’ll need additional IP addresses for my VMs. At home, my DHCP server automatically assigns IPs to new VMs, but how does this work in a datacenter? Is it based on MAC address?
Comments
Memory Upgrades: I know the Ryzen platform especially the Gen 1/2/3 CPUs benefit a lot from higher speed memory. However given the lower clock speeds on EPYC I don't think it will give you a massive gain in performance.
NanoKVM is a neat project but it's not really ready for enterprise usage. It has some security issues that need to be addressed still. You may want to read through this Github issue: https://github.com/sipeed/NanoKVM/issues/301 - To be clear it's a great, affordable project, but I wouldn't just expose it directly to the internet just yet. Even running Tailscale on the thing skeeves me out a little bit... in terms of datacenter providing access to it, depending on the provider it varies on what is costs and how they do it. I've had it to where the IP-KVM was available directly via an IP with IP whitelisting, available over a VPN tunnel, or even proxied through the provider's own billing/management portal. Really just depends on the datacenter.
Typically for single server colo, you'll want to discuss this with the provider; they either include power in the pricing for the rack space up to a certain amount and then add extra $$$ per month if you need more amperage. Other providers may charge based on what you use instead. Colo providers are always wiling to negotiate so it's always worth getting a email thread going with sales.
IP addressing allocation depends on the datacenter and there are many different ways to do it. If you want to have just one public IPv4 for your server, you can use iptables rules to do NAT port forwarding and have a virtual network bridge that all your VMs connect to. I followed this tutorial for a Proxmox VE host I have with a single IPv4 with multiple VMs on it: https://raymii.org/s/tutorials/Proxmox_VE_One_Public_IP.html
Cheap dedis are my drug, and I'm too far gone to turn back.
If you do go the Nat route and need a hand I will make myself available. Even to answer questions I may have a bit of experience in the area
Free Hosting at YetiNode | MicroNode | Cryptid Security | URL Shortener | LaunchVPS | ExtraVM | Host-C | In the Node, or Out of the Loop?
Thank you very much for your valuable input—I really appreciate it!
Makes sense. I'll hold off on any memory upgrades until I get this thing running and determine whether the performance is acceptable or not.
Got it. I'll make a post asking for offers, including optional KVM access, and see what they come up with.
I'm aware of the NAT route and have tried it myself before. However, I would prefer a unique IP address for each VM. Three IP addresses should be sufficient for now—perhaps five in the long run—but that depends on pricing, of course.
Cheers, I appreciate your help!
LinuxFreek.com
BliKVM is another option for PCIe - as others mentioned, not sure it’s ready for the big leagues, but you’re down the engineering sample rabbit hole - so surely are prepared for some fun!
@Freek just to be safe as well, make sure that you don't mix ranks on RAM in the same system. I.e. Dual Rank (2Rx4) with Quad Rank (4Rx4) or any variants of that. If it is your first foray into enterprise stuff, this is a huge oversight that can cause people to think RAM is bad or something else is broken. As long as they're all the same (2Rx4, 1Rx8, 4Rx4, etc whatever they might be) you should be fine and it will run at lowest safe speed. Voltages should also be checked, but most of them will be the same/compatible. Anything 2400MHz/2666MHz is perfectly fine/sweet spot for Rome. More than that is nice, but you won't seriously notice it.
EDIT:
I've never seen this before, and given the cost + application this might have just solved a massive roadblock to an interesting LE-product I've wanted to push forever. Awesome share.
NVMe VPS | Ryzen 7950X VDS | Dedicated Servers -- Crunchbits.com
I have two of these in use for few months. Amazing devices and an open platform which can be extended easily. Give them a shot, you won't regret!
dnscry.pt - Public DNSCrypt resolvers hosted by LowEnd providers • Need a free NAT LXC? -> https://microlxc.net/
The chassis has arrived. Apparently this is what I bought:
The motherboard is still stuck in the US of A. It's currently in Carol Stream IL and delayed due to 'severe weather conditions in the mid-west' (?). Anyway, we'll wait and see.
Never heard of this one before, looks interesting as well. Thanks
Thanks for the tip! Noted
LinuxFreek.com
@Freek
do some memory tests and CPU performance, then throw out the 2133 MHz sticks and redo.
I think you might gain ±20% by doing that.
AMD is pretty sensitive to memory timings and frequencies.
We do not use 2133 MHz not even in Xeon V4's. ( never head actually if I think about it )
Other then that your build looks very nice
Cheers!
Host-C - VPS & Storage VPS Services – Reliable, Scalable and Fast - AS211462
"If there is no struggle there is no progress"
Thanks for sharing your project, exiting times :-)