My EPYC Quest (for failure?)

Freek · March 13

TL;DR: In this post, I'll take you along on my journey to get a dual-socket EPYC Engineering Sample CPU server up and running. My goal is to use it as a Proxmox host, co-located in a datacenter (yes, really). This will be a blog-style thread, and you're all welcome to chime in with comments, advice, or even just laugh at my incompetence, ignorance—you name it. 😉

What’s This All About?

As some of you may know, I was given four AMD EPYC 7742 Engineering Sample CPUs, along with a couple dozen RAM sticks. At first, I was thrilled, thinking I could just slap these into a motherboard and build my own EPYC monster server. However, after some Googling and advice from other LES members, reality quickly set in.

Engineering Sample CPUs are exactly that—samples designed for engineers to test a CPU’s characteristics before mass production. This means they likely won’t behave exactly like the final retail versions. Some features may be missing, performance could be subpar, and they might require modifications or tweaks just to get them running.

After researching the specific Engineering Sample I had, I found a goldmine of information on Serve The Home. Someone had gone to great lengths to get this particular sample working, even publishing several custom BIOS versions to make it functional. Apparently, the CPU performs fairly well, though its memory controller is about 20% slower than the final retail version. Additionally, it consumes slightly more power. But this got me thinking—if it’s possible to make it work, then why not try? The only problem? I have zero experience with server co-location. After digesting all the information, I came to my senses and thought: Wow, this is a really stupid idea… let’s do it!

But Why?

Good question. Why go through all this effort instead of just finding a good VPS deal and calling it a day?

Well, I’ve always wanted my own dedicated server. I run a few Proxmox hosts at home, but they don’t come close to what this EPYC beast could be. My goal is to run a self-hosted LLM for use with Home Assistant. Sure, I could use ChatGPT or another cloud-based service, but that’s not why we’re here on this forum, is it?

On top of that, I enjoy tinkering with hardware. Over the years, I’ve learned a lot from experimenting with computer components, and those skills have been valuable in my career. Even now, I’ve already picked up a few new things (see below).

Worst-case scenario? I waste about 750 euros and some time but gain valuable knowledge and experience. And if it all goes south, I can still sell the motherboard (and chassis, I guess) to recoup some of the costs.

Parts

CPU

As mentioned earlier, these are AMD EPYC 7742 Engineering Sample CPUs, specifically the ZS1406E2VJUG5 variant. More details can be found here.

During my research, I discovered that EPYC CPUs can be vendor-locked. For instance, if you install a new EPYC CPU into a Dell motherboard, it will prompt you to execute the Platform Secure Boot Process, which permanently locks the CPU to Dell systems—making it incompatible with any other brand. I had no idea such a thing existed, so naturally, my first concern was: Are my EPYC CPUs vendor-locked?

According to my contact, they probably aren’t, since they were only used with official AMD Engineering Boards rather than third-party vendor boards like Dell. But of course, there’s only one way to be absolutely sure—test it and see what happens.

RAM

Along with the CPUs, I was also gifted a couple dozen RAM sticks. Initially, when the box was handed to me, I only glanced at the top sticks, which were all DDR4 ECC Registered RAM. However, upon closer inspection, I realized that most of the RAM in the box was actually standard DDR4 or even DDR5—neither of which are compatible with my EPYC CPUs.

After sorting through everything, I ended up with the following RAM sticks that are (somewhat) compatible:

8× 16GB @ 2666MHz
2× 32GB @ 3200MHz
4× 32GB @ 2400MHz
4× 16GB @ 2133MHz

This configuration is far from ideal (ahem), given AMD’s DIMM population guidelines:

Populate open channels before adding a second DIMM to any channel.
Balance memory capacity across channel pairs on a given CPU.
Balance memory capacity across both CPU sockets in a dual-socket system.
For optimal performance, AMD recommends populating all eight memory channels per socket, ensuring each channel has the same capacity.

To make sense of this mess, I turned to ChatGPT for an optimization strategy. It suggested the following:

CPU 1 (Socket P1)

DIMM Slot	Channel Pair	Module Size	Nominal Speed
P1-DIMMA1	Pair 0 (A–B)	32 GB	2400 MHz
P1-DIMMB1	Pair 0 (A–B)	32 GB	2400 MHz
P1-DIMMC1	Pair 1 (C–D)	16 GB	2666 MHz
P1-DIMMD1	Pair 1 (C–D)	16 GB	2666 MHz
P1-DIMME1	Pair 2 (E–F)	16 GB	2666 MHz
P1-DIMMF1	Pair 2 (E–F)	16 GB	2666 MHz
P1-DIMMG1	Pair 3 (G–H)	16 GB	2133 MHz
P1-DIMMH1	Pair 3 (G–H)	16 GB	2133 MHz

Total for CPU 1: (2 × 32 GB) + (6 × 16 GB) = 160 GB

CPU 2 (Socket P2)

DIMM Slot	Channel Pair	Module Size	Nominal Speed
P2-DIMMA1	Pair 0 (A–B)	32 GB	2400 MHz
P2-DIMMB1	Pair 0 (A–B)	32 GB	2400 MHz
P2-DIMMC1	Pair 1 (C–D)	16 GB	2666 MHz
P2-DIMMD1	Pair 1 (C–D)	16 GB	2666 MHz
P2-DIMME1	Pair 2 (E–F)	16 GB	2666 MHz
P2-DIMMF1	Pair 2 (E–F)	16 GB	2666 MHz
P2-DIMMG1	Pair 3 (G–H)	16 GB	2133 MHz
P2-DIMMH1	Pair 3 (G–H)	16 GB	2133 MHz

Total for CPU 2: (2 × 32 GB) + (6 × 16 GB) = 160 GB

According to its recommendation, the 2× 32GB @ 3200MHz modules should be left unused because there aren’t four identical 32GB sticks at that speed to maintain balance across both CPU sockets.

Additionally, memory speed is dictated by the slowest DIMM in the system. Since I have 16GB modules running at 2133MHz, the entire memory subsystem on that CPU will default to 2133MHz (more on this later).

Motherboard

As mentioned earlier, I found custom BIOSes on Serve The Home specifically designed for this type of Engineering Sample. Since I’m aiming for a dual-socket setup, my choices are limited to either the Supermicro H11DSi or H11DSi-NT—the only dual-socket motherboards with a compatible custom BIOS. The only difference between the two is networking: the NT version features 10GbE, while the non-NT model has only 1GbE.

Over the past few weeks, I’ve been trying to source this exact motherboard. While there are plenty of listings on AliExpress and eBay, actually getting my hands on one proved to be a nightmare. Every time I placed an order, the seller would reach out via private message or WhatsApp with one of the following excuses:

The price has suddenly increased.
The motherboard was found faulty before shipment.
It’s "no longer in stock."

After several failed attempts, I finally thought I had secured one—only for the seller to scam me by providing a fake tracking number. I had to dispute the purchase with AliExpress, but thankfully, I got my money back.

So, I turned to eBay and found a seller with 38,000+ positive feedback. I placed the order, but guess what? Another excuse. They suddenly couldn’t sell me the board and refunded my order.

In the end, by pure luck, I stumbled upon a listing from a private seller in the USA who was actually willing to ship it to me. I’m currently awaiting shipment, but he has confirmed the sale, so now it’s just a waiting game. To be continued...

Chassis (+ PSU)

Since this server will be co-located in a datacenter, I need a rackmountable chassis. Initially, I considered using the tower co-location option at Skylink Datacenter, but their max tower dimensions don’t accommodate E-ATX boards. So, I had to switch to a rackmount chassis instead.

I went with a 2U chassis for a balance between cooling and expansion flexibility. While 1U is cheaper in terms of co-location, it comes with cooling challenges and is a PITA when dealing with expansion cards. With 2U, I get:

More flexibility for expansion cards
Better cooling due to larger heatsink options (a plus considering the TDPs of these EPYCs)

I managed to find a Supermicro SC826 chassis for €150, which includes:

2U rackmount case
920-watt power supply
LSI 9211-8i controller
SAS2 BPN-SAS2-826EL1 backplane

It should arrive this weekend, so once again, let’s wait and see what I actually bought.

Cooling

The EPYC 7742 CPUs use Socket SP3, so I went with a cost-effective Intertech A-38 cooler. It’s 2U in height and rated for TDPs up to 280W, perfect for this setup. More info: Intertech A-38 Cooler

Storage

I haven’t fully decided on storage yet, so I’m open to suggestions. That said, I do have:

2× 1TB Samsung SATA SSDs (most likely for backups)
Several NVMe SSDs (need to check exact models, but they’re at least 1TB each)

IPMI/KVM

As @beanman109 rightfully pointed out, these are Engineering Sample CPUs, which means they could be unstable. If the server crashes or gets stuck, I’d either have to make frequent trips to the datacenter or pay for expensive remote hands. That got me thinking—how can I work around this?

As we all know, exposing IPMI over the internet is a terrible idea. So, what about using a third-party KVM solution, like Pi-KVM? I actually have a Pi-KVM v3 lying around that I could use, but I’d have to squeeze it inside the server chassis, which feels really janky.

After some Googling, I stumbled upon the NanoKVM PCIe—a low-profile KVM solution with remote power cycling, WiFi support, and Tailscale integration:
🔗 NanoKVM PCIe

Coincidentally, I already own a NanoKVM Lite, so I’m familiar with the product. I know it got bad press recently due to its terrible software, but things have improved significantly since then.

Of course, for this to work, the datacenter needs to provide WiFi without a captive portal, since my only Ethernet port will already be in use. And yes, this does introduce a WiFi dependency, but let’s be real—I’m not running a Fortune 500 company on this server anyway.

Datacenter (Power Costs)

Now, let’s talk about the elephant in the room—where am I actually going to put this thing?

Ideally, I’d like to co-locate the server in the south of the Netherlands. That way, I can drop it off in person and visit relatively easily if something breaks. But cost is also a factor—I mean, we’re on LES after all.

There’s a datacenter just down the road, but it’s way too premium for my budget. Instead, I’m considering Skylink Datacenter in Kerkrade, which I think offers fair pricing.

Budget: ~€50/month (most of which will go toward power—this beast isn’t exactly energy-efficient).

Bill of Materials

So far, here’s the BOM:

Component	Price (€)
2× AMD EPYC 7742 Engineering Sample CPUs	€0
320GB DDR4 "Frankenstein" RAM setup	€0
SuperMicro H11DSi Motherboard	€445
Supermicro SC826 Chassis + PSU	€150
CPU Coolers (2× Intertech A-38)	€100
Storage	TBD
KVM Solution	TBD

Subtotal: €695

Thanks for Making It This Far

I genuinely appreciate your interest in my adventure! 😊 Now, onto some noob questions—I’d love to hear your thoughts:

1️⃣ Memory Upgrade Idea
Due to my Frankenstein RAM setup, my memory is currently limited to 2133MHz. I did some scouting and found 32GB 2400MHz Registered ECC DDR4 modules for €20 each. I'm considering buying four of these to replace my four 2133MHz modules, which would bring the overall RAM speed up to 2400MHz.

What do you think? Worth it?

2️⃣ KVM & IPMI Access in a Datacenter
What do you think of my out-of-the-box KVM idea (NanoKVM PCIe)? Also, how do datacenter providers typically handle IPMI access? Do they only provide physical access, or do they have secure remote access solutions (other than "expensive remote hands")?

3️⃣ Power Estimation for Co-Location
When requesting co-location offers, how should I estimate and specify power requirements?

The CPUs alone have a TDP of 225W each, so in theory, that’s 450W for both.
Add in the disks, motherboard, and some overhead, and I estimate around 550W under full load.
But again, this is theoretical max power draw—real-world usage is probably lower.

What’s the best way to specify this to a provider? Do they typically charge based on peak power draw or actual usage?

4️⃣ How Does IP Addressing Work in a Datacenter?
Since I'll be running Proxmox, I’ll need additional IP addresses for my VMs. At home, my DHCP server automatically assigns IPs to new VMs, but how does this work in a datacenter? Is it based on MAC address?

CamoYoshi · March 13

Memory Upgrades: I know the Ryzen platform especially the Gen 1/2/3 CPUs benefit a lot from higher speed memory. However given the lower clock speeds on EPYC I don't think it will give you a massive gain in performance.

NanoKVM is a neat project but it's not really ready for enterprise usage. It has some security issues that need to be addressed still. You may want to read through this Github issue: https://github.com/sipeed/NanoKVM/issues/301 - To be clear it's a great, affordable project, but I wouldn't just expose it directly to the internet just yet. Even running Tailscale on the thing skeeves me out a little bit... in terms of datacenter providing access to it, depending on the provider it varies on what is costs and how they do it. I've had it to where the IP-KVM was available directly via an IP with IP whitelisting, available over a VPN tunnel, or even proxied through the provider's own billing/management portal. Really just depends on the datacenter.

Typically for single server colo, you'll want to discuss this with the provider; they either include power in the pricing for the rack space up to a certain amount and then add extra $$$ per month if you need more amperage. Other providers may charge based on what you use instead. Colo providers are always wiling to negotiate so it's always worth getting a email thread going with sales.

IP addressing allocation depends on the datacenter and there are many different ways to do it. If you want to have just one public IPv4 for your server, you can use iptables rules to do NAT port forwarding and have a virtual network bridge that all your VMs connect to. I followed this tutorial for a Proxmox VE host I have with a single IPv4 with multiple VMs on it: https://raymii.org/s/tutorials/Proxmox_VE_One_Public_IP.html

AuroraZero · March 15

If you do go the Nat route and need a hand I will make myself available. Even to answer questions I may have a bit of experience in the area

Freek · March 15

Thank you very much for your valuable input—I really appreciate it!

@CamoYoshi said:
Memory Upgrades: I know the Ryzen platform especially the Gen 1/2/3 CPUs benefit a lot from higher speed memory. However given the lower clock speeds on EPYC I don't think it will give you a massive gain in performance.

Makes sense. I'll hold off on any memory upgrades until I get this thing running and determine whether the performance is acceptable or not.

@CamoYoshi said:
In terms of datacenter providing access to it, depending on the provider it varies on what is costs and how they do it. I've had it to where the IP-KVM was available directly via an IP with IP whitelisting, available over a VPN tunnel, or even proxied through the provider's own billing/management portal. Really just depends on the datacenter.

Got it. I'll make a post asking for offers, including optional KVM access, and see what they come up with.

@CamoYoshi said:
IP addressing allocation depends on the datacenter and there are many different ways to do it. If you want to have just one public IPv4 for your server, you can use iptables rules to do NAT port forwarding and have a virtual network bridge that all your VMs connect to. I followed this tutorial for a Proxmox VE host I have with a single IPv4 with multiple VMs on it: https://raymii.org/s/tutorials/Proxmox_VE_One_Public_IP.html

I'm aware of the NAT route and have tried it myself before. However, I would prefer a unique IP address for each VM. Three IP addresses should be sufficient for now—perhaps five in the long run—but that depends on pricing, of course.

@AuroraZero said:
If you do go the Nat route and need a hand I will make myself available. Even to answer questions I may have a bit of experience in the area

Cheers, I appreciate your help!

bingobangobongo · March 15

BliKVM is another option for PCIe - as others mentioned, not sure it’s ready for the big leagues, but you’re down the engineering sample rabbit hole - so surely are prepared for some fun!

crunchbits · March 17

@Freek just to be safe as well, make sure that you don't mix ranks on RAM in the same system. I.e. Dual Rank (2Rx4) with Quad Rank (4Rx4) or any variants of that. If it is your first foray into enterprise stuff, this is a huge oversight that can cause people to think RAM is bad or something else is broken. As long as they're all the same (2Rx4, 1Rx8, 4Rx4, etc whatever they might be) you should be fine and it will run at lowest safe speed. Voltages should also be checked, but most of them will be the same/compatible. Anything 2400MHz/2666MHz is perfectly fine/sweet spot for Rome. More than that is nice, but you won't seriously notice it.

EDIT:

After some Googling, I stumbled upon the NanoKVM PCIe—a low-profile KVM solution with remote power cycling, WiFi support, and Tailscale integration

I've never seen this before, and given the cost + application this might have just solved a massive roadblock to an interesting LE-product I've wanted to push forever. Awesome share.

Brueggus · March 17

@crunchbits said:

After some Googling, I stumbled upon the NanoKVM PCIe—a low-profile KVM solution with remote power cycling, WiFi support, and Tailscale integration

I've never seen this before, and given the cost + application this might have just solved a massive roadblock to an interesting LE-product I've wanted to push forever. Awesome share.

I have two of these in use for few months. Amazing devices and an open platform which can be extended easily. Give them a shot, you won't regret!

Freek · March 17

The chassis has arrived. Apparently this is what I bought:

The motherboard is still stuck in the US of A. It's currently in Carol Stream IL and delayed due to 'severe weather conditions in the mid-west' (?). Anyway, we'll wait and see.

@bingobangobongo said:
BliKVM is another option for PCIe - as others mentioned, not sure it’s ready for the big leagues, but you’re down the engineering sample rabbit hole - so surely are prepared for some fun!

Never heard of this one before, looks interesting as well. Thanks

@crunchbits said:
@Freek just to be safe as well, make sure that you don't mix ranks on RAM in the same system. I.e. Dual Rank (2Rx4) with Quad Rank (4Rx4) or any variants of that. If it is your first foray into enterprise stuff, this is a huge oversight that can cause people to think RAM is bad or something else is broken. As long as they're all the same (2Rx4, 1Rx8, 4Rx4, etc whatever they might be) you should be fine and it will run at lowest safe speed. Voltages should also be checked, but most of them will be the same/compatible. Anything 2400MHz/2666MHz is perfectly fine/sweet spot for Rome. More than that is nice, but you won't seriously notice it.

Thanks for the tip! Noted

host_c · March 17

@Freek

do some memory tests and CPU performance, then throw out the 2133 MHz sticks and redo.

I think you might gain ±20% by doing that.

AMD is pretty sensitive to memory timings and frequencies.

We do not use 2133 MHz not even in Xeon V4's. ( never head actually if I think about it )

Other then that your build looks very nice

Cheers!

wankel · March 18

Thanks for sharing your project, exiting times :-)

remy · March 19

Nice project !
Thanks for all the details

A bit surprised by your budget to colocate this beast.
I would have thought it would be more expensive.

AuroraZero · March 19

@Freek said: The motherboard is still stuck in the US of A. It's currently in Carol Stream IL and delayed due to 'severe weather conditions in the mid-west' (?). Anyway, we'll wait and see.

Just about 5 hours south of me. They always hang there for some reason.

Freek · March 20

@remy said:
A bit surprised by your budget to colocate this beast.
I would have thought it would be more expensive.

Well my target budget is was 50 USD / month, but after doing some quick math I must admit that's not realistic. I'm probably looking at double that, but we'll see once I can finally get some real world power usage numbers.

@AuroraZero said:

@Freek said: The motherboard is still stuck in the US of A. It's currently in Carol Stream IL and delayed due to 'severe weather conditions in the mid-west' (?). Anyway, we'll wait and see.

Just about 5 hours south of me. They always hang there for some reason.

Stop hugging my mobo and get it over here

cainyxues · March 20

@AuroraZero said: Just about 5 hours south of me. They always hang there for some reason.

yeti location leaked

AuroraZero · March 20

@Freek said:

@remy said:
A bit surprised by your budget to colocate this beast.
I would have thought it would be more expensive.

Well my target budget is was 50 USD / month, but after doing some quick math I must admit that's not realistic. I'm probably looking at double that, but we'll see once I can finally get some real world power usage numbers.

@AuroraZero said:

@Freek said: The motherboard is still stuck in the US of A. It's currently in Carol Stream IL and delayed due to 'severe weather conditions in the mid-west' (?). Anyway, we'll wait and see.

Just about 5 hours south of me. They always hang there for some reason.

Stop hugging my mobo and get it over here

I am inspecting it very carefully with my nether regions at the moment. In other words she busy Lil bro

Freek · March 24

Some note-to-self links regarding my quest, in case I lose/forget them:

https://peterkleissner.com/2018/05/27/reverse-engineering-supermicro-ipmi/ (I need to flash the modded bios via IPMI, as the board won't boot without it using the Engineering Samples).

- Same board, contains many rookie/newbie tips.

- Same board, contains some BIOS tips

https://forum.level1techs.com/t/amd-epyc-8004-energy-efficiency-settings-proxmox/207272 - Some EPYC power optimization tips

Freek · March 31

_MS_ · March 31

@Freek said:

Freek · April 3

Mainboard has arrived! Thanks for the free upgrade to 10Gbit NICs @AuroraZero
(I ordered the non-NT variant, which has 1Gbit NICs, but they sent me the NT variant with 10Gbit NICs. Score).

AuroraZero · April 3

@Freek said:
Mainboard has arrived! Thanks for the free upgrade to 10Gbit NICs @AuroraZero
(I ordered the non-NT variant, which has 1Gbit NICs, but they sent me the NT variant with 10Gbit NICs. Score).

Just did it mess with you, makes things more interesting when stuff goes wrong 😆

lapua · April 3

yabs please!

Freek · April 4

Right, we've hit a small bump in the road. Here's the issue:

I need to flash a custom BIOS to get the mainboard to boot with these Engineering Sample CPUs. Since I don’t have a compatible CPU on hand, I need to flash the BIOS via IPMI, as the system can’t boot.

However, the latest Supermicro IPMI version doesn’t allow custom BIOS flashing anymore; it now requires all BIOS files to be signed. And unfortunately, the previous owner of this board (it’s second hand, remember) had already updated it to the latest IPMI version. So now I can’t flash the BIOS unless I either get a CPU that's compatible out of the box, or I flash the BIOS chip directly using a CH341 programmer. Downgrading IPMI is also no longer supported, of course. God, I hate DRM.

Luckily, I have some experience flashing chips directly. I had to do the same with my LG soundbar last year due to a firmware bug, so I still have the CH341 flasher. The catch? I don’t have a SOIC16 clip, only a SOIC8 one from the soundbar job. I just ordered the correct clip, so now I’m waiting for it to arrive before I can continue.

So, nothing that can’t be overcome, just... kind of annoying. On the bright side, IPMI does recognize the CPU, it just refuses to POST or boot without the custom BIOS. So the CPU is somewhat alive, at least...

Not_Oles · April 4

Thanks for posting this, especially with details, links, and screenshots! Good luck! ⭐

Freek · June 8

It's been a while since I last posted an update on my EPYC quest, but rest assured, I haven't forgotten about it. Over the past few weeks, I've been working on it off and on whenever I had some downtime (pun intended). And believe it or not, a lot has happened. Let me summarize my adventure, otherwise it'll turn into a wall of text no one will read.

The SOIC-16 programmer clip finally arrived, which I needed to downgrade the IPMI firmware to allow flashing of unsigned BIOS. However, the programmer itself only supported 8 pins, so I had to remap the 16 pins to 8. Fortunately, this is possible because the chip also comes in an 8-pin variant, and the 16-pin version has some redundant or non-connected (NC) pins. I'm not sure why they used the 16-pin version; it's functionally identical to the 8-pin one. Cost reasons maybe? I'd assume you'd always want the smallest footprint possible, but who knows.

After figuring out how to remap the pins and successfully flashing the firmware directly to the chip using my CH341a programmer, I ran into the next issue: the IPMI wouldn't get an IP address anymore. The Ethernet LEDs were blinking like crazy, and after checking my UniFi console, I saw that the port had been disabled by STP to prevent a network loop.

Turns out the MAC address of the IPMI was invalid; all zeroes. So it wasn't getting an IP assigned. Apparently, this happened because I flashed the firmware directly using a programmer and overwrote the MAC address in the process. The official fix requires booting into the system and using a vendor tool to reset the MAC address… but that would require a compatible CPU, which I didn’t have; hence why I needed to flash the IPMI in the first place, to enable modded BIOS support for my CPU.
My workaround? I edited the stock firmware in a hex editor and injected the correct MAC address before flashing it. Ultra hacky, but it fits the rest of the project.

With IPMI working again, I finally tried flashing my modded BIOS. Guess what happened? I got this obscure error with only two (now three) hits on Google: “BIOS upload failed.” That’s it. No logs, no details, nothing.
I tried different IPMI versions; no luck. I even tried Supermicro's CLI tool to flash BIOS remotely over the network. Still nothing.
In the end, I just flashed the BIOS chip directly. Why didn’t I think of that sooner?
And here’s the kicker: the BIOS chip is 8 pins. I didn’t even need the SOIC-16 clip; I could’ve used the SOIC-8 clip I already had lying around.

But even with the modded BIOS, the darn thing wouldn’t boot. I’ve always been taught to use a minimal configuration when troubleshooting. So I did: one CPU, one RAM stick in the proper slot; still no luck.
I triple-checked the torque (1.58 Nm precisely. Crucial, since it ensures proper pin contact). I swapped CPUs. At this point, I started to doubt the BIOS and even tried a forked version of the modded BIOS; still nothing. I’ll spare you the rest of the headache, but here’s the kicker: Engineering Samples, ladies and gentlemen, only work in PAIRS. On dual-socket motherboards, both sockets need to be populated with engineering sample CPUs; otherwise, the system WILL_NOT_BOOT.

So, finally, the system boots. What's left? Just install Proxmox, overclock it, and call it a day, right? "How hard can it be?" Well… very hard, as it turns out.
Someone on the ServeTheHome forums made an overclocking tool specifically for this chip and board, but it no longer works. Why? Because I’m late to the party. These chips came out years ago, before the kernel lockdowns post-Meltdown/Spectre. Kernel security has tightened significantly since then.
The tool relies on MSR access to modify the kernel, which no longer works on modern kernels (basically anything from the last 3 years). The only way to get overclocking to work now is to use SMU. I’ve got that somewhat working, thanks to Claude Sonnet 4, but it still needs more testing and fine-tuning before I can release the code.

                  PassMark PerformanceTest Linux (11.0.1002)


AMD Eng Sample: ZS1406E2VJUG5_22/14_N (x86_64)
128 cores @ 1434 MHz  |  472.2 GiB RAM
Number of Processes: 256  |  Test Iterations: 1  |  Test Duration: Medium
-
CPU Mark:                          66767
  Integer Math                     658677 Million Operations/s
  Floating Point Math              457474 Million Operations/s
  Prime Numbers                    596 Million Primes/s
  Sorting                          156164 Thousand Strings/s
  Encryption                       133263 MB/s
  Compression                      1413246 KB/s
  CPU Single Threaded              1771 Million Operations/s
  Physics                          6450 Frames/s
  Extended Instructions (SSE)      44431 Million Matrices/s

Oh, and I also discovered that one memory channel (or DIMM slot) isn't being detected. I have all 16 slots populated, but only 15 are recognized. Probably a spec of dust or schmutz on the chip from all the handling. I’ll need to remove the cooler (again) and clean it with alcohol.

So; two steps forward, one step back. To be continued.

And yes, you all warned me. I never said this was a good idea, did I?

AuroraZero · June 8

@Freek said:
It's been a while since I last posted an update on my EPYC quest, but rest assured, I haven't forgotten about it. Over the past few weeks, I've been working on it off and on whenever I had some downtime (pun intended). And believe it or not, a lot has happened. Let me summarize my adventure, otherwise it'll turn into a wall of text no one will read.

The SOIC-16 programmer clip finally arrived, which I needed to downgrade the IPMI firmware to allow flashing of unsigned BIOS. However, the programmer itself only supported 8 pins, so I had to remap the 16 pins to 8. Fortunately, this is possible because the chip also comes in an 8-pin variant, and the 16-pin version has some redundant or non-connected (NC) pins. I'm not sure why they used the 16-pin version; it's functionally identical to the 8-pin one. Cost reasons maybe? I'd assume you'd always want the smallest footprint possible, but who knows.

After figuring out how to remap the pins and successfully flashing the firmware directly to the chip using my CH341a programmer, I ran into the next issue: the IPMI wouldn't get an IP address anymore. The Ethernet LEDs were blinking like crazy, and after checking my UniFi console, I saw that the port had been disabled by STP to prevent a network loop.

Turns out the MAC address of the IPMI was invalid; all zeroes. So it wasn't getting an IP assigned. Apparently, this happened because I flashed the firmware directly using a programmer and overwrote the MAC address in the process. The official fix requires booting into the system and using a vendor tool to reset the MAC address… but that would require a compatible CPU, which I didn’t have; hence why I needed to flash the IPMI in the first place, to enable modded BIOS support for my CPU.
My workaround? I edited the stock firmware in a hex editor and injected the correct MAC address before flashing it. Ultra hacky, but it fits the rest of the project.

With IPMI working again, I finally tried flashing my modded BIOS. Guess what happened? I got this obscure error with only two (now three) hits on Google: “BIOS upload failed.” That’s it. No logs, no details, nothing.
I tried different IPMI versions; no luck. I even tried Supermicro's CLI tool to flash BIOS remotely over the network. Still nothing.
In the end, I just flashed the BIOS chip directly. Why didn’t I think of that sooner?
And here’s the kicker: the BIOS chip is 8 pins. I didn’t even need the SOIC-16 clip; I could’ve used the SOIC-8 clip I already had lying around.

But even with the modded BIOS, the darn thing wouldn’t boot. I’ve always been taught to use a minimal configuration when troubleshooting. So I did: one CPU, one RAM stick in the proper slot; still no luck.
I triple-checked the torque (1.58 Nm precisely. Crucial, since it ensures proper pin contact). I swapped CPUs. At this point, I started to doubt the BIOS and even tried a forked version of the modded BIOS; still nothing. I’ll spare you the rest of the headache, but here’s the kicker: Engineering Samples, ladies and gentlemen, only work in PAIRS. On dual-socket motherboards, both sockets need to be populated with engineering sample CPUs; otherwise, the system WILL_NOT_BOOT.

So, finally, the system boots. What's left? Just install Proxmox, overclock it, and call it a day, right? "How hard can it be?" Well… very hard, as it turns out.
Someone on the ServeTheHome forums made an overclocking tool specifically for this chip and board, but it no longer works. Why? Because I’m late to the party. These chips came out years ago, before the kernel lockdowns post-Meltdown/Spectre. Kernel security has tightened significantly since then.
The tool relies on MSR access to modify the kernel, which no longer works on modern kernels (basically anything from the last 3 years). The only way to get overclocking to work now is to use SMU. I’ve got that somewhat working, thanks to Claude Sonnet 4, but it still needs more testing and fine-tuning before I can release the code.
                  PassMark PerformanceTest Linux (11.0.1002)


AMD Eng Sample: ZS1406E2VJUG5_22/14_N (x86_64)
128 cores @ 1434 MHz  |  472.2 GiB RAM
Number of Processes: 256  |  Test Iterations: 1  |  Test Duration: Medium
-
CPU Mark:                          66767
  Integer Math                     658677 Million Operations/s
  Floating Point Math              457474 Million Operations/s
  Prime Numbers                    596 Million Primes/s
  Sorting                          156164 Thousand Strings/s
  Encryption                       133263 MB/s
  Compression                      1413246 KB/s
  CPU Single Threaded              1771 Million Operations/s
  Physics                          6450 Frames/s
  Extended Instructions (SSE)      44431 Million Matrices/s
Oh, and I also discovered that one memory channel (or DIMM slot) isn't being detected. I have all 16 slots populated, but only 15 are recognized. Probably a spec of dust or schmutz on the chip from all the handling. I’ll need to remove the cooler (again) and clean it with alcohol.

So; two steps forward, one step back. To be continued.

And yes, you all warned me. I never said this was a good idea, did I?

Damn man you are serious on this one!! Gets me jimmies to rustling I love it!!

Not_Oles · June 8

@Freek Really super cool! 😀 Thanks again for posting! 🤩

skorous · June 9

@Freek said: It's been a while since I last posted an update on my EPYC quest, but rest assured, I haven't forgotten about it. Over the past few weeks, I've been working on it off and on whenever I had some downtime (pun intended). And believe it or not, a lot has happened. Let me summarize my adventure, otherwise it'll turn into a wall of text no one will read.

Didn't you just have a little one? How do you have any downtime? ;-)

Freek · July 25

@skorous said:

@Freek said: It's been a while since I last posted an update on my EPYC quest, but rest assured, I haven't forgotten about it. Over the past few weeks, I've been working on it off and on whenever I had some downtime (pun intended). And believe it or not, a lot has happened. Let me summarize my adventure, otherwise it'll turn into a wall of text no one will read.

Didn't you just have a little one? How do you have any downtime? ;-)

Yep, that's correct, and I have indeed had little downtime lately. Although that's not entirely true—I just had other priorities (assembling new IKEA furniture, etc.). I'm hoping to pick up this project again soon. I made notes, so I should (ahem) be able to continue where I left off. Last I recall, the voltage values my ported version sets are different from what I would expect, or from what the original version does.

havoc · July 26

That's a lot of horsepower for not a lot of money! well played

Freek · August 20

You either die a hero, or live long enough to be referenced by ChatGPT:

Since this engineering sample is quite obscure, available information is very limited.

somik · August 21

@Freek said:
You either die a hero, or live long enough to be referenced by ChatGPT:

Since this engineering sample is quite obscure, available information is very limited.

It's not just obscure, it is also highly illegal to remove from the labs and sell to the public. But as we know, when your labs are in china, boxes tend to fall off the back of the truck on the way to the shredder and end up on aliexpress. Since they were never meant for public use, they usually dont have any details on the internet.

On the other hand, those who got their hands on these says they are usually stable enough for production use, probably the last engineering samples designed for load tests and such before the actual CPU production takes place. There are even some chinese motherboards that have the BIOS customized for these kinds of chips.