The buried lede of these is that they have asymmetric L3 cache. One CCD gets the 3D V-cache stacked on top, the other doesn't. That's almost certainly how AMD managed to solve the thermal and clock issues of the 5800X3D - they didn't. The 3D CCD will run hot (at low power) with lower clocks and the non-3D CCD is responsible for reaching the advertised boost clocks.
Note how the 7800X3D, a single CCD SKU, has an advertised maximum clock of 5 GHz, compared to the 5.6/5.7 of the two dual CCD SKUs (the 7700X below it advertises 5.4 GHz). Don't expect the 3D cached-cores to exceed 5 GHz on those other parts.
This means that the performance profile of the two CCDs is very different. Not "P vs E core" different, but still significant. If you run (most) games, you want to put all their threads on the first CCD. If you run (most other) lightly threaded workloads, you'd often want the other, non-3D CCD.
This seems like a rather significant scheduling nightmare to me.
> This seems like a rather significant scheduling nightmare to me.
It's way easier then big.LITTLE or current Intel CPUs with E and P cores, so it should be a non issue.
It's only a relevant significance when you want to maximize performance but most desktop tasks, including many games, do already perform to a degree where you are unlikely to notice the difference without profiling or being very very sensitive.
But then for gaming having more then 8 performance cores is currently useless for most games (assuming you don't do anything "costly" in parallel). It's to a point that for many gaming benchmarks disabling one CCD can lead to slightly better results on a 7950X (due to more thermal and power budged for the other CCD).
Or in other words if you only care about gaming don't go for a 2 CCD CPU it's not worth the money (similar for Intel go for 8 P-cores but going for more E-cores has most times dimishing returns AFIK)
And if you do other things:
On windows AMD will work with windows to make games run nicely.
On Linux it anyway can be a good idea to pin you game to one CCD manually (even for the 7950X; using cgroups, e.g. through systemd). As this should implicitly lead to other services being run on the other CCD (simple due to the first already being utilized highly).
Now highly scalar applications which use both CCDs fully could have some interesting dynamics I guess. But again it should be much more predictable then doing similar across P & E cores.
But there is one fun use-case where this looks awesome for: Embedding a gaming VM in you system. (Dedicate all of CCD1 to the VM, use IOMMU for GPU and NVME pass through).
Good insights here! I wonder how this would translate to a dual-KVM setup, where one KVM is Windows (for gaming) and another is Linux (all supervised by e.g. Proxmox).
If we don't do core pinning, would the "whole CPU passthrough" be sufficient to preserve the scheduler logic in Windows (and any such logic in Linux, if it is or will be available in the near future)?
Alternatively, as you suggested, one could feasibly allocate one CCD entirely to Windows and another to Linux, to optimize each one for performance in gaming and productivity correspondingly. Though for heavy multi-core tasks, passing the whole CPU to Linux would still be preferable, I think; same for Windows if playing strategy games. So this approach might not suit all cases equally well.
400Mhz off a 5.4Ghz clock is a little less than a 10% drop. While it'd be a shame if the scheduler used the "wrong" core, if that's the kind of price for a misjudgement, I don't expect it to be all that impactful in practice. IIRC published results from the 5800x3d also showed that quite a few workloads did have a small benefit from the larger l3 - not enough to make it worth it financially, but enough to roughly compensate for the lost frequency. The set of workloads where you'll see the full proportional slowdown due to the clockfrequency, with zero compensation from increased L3 is likely to be fairly small.
All in all, this sounds more like a technically really interesting challenge, rather than a high-stakes bet. Even a "bad" scheduler probably won't do much harm.
Realistic worst case, quite a few workloads that might have benefited from the new cache don't... but probably aren't hurt much either.
Its interesting to note it will make benchmarking harder. Any benchmark could be 10% faster or slower based on which core your code is running on.
On linux you can use 'taskset 0x1 ./bench' to force a process onto a specific mask of cores. Sounds like this sort of thing will be increasingly necessary if you want to compare code variants on these CPUs with precision.
Good benchmarks are often concerned with core to core latency due to the physical topology of the cores. I imagine we’ll see some expansion into which core complex is in use for a given test and comparing both of them.
If the change causes the scheduler to put it on the cache heavy cores but it turns out the program runs better on the clock heavy cores then this will cause your benchmark to report things are better than ever when the change really did make it perform worse. If pinning really is the best way to get the application to perform well and you want to implement that then the application or environment should be doing the pinning so the benchmark runner can report what's actually happening not what could happen.
Regardless though if you want reliable numbers, even on a CPU with uniform cores, you should be doing multiple runs and comparing average and variance. This will always tell the actual story regardless how the platform underneath behaves that day.
AMD said their drivers/software should handle it. Basically working with Microsoft to get the Windows scheduler make the correct decision. No clue how it would work with Linux though.
Any idea if this is a win 11 thing? I've got a 7700x currently in part because I didn't want to run win 11 in order to support the big.little arch of Intel's latest. When the 12900k and friends came out I recall reading that just win 11 was getting the scheduling updates to support it well.
Indeed, that's Lakefield as seen in the Intel Core i5-L16G7.
The Scheduler changes (in Windows 11 at least) make it extremely pleasant to use on the Lenovo X1 Fold, even if there're some drivers problems (fortunately easy to fix: https://csdvrx.github.io/ )
I was thinking of buying a 5800X3D for my gaming rig (as it is already AM4). Am I reading correctly that I might need to upgrade to Windows 11 to get the full benefits of it?
No, only the 7950x3d and 7900x3d from AMD have this property of asymmetrical cores. This is a new thing for AMD. For the 5800x3d, or the 7800x3d, it's only 1 CCD so there's no asymmetry, and no scheduler accomodations are needed. Likewise, any non-x3d chips are symmetrical.
5800x3d should work fine on everything. The 7800x3d should also work fine. It’s the 7900/7950 that may not run well cos of the 3d stacking only on 1 ccd. So one has it and one doesn’t and that may cause issues.
>This seems like a rather significant scheduling nightmare to me.
I look at this as Zen 5 test bed. Microsoft already said scheduler tweaks will come in Win 11 to better accommodate Zen 4 X3D, and I assume Linux kernel devs have been prepping for a bit.
I agree that Intel is already having to play with optimizing BIG.little with Alder Lake and Raptor Lake. Functionally this is kind of the same for AMD, where you want that cache hungry application all on the same core complex. Not exactly apples to apples, but it's at least still fruit.
This is a good opportunity to get a year or more of battle testing this stuff in the field to ensure its at least up to snuff, as well as plan future iterative improvements.
Having a 5950X myself I am very impressed at its power efficiency. I've tweaked mine a fair bit (plus aggressive memory with manually adjusted timings) and it's been great. Up to 6-cores it can sit at 5GHz all day long (on air!), and for full core workloads with the aggressive memory it's basically on par with a stock CPU if it was 18.5 cores, and is silent in all but the goofiest long-running workloads. I'd kill for more PCIe lanes, but for a budget data processing workstation (40Gbps networking, GPU as a budget DPU), it's been great.
I believe the 7950X3D will have its voltage locked, but in terms of perf/W should still improve over the 5950X. And the 5950X's power efficiency is fantastic.
The Ryzen Pro processors are also fantastic. The 5750GE was a decent clocked 8-core that peaked out of the box at only around 39W, supports ECC and DASH, other nice features like Memory Guard, Shadow Stack, etc. Those are great in 1L form factors where you can build capable physical clusters where nodes can idle at only 9W total power draw.
Modern schedulers like Window's multilevel feedback queue and Linux's CFS (Completely Fair Scheduler) have advanced a lot in the last decade.
It would be annoying for kernal developers to make an exception for individual processors like this, but it's totally doable to make the scheduler handle this case.
Unlike P/E cores you can't handle this generically, because the performance delta is workload-dependent. For E cores you can more or less just punt "background tasks" on them and pull tasks that are using a significant amount of CPU to the P cores.
With X3D vs non-X3D cores which one is going to be faster depends on the workload. Most games benefit (sometimes drastically so), a lot of other stuff doesn't / is slower roughly in proportion to the clock reduction - just look at the 5800X and 5800X3D of the prior generation to see how split the benchmarks are; virtually every synthetic benchmark and most intensive things - compiling, encoding, running JS and other interpeters - suffer, games win.
I suppose a good heuristic for desktop users would be to pin processes using the GPU to the X3D cores.
I don't think it's quite as bad as you're saying, because reaching to the other chiplet for L3 isn't terrible in latency compared to hitting "close" L3. So probably the higher boost cores win overall in either case.
This is a good point. If the cores in the non-stacked CCD miss in local L3, it's still quicker to ask the stacked CCD instead of going to RAM. But GP's question is valid, how do you prefer the non-stacked CCD for certain high-intensity tasks without hardcoding their names/IDs?
So one might look at the stacked cores as extravagant "L4" controllers serving their faster peers that might occasionally throw in some compute of their own for particularly parallel workloads? Seems rather wasteful, but perhaps it's actually less bad than the performance per watt tradeoffs at high clock rates.
I'd love to see a benchmark with the 3d vcache ccd cores disabled and benched against the non 3d part it is based on with one ccd disabled.
Would be interesting to see at comparable clocks what the performance uplift of hitting the huge cache more often on the other ccd vs hitting main memory.
Can it actually hit the cache on the other chiplet? I thought that Zen's L3 caches were somewhat private to the local CCX/CCD, such that threads running on one chiplet have no way to cause new data to be prefetched or spilled in to someone else's L3.
My assumption was the IO die had a directory of cache lines and would route a request to the other CCD if it were present there. You can't evict to a remote L3 but you can snoop it, I think.
Now, thinking about it... you can't evict to the distant L3, though. So, it really depends on the workload and whether the remote big L3 is "warmed" suitably for you.
Yea... Bummer about the hybrid layout, voltage control will be disbled anyway to prevent killing the 3D die so I don't really care for high boost CCD. If it weren't for the EDC VID bug on Zen 3 taming PBO, I'd be running a manual OC to keep high load voltage and temps in check. Might as well have two tamed 3D CCDs.
but also for most daily desktop workloads the difference is irrelevant tbh.
games already should be pinned to one CCD for max performance
compilation and similar will optimally spawn across both CCDs making it not matter (through you want to only split parallel computation units across CCDs but parallel code gen for the same unit should be on the same CCD)
So I don't thinks it matters outside of artificial "challenges" to try to maximize performance beyond what is normally reasonable for a desktop system (due to work/time involved in doing so).
Can the CPU cores in a CCD access the L3 cache of another CCD with higher latency? If so the CCD without extra cache may still get a performance boost.
I know there has been such designs in the past but I don't know how it works in the Ryzen CPUs.
Speed of cache between CCDs has always been much worse than within one CCD.
At the same time, that latency is still peanuts compared to hitting main RAM.
The die with the cache probably has better latency (provided the cache doesn’t connect through the IO die), but lower clocks making it better with memory limited workloads.
The other die will be better at non memory bound work, but should still be much better than normal at memory bound tasks too. I suppose it remains to be seen if lower latency and lower clocks beats higher latency and higher clocks, but I suspect 10% higher clocks won’t compensate enough for cache hits being several times faster.
I'm hoping someday there will be an embedded Linux processor with this much cache. 128MB on-die SRAM means the PCB would no longer need separate DRAM. The complexity of the board routing would also go down. That much RAM ought to be enough for a lot of embedded applications.
The economics don't work out. Why would you avoid something as trivial as board routing, as cheap as $2 per gigabyte DRAM, and as performance-enhancing as having gigabytes of main memory, just to use a 128 MB on-die (or on-package) SRAM (at a price of ~$500/GB?)?
The main distinction between application processors that can run Linux and microcontrollers that use onboard RAM (and often Flash) is that the former have an MMU. It's attractive to imagine that your SBC might only need something as simple as a DIP-packaged Atmega for an Arduino, and I can imagine a system-on-module - actually, saying that, I think several exist, ex. this i.MX6 device with a 148-pin quad-flat "SOM" with 512 MB of DDR3L and 512 MB of Flash:
Whether you consider that Seeed branded metallic QFP (which obviously contains discrete DRAM, Flash, and an iMX6) to be a single package, while a comparably-sized piece of FR4 with a BGA package for each of the application processor, DRAM, and Flash on mezzanine or Compute-module style SODIMM edge connectors would not satisfy your desire for an embedded Linux processor with less routing complexity, I don't know. They build SOMs for people who don't want to pay for 8 layers and BGA fanout all the time.
I don't think there are enough applications for embedded systems that need 128M of onboard SRAM that won't support the power budget, size, complexity, and cost of a few GB of DRAM.
There is a use case when you can improve performance by keeping compressed (LZ4) data in RAM and decompressing by small blocks that fit in cache. This is demonstrated by ClickHouse[1][2] - the whole data processing after decompression fits in cache, and compression saves the RAM bandwidth.
You're correct but that is still a niche segment because markets that need 128MB of super-fast memory are almost always happy to pay a little bit more to get 4GB+ of "L4" (aka DRAM).
The economic point stands that you aren't going to get a processor with only cache and no RAM because virtually no workloads want such an unbalanced system.
As SSDs get faster and L3 caches get larger, will conventional RAM get squeezed out? I know Optane failed a few years back, but that kind of convergence seems inevitable in the long term.
> The economics don't work out. Why would you avoid something as trivial as board routing, as cheap as $2 per gigabyte DRAM, and as performance-enhancing as having gigabytes of main memory, just to use a 128 MB on-die (or on-package) SRAM (at a price of ~$500/GB?)?
Size? But then vendor could just ship the CPU+RAM stacked on top of eachother.
That would be incredibly inefficient. The price difference of such a 128MB L3 + 0GB DRAM, compared to let's say 128MB L3 + 2GB DRAM would be quite small, and in practice the performance would be much much higher because realistically in your 128+0 setup you'll be wasting easily half of that on the OS or libraries data that isn't actually needed at the moment, whereas having DRAM you can actually use the whole 128MB of L3 for things that need to be fast.
It's also extremely niche to have a workload that requires such high CPU performance, but that it would fit including a linux OS in 128MB. Usually something like that is FPGA or DSP territory.
I think what you want is a cheap ARM CPU with DRAM stacked on top of it on the same package (which exists).
If you just want to reduce board complexity (what a hobbyost/maker/homebuilder dream that would be), there's lots of package-on-package and system-in-package offerings already!
AllWinner V3s, S3. Theres a SAMA5D2 SiP. Bouffalo BL808 (featured on the Pine Ox64). There's a lot a lot more. I think there's a couple with even more memory too.
Intel's Lakefield, with Foveros stacking, was an amazing chip with 1+4 cores and on chip ram. High speed too, 4266MHz, back in 2020 when that was pretty fast. This is more for MID/ultrabooks, but wow what a chip, just epic: add power and away you go. Ok not really but not dealing with routing (and procuring!) highspeed ram is very nice.
Intels been doing such a good pushing interesting nice things in embedded, but the adoption has been not great. The Quark chips, powering the awesome Edison module, had nice oomph & Edison was so well integrated, such an easy to use & so featureful small Linux system... wifi & bt well well well before RPi.
It would be fun to see DRAM-less computers but I more imagined them being big systems with a couple GB of sram. There's definitely potential for low end too though!
I want to build my DYO usb keyboard, that including 64bits RISC-V assembly coding of the keyboard firmware.
I have been lurking on the Ox64 for while but I need a few more green lights:
- Is the boot rom enough to fully init the SOC? Aka, I don't need to run extra code I would need to include in my keyboard firmware on the sdcard.
- The hardware programming manual misses the USB2 controller with its DMA programming. Even with some SDK example, you would need the hardware programming manual to understand properly how all that works.
- I want to run my keyboard firmware directly from the sdcard slot, and that directly on the 64bits risc-v core, possible? (no 32bits risc-v core).
The SDCard is not listed as a supported boot target, though you could almost certainly build a small bootloader that's stored in the qSPI flash and then load the rest of your code into RAM from there.
I'm not the most familiar with it, but I believe all hardware init (setting clock source, initialing USB, GPIO, etc.) is handled by the flashable firmware of which there are open source SDKs for.
Then, it means I would need to flash my keyboard firmware, or a SDcard loader firmware.
I guess this is a "standard" flashing protocol over usb, enabled by the right button pressed at power on (plugging the USB cable). Would I need to including the flashing support code into my keyboard/SDcard loader firmware or is it handled separately by a different piece of hardware?
Any specs on the format of the firmware image, to know which core will run the real boot code?
I'm not sure you could flash over USB either without significant work. There's no UART <-> USB device on the Ox64, so you need to use an external one connected to some GPIO pins. You could maybe build a DFU mode yourself, but I'm somewhat skeptical it would work (though it might be possible, there's 3 cores in the thing). Despite there being not one, but TWO USB ports on the Ox64, neither are used for flashing. The micro USB type B connector is only used for power delivery and the USB-c is primarily intended for being a host device, ie. for plugging in a camera module.
Edit: to clarify, there's a bug in the bootrom that prevents the initialization of the USB device. Newer revisions of the Ox64 may fix this.
You can flash it, but you need to use GPIO pins and UART to do so.
The bootrom bug only prevents you from flashing via the on-board USB-c port. You can use a separate USB <-> UART device plugged into GPIO pins to boot/flash.
I'm told it's a bug in the chip from the upstream supplier (BL808 by Bouffalo Lab), so there's not much they can do. IMO it's not a huge deal, and like I said, you can both flash with UART over GPIO pins or implement a bootloader yourself
I mean, you'd need to solder headers onto the board, but otherwise not really. There's JTAG headers pre-soldered that might work too, but I'd have to look at which UART/CPU they're hooked up to.
I don't think that's entirely true, but the 'p' is pSRAM does stand for 'pseudo' and does have refresh circuitry and is slower than true SRAM. By how much, I have no idea.
Might be worth taking a look at the announced Intel Xeon MAX chips then. I watched a video on it last night and these new server CPUs have a boatload of memory on the chip and can actually run without needing external DRAM.
No change. SRAM got almost no process improvements compared to 5nm. And 5nm had minimal compared to 7nm. So 3nm has “3nm”-class small transistors and 7nm class SRAM.
Going based on AMD's first generation V-Cache (TSMC 7nm), you could get 1GB of SRAM onto a die slightly larger than a top of the line NVIDIA GPU. 2GB would be too large to fab as a single die. Or you could spend several million to get a Cerebras Wafer Scale Engine 2 with 40GB of SRAM in aggregate and a ton of AI compute power all on one wafer.
Excited to see massive performance improvements in desktop hardware again. 7950X3D + an RTX 4090 + a PCIe 5 SSD will be quite the system for programming, machine learning, video editing, music production, gaming...
> 7950X3D + an RTX 4090 + a PCIe 5 SSD will be quite the system for programming
Agreed, but not for testing, please. Too much stuff out there already seems to be designed for or tested on what might as well be supercomputers like the above, and then get shipped out to run on Grandma's 7 year old <Misc Manufacturer> laptop.
The same thing happens with displays. Designers and devs are using things like high-end Dell Ultrasharps, LG UltraFines, the integrated screens in MacBook Pros, iMac 27" displays/Studio Displays, Pro Display XDRs, etc but in reality a huge number of the screens in use are things like those terrible 1366x766 TN panels that dominated budget 10"-17" laptops for over a decade. I'm fairly confident the low-contrast flat design trend that's been going for a while now wouldn't have been a thing if it were a requirement for UIs to be usable and reasonably good looking on a crappy $250 Walmart special Dell Inspiron from 2012.
The wonderful thing about webpage design is your user don't give a f**k about color correction. Monitors nowadays are almost shipped with a default configuration of high saturation, high contrast, high color temperature. To the point that I think standard srgb 'paper white' is pointless for webpages. Because virtually nobody except designer themselves will enable it on their monitor.
Maybe some group should get consensus about modern monitor 'paper white' so at least everybody has a daily monitor in same setting no matter how good/bad his monitor is.
I have a 6-months old laptop with the latest intel processor and many of these web 3.0 websites (including alchmey own website) are too damn slow to be operable.
This is an odd take. Why shouldn't software benchmarks be showcased with the best available hardware? How am I supposed to reason about MySQL performance results taken from a run-of-the-mill dual-core machine from 2015?
I think they intended to refer to software development QA, not benchmarking. I do agree with the sentiment that consumer software should generally be usable on 10 year old hardware. "Generally" carries a lot of weight here, because intensive applications like CAD will always be laggy on old hardware.
I'm on a 3970x, 3090 and nvme-pcie-4 x 4 (raid 0), 256gb ram... I haven't found a need to upgrade, or have reached this systems potential by any stretch. especially for video editing, music production and photography and compiling...
I dunno the only direction I care for at the moment is TDP
I'm on an 8+ year old workstation with an i5-4460 (quad core, no HT) 3.2ghz, 16gb memory, first gen SSD and I also don't feel the need to upgrade.
All of the web apps I build inside of Docker with WSL 2 reload in a few dozen milliseconds at most. I can edit 1080p video without any delay and raw rendering speed doesn't matter because for batch jobs I do them overnight.
Writing to disk is fast and my internet is fast. Things feel super snappy. I've been building computers from parts since about 1998, I think this is the longest I ever went without an upgrade.
> Writing to disk is fast and my internet is fast. Things feel super snappy. I've been building computers from parts since about 1998, I think this is the longest I ever went without an upgrade.
I did like you: rocking a Core i7-6700 from, what, 2015 up until early 2022. 16 GB of RAM, NVMe PCI 3.0 x4 SSD, the first Samsung ones. I basically build that machine around one of the first Asus mobo to offer a NVMe M.2 slot.
It was and still is an amazing machine. I gave it to my wife and I'm now using an AMD 3700X since about a year and... I'll be changing it for a 7700X in the coming weeks (hopefully).
The 3700X is definitely faster than my trusty old 6th gen core i7 but Zen 4 is too good to be true so I'm upgrading.
All this to say: you can stay with the same system for seven years then upgrade twice in less than 12 months!
Individual cores on the 7950x are roughly twice as fast as what you have. If your work isn't massively multithreaded, then a 7950x will likely perform quite a bit better than a Threadripper 3970x. (Though of course you won't be able to get more than 128GB RAM.)
This is with clamping down TDP on the 7950x to roughly 110W. It's an absolute beast.
Keep in mind that 128 GB barely works on AM5 (basically, EXPO/XMP doesn't, only regular JEDEC speeds, i.e. 4800 MT/s) [1] 2 sticks of RAM up to 64 GB is the fully functional maximum.
Are you saying that 2x32GB is the max that allows for EXPO/XMP? I'm thinking about building a system like that and this is critical. Do you have any better resources that explore this more and test out different configurations?
That's my understanding, yes. All my knowledge is from random Reddit threads however.
Since I was scared off 128 GB by all those complaints, I built a 7950X system with 2*32GB memory, and it runs well/stable at its rated 6000 MT/s with EXPO on (also, AMD says 6000 MT/s is the sweet spot for AM5 memory interface, so I am happy).
Running with two DIMMs per channel is more challenging for the CPU's memory controller than one DIMM per channel, and it's much worse if they're dual-rank DIMMs—which current 32GB DDR5 modules are. I believe the EXPO/XMP profiles are all based on using only two DIMMs total, because those modules are sold in kits of two. At some point in the next year, there should be 24GB single-rank modules hitting the market, allowing for 48GB at top speed or 96GB with less of a performance penalty than 128GB from 4x32GB currently causes.
The CPU is alright, but the motherboards are lacking. You can install ECC memory modules on them and the system will boot, but you don't know for sure if ECC is really working unless the motherboard reports memory status like how many errors were correct.
I'm on a 5800X with a 3080ti and 64GB, also PCIe 4.0 SSDs. I built in in 2019 then upgraded the CPU a year ago from a 3600x to the 5800x and it's so insanely fast for everything I do, including VMs, compiles, transcodes, etc. Nutty performance. The past few years have had amazing performance boosts.
AMD sticking to one socket (AM4) for multiple generations was a great move for end users.
I am also on 3600x and just yesterday was looking to order a 5800x since the prices have dropped a lot and the performance gain is around 50% just by upgrading one component.
That is my main reason to upgrade, even from 5950x (which is still selling very well). "Future-proof" for 4-5 years, with easy upgrades. Only problem is, Intel is also now claiming LGA 1700 will be here for a couple more years. But nothing compared to what AMD offers.
We are already seeing SSDs with over double the read speed and ~70% more write speed. You don't see how that could make a difference? Even with 128GB of RAM on a workstation there are a lot of data sets that do not fit.
Full-access database queries? Indexed queries are going to randomly read blocks and PCIe 5 doesn't help. The only way I've perceived improvements from SSD linear access > 4GB/s is when doing gigantic ETL jobs that were not CPU-limited. Unfortunately a lot of ETL jobs are CPU limited and some of the formats I work with are too intensive on the compute side to benefit from faster storage (like GeoJSON, ugh). Formats that are more CPU-friendly also tend to be smaller, making the storage less relevant.
I think for general use most desktop and workstation users are going to get more benefits from faster random access and won't notice very high linear access speeds. I have two recent SSDs and one of them has a median access time of 12µs and the other has a median of 30µs. Even though the latter has gaudy benchmark numbers, can stream at many GB/s and can ultimately hit almost a million IOPS, higher-level application benchmarks lead me to prefer the former because random access waiting time is more important.
I was mainly talking about my usage patterns, not every desktop or workstation user. I work with a lot of data daily.
Something more general might be game asset load speed. Those are often sequential reads. They put a ton of engineering effort into the latest consoles simply to improve that one thing.
> They put a ton of engineering effort into the latest consoles simply to improve that one thing.
Most of that effort was to ensure the processors could actually ingest data at the speeds that off the shelf SSDs could deliver it. The Xbox Series X shipped with what was a low-end NVMe SSD at the time, and the PS5 used a custom SSD controller primarily so they could hit their performance targets using older, slower flash memory rather than being constrained by the supply of the faster flash that was just reaching the market at that time.
And there are still hardly any games that even make a serious attempt to use the available storage performance.
moving from spinning rust to SSD was a huge improvement in day to day life. i still see lag (on my very fast machines) opening programs or loading data on my laptop that would be highly improved by doubling the speed of the hard drive. and i’m not even counting compiling or things that we might do that is different than other users.
That's because SSDs had way lower latency. Upping the pcie gen will speed up sequential reads which were already really fast, but won't speed up random reads (or reduce latency) at all.
Right but nvme are at the level of speed where you need to optimize apps for it else they get CPU-bound. Difference between mid and high-end NVMe for most user stuff ought to be minuscule
PCIe 5 isn't what helps here. it's that nvme/ssd use very few lanes of PCIe (usually 2/4). they could have achieved these speeds already, but now it's just easier with a smaller slot.
I’m on a 5950X and AMD 6600 and getting stutters on Fedora 37 with my two 4K displays. The weirdest thing is this wasn’t an issue with my Intel NUC with measly 6th gen Iris graphics :(
Gnome Wayland, unless I made a mistake somewhere. It’s so random it’s difficult to determine more than “the desktop stutters sometimes especially when opening the overview.” I was having issues with my 5700g iGPU with 2x4K monitors as well - that required a kernel patch to fix.
Kind of disappointed an Intel NUC from 2015 handled the displays better.
Question for people that read deep into these things. What is the best CPU for the linker stage of compiling? I would assume these high performing single core "gaming" CPUs would do well, but I must admit I care a lot about power efficiency as well. I have a 3900x and love that CPU for nearly everything but waiting on optimized builds.
The problem is that most default linkers don't scale well with core count, and throwing hardware at the problem is kind of a waste of money since single core execution speeds only present marginal gains. If you use a linker like mold instead of ld you'll see orders of magnitude higher performance.
I went Intel 12th gen and I have buyer’s remorse. My next upgrade I’ll go back to AMD. Every 12th gen I’ve touched has little stuttering issues. Last week I was playing around with laptops at the store with 13th gen and they also get random stuttering issues too. A lot of people have said to me they don’t get any issues but going into a store and opening up YouTube with a 4k video and moving the mouse it stops and then continues. The AMD laptops don’t do this at all.
I can't speak to the "best" CPU, but I use a 7950x for rust development work in linux. Its probably overkill - The rust compiler doesn't even feel slow anymore for me on this machine.
The 7950X3D looks interesting, but I think the cache is on one CCD so it might only boost to 5GHz and the other CCD boosts to 5.7 as it doesn't have the cache.
If you want a dedicated server rather than a VM, yes. WebNX / Gorilla Servers, for example. But 7950X dedicated servers are currently pretty expensive from them - about twice the cost per month of a 5950X.
Yeah. They put cache on top of the cores. Heat dissipation is handled by lowering the TDP a little and the fact that the cache doesn't produce much heat.
Neat. Really struggling to justify any sort of upgrade my side to be honest though. 3700x is still plenty fast. Maybe when 4K gaming is a bit more mainstream…
I'm running a 5800x and a 3080 Ti, primarily for gaming and a bit of machine learning (for everything else I use a Mac). As I game at 4K there are definitely scenarios where upgrading would be helpful; games like Cyberpunk, A Plague Tale: Requiem, or The Callisto Protocol don't run particularly well on my setup. And ray tracing in general can be a challenge.
Of course, 4/5s of what I play are indies that would run on a potato, so it's probably not worth upgrading. I nevertheless probably will build a new computer later on in the year, and a 4080 / 5800x3D combo is awfully tempting ...
Not usually, but it can be, particularly when ray tracing is added to the mix. Of the three games I mentioned The Callisto Protocol is very much CPU limited (though more so with more powerful GPUs).
I think the main scenario where it starts mattering is with a 4090 :-) I.e. when the GPU is so powerful that the CPU becomes the bottleneck again. Not in every game of course, but there're some.
Yeah I’m still trying to decide between upgrading last-hurrah style vs retiring it to home server duty. Pcie gen 4 board so fast storage server would be feasible
I was looking forward to hearing about the next x3D CPUs and I'm relieved to hear they're finally coming. I'm excited about the benchmarks after seeing how capable the previous one was.
Wonder how long we have until they make chiplet without L3$ and rely only upon 3D V-Cache. That would allow to make chiplet either roughly 2x smaller or pack twice as much cores.
All the V-Cache parts are currently premium parts. Dropping the L3 from the underlying CCD chiplet would force them to incur the extra 3D packaging costs on all the products using those chiplets. It would also put the V-Cache directly over the CPU cores, where the current products only stack V-Cache on top of the L3 portion of the underlying CCD.
I don't expect them to make this kind of change anytime soon.
It made sense to me - an Uninterruptible Power Supply should be sized according to the power draw of a computer. If these were power hungry CPUs, you would need a really beefy UPS. So that's what I thought they were trying to say!
I've trained myself to look at the cost from a different perspective. If I spend 5K on a monster rig once every 3 years, the cost is around 1600 dollars per year. I'm spending approx 130 dollars a month or 4.35 dollars a day to drive a Rolls Royce.
4.35 a day isn't cheap, but I was spending more than that on starbucks everyday. So I stopped spending 10 dollars a day at Starbucks and opted for the rolls royce. I have a nicer rig, I'm healthier not drinking starbucks, and I'm saving a little money even if we factor in electricity costs.
That's why I still use a ~15 year old Supermicro server chassis with desktop foot kit on it lol. I've upgraded the innards 2 or 3 times so far. Has 8 hotswap bays, tons of room. no fucking RGB or "windows" or any of that bullshit.
I just built an i9-13900K replaced the LGA1366 2P board in it. It's an astounding performance bump, to mildly put it lol. I plan on using this chassis with this setup for probably another 5-10 years before upgrading again.
The board having Gamer-RGB should not be a problem as long as you can disable it on the BIOS. It would be like any other "extra" hardware your board has that you don't use (for instance, extra headers for front panel USB ports you don't have, or more SATA ports than you need).
The case having a transparent side panel, on the other hand...
My biggest gripe with most lowend-to-midrange motherboards is that they practically always have disappointing rear I/O, which is ridiculous to me particularly on mATX/ATX boards, because part of the point of having a desktop instead of a laptop is to have a ton of ports and little to no need for hubs/docks/etc.
So I tend to end up buying something on the higher end even if I'm not using all the board's features. My current tower uses an ASUS ProArt X570 Creator.
The younger generation of programmers seem to think that a $1000 computer is crazy expensive these days. How spoiled we have become! When I built my first computer 30 years ago, the components cost just over $2K (and the components you wished you could afford were closer to $3K) and that is not inflation adjusted.
That wasn't the first computer I owned, just the first one I built. I am sure some really old timers remember the ones that cost more that $5K.
I think things could be seen as a little pricey nowadays compared to ~25 years ago, mainly due to video cards.
I built about half a dozen machines from parts since 1998 and it was almost a constant to spend $650-800 for a mid-range machine for about 20 of those years. This includes everything but a monitor, but it does include a decent video card.
His part list for $915 doesn't include a video card. With today's market a mid-range card will put you at a grand total of about $1,250-1,300. That's approaching 2x the cost of a solid mid-range machine that you could build 8 years ago.
Here's a couple of line items from my last part purchase in 2014:
The crazy thing is, here we are 8+ years later and that same CPU cooler is $45, that's ~23% more expensive than almost a decade ago.
The exact CPU and SSD aren't worth price comparing because no one would buy them today since we have way better hardware at similar price points. For ~$180 you can grab a Ryzen 7 5700G and a 1TB SSD is about the same price as 256GB from back then. That feels like it's on a higher end of mid-range for today. It's really video cards where you get killed.
Yup - I used to budget $200 - $250 for "last year's flagship" or something midrange.
Spent $290 in April 2020 for a Radeon RX 5600 XT - a few steps below the top end $400 5700 XT!
The 6600 XT is actually about that much now, which isn't awful. But it's also increasingly far from the top end (behind the 6700 XT, 6800, 6900 XT, 7900 XT, 7900 XTX) and those range in price from $350 to $1000!
Yeah, a bunch of years ago $200-250 would get you a really nice card relative to what's available at that time.
I lost my invoice for my video card but it's a GeForce GTX 750 Ti. I'm pretty sure it was around $150 went I got it. It's not the best thing in the world and today it's good enough to casually play some older games. It has no problem powering multiple 1440p displays which is why I continue to use it. For its time when I got it in 2015 it was maybe middle of the pack as a mid-range card. Playing those era games at 60 FPS at 1080p is fine.
Nowadays, forget about getting anything near that price point even though every other piece of hardware has huge upgrades at similar price points as back then (as seen with my previous post on CPUs and SSDs). It's especially bad too because almost 10 years has passed. Expectations have rose. Nowadays you would hope to be able to have a 1440p display running at 120+ hz and be able to play games at that resolution at 120+ fps.
Today you're looking at something like $300 to get a comparable card relative to what's available today vs back then. I haven't done the research on what's the best card in that range but a GTX 2060 is about $300 nowadays but it also retailed at about that price in 2019. In theory it should have dropped by now since there's been multiple new generations since then.
Exactly. I see $1k for all that as a steal. I remember spending a lot of money on LVD SCSI in one machine I built to do anything to speed up disk access.
I feel like CPU prices aren't particularly bad right now. GPUs are awful, though, with prices going up much faster than even our currently high inflation, and price to performance ratios that have barely budged since last generation. I suspect we'll see AIB 4060s for sale at or above the $600 mark.
I've seen a bit of apologia for this, claiming that costs have risen so of course prices would increase as well. But prices are going up so much faster than inflation that I'm not sure that passes the sniff test. We should see 4080s for $850-$900, not $1200.
All of this just in time for 30-series supplies to dwindle ... new 3080s are back up to the $1k mark. Sigh.
Unfortunately, it is hard, especially at an individual scale to say "I'm going to save up these $4000 over the next year and a half in productivity". Doubly so if you're just salaried and your pay will come, whether you use the shitty company laptop that takes 15 minutes to build or your crazy home rig.
I'd love for me to be able to just run all my builds on full blast on a 24 core beast at home, but if your interests also include games, you're looking at multiple thousands just to upgrade. Needless to say, my apartment, family, vacations and a dozen other things are well above in the priority list.
I upgraded to an X670E platform back in October. CPU+Mobo+RAM ran me up to $750 with a bundle from Microcenter (essentially, I got the RAM for free), and I reused my case and power supply. To upgrade to a "24 core beast", as close as you can get with the 7000 series is the 7950X (16C/32T), which I can get on Amazon for $550, minus the aftermarket price for my 7700X (and possibly my old rig, which was a Z170 intel platform). Even if I kept my old processor, a top of the AMD line rig costs about $1250, memory capacity not withstanding. Of course, that says nothing about the additional electricity costs, and I understand that my prices are not obtainable in most of the world unfortunately, but I think $4000 is a stretch unless you needed a load of RAM (it'd be about $600 to max out my motherboard to 128GB).
* $60 for a cooler because the Wraith Prism is shit.
* $200 for a motherboard, maybe $100 if you want to try to cheap out and get something that might have unstable voltages, shit PCIe bandwidth or no NVMe slots, etc.
* $200 for 32GB of quality DDR5-4800 (or 5600 if you want to be fancy), and that's easily used up these days.
* $200 for a quality 750W power supply
* $100 for a case with good enough airflow, assuming you don't already have one.
* $200+ for at least 1TB of NVMe storage, easily much more
So, assuming a new build that didn't get incremental upgrades in the past, building a new, powerful PC these days is going to run you $1500. Without even picking any top of the line stuff. Guess what didn't get included in there ? GPUs with their bloody ridiculous prices. If you're going with NVidia, this 4000 generation is a waste if you're buying anything but the 4090. You could absolutely buy a 4080 (or rather a 4070Ti), or a 4070, but they're such a horrible deal in terms of price/performance. And that's going to cost you at the absolute least $800 (for a 4070, which is a dogshit card). Or you can try to find a series 3000, but that's also going to run you $1000+. If you're going with AMD, your problems are similar, for cards that are really subpar. As for Intel, well, let's just say an A770 with your high end CPU might cause a few bottlenecks here and there.
So, yes, if you're lucky enough to find deals _and_ to have stuff that you can still use from an old, recent rig, sure, building a new PC isn't _that_ expensive. If you have to do major upgrades, you're looking at multiple thousands. Pulling that much money out in one go for something that is ultimately not extremely necessary is something that can only be afforded by a very small percentage of people.
The iGPU this gen is a godsend. I can _finally_ build multi-KVM solution with dedicated GPU passthrough for Windows gaming and the rest going over to the Linux VMs sharing the iGPU over VirGL and being remotely accessible via Moonlight/Sunshine! Best of both worlds.
The only compromise compared to the last gen is unfortunately the RAM - 64GB this gen compared to 128GB last, until they sort out DDR5 4-DIMM dual-rank configs..
You don't need to upgrade everything at once. Yes, modern high-end GPUs are ridiculously expensive. But my $200-5-years-ago RX480 is also still holding up just fine. It's still more expensive than it ought to be, but I can find it used for around $80 these days.
And if you already have an older rig… just bring it over from that.
If you haven't upgraded in a long time, the CPU+MB+RAM combo is a necessary upgrade. You also very likely need to upgrade your PSU, because the 550W of old will not be enough. And you most likely need to upgrade your GPU, unless you want to slap a 10 year old GTX970 to keep playing recent games in 720p with your 24 core CPU. You can bring in hard drives/NVMes, and eventually any additional PCIe cards.
So, no, any upgrade not done in the last 5 years means pretty much a full rebuild now. New sockets + DDR5 means that your old build is going to hit a brick wall.
Yes, CPU+MB is usually a package deal (aside from AM4's relative stability, but even that did come with a lot of asterisks). RAM tends to last a bit longer, but yes, we are in the middle of an annoying transition period right now.
Your old PSU might be insufficient for the new build, but it's not like 750W PSUs are a new thing (or particularly required if you're not doing a high-end build, even today). Power efficiency tends to yoyo around over the years.
No, you absolutely do not "need" to upgrade your GPU, and certainly not immediately. Your old GPU will not be doing a worse job in the new computer than it was in the old one. And you can always replace it later on, should the need actually arise.
I think a core part of the misunderstanding here is around expectations. The build plan for "a top-spec gaming computer" is going to look pretty different from "a top-spec dev computer that I can also game on". But, if anything, gaming has hit seriously diminishing returns in the last couple of years. It's quite hilarious and sad to see NVidia try to make real-time path tracing and 4K gaming into things, in a desperate bid to make GPU performance a relevant factor again.
I have a 20 core CPU (13600), it is already much faster in compiling large programs: it took me ~15mins to build linux kernel. Even if some future CPU can speed up the task by say 3X, I still have to wait for 5mins, and will be distracted. Perhaps the difference would be larger for tasks that are much more time consuming than this, say 3 days vs 1 day.
About right, did it two years ago with 2 Sockets, 128Cores (64 core per socket), 256 Threads 2 Epyc Motherboard with 1TB DDR4. Build Kernel < 90 seconds. Should be faster nowadays....
Same here. I just upgraded to a 5950X last year (I try to go at least 3 years between upgrades) but I am salivating already.
I fully expected my data management software to be much faster on the new hardware since it uses multi-threading to make use of all those wonderful cores, but it blew me away just how fast it is. https://www.youtube.com/watch?v=OVICKCkWMZE
I might have to bite the bullet later this year if the 7950X3D benchmarks look really good once it releases.
Imagine the 9950x, that can be a good reason to want to upgrade them.
I'm on a 3900x and 2070 Super. I mostly reduced using the desktop when I got the M1 and M1 Pro devices. I now use it remotely from another room in the house.
I'll hold out despite the temptations, and only update in 2-3 more years from now.
I'm on a 5950x as well but I'll be sticking with it for many years to come as it's still one of the most efficient chips out there - source: Gamer's Nexus benchmark videos.
Note how the 7800X3D, a single CCD SKU, has an advertised maximum clock of 5 GHz, compared to the 5.6/5.7 of the two dual CCD SKUs (the 7700X below it advertises 5.4 GHz). Don't expect the 3D cached-cores to exceed 5 GHz on those other parts.
This means that the performance profile of the two CCDs is very different. Not "P vs E core" different, but still significant. If you run (most) games, you want to put all their threads on the first CCD. If you run (most other) lightly threaded workloads, you'd often want the other, non-3D CCD.
This seems like a rather significant scheduling nightmare to me.