Why I am not gonna buy the NVIDIA DGX Spark

What is the hottest pc for LLM inference at the moment? I looked at the NVIDIA DGX Spark, the Apple M3 Ultra Mac Studio, the Apple M4 Max Mac Studio, the Framework Desktop with the Ryzen AI MAX 395+ and the Acemagic F3A Mini PC with the Ryzen AI 9 HX 370 as my top picks for the title.

In the video I go over the essential specs of each machine, before going into their respective llm performance expectations. We discuss how difficult it is to guess real LLM performance and what are driving factors and how vendors are optimizing their stats to convince us of the new shiny thing. Finally, I give you my personal take on the current state of affairs.

If you could run this command on your machine and share the results in the comments, this might help others as well with their decision. I know this provides a very limited insight, but it gives people an idea how systems compare:

ollama run --verbose llama3.1:8b-instruct-q8_0

Ollama needs to be downloaded and installed, in case you didn’t know it (https://ollama.com/)

Erratum

I missed in my analysis that with a bigger context window usage, the computational performance becomes significantly more important. I missed to address this and thus the better hardware and software support of NVIDIA was under-represented. Thanks http://www.youtube.com/@piotr6967 and http://www.youtube.com/@KiraSlith for pointing this out!
There is no USB 5, but more correctly Thunderbolt 5. Thanks http://www.youtube.com/@DSDSDS1235
15:47 Should refer to M4 Max Mac Studio and NOT the ULTRA - thanks to http://www.youtube.com/@erikjohnson9112 for this one.

Links from the video 🔥

Why I am not gonna buy the NVIDIA DGX Spark

Erratum

Links from the video 🔥

NVIDIA DGX Spark

Apple Mac chips

AMD

Reference Model used:

NVIDIA Specs