Rendered at 00:33:36 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
mikeayles 3 hours ago [-]
So for people wondering if it can be used to accelerate LLM inference, sadly not.
I've been trying to hit 100,000tokens/s with a 3.28m dumb model, and even this is an order of magnitude too large to benefit.
It appears to be focussed more on latency, than throughput. Happy to be corrected?
ai_fry_ur_brain 10 minutes ago [-]
Was anyone thinking this?
ag2718 3 hours ago [-]
You're correct that this work is not very applicable for LLMs and that the focus here is primarily on latency.
RantyDave 4 hours ago [-]
Right. But ... this would limit you to either extremely small models or extremely large FPGA's, yes? If there's a simple machine learning task that requires a sub microsecond latency I can see the point but otherwise??
ag2718 4 hours ago [-]
Yes, this work is focused on accelerating very small models, typically for real-time systems that require extremely low power or low latency.
I'm not in HFT, but I assume this is also an interesting applicable domain?
UltraSane 3 hours ago [-]
The author actually works at Jane Street.
ag2718 4 hours ago [-]
Yes, definitely: this type of work is applicable in domains where software run on general-purpose processors cannot meet latency or power requirements.
4 hours ago [-]
tomrod 3 hours ago [-]
Happy to hear that KANs continue to find solid footing.
Animats 4 hours ago [-]
This guy will be hired by a high-frequency trading firm, and the next time we hear about him, he will have a net worth in 9 figures.
throwaw12 4 hours ago [-]
he is already at Jane Street
Animats 4 hours ago [-]
Of course.
ai_fry_ur_brain 11 minutes ago [-]
Sure, if they worked for 100 years maybe.. FPGA guy at jane st probably makes 600k to low seven figures... Maybe.
Not everyone in quant is a centi-millionaire, probably almost none of them in r&d actually.
I've been trying to hit 100,000tokens/s with a 3.28m dumb model, and even this is an order of magnitude too large to benefit.
It appears to be focussed more on latency, than throughput. Happy to be corrected?
One primary application of this work is in high-energy physics (https://home.cern/smarter-decisions-at-the-speed-of-collisio...). Ultrafast and real-time learning is also very applicable for problems in quantum computing, plasma control, etc. (https://arxiv.org/pdf/2602.02005).
Not everyone in quant is a centi-millionaire, probably almost none of them in r&d actually.
p.s. Thanks for posting this and welcome to HN!