Monte Carlo Inference
What if we ran LLM inference as Monte Carlo paths for financial analytics workloads?
I don’t think it’s any secret that I’ve largely been an AI sceptic. I’ve certainly not been convinced by the need for the sorts of GPU capacity that the world seems intent on budling out. Though I think I’m starting to see where some of that demand might come from.
At QuantMinds this week it was suggested to me, numerous times, in conversation and in presentations that LLM inference could be treated as Monte Carlo (MC) paths. Ok I understand where this is coming from. I’ve complained myself about the lack of determinism in LLM output. Running the same inference multiple times as Monte Carlo paths certainly is one possible solution to that, I guess. Not sure how viable a solution it is (particularly in terms of cost) but it’s certainly an option.
I see two significant problems with this approach. Firstly, the economics of it. We’re used to running MC calculations where the cost of each path is incredibly cheap, often only fractions of a second long. On a CPU. This compares relatively poorly with the cost of inference on much more expensive GPU compute. Of course, it’s perfectly plausible that we may not require anywhere near as many paths. Though this brings me onto the second problem.
Running inference as MC paths and combining the output will inevitably require sharing compute resource that is occupied by very different resource sharing/ scheduling mechanisms today. Most inference frameworks are generally Kubernetes based while most risk analytics runs on high throughput schedulers such as Symphony, Grid Server or HTC Grid. Utilisation rates are already poor in the inference world, trying to solve that while also coordinating large compute demands across disparate schedulers certainly won’t help things. I’m sure we can cook up something to deal it, however it’s a problem that’s only just starting to come to light.
Honestly, I’d prefer to see some progress on solving the non determinism (see Thinking Machines) before going the brute force route. Though, if we do go down this path I can see where some of the insane demand for GPU compute might come from. I’m not sure how much of it would land in hyperscalers or AI companies though.
