Tensormesh raises USD $20 million for AI inference

Mon, 1st Jun 2026

Tensormesh has raised USD $20 million and launched its Tensormesh Inference platform, with backing from AMD Ventures, CoreWeave and NVentures.

The round extends the company's seed financing and brings total funding to USD $24.5 million. Valley Capital Partners and Laude Ventures also participated. The product is now generally available and is designed to reduce AI inference costs by reusing previously computed model data instead of processing the same prompt context from scratch each time.

Inference has become a growing cost centre for companies deploying large language models, particularly in applications that repeatedly send the same system prompts, conversation history and tool definitions with every request. Tensormesh is built around key-value, or KV, caching, which stores intermediate results so repeated input does not need to be recalculated.

Across its serverless deployments, cached input tokens are priced at USD $0. Customers can also see what is being cached, how often cached data is reused and what that means for token-level costs.

That approach differs from a broader market in which caching may happen in the background without clear disclosure to users. Tensormesh's dashboard shows cache hit rates, the share of prompt tokens served from cache and the amount saved over time.

Junchen Jiang, Co-founder and Chief Executive Officer of Tensormesh, described KV cache data as a distinct category in AI systems.

"Tensormesh offers a new vision on the significance of the intermediate data that LLMs generate when processing prompts. Behind the term KV cache is a whole concept of AI interpretation of the question it is asked. This makes it a whole new class of data and a category Tensormesh is uniquely positioned to define. We're excited to keep building," said Jiang.

Investor backing

The financing brings together investors closely tied to the hardware and cloud infrastructure used in AI workloads. Their involvement reflects growing interest in software that can reduce the compute needed for inference at a time when demand for graphics processing units remains high and costly.

Ramine Roane, Corporate Vice President of AI at AMD, linked the software model to the economics of GPU use.

"As enterprises scale AI workloads, maximizing every GPU cycle is critical. Software innovations like KV caching are a powerful complement to raw accelerator performance. Paired with AMD Instinct GPUs, Tensormesh's platform can help customers drive value from their infrastructure investments," said Roane.

CoreWeave also framed the investment around the economics of scaling AI services.

"Tensormesh is working to solve infrastructure challenges that will ultimately impact the economics and scalability of AI. Their work advancing KV caching can help make inference faster and more efficient at scale, and it reflects exactly the kind of foundational innovation CoreWeave Ventures is committed to backing," said Brannin McBee, Co-founder and Chief Development Officer at CoreWeave.

Valley Capital Partners, which already holds a board seat, said it sees KV caching as an important but underused part of AI infrastructure.

"KV caching represents one of the most consequential and underexplored opportunities in AI infrastructure today. Tensormesh has built the only platform that makes this technology production-ready for the enterprise, and we believe it will become a critical part of how every serious AI deployment is run," said Steve O'Hara, Founder and Managing Partner at Valley Capital Partners and a Tensormesh board member.

Product model

The platform is available in two forms. A serverless option gives customers API access to a catalogue of models through an OpenAI-compatible interface, while reserved deployments are aimed at organisations that want dedicated capacity and tailored support.

Customers can monitor metrics including time to first token, inter-token latency, throughput and GPU utilisation. Tensormesh also said well-optimised deployments can achieve cache hit rates above 70%, shifting a larger share of requests away from recomputation.

The software is built on LMCache, an open-source KV caching project created by the team behind Tensormesh. The company said LMCache has more than 8,000 GitHub stars and integrations with tools and services including vLLM, SGLang, TensorRT, llmp-d, NVIDIA Dynamo, AWS SageMaker and Oracle OCI Data Science.

Samsung Electronics also highlighted the role of storage in cached inference systems.

"As AI workloads grow, intelligent reuse of cached state has become one of the most powerful levers for performance and cost efficiency," said Leno Park, Vice President of NAND Product Planning at Samsung Electronics. "Tensormesh's LMCache is built to take full advantage of next-generation storage, and we look forward to our continued collaboration to push the boundaries of what's possible across the AI stack."

Open-source roots

Some of the new funding will go towards product development, deeper hardware integrations with AMD, CoreWeave and NVIDIA, and continued work on LMCache. The company was founded by researchers and alumni from the University of Chicago, the University of California, Berkeley, and Carnegie Mellon University. It is led by Jiang, who is also a University of Chicago faculty member and a co-creator of LMCache.

Industry backers said the company's open-source background has helped it stand out in a crowded infrastructure market.

"What started as a research project around KV caching is becoming a critical part of the AI stack. Tensormesh understood early that enterprises were paying AI systems to recompute the same work again and again, and built foundational infrastructure to eliminate that inefficiency and dramatically improve price-performance. The team has paired deep systems expertise with real open-source credibility to build infrastructure enterprises can actually rely on," said Pete Sonsini, Co-founder and General Partner at Laude Ventures.

Hui Zhang, Chief Technology Officer and Co-founder of Conviva and an adviser to Tensormesh, said, "Inference economics will define what is possible for the next generation of AI products. Tensormesh is tackling one of the most important challenges in AI infrastructure: helping companies reduce GPU spend without requiring changes to application code. The combination of meaningful cost savings and simple deployment is rare. It positions Tensormesh to become a critical layer in the AI infrastructure stack."

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google

Image: Hui Zhang