Michael Wu is GM and President of Phison Technology Inc. (USA), a leading provider of NAND controllers and NAND storage solutions.
Enterprises have embraced generative AI (GenAI) faster than any prior technology wave. According to McKinsey researchers, more than three-quarters of organizations now “use AI in at least one business function.” But as adoption broadens, the challenge has shifted from experimentation to economics. Finance teams are under pressure to bring clarity to an AI cost structure that cuts across cloud contracts, GPUs and storage arrays.
At the same time, energy has become a material line item. The International Energy Agency (IEA) estimates that data center electricity use reached roughly “415 terawatt hours (TWh), or about 1.5% of global electricity consumption in 2024,” and that usage could double by 2030. Specialized servers that power AI are increasing power usage by about 30% annually, a rate more than three times that of standard servers. Cooling alone can account for more than 30% of that total. The combination of expanding models, heavier inference loads and rising energy prices is forcing executives to ask a fundamental question: “How do we measure the cost of a token and the efficiency of the infrastructure that produced it?”
A Finance-Ready Frame For AI Efficiency
Traditional cost models stop at compute or storage capacity. GenAI requires a more granular, physics-aware view that captures the flow of energy as well as data. A practical framework links model economics directly to storage and input/output (I/O) metrics that can be tracked, audited and compared:
• $/TB (terabyte)-month for capacity utilization
• Watt-hours (Wh)/TB moved for I/O efficiency
• Wh/token for inference cost
• Endurance-adjusted $/TBW (terabytes written) for life cycle value
Together, these metrics form a bill of materials for AI operations. They allow finance leaders to express model performance and cost in the same dollar unit by translating watt-hours into local electricity prices ($/kWh). This enables apples-to-apples comparison across cloud, on-prem and hybrid stacks.
Recent research from the EuroMLSys 2025 conference suggests energy per token as a key performance indicator for inference, which could complement traditional accuracy benchmarks. The study found that inference, not training, often dominates total energy use, sometimes by more than an order of magnitude in real-time workloads. Larger models can even prove more energy-efficient than smaller ones, once excessive “reasoning steps” are factored out. These insights underscore the need for standardized unit economics that tie power, performance and precision together.
Storage Is The New Multiplier In The AI Cost Equation
Although storage typically accounts for around 5% of a data center’s total electricity consumption, it exerts a much bigger influence on overall efficiency. Stalled data can quickly result in wasted GPU cycles, extending idle time, cooling cycles and total watt-hours per inference. Conversely, efficient data pipelines allow accelerators to stay fully utilized, minimizing wasted energy.
This is why many organizations are re-architecting their storage mix across cloud and edge, balancing performance, endurance and density. A finance-aligned framework can reveal whether a given change in workload mix or NAND pricing actually enhances or diminishes unit economics.
The Energy Behind Every Token
The next wave of AI efficiency will hinge on visibility. Energy-aware reporting at the workload level, through kilowatt-hours per query, per token or per terabyte moved, will give finance teams the missing transparency that current AI cost models lack.
Such insight will also help reconcile divergent industry forecasts. IDC researchers expect global AI infrastructure spending to exceed $200 billion by 2028, with storage investments climbing 18% year-over-year. Gartner analysts project $644 billion in GenAI spending by the end of 2025, with nearly 80% of that tied to hardware. Yet, these figures rarely clarify how much of that spend translates to useful compute versus wasted watts.
By integrating metrics like energy per token and endurance-adjusted $/TBW into regular financial reporting, enterprises can capture a truer picture of ROI—one that incorporates the cost of energy, depreciation and utilization into a single comparable measure.
Turning Cost Into Competitive Advantage
For finance leaders, this approach reframes storage not as a sunk cost but as a controllable lever of AI performance. In the same way modern supply-chain leaders track cost per unit shipped, CFOs can now track cost per token generated. Standardized metrics let teams evaluate whether scaling a model, switching a cloud provider or upgrading to higher-endurance SSDs improves the organization’s overall cost-to-insight ratio.
The benefits extend beyond efficiency. Organizations that adopt measurable, energy-aware AI economics will be better positioned for emerging energy-efficiency reporting requirements and for internal chargeback models that reward sustainable computing. They’ll also be able to forecast AI costs with greater precision, which will become increasingly critical as workloads multiply across functions and geographies.
Ultimately, the enterprise value of AI won’t hinge on model size alone. It will depend on the efficiency of every watt, every byte and every terabyte written. The companies that master that math will not only scale AI faster, but they’ll do it with balance sheets that make sense.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?
















