← Back to Papers

CarbonCall: Sustainability-Aware Function Calling for Large Language Models on Edge Devices

Varatheepan Paramanayakam, Andreas Karatzas, Iraklis Anagnostopoulos, Dimitrios Stamoulis
arXiv.org | 2025
CarbonCall is a sustainability-aware function-calling framework for LLMs on edge devices that dynamically adapts model variants and power thresholds based on real-time carbon intensity forecasts to reduce emissions without sacrificing throughput. It addresses the gap between LLM performance optimization and environmental sustainability in edge AI deployments.

Problem Statement

LLMs deployed on edge devices for real-time function calling incur significant computational overhead, driving high power consumption and carbon emissions that existing frameworks ignore entirely. Current function-calling optimization methods focus solely on accuracy and latency, leaving energy-constrained edge environments underserved. This creates a growing sustainability problem as edge AI proliferates without any carbon-aware scheduling or model adaptation.

Key Novelty

  • Carbon-aware execution loop that integrates real-time carbon intensity forecasts to dynamically adjust power thresholds during LLM function calling
  • Dynamic switching between quantized LLM variants (e.g., different quantization levels) to maintain high tokens-per-second throughput under power constraints
  • End-to-end sustainability-aware function-calling framework specifically designed and evaluated for edge hardware (NVIDIA Jetson AGX Orin)

Evaluation Highlights

  • CarbonCall reduces carbon emissions by up to 52% and power consumption by up to 30% on NVIDIA Jetson AGX Orin compared to non-sustainability-aware baselines
  • Execution time is reduced by up to 30% while maintaining high tokens-per-second throughput, demonstrating that sustainability and performance are not mutually exclusive

Breakthrough Assessment

6/10 CarbonCall is a solid and timely contribution that bridges an important gap between LLM deployment efficiency and environmental sustainability on edge devices. However, it is largely a systems integration advance combining existing techniques (quantization, carbon forecasting, dynamic scheduling) rather than a fundamental algorithmic breakthrough.

Methodology

  1. Monitor real-time carbon intensity signals and translate them into dynamic power budgets/thresholds for the edge device during inference
  2. Select and switch between pre-quantized LLM variants (differing in quantization level/size) at runtime to stay within the current power budget while maximizing throughput
  3. Execute function-calling tasks using the selected model variant, measuring tokens-per-second, power draw, and carbon footprint, and feeding results back into the adaptive scheduling loop

System Components

Carbon Intensity Monitor

Ingests real-time carbon intensity forecasts from external or local sources to determine the current 'greenness' of available compute power

Dynamic Power Threshold Manager

Translates carbon intensity signals into actionable power caps that constrain the LLM execution environment on the edge device

Quantized LLM Variant Selector

Maintains a pool of quantized model variants and dynamically switches between them to match the current power budget while sustaining high tokens-per-second

Sustainability-Aware Function Caller

Orchestrates tool/function selection and execution within the carbon- and power-constrained LLM inference pipeline on edge hardware

Results

Metric Baseline (Standard Function Calling) CarbonCall Delta
Carbon Emissions Baseline level Up to 52% lower -52%
Power Consumption Baseline level Up to 30% lower -30%
Execution Time Baseline level Up to 30% lower -30%
Tokens-per-Second Throughput Baseline level Maintained high efficiency ~0% degradation

Key Takeaways

  • Edge AI deployments can significantly reduce their carbon footprint (up to 52%) by integrating real-time carbon intensity forecasting into LLM inference scheduling without a major performance penalty
  • Maintaining a suite of quantized model variants and switching between them dynamically is a practical and effective strategy for balancing sustainability and throughput on resource-constrained hardware like NVIDIA Jetson
  • Sustainability should be treated as a first-class constraint alongside latency and accuracy in LLM system design, especially as edge AI scales — CarbonCall provides a concrete framework architecture for doing so

Abstract

Large Language Models (LLMs) enable real-time function calling in edge AI systems but introduce significant computational overhead, leading to high power consumption and carbon emissions. Existing methods optimize for performance while neglecting sustainability, making them inefficient for energy-constrained environments. We introduce CarbonCall, a sustainability-aware function-calling framework that integrates dynamic tool selection, carbon-aware execution, and quantized LLM adaptation. CarbonCall adjusts power thresholds based on real-time carbon intensity forecasts and switches between model variants to sustain high tokens-per-second throughput under power constraints. Experiments on an NVIDIA Jetson AGX Orin show that CarbonCall reduces carbon emissions by up to 52%, power consumption by 30%, and execution time by 30%, while maintaining high efficiency.

Generated on 2026-03-02 using Claude