1. Overview of the preliminary verification
1.1 Background and Objectives
With the rapid rise of generative AI and large language models (LLMs), demand for AI development infrastructure has been growing sharply. Traditionally, it has been considered essential to place AI computing devices (GPUs) and large-scale storage physically side by side. However, due to constraints in data center installation space, as well as diverse needs such as the desire to store computation results at a company’s own facilities, there is now a strong demand to build distributed AI infrastructure that overcomes geographical limitations.
In this proof-of-concept, we utilized the high-speed, large-capacity, and low-latency features of NTT’s next-generation communication platform, “IOWN (Innovative Optical and Wireless Network) APN (All-Photonics Network),” to conduct a preliminary verification of the technical feasibility of remote GPU–storage usage. Specifically, we performed a performance test on GMO GPU Cloud in a simulated remote environment assuming the distance between Tokyo and Fukuoka.
1.2 Roles of Each Company
GMO Internet, Inc. | Provision of GPUs and storage for GMO GPU Cloud, as well as application implementation |
NTT East Corporation | Provision of IOWN APN technology and test lines (*) |
NTT West Corporation | Provision of IOWN APN technology and test lines (*) |
QTnet, Inc. | Provision of the proof-of-concept environment within the data center (Fukuoka City, Fukuoka Prefecture) (*) |
1.3 Verification Schedule
Preliminary verification: Performance evaluation in a simulated remote environment (conducted in July 2025)
Main proof-of-concept: Connection verification between actual sites (scheduled for November–December 2025)
2. Preliminary Verification Environment and Configuration
2.1 Physical Configuration
• Location: QTnet Data Center (Fukuoka City, Fukuoka Prefecture)
• GPU: NVIDIA HGX H100
• Storage: DDN AI400X2
• Network switch: Arista 7050SX3-48YC8
• Latency adjustment device: OTN Anywhere (inserts latency equivalent to the distance between Tokyo and Fukuoka, etc.)
2.2 Network Configuration
• Connection bandwidth: 100GbE
• Inserted latency: Using OTN Anywhere to simulate latency equivalent to the Tokyo–Fukuoka distance (0–40 ms)
2.3 Method of Constructing the Simulated Remote Environment
Instead of physically installing servers at remote sites, a latency adjustment device, “OTN Anywhere,” was installed within the data center in Fukuoka City, Fukuoka Prefecture. By inserting communication latency equivalent to the physical distance between Tokyo and Fukuoka, etc., a virtual remote environment was constructed.
3. Preliminary Verification Scenarios
3.1 Test Workloads
In this proof-of-concept, two representative tasks in AI development were executed to evaluate the performance impact of using remote storage.
3.1.1 Image Classification Task: MLPerf® Training Round 4.0 ResNet (*1, hereafter referred to as ResNet)
• Benchmark: ResNet (Residual Neural Network)
• Feature: Executes loading and processing of the ImageNet dataset (containing approximately 1.28 million training images)
• Evaluation metric: Training time required to reach the target accuracy
3.1.2 Large Language Model Processing Task: MLPerf® Training Round 4.1 Llama2 70B (*2, hereafter referred to as Llama2)
• Benchmark: Llama (Large Language Model Meta AI) 2 70B
• Feature: Executes training on the Llama2 70B model itself (approximately 130GB)
• Evaluation metric: Training time required to reach the target accuracy
3.2 Latency Condition Settings
Each task was executed under the following round-trip latency conditions to measure performance impact:
Baseline: 0 ms (equivalent to adjacent placement)
Medium-distance remote: 15 ms (equivalent to Tokyo–Fukuoka, approx. 1,000 km)
Long-distance remote: 20 ms (equivalent to Tokyo–Okinawa, approx. 1,400 km)
Ultra-long-distance remote: 30 ms (equivalent to Tokyo–Taipei, approx. 2,100 km)
Extreme-long-distance remote: 40 ms (equivalent to Tokyo–Manila, approx. 2,700 km)
4. Experimental Results
4.1 Results of the ResNet Image Classification Task
Latency Condition | Benchmark Score (minutes) (*1) |
---|---|
0 ms | 13.80 min |
15 ms | 15.55 min |
20 ms | 15.95 min |
30 ms | 17.52 min |
40 ms | 19.09 min |
4.2 Results of the Llama Large Language Model Processing Task
Latency Condition | Benchmark Score (minutes) (*2) |
---|---|
0 ms | 24.87 min |
15 ms | 24.94 min |
20 ms | 24.95 min |
30 ms | 25.01 min |
40 ms | 25.07 min |
5. Discussion and Analysis
5.1 Performance Impact Analysis
In both the image classification task and the large language model processing task, a tendency was observed where benchmark scores worsened as latency conditions increased. This confirmed that the impact of using remote storage could be virtually measured and observed.
On the other hand, there was a significant difference between the two tasks in terms of the degree of impact on benchmark scores. A detailed analysis of the benchmark implementations and processing contents suggests that this difference depends on the characteristics of each benchmark—particularly the frequency of I/O operations to remote storage within the benchmark. It can therefore be inferred that certain processes are more susceptible (or less susceptible) to latency conditions.
5.2 ResNet Image Classification Task
After the benchmark starts, the ImageNet dataset is loaded into GPU memory, and the change in transfer performance from storage due to increasing latency conditions (ms) is reflected in the decline of benchmark scores. On the other hand, because the dataset is preprocessed—using standard AI training methods—into a single file format suitable for training from the raw data (approximately 1.28 million training images), the degree of performance decline was around 12% under the Tokyo–Fukuoka assumption (15 ms).
5.3 Llama Large Language Model Processing Task
Before the benchmark measurement begins, the large language model and related data are already loaded into GPU memory. After the measurement starts, most processes are completed primarily through GPU computation, with far fewer I/O operations to storage compared to the ResNet image classification task. As a result, the degree of benchmark score degradation was extremely minimal.
5.4 Summary
In this verification, we evaluated the performance impact of remote storage usage by applying simulated latency conditions to image classification tasks and large language model processing tasks.
As a result, in both tasks we confirmed changes in benchmark scores as latency increased, demonstrating that latency effects originating from remote storage can indeed be observed. The intended latency impact was successfully measured, and under the Tokyo–Fukuoka equivalent latency condition set in this verification, the performance decline was approximately 12%. NVIDIA HGX H100 GPUs are reported to deliver up to four times faster AI training performance compared to the previous generation A100 GPUs (*3). Taking this performance improvement rate into account, a 12% performance decline in training is considered acceptable.
Based on these findings, the significance of continuing this verification was confirmed, and we will proceed with proof-of-concept testing of IOWN APN connections between actual sites, with the aim of verifying the latency reduction benefits achieved by replacing conventional networks with IOWN APN.
6. Future Developments
6.1 Planned Proof-of-Concept (November–December 2025)
• Implementation: Verification using actual IOWN APN lines between Tokyo and Fukuoka
• Comparison target: Performance comparison with conventional Ethernet leased lines
• Evaluation items: Practicality assessment for commercial implementation
6.2 Future Vision for Social Implementation
With the success of this proof-of-concept, the following social implementations are expected:
Realization of a distributed AI cloud: Optimal allocation of AI resources on a nationwide scale
Enhanced disaster resilience: Ensuring business continuity through distributed deployment
Realization of a new social network infrastructure: Broad deployment of IOWN APN (NTT East and NTT West’s “All-Photonics Connect powered by IOWN”)
※1 Unverified MLPerf® Training Round 4.0 Closed Resnet offline. Result not verified by MLCommons Association.
※2 Unverified MLPerf® Training Round 4.1 Closed Llama2 70B offline. Result not verified by MLCommons Association.
The MLPerf name and logo are registered and unregistered trademarks of MLCommons Association in the United States and other countries. All rights reserved. Unauthorized use strictly prohibited. See www.mlcommons.org for more information.”
※3 NVIDIA H200 Tensor コアGPU (https://www.nvidia.com/ja-jp/data-center/h100/)