AIOS DNA

MULTI_CLOUD_GUIDE

Multi-Cloud Setup and Usage Guide

Overview

AIOSx supports multi-cloud orchestration across AWS, Azure, GCP, CoreWeave, and private datacenters. The UAICP cloud layer provides unified abstraction for managing workloads across all cloud providers.

Configuration

1. Cloud Provider Credentials

Edit config/cloud_providers.yaml and set credentials for each provider:

yaml
cloud_providers:
aws:
enabled: true
credentials:
access_key_id: "${AWS_ACCESS_KEY_ID}"
secret_access_key: "${AWS_SECRET_ACCESS_KEY}"
region: "us-east-1"
azure:
enabled: true
credentials:
subscription_id: "${AZURE_SUBSCRIPTION_ID}"
tenant_id: "${AZURE_TENANT_ID}"
client_id: "${AZURE_CLIENT_ID}"
client_secret: "${AZURE_CLIENT_SECRET}"
gcp:
enabled: true
credentials:
project_id: "${GCP_PROJECT_ID}"
service_account_key: "${GCP_SERVICE_ACCOUNT_KEY}"
coreweave:
enabled: true
credentials:
api_key: "${COREWEAVE_API_KEY}"
api_url: "${COREWEAVE_API_URL}"
private:
enabled: true
credentials:
endpoint: "${PRIVATE_DC_ENDPOINT}"
api_key: "${PRIVATE_DC_API_KEY}"

2. Environment Variables

Set the following environment variables:

bash
export AWS_ACCESS_KEY_ID="your-aws-key"
export AWS_SECRET_ACCESS_KEY="your-aws-secret"
export AZURE_SUBSCRIPTION_ID="your-azure-subscription"
export AZURE_TENANT_ID="your-azure-tenant"
export AZURE_CLIENT_ID="your-azure-client-id"
export AZURE_CLIENT_SECRET="your-azure-client-secret"
export GCP_PROJECT_ID="your-gcp-project"
export GCP_SERVICE_ACCOUNT_KEY="path-to-service-account-key.json"
export COREWEAVE_API_KEY="your-coreweave-key"
export COREWEAVE_API_URL="https://api.coreweave.com"
export PRIVATE_DC_ENDPOINT="https://your-datacenter.com/api"
export PRIVATE_DC_API_KEY="your-datacenter-key"

Usage

Listing Available Clouds

python
# Via API
GET /uaicp/clouds
# Response
{
"providers": ["aws", "azure", "gcp", "coreweave", "private"],
"regions": [...],
"total_providers": 5,
"total_regions": 15
}

Routing Workloads

python
# Route workload to optimal cloud
POST /uaicp/workloads/route
{
"task_id": "task_123",
"task_type": "llm_inference",
"domain": "llm",
"requirements": {
"cloud_preference": "aws", # Optional
"preferred_region": "us-east-1",
"cost_constraint": 5.0,
"latency_slo_ms": 500
}
}

Workload Router

The workload router automatically selects the best cloud based on:

  1. Cost: Selects cloud with lowest cost within constraints
  2. Latency: Prefers clouds with lower latency
  3. Availability: Only considers available resources
  4. Region: Honors preferred region if specified

Manual Cloud Selection

python
from aiosx.uaicp.cloud.cloud_registry import CloudRegistry
from aiosx.uaicp.cloud.aws_provider import AWSProvider
# Register provider
cloud_registry = CloudRegistry()
aws_provider = AWSProvider()
cloud_registry.register_provider(aws_provider)
# List resources
resources = await cloud_registry.list_all_resources(
provider="aws",
region="us-east-1",
resource_type="gpu"
)
# Find best resource
best_resource = await cloud_registry.find_best_resource(
resource_type="gpu",
requirements={
"max_cost_per_hour": 5.0,
"preferred_region": "us-east-1"
}
)

Cloud-Specific Features

AWS

  • EC2 instances
  • EKS clusters
  • Lambda functions
  • SageMaker endpoints

Azure

  • AKS clusters
  • Azure Functions
  • Azure ML

GCP

  • GKE clusters
  • Cloud Functions
  • Vertex AI

CoreWeave

  • GPU-focused infrastructure
  • Competitive GPU pricing
  • High availability

Private Datacenters

  • On-premises infrastructure
  • Low latency
  • Cost-effective for high-volume workloads

Best Practices

  1. Multi-Region: Deploy across multiple regions for high availability
  2. Cost Optimization: Use workload router to automatically select cost-effective clouds
  3. Latency: Route latency-sensitive workloads to nearest region
  4. Redundancy: Use distributed execution strategy for critical workloads
  5. Monitoring: Monitor costs and performance across all clouds

Troubleshooting

Provider Not Available

If a cloud provider is not available:

  1. Check credentials in config/cloud_providers.yaml
  2. Verify environment variables are set
  3. Check provider-specific API status
  4. Review logs for authentication errors

Resource Allocation Failures

If resource allocation fails:

  1. Check provider quotas
  2. Verify region availability
  3. Review cost constraints
  4. Check resource availability

High Costs

To reduce costs:

  1. Use private datacenters for high-volume workloads
  2. Leverage spot instances where available
  3. Optimize resource allocation with ROI engine
  4. Use cost constraints in workload routing

Was this helpful?