Multi-Cloud Setup and Usage Guide

Overview

AIOSx supports multi-cloud orchestration across AWS, Azure, GCP, CoreWeave, and private datacenters. The UAICP cloud layer provides unified abstraction for managing workloads across all cloud providers.

Configuration

1. Cloud Provider Credentials

Edit config/cloud_providers.yaml and set credentials for each provider:

yaml
cloud_providers:
  aws:
    enabled: true
    credentials:
      access_key_id: "${AWS_ACCESS_KEY_ID}"
      secret_access_key: "${AWS_SECRET_ACCESS_KEY}"
      region: "us-east-1"
  
  azure:
    enabled: true
    credentials:
      subscription_id: "${AZURE_SUBSCRIPTION_ID}"
      tenant_id: "${AZURE_TENANT_ID}"
      client_id: "${AZURE_CLIENT_ID}"
      client_secret: "${AZURE_CLIENT_SECRET}"
  
  gcp:
    enabled: true
    credentials:
      project_id: "${GCP_PROJECT_ID}"
      service_account_key: "${GCP_SERVICE_ACCOUNT_KEY}"
  
  coreweave:
    enabled: true
    credentials:
      api_key: "${COREWEAVE_API_KEY}"
      api_url: "${COREWEAVE_API_URL}"
  
  private:
    enabled: true
    credentials:
      endpoint: "${PRIVATE_DC_ENDPOINT}"
      api_key: "${PRIVATE_DC_API_KEY}"

2. Environment Variables

Set the following environment variables:

bash
export AWS_ACCESS_KEY_ID="your-aws-key"
export AWS_SECRET_ACCESS_KEY="your-aws-secret"
export AZURE_SUBSCRIPTION_ID="your-azure-subscription"
export AZURE_TENANT_ID="your-azure-tenant"
export AZURE_CLIENT_ID="your-azure-client-id"
export AZURE_CLIENT_SECRET="your-azure-client-secret"
export GCP_PROJECT_ID="your-gcp-project"
export GCP_SERVICE_ACCOUNT_KEY="path-to-service-account-key.json"
export COREWEAVE_API_KEY="your-coreweave-key"
export COREWEAVE_API_URL="https://api.coreweave.com"
export PRIVATE_DC_ENDPOINT="https://your-datacenter.com/api"
export PRIVATE_DC_API_KEY="your-datacenter-key"

Usage

Listing Available Clouds

python
# Via API
GET /uaicp/clouds

# Response
{
  "providers": ["aws", "azure", "gcp", "coreweave", "private"],
  "regions": [...],
  "total_providers": 5,
  "total_regions": 15
}

Routing Workloads

python
# Route workload to optimal cloud
POST /uaicp/workloads/route
{
  "task_id": "task_123",
  "task_type": "llm_inference",
  "domain": "llm",
  "requirements": {
    "cloud_preference": "aws",  # Optional
    "preferred_region": "us-east-1",
    "cost_constraint": 5.0,
    "latency_slo_ms": 500
  }
}

Workload Router

The workload router automatically selects the best cloud based on:

Cost: Selects cloud with lowest cost within constraints
Latency: Prefers clouds with lower latency
Availability: Only considers available resources
Region: Honors preferred region if specified

Manual Cloud Selection

python
from aiosx.uaicp.cloud.cloud_registry import CloudRegistry
from aiosx.uaicp.cloud.aws_provider import AWSProvider

# Register provider
cloud_registry = CloudRegistry()
aws_provider = AWSProvider()
cloud_registry.register_provider(aws_provider)

# List resources
resources = await cloud_registry.list_all_resources(
    provider="aws",
    region="us-east-1",
    resource_type="gpu"
)

# Find best resource
best_resource = await cloud_registry.find_best_resource(
    resource_type="gpu",
    requirements={
        "max_cost_per_hour": 5.0,
        "preferred_region": "us-east-1"
    }
)

Cloud-Specific Features

AWS

EC2 instances
EKS clusters
Lambda functions
SageMaker endpoints

Azure

AKS clusters
Azure Functions
Azure ML

GCP

GKE clusters
Cloud Functions
Vertex AI

CoreWeave

GPU-focused infrastructure
Competitive GPU pricing
High availability

Private Datacenters

On-premises infrastructure
Low latency
Cost-effective for high-volume workloads

Best Practices

Multi-Region: Deploy across multiple regions for high availability
Cost Optimization: Use workload router to automatically select cost-effective clouds
Latency: Route latency-sensitive workloads to nearest region
Redundancy: Use distributed execution strategy for critical workloads
Monitoring: Monitor costs and performance across all clouds

Troubleshooting

Provider Not Available

If a cloud provider is not available:

Check credentials in config/cloud_providers.yaml
Verify environment variables are set
Check provider-specific API status
Review logs for authentication errors

Resource Allocation Failures

If resource allocation fails:

Check provider quotas
Verify region availability
Review cost constraints
Check resource availability

High Costs

To reduce costs:

Use private datacenters for high-volume workloads
Leverage spot instances where available
Optimize resource allocation with ROI engine
Use cost constraints in workload routing

MULTI_CLOUD_GUIDE