AI-Enabled Windows 365 Cloud PCs – Full Automation with PowerShell (Graph REST API – Part 1)

9 Dec

Microsoft recently announced AI-enabled Windows 365 Cloud PCs as part of the Frontier Preview program. These Cloud PCs bring Copilot+ PC features like Improved Windows Search and Click to Do to virtualized environments, without requiring local NPU hardware.

In this blog post, I will demonstrate how to fully automate the deployment of AI-enabled Cloud PCs using PowerShell and Microsoft Graph REST APIs. This includes:

  • Creating a Provisioning Policy
  • Creating a Cloud PC Configuration with AI features enabled
  • Assigning policies to Entra ID groups
  • Configuring Windows Insider Beta Channel enrollment (GUI Based)

What are AI-Enabled Cloud PCs?

AI-enabled Cloud PCs deliver integrated Windows AI experiences to any device in any location. They combine the power of Windows 365 with AI acceleration, offering:

  • Improved Windows Search: Semantic search using natural language queries across local files and OneDrive
  • Click to Do: Instant actions on highlighted text or images (Windows+Q or Windows+Click)
  • Enterprise Security: All AI processing remains within the customer’s trusted cloud boundary

Cloud PC Requirements

RequirementsValue
vCPU8 vCPU (minimum)
RAM32 GB (minimum)
Storage256 GB (minimum)
OS VersionWindows 11 Enterprise 24H2
Windows InsiderBeta Channel enrollment required
Supported RegionWest US 2, West US 3, East US, East US 2, Central US, Central India, South East Asia, Australia East, UK South, West Europe, North Europe
PowerShellOpen PowerShell on the Cloud PC with admin privileges (Run as Administrator) 
Run the following command: Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser

The API Discovery

While the Provisioning Policy API is well documented, the Cloud PC Configuration settings (including AI enablement) use an endpoint. By analyzing the Intune admin center network traffic, I discovered the following API:

Endpoint: POST /beta/deviceManagement/virtualEndpoint/settingProfiles

The key setting definition ID for AI enablement is:

W365.CloudPCConfiguration.AI.IsEnabled

Prerequisites

  1. App Registration in Entra ID with the following API permissions (admin consented):
  2. CloudPC.ReadWrite.All
  3. DeviceManagementConfiguration.ReadWrite.All
  4. Group.Read.All
  5. Windows 365 Enterprise licenses (8vCPU/32GB/256GB or higher)
  6. Entra ID Security Group for target users
  7. Users registered with the Windows Insider Program

PowerShell Script: Full Automation

The following PowerShell script automates the entire AI-enabled Cloud PC deployment. It creates:

  • A Cloud PC Configuration profile with AI features enabled
  • A Provisioning Policy with the correct image and region
  • Assignments to your specified Entra ID group

Configuration Section

Update the following variables with your tenant-specific values:

# ==========================

# CONFIGURATION - UPDATE THESE VALUES

# ==========================

$TenantId     = "<Your-Tenant-ID>"

$ClientId     = "<Your-App-Client-ID>"

$ClientSecret = "<Your-Client-Secret>"

$GroupId      = "<Your-Entra-Group-ID>"

$RegionName   = "australiaeast"  # Change to your preferred region

Note: The complete script is provided at the end of this post and is also available on GitHub.

Step-by-Step Breakdown

Step 1: Authentication

The script authenticates using OAuth 2.0 client credentials flow to obtain an access token for Microsoft Graph API.

$TokenEndpoint = "https://login.microsoftonline.com/$TenantId/oauth2/v2.0/token"

$tokenForm = @{

    client_id     = $ClientId

    client_secret = $ClientSecret

    scope         = "https://graph.microsoft.com/.default"

    grant_type    = "client_credentials"

}

Step 2: Create Cloud PC Configuration (AI-Enabled)

This is the key discovery – the Cloud PC Configuration uses the undocumented settingProfiles endpoint:

$configBody = @{

    displayName  = "AI-Enabled-CloudPC-Config"

    description  = "AI features enabled for Cloud PCs"

    profileType  = "template"

    templateId   = "W365.CloudPCConfiguration"

    settings     = @(

        @{

            "@odata.type"       = "#microsoft.graph.cloudPcBooleanSetting"

            dataType            = "boolean"

            settingDefinitionId = "W365.CloudPCConfiguration.AI.IsEnabled"

            platform            = "all"

            isEnabled           = $true

        }

    )

    assignments  = @(@{ groupId = $GroupId; assignType = "group" })

}

Step 3: Create Provisioning Policy

The provisioning policy defines the Cloud PC specifications. For AI features, you need the 8vCPU/32GB configuration:

$policyBody = @{

    "@odata.type"           = "#microsoft.graph.cloudPcProvisioningPolicy"

    displayName             = "AI-Enabled-ProvPolicy"

    description             = "Provisioning policy for AI-enabled Cloud PCs"

    provisioningType        = "dedicated"

    managedBy               = "windows365"

    imageId                 = "microsoftwindowsdesktop_windows-ent-cpc_win11-24h2-ent-cpc-m365"

    imageType               = "gallery"

    enableSingleSignOn      = $true

    domainJoinConfigurations = @(

        @{ type = "azureADJoin"; regionName = $RegionName }

    )

    windowsSettings         = @{ language = "en-US" }

}

Step 4: Assign Provisioning Policy to Group

After creating the provisioning policy, assign it to your Entra ID security group:

$assignBody = @{

    assignments = @(

    )

        @{

            target = @{

                "@odata.type" = "#microsoft.graph.cloudPcManagementGroupAssignmentTarget"

                groupId       = $GroupId

            }

        }

}

Windows Insider Beta Channel Enrollment

For AI features to activate, Cloud PCs must be enrolled in the Windows Insider Beta Channel. This can be done at scale using Intune Update Rings.

Manual Enrollment (Per Device)

  • Open Settings in the Cloud PC
  • Navigate to Windows Update > Windows Insider Program
  • Click Get started and sign in with Microsoft account or Entra ID
  • Select Beta Channel (Recommended)

Bulk Enrollment via Intune Update Ring

For enterprise deployments, use Intune Update Rings to enroll devices at scale:

  • Sign in to Microsoft Intune admin center
  • Navigate to Devices > Windows > Update rings for Windows 10 and later
  • Create or edit an update ring
  • Set Enable pre-release builds = Yes
  • Set Pre-release channel = Beta Channel
  • Assign to your Cloud PC security group

Complete PowerShell Script

Below is the complete, ready-to-use PowerShell script. Copy this into your PowerShell environment, update the configuration variables, and run.

GitHub Repository: avdwin365mem/aienabledcloudpc at main · askaresh/avdwin365mem

What’s next (Part 2)

We shall validate the AI features within the Cloud PC. Note: I need the higher 8 vCPU/16GB RAM version, and I am still awaiting access. Before the part 2 gets released if you cant wait dont forget to checkout the AI Cloud PC features that Dieter has blog post – Windows 365 blog by Dieter Kempeneers

I hope you find this helpful information for enabling the new AI features in Windows 365 Cloud PC using PowerShell. If I have missed any steps or details, I will be happy to update the post.

Thanks,
Aresh Sarkari

Copilot in Intune for Windows 365 – A New Era of Cloud PC Management (Step by step guide)

27 Oct

If you manage Windows 365 Cloud PCs, there’s good news — Copilot in Intune now supports Windows 365, and it’s generally available!
This new capability brings the power of AI directly into the Intune admin center, helping IT pros quickly understand, troubleshoot, and optimize their Cloud PC environments through natural-language conversations. Let’s break down what this means in simple terms — and how you can make the most of it.

Please note that when you enable Copilot in Intune, it will blow up your credits like no tomorrow. I exhausted $150 in under 24 hours. Please be careful when performing testing and validation.

What Is Copilot in Intune?

Think of Copilot in Intune as your AI-powered assistant/MCP/Agent inside the Intune portal. Instead of digging through dashboards or complex reports, you can simply type questions like:

  • “Show me my Enterprise Cloud PC licenses.”
  • “Analyze trends in bandwidth performance.”
  • “Summarize Cloud PCs that have never been used.”

Copilot then reads your organization’s Windows 365 data (based on your role and permissions) and returns insights — instantly.

It’s built to save IT admins time, surface actionable information, and make complex management tasks feel as easy as chatting with a colleague.

Getting Started: Enabling Copilot for Windows 365

Before using Copilot’s Windows 365 features, make sure Copilot in Microsoft Intune is enabled.
Then, confirm that the Windows 365 plug-in is turned on in the Security Copilot portal:

  1. Open the Security Copilot portal. (https://securitycopilot.microsoft.com)
  2. In the prompt bar, click the Sources icon (you’ll see it on the right side).
  3. In the Manage sources pane, toggle on Windows 365.

That’s it! Once connected, your Copilot chat experience in Intune will be able to access your organization’s Windows 365 data securely — respecting role-based access controls (RBAC) and scope tags.

What You Can Do with Copilot in Intune for Windows 365

This integration is designed to give IT professionals faster insight into four key areas:

1. Cloud PC Performance Optimization

Copilot analyzes performance data and highlights Cloud PCs that may need resizing — whether they’re overpowered (wasting cost) or under-spec’d (affecting user experience).
It even suggests configuration changes and provides trend analysis so you can act proactively.

2. User Experience Insights

Having connection issues? Ask Copilot to identify regions or user groups experiencing latency, bandwidth drops, or connection instability.
It can summarize performance trends and pinpoint whether problems are widespread or isolated — perfect for diagnosing issues before they escalate.

3. License and Cost Optimization

Licenses aren’t cheap — and unused Cloud PCs can quietly eat into budgets.
Copilot identifies underutilized or inactive Cloud PCs, helping you reallocate licenses efficiently. You’ll get summaries of usage patterns, device age, and connection history — all within your chat results.

4. Cloud PC Management Assistance

Need to troubleshoot provisioning or grace-period issues?
Copilot automatically scans for common causes, provides diagnostic context (like provisioning errors or expiration dates), and links directly to remediation resources. You can even analyze up to 10 Cloud PCs in bulk, saving hours of manual work.

Real-World Prompts You Can Try

Here are some examples you can copy directly into your Copilot chat:

CategoryExample Prompts
Availability“Analyze unavailable Cloud PCs”
“Summarize Cloud PCs that cannot connect by region”
Connection Quality“Show regions with increasing Cloud PC latency”
“Show Cloud PCs experiencing low bandwidth”
Licensing“Summarize my Cloud PC license inventory”
“Show me my Frontline Cloud PC licenses”
Utilization“Summarize Cloud PCs that have never been used”
“Show Cloud PCs that are underutilized”
Performance“Summarize performance of my Cloud PCs”
“Show me Cloud PCs that are candidates for downgrading”

Each query runs in the context of your organization’s Windows 365 data — giving results that are accurate, relevant, and scoped to your permissions.

I could have shown you many more examples – I ran out of credits 🙂

Why It Matters

With Copilot in Intune for Windows 365, IT admins can move from reactive monitoring to proactive management. Instead of sifting through logs or building reports, you can simply ask Copilot — and act on data-driven insights right away.

This not only boosts efficiency but also helps improve end-user experience, optimize license usage, and strengthen overall cloud resource management.

Business standpoint – You don’t need highly skilled resources to skim through the logs, various dashboards and widgets to get a better understanding. Anyone who can ask the questions should be able to retrieve the information.

Final Thoughts

Microsoft’s vision for AI-assisted IT administration is becoming clearer — and Copilot in Intune for Windows 365 is a perfect example of that. It’s not just a fancy chatbot; it’s a practical, data-driven assistant that brings clarity, automation, and intelligence to everyday Cloud PC management.

If you’re an IT admin managing Windows 365, now’s the time to try it out. Head over to the Intune admin center, enable Copilot, and start asking questions — your Cloud PCs will thank you!

I hope this information helps you enable Copilot for W365. If I have missed any steps or details, I will be happy to update the post.

Thanks,
Aresh Sarkari

Windows 365 Cloud Apps — Publishing Apps (Part 2)

29 Sep

In Part 1 we built the provisioning policy and wired it to a group with size/capacity. In Part 2, we’ll actually publish the apps, tweak their details, undo changes when needed, and explain how licensing & concurrency work (with a simple diagram).

Where we work: All Cloud Apps

Once your first Frontline Cloud PC (Shared mode) finishes provisioning, the image’s Start-menu apps appear in Windows 365 → All Cloud Apps as Ready to publish.

You can: Publish, Edit, Reset, and Unpublish apps here. Deletion is tied to the policy assignment (more on that below).

Publish an app (All Cloud Apps)

  1. Intune Admin CenterDevicesWindows 365All Cloud Apps
  2. Pick one or more apps (Word, Excel, PowerPoint and Edge) with status Ready to publishPublish
  3. Watch the status flow:
    • Ready to publishPublishingPublished
  4. Once Published, the app appears in Windows App for all users assigned to the provisioning policy.
  • All Cloud Apps list (Ready → Publishing → Published)

If an app shows Failed: Unpublish it, then publish again. Check that the Start-menu shortcut on your image is valid (path/command still exists).

Edit an app (safe, instant updates)

For a published or ready app, select Edit to adjust:

  • Display name
  • Description
  • Command line (e.g., parameters)
  • Icon path index

Changes inherit scope tags & assignment from the provisioning policy, and updates are immediate in Windows App.

  • Edit dialog (name/description/command/icon index)

Reset an app (rollback to discovered state)

If you went too far with edits, use Reset to revert back to whatever was discovered from the image originally (name/icon/command). Great for quick experiments.

  • Reset confirmation

Unpublish (and how “delete” works)

  • Unpublish: App status goes Published → Ready to publish and the app disappears from Windows App. Its edited details are reset.
  • Delete: There isn’t a “delete app” button—Cloud Apps are discovered from the image. To truly remove an app from scope, remove the provisioning policy’s assignment (or update the image so the Start-menu shortcut no longer exists).
  • Unpublish action

Accessing apps (Windows App)

Users launch Windows App (Windows/macOS/iOS/Android) and see the Published apps. Selecting an app starts a session on a Frontline Cloud PC (Shared mode).

  • A published app can spawn other apps on that Cloud PC when needed (e.g., Outlook opening Edge from a link), even if the other app isn’t separately published.
  • To tightly control what can launch, use Application Control for Windows policies.
  • Windows App with your published apps visible
  • Launch flow (e.g., Outlook → Edge link)

Licensing & monitoring (Frontline Shared mode — explained)

Frontline (Shared mode) is built for brief, task-oriented access with no data persistence per user session. Think “one at a time” use of a shared Cloud PC.

The rules!

  • 1 Frontline license = 1 concurrent session.
  • You can assign many users to the policy, but only N can be active at once (where N = number of Frontline licenses you assigned to that policy).
  • When a user signs out, their data is deleted and the Cloud PC is free for the next user.
  • There’s no concurrency buffer for Frontline Shared mode (and none for GPU-enabled Cloud PCs).

Monitoring concurrency (what to look at)

  • Frontline connection hourly report: See active usage over time; verify you’re not hitting limits.
  • Frontline concurrency alert: Get notified if you breach your concurrency threshold.
  • Note: Concurrency buffer doesn’t apply to GPU or Frontline Shared Cloud PCs—plan capacity accordingly.

Practical sizing tip: Start with a license count that matches your peak simultaneous users for that group/policy. Watch the hourly report for a week, then adjust up/down.

Troubleshooting checklist

  • Published but not visible? Confirm the user is in the assigned group and is using the latest Windows App.
  • Failed on publish? Unpublish → Publish. Validate the Start-menu shortcut on the image and any custom command-line parameters.
  • Unexpected app launches (e.g., Edge opens)? That’s normal when an app calls another binary. Use Application Control if you must restrict it.
  • Hitting concurrency: Users 1..N can connect; N+1 waits. Increase Frontline licenses on the policy or split users into multiple policies sized per peak.

I hope you find this helpful information for creating a Cloud App. If I have missed any steps or details, I will be happy to update the post.

Thanks,
Aresh Sarkari

Windows 365 Cloud Apps – Provisioning Policy – PowerShell (Graph Rest API) – Part 1

26 Sep

This is part one of a two-part series on Windows 365 Cloud Apps. In this post, we’ll walk through what Cloud Apps are and how to create the provisioning policy with PowerShell. In part two, we’ll publish the apps themselves. I’ll also include the PowerShell script that uses Azure/Graph REST APIs.

What is Windows 365 Cloud Apps?

Windows 365 Cloud Apps let you give users access to specific apps streamed from a Cloud PC—without handing out a full desktop to everyone. Under the hood, Cloud Apps run on Windows 365 Frontline Cloud PCs in Shared mode. That licensing model is designed for shift or part-time staff: many users can be assigned, but only one active session per license at a time.

Think of it as “just-the-apps” VDI: Outlook, Word, your line-of-business app—delivered from the cloud—with the management simplicity of Windows 365 and Intune.

Why customers care: You streamline app delivery, lower overhead, and modernize VDI without building and babysitting a big remote desktop estate.

Cloud Apps vs AVD Published Apps vs “Traditional” VDI Published Apps

TopicWindows 365 Cloud AppsAzure Virtual Desktop Published AppsTraditional VDI Published Apps
What users seeIndividual apps streamed from a Cloud PC; no full desktopIndividual apps from session hosts in Azure Virtual DesktopIndividual apps from on-prem or hosted RDS/Horizon/Citrix farms
Infra you manageCloud PC lifecycle via Intune; Microsoft operates the fabricYou design & operate host pools, scaling, FSLogix, imagesYou run the farm: brokers, gateways, hypervisors, storage
Licensing / sessionsFrontline: many users per license, 1 active session per licensePer-user/per-device or CALs + Azure consumption; multiple sessions per hostPer-user/device + on-prem infra costs
Admin planeIntune + Windows 365Azure Portal + ARM + Host pool automationVendor consoles + on-prem change management
App packagingStart-menu discovered apps from the image (MSIX/Appx discovery expanding)MSI/MSIX; MSIX App Attach; image-basedMSI/MST/App-V/Citrix packages, etc.
Who it’s great forTask/shift workers; predictable, lightweight app accessBroad use cases; granular scale & controlHeavily customized legacy estates, on-prem constraints

Mental model:

If you need elastic host pools or platform primitives, choose AVD.

  • If you’re tied to on-prem or specific vendor features, you might keep traditional VDI, but expect more ops work.
  • If you like the “managed Cloud PC” experience and want app-only access, choose Cloud Apps.
  • If you like the “managed Cloud PC” experience and want app-only access, choose Cloud Apps.

PowerShell: create the policy via REST/Graph

  • Auth: Provide $TenantId, $ClientId, $ClientSecret from your app registration. Grant/admin-consent the scopes listed above. If you are not aware of how to create the app registration, you can follow here – How to register an app in Microsoft Entra ID – Microsoft identity platform | Microsoft Learn
  • Image: Set $ImageType (e.g., "gallery") and $ImageId for your chosen image.
  • Region: $RegionName (e.g., australiaeast or "automatic").
  • Assignment:
    • $GroupId: Entra group whose members should see the Cloud Apps.
    • $ServicePlanId: the Frontline size (e.g., FL 2vCPU/8GB/128GB in the example).
    • $AllotmentCount: how many concurrent sessions you want available for this policy.
    • $AllotmentDisplayName: a friendly label that shows up with the assignment.
  • Verification/Polling: The script dumps the policy with assignments and can optionally poll for provisioned Cloud PCs tied to the policy.
  • Get-or-Create a Cloud Apps provisioning policy (userExperienceType = cloudApp, provisioningType = sharedByEntraGroup, Azure AD Join in a specified region).
  • Assigns the policy to an Entra group with service plan, capacity (allotment), and a friendly label

Required permissions (app registration – admin consent):

  • CloudPC.ReadWrite.All (and CloudPC.Read.All)
  • DeviceManagementServiceConfig.ReadWrite.All (for policy config)
<#
Create (or reuse) a Windows 365 "Cloud Apps" provisioning policy, assign an Entra group
with size + capacity + label, then verify assignment (via $expand=assignments) and optionally
poll for provisioned Cloud PCs. Uses Microsoft Graph beta.

Key note: /assignments endpoint returns 404 by design; use $expand=assignments. See MS docs.
#>

# ==========================
# 0) CONFIG — EDIT THESE
# ==========================
$TenantId     = "<Copy/Paste Tenant ID>"
$ClientId     = "<Copy/Paste Client ID>"
$ClientSecret = "<Copy/Paste ClientSecret ID>"

# Policy
$DisplayName  = "Cloud-Apps-Prov-4"
$Description  = "Cloud Apps Prov Policy - Frontline"
$EnableSSO    = $true
$RegionName   = "australiaeast"   # or "automatic"
$Locale       = "en-AU"
$Language     = "en-AU"

# Image (gallery)
$ImageType    = "gallery"
$ImageId      = "microsoftwindowsdesktop_windows-ent-cpc_win11-24H2-ent-cpc-m365"

# Assignment
$GroupId              = "b582705d-48be-4e4b-baac-90e5b50ebdf2"   # Entra ID Group
$ServicePlanId        = "057efbfe-a95d-4263-acb0-12b4a31fed8d"   # FL 2vCPU/8GB/128GB
$AllotmentCount       = 1
$AllotmentDisplayName = "CP-FL-Shared-CloudApp-1"

# Optional provisioning poll
$VerifyDesiredCount   = $AllotmentCount
$VerifyMaxTries       = 30
$VerifyDelaySec       = 30

# ==========================
# Helpers
# ==========================
function New-GraphUri {
  param(
    [Parameter(Mandatory)][string]$Path,        # e.g. "/beta/deviceManagement/virtualEndpoint/provisioningPolicies"
    [hashtable]$Query
  )
  $cleanPath = '/' + ($Path -replace '^\s*/+','' -replace '\s+$','')
  $b = [System.UriBuilder]::new("https","graph.microsoft.com")
  $b.Path = $cleanPath
  if ($Query -and $Query.Count -gt 0) {
    Add-Type -AssemblyName System.Web -ErrorAction SilentlyContinue | Out-Null
    $nvc = [System.Web.HttpUtility]::ParseQueryString([string]::Empty)
    foreach ($k in $Query.Keys) { $nvc.Add($k, [string]$Query[$k]) }
    $b.Query = $nvc.ToString()
  } else { $b.Query = "" }
  return $b.Uri.AbsoluteUri
}

function Invoke-Graph {
  param(
    [Parameter(Mandatory)][ValidateSet('GET','POST','PATCH','DELETE','PUT')][string]$Method,
    [Parameter(Mandatory)][string]$Uri,
    [hashtable]$Headers,
    $Body
  )
  try {
    if ($PSBoundParameters.ContainsKey('Body')) {
      return Invoke-RestMethod -Method $Method -Uri $Uri -Headers $Headers -Body ($Body | ConvertTo-Json -Depth 20)
    } else {
      return Invoke-RestMethod -Method $Method -Uri $Uri -Headers $Headers
    }
  } catch {
    Write-Warning "HTTP $Method $Uri failed: $($_.Exception.Message)"
    try {
      $resp = $_.Exception.Response
      if ($resp -and $resp.GetResponseStream()) {
        $reader = New-Object IO.StreamReader($resp.GetResponseStream())
        $text   = $reader.ReadToEnd()
        Write-Host "Response body:" -ForegroundColor DarkYellow
        Write-Host $text
      }
    } catch {}
    throw
  }
}

# ==========================
# 1) AUTH — Client Credentials
# ==========================
$TokenEndpoint = "https://login.microsoftonline.com/$TenantId/oauth2/v2.0/token"
$tokenForm = @{
  client_id     = $ClientId
  client_secret = $ClientSecret
  scope         = "https://graph.microsoft.com/.default"
  grant_type    = "client_credentials"
}
$auth = Invoke-RestMethod -Method Post -Uri $TokenEndpoint -Body $tokenForm -ContentType 'application/x-www-form-urlencoded'
if (-not $auth.access_token) { throw "No access_token returned. Ensure CloudPC.ReadWrite.All (+ CloudPC.Read.All) are admin-consented." }

$Headers = @{
  Authorization = "Bearer $($auth.access_token)"
  "Content-Type" = "application/json"
  "Prefer"       = "include-unknown-enum-members"
}

# Base paths
$PoliciesPath = "/beta/deviceManagement/virtualEndpoint/provisioningPolicies"
$CloudPcsPath = "/beta/deviceManagement/virtualEndpoint/cloudPCs"

# ==========================
# 2) Ensure (Get-or-Create) the policy
# ==========================
$escapedName = $DisplayName.Replace("'","''")
$findUri     = New-GraphUri -Path $PoliciesPath -Query @{ '$filter' = "displayName eq '$escapedName'" }

Write-Host "Finding policy by name: $DisplayName" -ForegroundColor Cyan
try { $existing = Invoke-Graph -Method GET -Uri $findUri -Headers $Headers }
catch { Write-Warning "Name lookup failed; proceeding to create."; $existing = $null }

if ($existing -and $existing.value -and $existing.value.Count -gt 0) {
  $policy   = $existing.value | Select-Object -First 1
  $policyId = $policy.id
  Write-Host "Using existing policy '$DisplayName' (id: $policyId)" -ForegroundColor Yellow
} else {
  $createBody = @{
    "@odata.type"           = "#microsoft.graph.cloudPcProvisioningPolicy"
    displayName             = $DisplayName
    description             = $Description
    enableSingleSignOn      = $EnableSSO
    imageType               = $ImageType
    imageId                 = $ImageId
    managedBy               = "windows365"
    microsoftManagedDesktop = @{
      "@odata.type" = "microsoft.graph.microsoftManagedDesktop"
      managedType   = "notManaged"
    }
    windowsSetting          = @{
      "@odata.type" = "microsoft.graph.cloudPcWindowsSetting"
      locale        = $Locale
    }
    windowsSettings         = @{
      "@odata.type" = "microsoft.graph.cloudPcWindowsSettings"
      language      = $Language
    }
    userExperienceType      = "cloudApp"
    provisioningType        = "sharedByEntraGroup"
    domainJoinConfigurations = @(
      @{
        "@odata.type"  = "microsoft.graph.cloudPcDomainJoinConfiguration"
        domainJoinType = "azureADJoin"
        type           = "azureADJoin"
        regionName     = $RegionName
      }
    )
  }

  $createUri = New-GraphUri -Path $PoliciesPath
  Write-Host "Creating policy '$DisplayName'..." -ForegroundColor Cyan
  $policy   = Invoke-Graph -Method POST -Uri $createUri -Headers $Headers -Body $createBody
  $policyId = $policy.id
  if (-not $policyId) { throw "Create succeeded but no policy id returned." }
  Write-Host "Created policy id: $policyId" -ForegroundColor Green
}

# ==========================
# 3) Assign — group + size + capacity + label
# ==========================
$assignUri = New-GraphUri -Path ($PoliciesPath + "/$policyId/assign")

Write-Host "Clearing existing assignments..." -ForegroundColor DarkGray
Invoke-Graph -Method POST -Uri $assignUri -Headers $Headers -Body @{ assignments = @() } | Out-Null

$assignBody = @{
  assignments = @(
    @{
      target = @{
        "@odata.type"          = "#microsoft.graph.cloudPcManagementGroupAssignmentTarget"
        groupId                = $GroupId
        servicePlanId          = $ServicePlanId
        allotmentLicensesCount = $AllotmentCount
        allotmentDisplayName   = $AllotmentDisplayName
      }
    }
  )
}
Write-Host "Assigning policy to group $GroupId (plan $ServicePlanId, count $AllotmentCount)..." -ForegroundColor Cyan
Invoke-Graph -Method POST -Uri $assignUri -Headers $Headers -Body $assignBody | Out-Null
Write-Host "Assignment submitted." -ForegroundColor Green

# ==========================
# 4) Verify — read policy with $expand=assignments (RETRY)
# ==========================
$expandUri = New-GraphUri -Path ($PoliciesPath + "/$policyId") -Query @{ '$expand' = 'assignments' }
Write-Host "Reading back policy + assignments via $expand..." -ForegroundColor Cyan

$verify = $null
for ($try=1; $try -le 12; $try++) {
  try {
    $verify = Invoke-Graph -Method GET -Uri $expandUri -Headers $Headers
    if ($verify.assignments -and $verify.assignments.Count -gt 0) { break }
    Write-Host "Assignments not materialized yet (attempt $try). Waiting 3s..." -ForegroundColor Yellow
  } catch {
    Write-Warning "Expand read attempt $try failed; retrying in 3s..."
  }
  Start-Sleep -Seconds 3
}
if (-not $verify) { throw "Failed to read policy with assignments after retries." }

$verify | ConvertTo-Json -Depth 20 | Write-Output

# Sanity: confirm the expected assignment
$expected = $verify.assignments | Where-Object {
  $_.target.groupId -eq $GroupId -and
  $_.target.servicePlanId -eq $ServicePlanId -and
  $_.target.allotmentLicensesCount -eq $AllotmentCount -and
  ($_.target.allotmentDisplayName -eq $AllotmentDisplayName -or -not $AllotmentDisplayName)
}
if ($expected) {
  Write-Host "✅ Assignment present with expected group, plan, and count." -ForegroundColor Green
} else {
  Write-Warning "Assignment not found with expected fields. See dump above."
}

# ==========================
# 5) (Optional) Poll for provisioned Cloud PCs for this policy
# ==========================
if ($VerifyDesiredCount -gt 0) {
  $cloudPcsUri = New-GraphUri -Path $CloudPcsPath -Query @{ '$filter' = "provisioningPolicyId eq '$policyId'" }
  for ($i = 1; $i -le $VerifyMaxTries; $i++) {
    try {
      $cloudPcs   = Invoke-Graph -Method GET -Uri $cloudPcsUri -Headers $Headers
      $pcsForPlan = $cloudPcs.value | Where-Object { $_.servicePlanId -eq $ServicePlanId -or -not $_.psobject.Properties.Name.Contains('servicePlanId') }
      $count      = ($pcsForPlan | Measure-Object).Count
      if ($count -ge $VerifyDesiredCount) {
        Write-Host "Provisioned Cloud PCs for policy: $count (target $VerifyDesiredCount) ✅" -ForegroundColor Green
        $pcsForPlan | ConvertTo-Json -Depth 10 | Write-Output
        break
      } else {
        Write-Host "Provisioned Cloud PCs for policy: $count (waiting for $VerifyDesiredCount) … attempt $i/$VerifyMaxTries" -ForegroundColor Yellow
      }
    } catch {
      Write-Warning "Provisioning check failed on attempt ${i}: $($_.Exception.Message)"
    }
    Start-Sleep -Seconds $VerifyDelaySec
  }
}

GitHub Link – avdwin365mem/W365-CloudApp-Prov-Policy at main · askaresh/avdwin365mem

Provisioning Policy Details – UI

Policy

Overview

Tips, gotchas, and troubleshooting

  • App discovery: Ensure the app has a Start menu shortcut on the image. That’s how Cloud Apps gets its list.
  • Security baselines: If your tenant enforces restrictions on PowerShell in the image at discovery time, discovery can fail.
  • MSIX/Appx: Discovery is expanding—classic installers show up first; some Appx/MSIX apps (e.g., newer Teams) may not appear yet.
  • Concurrency math: Active sessions for the policy are capped by assigned Frontline license count on that policy.
  • Schema drift: These are beta endpoints. If you hit a property/enum change, the script’s warnings will surface the response body—update the field names accordingly.

What’s next (Part 2)

We’ll move to All Cloud Apps to publish the discovered apps, tweak display name/description/command line/icon index, confirm they appear in Windows App, and cover unpublish/reset workflows—with your screenshots.

I hope you find this helpful information for creating a Cloud App using PowerShell. If I have missed any steps or details, I will be happy to update the post.

Thanks,
Aresh Sarkari

Cloud PC Maintenance Windows: Scheduling Resize Operations for Maximum Efficiency + Bonus Microsoft Graph Powershell way of implementation

3 Mar

Today I’m diving into a feature that’s currently in preview but promises to be super useful for Windows 365 Cloud PC admins: Cloud PC Maintenance Windows.

If you’ve ever needed to resize multiple Cloud PCs but worried about disrupting users during work hours, this new feature is about to make your life much easier. Let’s break it down!

What Are Cloud PC Maintenance Windows?

Simply put, maintenance windows allow you to schedule when certain actions (currently just resize operations) will take place on your Cloud PCs. Instead of changes occurring immediately after you initiate them, you can schedule them to run during specified time periods.

Think of it as telling your Cloud PCs, “Hey, only accept these maintenance actions during these specific hours.” It’s perfect for organizations that need to plan around busy periods and minimize disruption.

Why You Should Care About This Feature

There are several compelling reasons to start using maintenance windows:

  • After-hours maintenance: Schedule resize operations to happen overnight or on weekends
  • Predictable changes: Users receive notifications before maintenance begins
  • Bulk operations: Apply resize actions to entire departments or teams at once
  • Organizational compliance: Meet any requirements about when system changes can occur

Setting Up Your First Maintenance Window

The setup process is straightforward and consists of two main parts: creating the window itself and then applying it to a device action.

Part 1: Creating a Maintenance Window

  • Sign into the Microsoft Intune admin center
  • Navigate to Tenant administration > Cloud PC maintenance windows (preview)
  • Click Create
  • On the Basics page:
    • Enter a descriptive Name (e.g., “Weekend Resize Window”)
    • Add a Description to help other admins understand the purpose
  • On the Configuration page:
    • Set your Weekday schedule (if applicable)
    • Set your Weekend schedule (if applicable)
    • Remember: Each window must be at least two hours long
    • Select when users will receive notifications (15 minutes to 24 hours in advance)
  • On the Assignments page:
    • Add the groups whose Cloud PCs will use this maintenance window
  • Review your settings and click Create

Part 2: Using Your Maintenance Window

Once your window is created, it won’t do anything by itself until you create a bulk device action that uses it:

  • In the Intune admin center, go to Devices > Windows Devices > Bulk device actions
  • For the configuration:
    • OS: Windows
    • Device type: Cloud PCs
    • Device action: Resize
  • Select your source and target sizes
  • Important: Check the box for Use Cloud PC maintenance windows
  • Add the devices/groups and create the action

When the maintenance window becomes active, the resize operation will run, and users will receive notifications based on the lead time you specified.

Powershell way to implement Cloud PC maintence

Step 1 – Install the MS Graph Beta Powershell Module

#Install Microsoft Graph Beta Module
PS C:WINDOWSsystem32> Install-Module Microsoft.Graph.Beta

Step 2 – Connect to scopes and specify which API you wish to authenticate to. If you are only doing read-only operations, I suggest you connect to “CloudPC.Read.All” in our case, we are creating the policy, so we need to change the scope to “CloudPC.ReadWrite.All”

#Read-only
PS C:WINDOWSsystem32> Connect-MgGraph -Scopes "CloudPC.Read.All" -NoWelcome
Welcome To Microsoft Graph!

OR

#Read-Write
PS C:WINDOWSsystem32> Connect-MgGraph -Scopes "CloudPC.ReadWrite.All" -NoWelcome
Welcome To Microsoft Graph!
Permissions for MS Graph API

Step 3 –  Check the User account by running the following beta command.

#Beta APIs
PS C:WINDOWSsystem32> Get-MgBetaUser -UserId admin@wdomain.com

Create Cloud Maintenace Policy Window

We are creating a provisioning policy that involves the following: avdwin365mem/createcloudpcmaintwindow at main · askaresh/avdwin365mem

  • displayname – Name of the policy “CloudPC-Window-askaresh”
  • Description – Enter details to remember for the future
  • notification – 60 min (tweak based on your company policies)
  • Schedule – Weekday (Ensure don’t enter business hours)
# Ensure the Microsoft.Graph.Beta module is installed
if (-not (Get-Module -ListAvailable -Name Microsoft.Graph.Beta)) {
    Write-Host "Installing Microsoft.Graph.Beta module..." -ForegroundColor Cyan
    Install-Module Microsoft.Graph.Beta -Force -AllowClobber
}
Import-Module Microsoft.Graph.Beta

# Connect to Microsoft Graph with the required permissions for maintenance operations
Write-Host "Connecting to Microsoft Graph..." -ForegroundColor Cyan
Connect-MgGraph -Scopes "CloudPC.ReadWrite.All" -NoWelcome

# Define the endpoint for Cloud PC maintenance windows
$uri = "beta/deviceManagement/virtualEndpoint/maintenanceWindows"

# Construct the JSON payload for the maintenance window
$maintenancePayload = @{
    displayName                   = "CloudPC-Window-askaresh"
    description                   = "A window for test"
    notificationLeadTimeInMinutes = 60
    schedules                     = @(
        @{
            scheduleType = "weekday"
            startTime    = "01:00:00.0000000"
            endTime      = "04:00:00.0000000"
        },
        @{
            scheduleType = "weekend"
            startTime    = "01:00:00.0000000"
            endTime      = "04:00:00.0000000"
        }
    )
} | ConvertTo-Json -Depth 5

# Call the Microsoft Graph API to create the maintenance window
try {
    Write-Host "Creating Cloud PC maintenance window..." -ForegroundColor Cyan
    $result = Invoke-MgGraphRequest -Method POST -Uri $uri -Body $maintenancePayload
    Write-Host "Maintenance window created successfully." -ForegroundColor Green
    $result | Format-List
}
catch {
    Write-Error "Error creating maintenance window: $_"
}

# Optionally disconnect from Microsoft Graph when done
Disconnect-MgGraph

The User Experience

From the user perspective, they’ll receive a notification in their Cloud PC session when a maintenance window is approaching. The notification will indicate that maintenance is scheduled and when it will occur. They can’t override or postpone the maintenance, but at least they’ll be prepared.

Current Limitations

It’s worth noting that this feature is still in preview, and has some limitations:

  • Currently only supports resize operations (likely to expand in the future)
  • The maintenance window itself doesn’t guarantee the success of operations
  • Doesn’t handle Windows updates, Intune payloads, or OS updates
  • Each window must be at least two hours long

When NOT to Use Maintenance Windows

If you have an urgent situation requiring immediate resizing of Cloud PCs, simply don’t check the “Use Cloud PC maintenance windows” box when creating your bulk action. This way, the resize will happen immediately rather than waiting for the next scheduled window.

Conclusion

Having played with this feature for a bit, I’m impressed with how it streamlines the management of Cloud PCs. Before this, scheduling maintenance was much more manual and potentially disruptive. While I wish it supported more actions beyond just resizing, this is a solid foundation that I expect Microsoft will build upon.

This feature is particularly valuable for organizations with users across different time zones or with strict requirements about when system changes can occur. It’s also a huge time-saver for admins who manage large fleets of Cloud PCs. I hope you find this helpful information for creating a Cloud PC maintenance window using PowerShell. If I have missed any steps or details, I will be happy to update the post.

Thanks,
Aresh Sarkari

PowerShell – Shared Frontline Workers – Create Windows 365 Cloud PC Provisioning Policy

11 Feb

I have a blog post about creating a dedicated PowerShell – Frontline Workers – Create Windows 365 Cloud PC Provisioning Policy | AskAresh. In this post blog, I will demonstrate how to create the provisioning policy using PowerShell and MS Graph API with beta modules for Windows 365 Cloud PC – Shared Frontline Workers.

Introduction

I will not attempt to explain Frontline, but the best explanation is here: What is Windows 365 Frontline? | Microsoft Learn.

Example – With Windows 365 Frontline Shared licensing, you don’t assign a license to each individual user. Instead, you provision a pool of shared virtual desktops and grant access to a designated group of users. Each shared license represents a virtual desktop that can be dynamically used by any authorized user when available. For example, rather than needing a strict 1:1 (or even 1:3) mapping between users and desktops, you can support many more employees than the number of desktops you provision—much like a traditional non-persistent VDI setup. Once a user logs off, their desktop resets and becomes available for another user, allowing you to meet peak concurrency needs without assigning a dedicated device to every single employee.

Connect to MS Graph API

Step 1 – Install the MS Graph Beta Powershell Module

#Install Microsoft Graph Beta Module
PS C:WINDOWSsystem32> Install-Module Microsoft.Graph.Beta

Step 2 – Connect to scopes and specify which API you wish to authenticate to. If you are only doing read-only operations, I suggest you connect to “CloudPC.Read.All” in our case, we are creating the policy, so we need to change the scope to “CloudPC.ReadWrite.All”

#Read-only
PS C:WINDOWSsystem32> Connect-MgGraph -Scopes "CloudPC.Read.All" -NoWelcome
Welcome To Microsoft Graph!

OR

#Read-Write
PS C:WINDOWSsystem32> Connect-MgGraph -Scopes "CloudPC.ReadWrite.All" -NoWelcome
Welcome To Microsoft Graph!
Permissions for MS Graph API

Step 3 –  Check the User account by running the following beta command.

#Beta APIs
PS C:WINDOWSsystem32> Get-MgBetaUser -UserId admin@wdomain.com

Create Provisioning Policy (Frontline Shared Worker)

We are creating a provisioning policy that involves the following: avdwin365mem/win365sharedfrontlineCreateProvPolicy at main · askaresh/avdwin365mem

  • Azure AD Joined Cloud PC desktops
  • The region for deployment – Australia East
  • Image Name – Windows 11 Enterprise + Microsoft 365 Apps 24H2 (from the Gallery)
  • Language & Region – English (United States)
  • Network – Microsoft Managed
  • SSO – True
  • the biggest change for share front like is this provisioningType = “sharedByEntraGroup”
  • Cloud PC Naming format – FLWS-%RAND:10% (FLSW – Frontline Worker Shared)
$params = @{
	displayName = "Demo-Shared-FrontLine"
	description = "Shared Front Line Workers Prov Policy"
	provisioningType = "sharedByEntraGroup"
	managedBy = "windows365"
	imageId = "microsoftwindowsdesktop_windows-ent-cpc_win11-24H2-ent-cpc-m365"
	imageDisplayName = "Windows 11 Enterprise + Microsoft 365 Apps 24H2"
	imageType = "gallery"
	microsoftManagedDesktop = @{
		type = "notManaged"
		profile = $null
	}
	enableSingleSignOn = $true
	domainJoinConfigurations = @(
		@{
			type = "azureADJoin"
			regionGroup = "australia"
			regionName = "australiaeast"
		}
	)
	windowsSettings = @{
		language = "en-US"
	}
	cloudPcNamingTemplate = "FLWS-%RAND:10%"
}

New-MgBetaDeviceManagementVirtualEndpointProvisioningPolicy -BodyParameter $params

The policy will show up in the Intune Portal

Optional Properties

If you are doing on-premise network integration (Azure Network Connection) , then the following additional property and value is required. In my lab, I am leveraging the Microsoft Managed Network, so this is not required.

OnPremisesConnectionId = "4e47d0f6-6f77-44f0-8893-c0fe1701ffff"

Additionally, if you have enrolled into autopatch the following is the parameter. You will have to put the name from the Intune Portal.

            "autopatch": null,

I hope you will find this helpful information for creating a shared frontline worker provisioning policy using PowerShell. Please let me know if I have missed any steps or details, and I will be happy to update the post.

Thanks,
Aresh Sarkari

Offline Transcribing and Summarizing Audio with Whisper, Phi, FastAPI, Docker on NVIDIA GPU

9 Jan

In this blog post, we’ll dive into how I built an offline comprehensive audio transcription and summarization system using OpenAI Whisper (medium) for transcription, a Microsoft Phi 3.5 Large Language Model (LLM) for summarizing, FastAPI for the REST API, and Docker for containerization. Audio content can be dense and long, so having an automated way to extract transcripts and high-level summaries can be a game-changer for meetings, interviews, podcasts, and beyond!

Github – askaresh/LocalAudioTran-LLM-Summar: Offline Audio Transcription (Whisper) and LLM based (Phi-3.5) Summarization

Why Use LLMs for Audio Summaries?

Traditional speech-to-text solutions focus on generating transcripts. However, reading pages of raw transcript text can be time-consuming. Conversational text is quite boring and often makes sense during an audio call by bringing an LLM-based summarizer into the pipeline changes the entire perspective:

  • High-Level Summaries: Quickly get the core ideas or key actions from a meeting.
  • Contextual Understanding: LLMs handle nuance like speaker changes, main topics, and action items.
  • Reduced Human Effort: Saves time sifting through entire transcripts.

High-Level Architecture

  • Audio Ingestion: The user uploads an audio file (e.g., .mp3, .wav).
  • Transcription: OpenAI Whisper medium model transcribes the audio into text.
  • LLM Summarization: A large language model (e.g., Microsoft Phi 3.5) processes the transcript and produces a condensed summary.
  • RESTful API: Built with FastAPI, handling file uploads and returning structured JSON responses.
  • Docker: Containerizes the entire application for easy deployment anywhere with a GPU.

Design Decisions

Following is the list of design decisions around this project:

  • Offline Processing – All processing is conducted locally to maximize efficiency. Utilizing a robust setup with multiple GPUs, specifically the cutting-edge NVIDIA graphics cards (A4000 and RTX 3090), ensures unparalleled performance and reliability.
  • Audio Transcription – Using OpenAI Whisper (medium) is an obvious choice, as the transcription output is quite accurate, and the model size is efficient for offline running. I tried the large model, but the output did not justify the increased GPU VRAM requirements.
  • Summarization – This aspect took the most time to refine. I initially experimented with FLAN5 and BERT models, but I found their summarization outputs to be subpar, which made the project feel unworthy. While I believe these models could perform better with extensive training, I prefer an out-of-the-box solution. Therefore, I chose Microsoft Phi 3.5 (phi3.5:3.8b-mini-instruct) as my model of choice.
  • Context Window – I quickly learned that a large content window-based model is best for generating great summaries. I selected Phi 3.5 due to its 128K context window.
  • LLM Model Quantization – My NVIDIA A4000 has 16GB of VRAM. To effectively use the Phi-3.5 model, I opted for the quantized phi3.5:3.8b-mini-instruct-q4_K_M model, which balances performance and quality output. However, I noted that the KVCache still overflows and utilizes system RAM. I also experimented with Q8 LLaMA models, but I found Q4 to be the best fit.
    • Because I am using the quantised model, I ended up using the Ollama container to run the GGUF model, which has the most straightforward implementation.
  • API/Containers – All the code utilizes FastAPI for GET/POST requests. Of course, for modularity, everything operates within a container.

Implementation Details

  1. FastAPI for the REST Endpoints
    • /transcribe: Receives an audio file, calls Whisper for transcription, and returns the text.
    • /summarize: Takes the transcribed text, calls the LLM, and returns a summary.
    • Health checks (/health) keep the container orchestration informed of readiness.
  2. Whisper
    • We used a GPU version for speed (if torch.cuda.is_available()).
    • For smaller hardware or faster inference, you can opt for "tiny" or "small" model sizes.
  3. LLM Summarization
    • Could be an open-source LLM (like Llama 2, GPT-NeoX, etc.) or something hosted. We are using Microsoft Phi 3.5 (phi3.5:3.8b-mini-instruct-q4_K_M)
    • Direct Processing: Transcript processed in a single pass using Phi model. The biggest reason to choose a large context window is to ensure the model can process the entire transcript without truncation, chunking, overlapping sections, etc as the quality gets deteriorated with chunking
    • Structured Output: Summary organized into clear sections:
      •    Overview
      •    Main Points
      •    Key Insights
      •    Action Items / Decisions
      •    Open Questions / Next Steps
      •    Conclusions
    • System Prompt does all the magic for summarisation. I highly recommend spending time and learning the System Prompt
  4. Docker
    • A Dockerfile that installs Python, PyTorch, Whisper, plus your LLM dependencies.
    • The container also runs Uvicorn for FastAPI.
    • If GPU acceleration is needed, we used an NVIDIA CUDA base image (e.g., nvidia/cuda:12.1.0-runtime-ubuntu22.04) and pass --gpus all to docker run.
  5. Optional: Streamlit UI
    • If you want a friendly front-end, spin up a UI to upload audio, track progress, and view results in real-time.
    • Alternatively, you could just expose the endpoints in FastAPI and have your favorite front-end call them.

Key Challenges and Lessons

  1. Timeouts for Large Audio
    • Whisper or LLM summarization might take a while for hour-long recordings.
    • We increased the request timeout or used asynchronous background tasks.
  2. GPU Memory Constraints
    • Large LLMs and Whisper can each use significant VRAM.
    • Consider smaller quantized (Q2, etc.) LLMs or chunk-based summarization.
  3. Accuracy vs. Speed
    • The "medium" or "large" Whisper model is more accurate but slower.
    • Summaries can get more coherent using bigger LLMs, but performance can suffer.
  4. Logging & Error Handling
    • Detailed logs ensure you catch issues early (e.g., partial transcripts, AI inference errors).
    • A robust system logs whether GPU is found, load times, and inference performance metrics.
  5. Security & Data Privacy
    • Audio recordings may contain sensitive information.
    • Ensure your container or environment has proper access controls.

Validation and Examples

  1. Meeting Transcription + Summary
    • A 30-minute internal meeting is processed, producing a transcript of 6,000 words.
    • LLM Summaries: A concise bullet-point list of decisions, tasks, and key insights.
  2. Podcast or Interview
    • Summarize multi-speaker dialogues to highlight important quotes or topics.
    • Possibly split each speaker’s segment, then unify in final text.
  3. Conference Keynote
    • Summaries merged into an “executive summary” with top-level takeaways.

Project Structure

LocalAudioTran-LLM-Summar/
├─ .dockerignore
├─ .env
├─ .gitignore
├─ README.md
├─ docker-compose.yml
├─ Dockerfile
├─ backend/
│  ├─ requirements.txt
│  └─ app/
│     ├─ main.py
│     ├─ services/
│     │  ├─ transcription.py
│     │  ├─ summarization.py
│     │  └─ __init__.py
│     ├─ utils/
│     │  └─ logger.py
│     ├─ models/
│     │  ├─ schemas.py
│     │  └─ __init__.py
│     └─ __init__.py
├─ frontend/
│  ├─ requirements.txt
│  └─ src/
│     └─ app.py
└─ logs/
  • transcription.py loads Whisper, handles file I/O.
  • summarization.py calls your Phi3.5 LLM (Hugging Face Transformers, Ollama, etc.).
  • docker-compose.yml to spin up both the app container and optional GPU-based service.

Conclusion

By combining OpenAI Whisper (or any speech-to-text engine) with a Large Language Model (Phi 3.5 mini) summarizer inside a Docker container, we’ve built a unified pipeline for turning raw audio into manageable transcripts and actionable summaries. Whether you’re automating meeting minutes or analyzing podcast content, this approach saves countless hours. Feel free to experiment with chunking, smaller models, or advanced summarization prompts. Let me know how it goes!

Happy transcribing and summarizing!
Aresh Sarkari

Following are the list of helpful Links:

DescriptionLink
Microsoft Phi 3.5 Model Pagemicrosoft/Phi-3.5-mini-instruct · Hugging Face
OpenAI Whisper Modelopenai/whisper-medium · Hugging Face
Ollama Model Card Detailsphi3.5:3.8b-mini-instruct-q4_K_M
NVIDIA Docker Images (Contains Container Engine)nvidia/cuda – Docker Image | Docker Hub
IDE Editor of my choiceCursor – The AI Code Editor

Building a Comprehensive Image Analysis API with Microsoft Florence-2-large, Chainlit and Docker

8 Jul

In this blog post, we’ll embark on an exciting journey of building a comprehensive Image Analysis API using Microsoft Florence-2-large, Chainlit, and Docker. Image analysis is a fascinating field that involves extracting meaningful information from images using advanced AI techniques. By leveraging the power of Microsoft’s Florence-2-large model, we can create a system that automatically understands the content of an image and performs various analysis tasks such as captioning, object detection, expression segmentation, OCR etc..

My Florence2 Code Repository askaresh/MS-Florence2 (github.com)

Note – In the past have written a blog article on Image Captioning you can read more here – Building an Image Captioning API with FastAPI and Hugging Face Transformers packaged with Docker | AskAresh

Model Overview

Hugging Face Link – microsoft/Florence-2-large · Hugging Face

The Microsoft Florence-2-large model is a powerful pre-trained model designed for various image analysis tasks. Developed by Microsoft, this model is part of the Florence family, which is known for its robust performance in computer vision applications. The Florence-2-large model leverages extensive training on a vast dataset of images, enabling it to excel in tasks such as image captioning, object detection, and optical character recognition (OCR).

Key Features of Florence-2-large

  • Multitask Capabilities: The model can perform a wide range of image analysis tasks, including generating captions, detecting objects, segmenting regions, and recognizing text within images.
  • High Accuracy: Trained on diverse and extensive datasets, the Florence-2-large model achieves high accuracy in understanding and analyzing image content.
  • Scalability: Its architecture is designed to scale effectively, making it suitable for integration into various applications and systems.

Why Florence-2-large?

We chose the Florence-2-large model for our Image Analysis API due to its versatility and performance. The model’s ability to handle multiple tasks with high precision makes it an ideal choice for building a comprehensive image analysis system. By leveraging this model, we can ensure that our API delivers accurate and reliable results across different types of image analysis tasks.

Implementation Details

To build our Image Analysis API, we started by setting up a Chainlit project and defining the necessary message handlers. The main handler accepts an image file and processes it through various analysis tasks.

We utilized the pre-trained Florence-2-large model from Hugging Face Transformers for image analysis. This powerful model has been trained on a vast dataset of images and can perform multiple tasks such as image captioning, object detection, and OCR.

To ensure a smooth development experience and ability to run on any cloud, we containerized our application using Docker. This allowed us to encapsulate all the dependencies, including Python libraries and the pre-trained model, into a portable and reproducible environment.

Choosing NVIDIA Docker Image

We specifically chose the NVIDIA CUDA-based Docker image (nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04) for our containerization. This choice was driven by the need to leverage GPU acceleration for the Florence-2-large model, which significantly enhances the performance of image processing tasks. The CUDA-based image ensures compatibility with GPU drivers and provides pre-installed libraries necessary for efficient model execution.

Our project structure looks like this:

MS-FLORENCE2/
│
├── app/
│   ├── __init__.py
│   ├── config.py
│   ├── model.py
│   └── utils.py
│
├── Dockerfile
├── docker-compose.yml
├── .env
├── .gitignore
├── chainlit_app.py
├── requirements.txt
└── logging_config.py

Let’s break down the key components:

  1. chainlit_app.py: This is the heart of our Chainlit application. It defines the message handler that processes uploaded images and generates responses using the Florence model.
  2. app/model.py: This file contains the ModelManager class, which is responsible for loading and managing the Florence-2-large model.
  3. app/utils.py: This directory contains utility functions for image drwaing plot boxes, polygons and OCR boxes.
  4. logging_config.py: This file contains the detailed logging of this entire project and its various files
  5. Dockerfile: This file defines how our application is containerized, ensuring all dependencies are properly installed and the environment is consistent. The use of the NVIDIA CUDA-based Docker image ensures compatibility and performance optimization.

Task Prompts and Their Functions

Let’s break down the task prompts used in the Florence-2-large model and explain what each of them does:

  • <CAPTION>
    • Purpose: Generates a simple, concise caption for the image.
    • Output: A brief description of the main elements in the image.
    • Example: “A credit card bill with a price tag on it”
  • <DETAILED_CAPTION>
    • Purpose: Provides a more detailed description of the image.
    • Output: A comprehensive description including more elements and details from the image.
    • Example: “The image shows a credit card bill with a black background. The bill is printed on a white sheet of paper with a blue border and a blue header. The header reads ‘Credit Card Bill’ in bold black font. The bottom of the bill has a space for the customer’s name, address, and contact information.”
  • <OD> Object Detection
    • Purpose: Detects and locates objects within the image.
    • Output: A list of detected objects with their bounding box coordinates and labels.
    • Example: [{‘bboxes’: [[x1, y1, x2, y2], …], ‘labels’: [‘credit card’, ‘price tag’, …]}]
  • <OCR>
    • Purpose: Performs Optical Character Recognition on the image.
    • Output: Extracted text from the image.
    • Example: “Credit Card Bill\nName: John Doe\nAddress: 123 Main St…”
  • <CAPTION_TO_PHRASE_GROUNDING>
    • Purpose: Locates specific phrases or objects mentioned in the caption within the image.
    • Input: Requires a caption (usually the output from ”) as additional text input.
    • Output: Bounding boxes and labels for phrases/objects from the caption found in the image.
    • Example: [{‘bboxes’: [[x1, y1, x2, y2], …], ‘labels’: [‘credit card’, ‘price tag’, …]}]
  • <DENSE_REGION_CAPTION>
    • Purpose: Generates captions for specific regions within the image.
    • Output: A list of regions with their bounding boxes and corresponding captions.
    • Example: [{‘bboxes’: [[x1, y1, x2, y2], …], ‘labels’: [‘Header with Credit Card Bill text’, ‘Customer information section’, …]}]
  • <REGION_PROPOSAL>
    • Purpose: Suggests regions of interest within the image without labeling them.
    • Output: A list of bounding boxes for potentially important regions in the image.
    • Example: {‘bboxes’: [[x1, y1, x2, y2], …], ‘labels’: [”, ”, …]}
  • <MORE_DETAILED_CAPTION>
    • Purpose: Generates an even more comprehensive description of the image than ”.
    • Output: A very detailed narrative of the image, often including subtle details and potential interpretations.
    • Example: “The image displays a credit card bill document against a stark black background. The bill itself is printed on crisp white paper, framed by a professional-looking blue border. At the top, a bold blue header prominently declares ‘Credit Card Bill’ in a large, easy-to-read font. Below this, the document is structured into clear sections, likely detailing transactions, fees, and payment information. At the bottom of the bill, there’s a designated area for customer details, including name, address, and possibly account information. The contrast between the white document and black background gives the image a formal, official appearance, emphasizing the importance of the financial information presented.”
  • <REFERRING_EXPRESSION_SEGMENTATION>
    • Purpose: Segments the image based on a textual description of a specific object or region.
    • Input: Requires a textual description as additional input.
    • Output: A segmentation mask for the described object or region.
  • <REGION_TO_SEGMENTATION>
    • Purpose: Generates a segmentation mask for a specified region in the image.
    • Input: Requires coordinates of the region of interest.
    • Output: A segmentation mask for the specified region.
  • <OPEN_VOCABULARY_DETECTION>
    • Purpose: Detects objects in the image based on user-specified categories.
    • Input: Can accept a list of categories to look for.
    • Output: Bounding boxes and labels for detected objects matching the specified categories.
  • <REGION_TO_CATEGORY>
    • Purpose: Classifies a specific region of the image into a category.
    • Input: Requires coordinates of the region of interest.
    • Output: A category label for the specified region.
  • <REGION_TO_DESCRIPTION>
    • Purpose: Generates a detailed description of a specific region in the image.
    • Input: Requires coordinates of the region of interest.
    • Output: A textual description of the contents of the specified region.
  • ‘<OCR_WITH_REGION>’
    • Purpose: Performs OCR on specific regions of the image.
    • Output: Extracted text along with the corresponding regions (bounding boxes) where the text was found.

These task prompts allow us to leverage the Florence-2-large model’s capabilities for various image analysis tasks. By combining these prompts, we can create a comprehensive analysis of an image, from basic captioning to detailed object detection and text recognition. Understanding and effectively utilizing these task prompts was crucial in maximizing the potential of the Florence-2-large model in our project.

Lessons Learned and Debugging

Throughout the development of our Florence Image Analysis project, I encountered several challenges and learned valuable lessons:

  • Flash Attention Challenges: One of the most significant hurdles we faced was integrating flash-attn into our project. Initially, we encountered installation issues and compatibility problems with our CUDA setup. We learned that:
    • Flash-attn requires specific CUDA versions and can be sensitive to the exact configuration of the environment.
    • Note we moved to NVDIA based docker image to take care of all the pre-requsites specific to CUDA/Flashattention and interoperatbility of versions, that helped tremendously
    • Building flash-attn from source was often necessary to ensure compatibility with our specific setup. environment.
    • Using the --no-build-isolation flag during installation helped resolve some dependency conflicts. Solution: We ended up creating a custom build process in our Dockerfile, ensuring all dependencies were correctly installed before attempting to install flash-attn.
  • Segmentation and OCR with Region Iterations: Implementing effective OCR, especially with region detection, proved to be an iterative process:
    • Initially, we tried using the Florence model for general OCR, but found it lacking in accuracy for structured documents.
    • We experimented with pre-processing steps to detect distinct regions in documents (headers, body, footer) before applying OCR.
    • Balancing between processing speed and accuracy was a constant challenge. Solution: We implemented a custom region detection algorithm that identifies potential text blocks before applying OCR. This improved both accuracy and processing speed.
  • Error Handling and Logging: As the project grew more complex, we realized the importance of robust error handling and comprehensive logging:
    • Initially, errors in model processing would crash the entire application.
    • Debugging was challenging without detailed logs. Solution: We implemented try-except blocks throughout the code, added detailed logging, and created a system to gracefully handle and report errors to users.
  • Optimizing for Different Document Types: We found that the performance of our system varied significantly depending on the type of document being processed:
    • Handwritten documents required different preprocessing than printed text.
    • Certain document layouts (e.g., tables, multi-column text) posed unique challenges. Solution: We implemented a document type detection step and adjusted our processing pipeline based on the detected type.
  • Balancing Between Flexibility and Specialization: While I aimed to create a general-purpose image analysis tool, we found that specializing for certain tasks greatly improved performance:
    • We created separate processing paths for tasks like receipt OCR, business card analysis, and general document processing. Solution: We implemented a modular architecture that allows for easy addition of specialized processing pipelines while maintaining a common core.

These lessons significantly improved the robustness and effectiveness of our Florence Image Analysis project.

Validation of the API with real Examples

After the container is up and running, users can access the Chainlit interface at http://localhost:8010. Here’s an example of how to use the API:

Example – <Caption>

Example – <MORE_DETAILED_CAPTION>

Example – <OCR>

Example – <OCR_WITH_REGION>

Model GPU – VRAM Consumption

Following are the list of helpful links:

DescriptionLink
Florence-2: Advancing a Unified Representation for a Variety of Vision TasksFlorence-2: Advancing a Unified Representation for a Variety of Vision Tasks – Microsoft Research
Sample Google Collab Notebook Florence2sample_inference.ipynb · microsoft/Florence-2-large at main (huggingface.co)
Research Paper link direct[2311.06242] Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks (arxiv.org)
Florence 2 Inference ChainlitFlorence 2 Inference Chainlit – Mervin Praison

Conclusion

Building this comprehensive Image Analysis API with Florence-2-large, Chainlit, and Docker has been an incredible learning experience. I must have spend atleast a week to get to this point working all the features and functionality within a Docker image. By leveraging the power of advanced AI models and containerization, we created a scalable and efficient solution for performing various image analysis tasks automatically. Through this project, we gained valuable insights into model management, error handling, GPU utilization in containerized environments, and designing interactive UIs for AI applications.

I hope that this blog post has provided you with a comprehensive overview of our Image Analysis API project and inspired you to explore the fascinating world of computer vision. Feel free to check out our GitHub repository, try out the API, and let me know if you have any questions or suggestions!.

Thanks,
Aresh Sarkari

Optimize Your Azure Costs with VM Hibernation – Cost saving for Azure Virtual Desktop

3 Jun

Azure Virtual Machines (VMs) are a powerful tool for running applications, hosting desktops, and performing various tasks in the cloud. However, the cost of running VMs can add up quickly, especially if they are left running when not in use. Microsoft has introduced a new feature called VM Hibernation, which allows you to save costs by hibernating your VMs when they are not needed. In this blog post, we will explore the benefits of VM Hibernation and how you can use it to optimize your Azure costs.

What is VM Hibernation

VM Hibernation is a cost-saving feature that allows you to deallocate a VM while preserving its in-memory state. When a VM is hibernated, you don’t pay for the compute costs associated with the VM. Instead, you only pay for the storage and networking resources associated with the VM. This means you can save a significant amount of money on your Azure bill by hibernating VMs when they are not in use. Obviously this wont work on its own you will have to intergrate this further with Scaling Plans & Azure Automation.

Use Cases for VM Hibernation

CategoryDescription
Virtual DesktopsIf you use Azure Virtual Desktop to provide virtual desktops (Personal Only) to your employees, you can use VM Hibernation to save costs during non-business hours. By hibernating the VMs after business hours and resuming them the next morning, you can avoid paying for compute resources when they are not needed.
Dev/Test EnvironmentsIf you have development or testing environments that are not used 24/7, you can use VM Hibernation to save costs. By hibernating the VMs when they are not in use, you can avoid paying for compute resources that are not needed.
Prewarmed VMsIf you have applications that have long initialization times due to memory components, you can use VM Hibernation to save costs. By bringing up the apps and hibernating the VMs, you can quickly start the “prewarmed” VMs when needed, with the applications up and running in the desired state.

How to Enable Hibernation

Enabling hibernation is straightforward and can be done using various tools like Azure Portal, PowerShell, CLI, ARM, SDKs, and APIs. etc. I will be demonstrating within the Azure Portal:

  • Create a Host Pool – Select Personal
    • Under the Host Pool creation ensure you have Personal selected as this doesn’t work for Pooled
  • Under Host Pools –> Virtual Machine –> Select Hibernate

Hibernate is only available for personal host pools. For more information, see Hibernation in virtual machines.

Configuring Scaling Plans

To take full advantage of VM Hibernation, you can configure scaling plans for your VMs. Scaling plans allow you to automatically hibernate VMs based on a schedule or based on user activity.

  • Start VM on Connect – Ensure that is “Yes
  • Configure Hibernation Settings
    • Define hibernation after a user session disconnects or logs off for a configurable period.
  • Assign the Scaling Plan
    • Apply the scaling plan to one or more personal host pools.

Following are the list of helpful links:

DescriptionLinks
Hibernation support now available for Azure Virtual DesktopHibernation support now available for Azure Virtual Desktop | Azure Virtual Desktop Blog (microsoft.com)
Cost Optimization for General Purpose VMs using Hibernation now Generally AvailableCost Optimization for General Purpose VMs using Hibernation now Generally Available – Microsoft Community Hub

VM Hibernation is a powerful cost-saving feature that can help you optimize your Azure costs. By hibernating VMs when they are not in use, you can save a significant amount of money on your Azure bill. Whether you are using VMs for virtual desktops, dev/test environments, or prewarmed VMs, VM Hibernation can help you save costs without sacrificing performance or availability. Do give it a try in your enivornment and see how much you can save?

Thanks,
Aresh Sarkari

Unlocking the Power of Multimodal AI: A Deep Dive into LLaVA and LLaMA 3 – Demo in LM Studio

23 May

In my earlier post we explored uncensored LLM like Dolphin. Today, we shall look into the intersection of visual and language understanding what happens when a marriage takes place between Vision & LLM. One such innovation is LLaVA (Large Language and Visual Assistant), an open-source generative AI model that combines the strengths of vision encoders and large language models to create a powerful tool for general-purpose visual and language understanding. In this blog post, we’ll delve into the details of LLaVA, its underlying models, and how you can harness its capabilities using LMStudio.

What is LLaVA?

🖼️ LLaVA is a novel, end-to-end trained large multimodal model that integrates a pre-trained CLIP ViT-L/14 visual encoder with the Vicuna large language model. The integration is achieved through a projection matrix, enabling seamless interaction between visual and language data. LLaVA is designed to excel in both daily user-oriented applications and specialized domains such as science, offering a versatile tool for multimodal reasoning and instruction-following tasks.

What is LLaMA 3?

🧠 LLaMA 3 is the third iteration of the Large Language Model from Meta AI, known for its remarkable language understanding and generation capabilities. LLaMA 3 builds upon its predecessors with improved architecture, enhanced training techniques, and a broader dataset, making it one of the most advanced language models available. In the context of LLaVA, LLaMA 3 serves as the foundation for the language model component, providing robust support for complex conversational and reasoning tasks.

How to Run the Model Locally Using LMStudio

💻 Running LLaVA locally using LMStudio is a straightforward process that allows you to leverage the model’s capabilities on your own hardware. Here’s a step-by-step guide to get you started:

  • Setup Your Environment
    • Install LMStudio: The software its available on (Windows, Mac & Linux). This software allows you to manage and deploy local LLMs without you having to setup Python, Machine Learning, Transformers etc. libraries. Link to Download the Windows Bits  – LM Studio – Discover, download, and run local LLMs
  • Download the Model and Dependencies
    • The best space to keep a track on models is Hugging Face – Models – Hugging Face. You can keep a track of the model releases and updates here.
    • Copy the model name from Hugging Face – xtuner/llava-llama-3-8b-v1_1-gguf
    • Paste this name in LM Studio and it will list out all the quantized models
    • In my case due to the configurations I selected int4 model. Please note lower the quantized version less accurate the model is.
    • Obtain the LLaVA model files, including the quantized GGUF version and MMProj files, from the official repository.
    • Download of the model will take time depending upon your internet connection.
  • Prepare the Model for Running:
    • Within LMStudio click on the Chat interface to configure model settings.
    • Select the model from the drop down list – llava llama 3 v int4 GGUF
    • You will be able to run it stock but I like to configure the Advanced Configurations
    • Adjust the model settings to match your hardware capabilities and specific requirements.
    • Based on your system set the GPU to 50/50 or max. I have setup for max
    • Click Relod model to apply configuration
  • Run Inference: Start the model and begin running inference tasks, whether for visual chat, science QA, or other applications.

Note – If there is enough interest, I can also do a extended blogpost on Dockerized version of this model. Leave comments down below.

What are MMProj Files?

📂 MMProj files are a key component in the LLaVA ecosystem, representing multimodal projection matrices that facilitate the alignment between visual and language features. These files are crucial for the seamless integration of visual encoders and language models, enabling LLaVA to effectively interpret and generate content that spans both modalities. MMProj files are fine-tuned during the model’s training process to ensure optimal performance in various applications.

What is the Quantized GGUF Version of LLaVA?

💾 The quantized GGUF (GPT-Generated Unified Format) version of LLaVA represents a compressed and optimized variant of the model, enabling efficient deployment on consumer-grade hardware. Quantization reduces the precision of the model’s weights, significantly decreasing the memory footprint and computational requirements while maintaining a high level of performance. This makes the quantized GGUF version ideal for applications where resource constraints are a concern.

Testing the Model

🧪 Testing showcases the beauty of the LLaVA model look at the details its providing in the example images.

Example 1

Example 2

Through rigorous testing and validation, LLaVA continues to demonstrate its potential as a versatile and powerful multimodal model.

Reference Links

Following are the list of helpful links:

DescriptionLink
LLaVA Github PageLLaVA (llava-vl.github.io)
Microsoft Research Paper LLaVA: Large Language and Vision Assistant – Microsoft Research
Hugging Face GGUF modelxtuner/llava-llama-3-8b-v1_1-gguf · Hugging Face
Visual Instruction Tuning (arxiv)[2304.08485] Visual Instruction Tuning (arxiv.org)

🌐 LLaVA represents a significant advancement in the field of multimodal AI, combining powerful visual and language understanding capabilities in a single, efficient model. By leveraging the strengths of LLaMA 3 and innovative techniques like quantization and multimodal projection, LLaVA offers a robust tool for a wide range of applications. Whether you’re a researcher, developer, or enthusiast, exploring the potential of LLaVA can open up new possibilities in the realm of AI-driven interaction and understanding.

By following the steps outlined in this post, you can get started with LLaVA and begin harnessing its capabilities for your own projects. Please let me know if I’ve missed any steps or details, and I’ll be happy to update the post.

Thanks,
Aresh Sarkari