July | 2023 | AskAresh

Archive | July, 2023

PowerShell – GPO Analysis – Search for a specific or list of GPO Setting across multiple GPOs within a domain

Suppose you’ve ever had to search for a particular or a list of GPO settings across a large number of Group Policy Objects (GPOs) within your domain. In that case, you know how tedious it can be to find specific settings across hundreds or thousands of GPOs. PowerShell comes to the rescue with a powerful script that can search for GPO settings across all your existing GPOs and generate an organized CSV output. In this blog post, we’ll walk you through the process and ensure you have all the prerequisites to get started.

Usecase

You have approx. 50 to 60 GPO settings from the Center of Internet Security (CIS) benchmark policies document (CIS Microsoft Windows Desktop Benchmarks/CIS Microsoft Windows Server Benchmarks), which you may want to search against your domain, whether they are already preconfigured\existing available within a GPO or not present in the environment. Instead of searching manually one by one, you may want to use the below PowerShell to get results like a champion.

Prerequisites

Before using the PowerShell script, ensure you have the following prerequisites in place:

Windows PowerShell version 5.0 and above
Active Directory Module for Windows PowerShell
Permissions: Ensure you have sufficient permissions to access and analyze GPO settings. Typically, you need to be a member of the Domain Administrators group or have equivalent privileges.
Execute the script from a member server that is part of the domain and has the necessary permissions.
Prepare the input file (inputgpo.txt) and enter the GPO setting one per line and save the file. In my situation, it’s present in C:\Temp

Relax minimum password length limits
Allow Administrator account lockout
Generate security audits
Impersonate a client after authentication
Lock pages in memory
Replace a process level token
Accounts: Block Microsoft accounts
Interactive logon: Machine inactivity limit
Microsoft network server: Server SPN target name validation level
Network access: Remotely accessible  registry paths
Network security: Configure encryption types allowed for Kerberos
Audit Security State Change
Do not allow password expiration time longer than required by policy
Password Settings: Password Complexity
Password Settings: Password Length
Password Settings: Password Age (Days)

PowerShell Script

Now that you have the prerequisites in place, let’s dive into the PowerShell script. GitHub – avdwin365mem/GPOSettingsSearch at main · askaresh/avdwin365mem (github.com)

Enter the name of your domain (E.g askaresh.com)
Make sure the Input file is present in C:\Temp

#Domain
$DomainName = "askaresh.com"

# Initialize matchlist
$matchlist = @()

# Collect all GPOs
$GPOs = Get-GPO -All -Domain $DomainName

# Read search strings from text file
# A list of GPOs settings you want to search
$SearchStrings = Get-Content -Path "C:\Temp\inputgpo.txt"

# Hunt through each GPO XML for each search string
foreach ($searchString in $SearchStrings) {
    $found = $false
    foreach ($gpo in $GPOs) {
        $GPOReport = Get-GPOReport -Guid $gpo.Id -ReportType Xml
        if ($GPOReport -match $searchString) {
            $match = New-Object PSObject -Property @{
                "SearchString" = $searchString
                "GPOName" = $gpo.DisplayName
            }
            $matchlist += $match
            $found = $true
        }
    }
    if (-not $found) {
        $match = New-Object PSObject -Property @{
            "SearchString" = $searchString
            "GPOName" = "No results found"
        }
        $matchlist += $match
    }
}

# Output results to CSV, Search results

# This step will take time depending how many 100's or 1000's policies present in the enviornment
$matchlist | Export-Csv -Path "C:\Temp\gposearch.csv" -NoTypeInformation

Output (Results)

The ouput will look like the following within CSV:

I hope you will find this helpful information for searching GPO settings across 100’s and 1000’s of GPOs within your domain. Please let me know if I have missed any steps or details, and I will be happy to update the post.

Thanks,
Aresh Sarkari

Tags: Active Directory, CIS, GPO, Group Policy Objects, Microsoft, PowerShell, Security, Windows Server

Comments 1 Comment
Categories PowerShell, Scripts-API, Windows

Extract Highlighted Data from PDF using Python – Example CIS Windows Server 2022 Benchmark pdf

10 Jul

In this blog post, we will explore how to extract highlighted data from a PDF using Python. Before we go ahead lets understand what is the usecase, you have the (CIS_Microsoft_Windows_Server_2022_Benchmark_v2.0.0.pdf) which is 1065 pages and you are reviewing the the policy against your enivornment and highlighting the pdf with specific color codes. For example, I use four colors for the following purposes:

Red Color – Missing Policies
Yellow Color – All existing policies
Pink – Policies not applicable
Green – Upgraded policies

Example of the highlighted text:

You dont have to use the same color codes like I have done but you get the idea. Once you have done the heavy lifting of reviewing the document and happy with the analysis. The next steps is you want to extract the highlighted data into a csv format so that the teams can review and action them.

Pre-requsites

We will use the PyMuPDF & Pandas library to parse the PDF file and extract the highlighted text. Additionally, we will apply this technique to the CIS Windows Server 2022 Benchmark PDF as an example.

Before we begin, make sure you have installed the necessary dependencies. You can install PyMuPDF and Pandas using pip:

pip install fitz
pip install pandas

First, I created a small script to go within the document pdf and detect the colors. I had to do this although, to my eyes, the colors are RED, Yellow, etc., the RGD color codes seem slightly different.

import fitz  # PyMuPDF

# Open the PDF
doc = fitz.open('CIS_Microsoft_Windows_Server_2022_Benchmark_v2.0.0.pdf')

# Set to store unique colors
unique_colors = set()

# Loop through every page
for i in range(len(doc)):
    page = doc[i]
    # Get the annotations (highlights are a type of annotation)
    annotations = page.annots()
    for annotation in annotations:
        if annotation.type[1] == 'Highlight':
            # Get the color of the highlight
            color = annotation.colors['stroke']  # Returns a RGB tuple
            unique_colors.add(color)

# Print all unique colors
for color in unique_colors:
    print(color)

You will get the following output post executing the script make sure you put the exact name of the PDF file and within the IDE of your choice cd to the directory where the above (CheckColor.py) resides.

Now we have the color codes it’s time to go ahead and extract the highlighted text. We iterate through each page of the PDF and check for any highlighted annotations. If an annotation is found, we extract the content and accumulate it in the extracted_text variable, followed by export to the csv.

Main Code

Replace "CIS_Microsoft_Windows_Server_2022_Benchmark_v2.0.0.pdf" with the actual path to your PDF file.

import fitz  # PyMuPDF
import pandas as pd

# Open the PDF
doc = fitz.open('CIS_Microsoft_Windows_Server_2022_Benchmark_v2.0.0.pdf')

# Define the RGB values for your colors
PINK = (0.9686269760131836, 0.6000000238418579, 0.8196079730987549)
YELLOW = (1.0, 0.9411770105361938, 0.4000000059604645)
GREEN = (0.49019598960876465, 0.9411770105361938, 0.4000000059604645)
RED = (0.9215689897537231, 0.2862749993801117, 0.2862749993801117)

color_definitions = {"Pink": PINK, "Yellow": YELLOW, "Green": GREEN, "Red": RED}

# Create separate lists for each color
data_by_color = {"Pink": [], "Yellow": [], "Green": [], "Red": []}

# Loop through every page
for i in range(len(doc)):
    page = doc[i]
    annotations = page.annots()
    for annotation in annotations:
        if annotation.type[1] == 'Highlight':
            color = annotation.colors['stroke']  # Returns a RGB tuple
            if color in color_definitions.values():
                # Get the detailed structure of the page
                structure = page.get_text("dict")

                # Extract highlighted text line by line
                content = []
                for block in structure["blocks"]:
                    for line in block["lines"]:
                        for span in line["spans"]:
                            r = fitz.Rect(span["bbox"])
                            if r.intersects(annotation.rect):
                                content.append(span["text"])
                
                content = " ".join(content)

                # Append the content to the appropriate color list
                for color_name, color_rgb in color_definitions.items():
                    if color == color_rgb:
                        data_by_color[color_name].append(content)

# Convert each list to a DataFrame and write to a separate .csv file
for color_name, data in data_by_color.items():
    if data:
        df = pd.DataFrame(data, columns=["Text"])
        df.to_csv(f'highlighted_text_{color_name.lower()}.csv', index=False)

After running the script, the extracted highlighted text will be saved under multiple csv files like the below screenshot:

You can now extract the highlighted text from the PDF using the above technique. Feel free to modify and adapt this code to suit your specific requirements. Extracting highlighted data from PDFs can be a powerful data analysis and research technique.

I hope you will find this helpful information for extracting data out from any PDF files. Please let me know if I have missed any steps or details, and I will be happy to update the post.

Thanks,
Aresh Sarkari

Tags: CIS, Data extraction, PDF analysis, PDF parsing, PyMuPDF, Python

Comments 2 Comments
Categories Scripts-API

	Weekly Newsletter -… on AI-Enabled Windows 365 Cloud P…
	Weekly Newsletter -… on Copilot in Intune for Windows…
	askaresh on PowerGUI, a graphical user int…
	will on PowerGUI, a graphical user int…
	Weekly Newsletter –… on Windows 365 Cloud Apps — Publi…

Search

AskAresh

Extract Highlighted Data from PDF using Python – Example CIS Windows Server 2022 Benchmark pdf

Example of the highlighted text:

Pre-requsites

Main Code

Recent Posts

Follow me on Twitter

Categories

Archives

Recent Comments

Meta