How To Join Object Lists In PowerShell

How To Join Object Lists In PowerShell

This articles explains how to joins objects lists in a sql-like manner using PowerShell.

Question

So I have these two arrays of data objects in my PowerShell code:

$Customers = (
    @{ Id = 1; Name = "Customer 1" },
    @{ Id = 2; Name = "Customer 2"; },
    @{ Id = 3; Name = "Customer 3"; }
);

$Orders = (

    @{ Id = 1; CustomerId = 1; Product = "Red Pumpkin" },
    @{ Id = 2; CustomerId = 1; Product = "Blue Pumpkin" },
    @{ Id = 3; CustomerId = 2; Product = "Green Pumpkin" }
);

How can I mimic a SQL-like JOIN on these two lists and output a new list with the joined data?

Short Answer

Looping and New-Object are your friends:

# loop all customers
foreach ($c in $Customers)
{
    # loop al orders
    foreach ($o in $Orders)
    {
        # decide whether to join
        if ($o.CustomerId -eq $c.Id)
        {
            # output a new object with the join result
            New-Object PSObject -Property @{
                CustomerId = $c.Id;
                OrderId = $o.Id;
                CustomerName = $c.Name;
                OrderProduct = $o.Product;
            };
        }
    }
}

What would be really cool though would be to have a CmdLet that would do this work for us dynamically whilst allowing us to pipe data through, wouldn’t it?

Long Answer

That’s precisely what the basic function below does:

function Join-Object
{
    param
    (
        [Parameter(Mandatory = $true, ValueFromPipeline = $true)]
        $From,

        [Parameter(Mandatory = $true)]
        $On,

        [Parameter(Mandatory = $true)]
        [ScriptBlock]
        $Where = { $true },

        [Parameter(Mandatory = $true)]
        [ScriptBlock]
        $Select = { @{ SomeId = $a.Id; SomeName = $b.Name } },

        [Parameter(Mandatory = $false)]
        [ValidateSet("Inner", "Outer")]
        [String]
        $Type = "Inner"
    )

    begin {}
    process
    {
        # enumerate all items in the driver list
        foreach ($a in $From)
        {
            $HasJoined = $false;
            foreach ($b in $On)
            {
                if (&$Where)
                {
                    New-Object -TypeName PSObject -Property (&$Select);
                    $HasJoined = $true;
                }
            }
            if (!$HasJoined -and $Type -eq "Outer")
            {
                $b = $null;
                New-Object -TypeName PSObject -Property (&$Select);
            }
        }
    }
    end {}
}

We can make use of this function by calling it like this:

$Customers | Join-Object -On $Orders -Where { $a.Id -eq $b.CustomerId } -Select { @{
    CustomerId = $a.Id;
    OrderId = $b.Id;
    CustomerName = $a.Name;
    OrderProduct = $b.Product;
} }

So what magic is going on here?

Well, the magic lies in the way PowerShell evaluates script blocks. Look at the Where parameter of the previous call:

-Where { $a.Id -eq $b.CustomerId }

Here, we are passing a script block as a parameter to the function, almost as if it were of .NET delegate of sorts of a function pointer in C++. However, unlike these, script blocks are scoped and evaluated in the same context they are executed, not as separate scopes entirely. Knowing this quirk allows us to inject some placeholder code to be evaluated at specific points in the process.

What the Where parameter is doing here, then, is accepting a bit a code which will be evaluated on every join and returns $true or $false depending on whether they join should succeed or not.

The same principle underpins the Select parameter:

-Select { @{
    CustomerId = $a.Id;
    OrderId = $b.Id;
    CustomerName = $a.Name;
    OrderProduct = $b.Product;
} }

Here we have another script block being passed to the function. This one however, returns a hash table constructed from the properties of the placeholder variables $a and $b. This block will then be evaluated in the context of the join itself and be used to feed the New-Object Cmdlet to create a join result.

Oh almost forgot. As a small bonus, this function also supports Left Outer by specifying the join type, like so:

$Customers | Join-Object -On $Orders -Where { $a.Id -eq $b.CustomerId } -Type Outer -Select { @{
    CustomerId = $a.Id;
    OrderId = $b.Id;
    CustomerName = $a.Name;
    OrderProduct = $b.Product;
} }

You’ve probably thought about this by now, but we can also fiddle with the parameters here to make the function perform a Cross Join<>. After all, a Cross Join is nothing but a regular inner join where everything on one side matches to everything on the other side.

This can easily be accomplished by specifying $true on the Where clause:

$Customers | Join-Object -On $Orders -Where { $true } -Select { @{
    CustomerId = $a.Id;
    OrderId = $b.Id;
    CustomerName = $a.Name;
    OrderProduct = $b.Product;
} }

And that’s it. Of course, there are lots of improvements that can be made here, such as Right and Full Outer joins. Also, there isn’t the concern here for the over-the-top performance optimization we have in SQL but, then again, we use scripting to quickly perform repetitive tasks, not to implement a DBMS. With that in mind, this should get you started.

Jorge Candeias's Picture

About Jorge Candeias

Jorge helps organizations build high-performing solutions on the Microsoft tech stack.

London, United Kingdom https://jorgecandeias.github.io