Friday, June 29, 2012

All partitions of two sets C#


I've answered an interesting question on StackOverflow.com
The question was, provided two unique sets, how to produce all posible combination of subsets. For example provided these sets:
List<string> employees = new List<string>() { "Adam", "Bob"};
List<string> jobs      = new List<string>() { "1", "2", "3"};

We can split the two sets to the maximum of two subsets, and here are all the combinations:
Adam       : 1
Bob        : 2 , 3
Adam       : 1 , 2
Bob        : 3
Adam       : 1 , 3
Bob        : 2
Adam       : 2
Bob        : 1 , 3
Adam       : 2 , 3
Bob        : 1
Adam       : 3
Bob        : 1 , 2
Adam, Bob  : 1, 2, 3

First of all before you write your code down, lets understand the underlying combinatorics of this question.
Basically, you require that for any partition of set A, you require the same number of parts in set B.
E.g. If you split set A to 3 groups you require that set B will be split to 3 groups as well, if you don't at least one element won't have a corresponding group in the other set.
It's easier to picture it with splitting set A to 1 group we must have one group made from set B as in your example (Adam, Bob : 1, 2, 3) .
So now, we know that set A has n elements and that set B has k elements.
So naturally we cannot request that any set be split more that Min(n,k).

Let's say we split both sets to 2 groups (partitions) each, now we have 1*2=2! unique pairs between the two sets.
Another example is 3 groups each would give us 1*2*3=3! unique pairs.

However, we're still not finished, after any set is split into several subsets (groups), we can still order the elements in many combinations.
So for m partitions of a set we need to find how many combinations of placing n elements in mpartitions.
This can be found by using the Stirling number of the second kind formula:

(Eq 1) Stirling Number of the Second Kind
This formula gives you The number of ways of partitioning a set of n elements into k nonempty sets.

So the total number of combinations of pairs for all partitions from 1 to min(n,k) is:

(Eq 2) All combinations
In short, it's the sum of all partition combinations of both sets, times all combinations of pairs.
So now we understand how to partition and pair our data we can write the code down:
Code:
So basically, if we look at our final equation (2) we understand the we need four parts of code to our solution.
1. A summation (or loop)
2. A way to get our Stirling sets or partitions from both sets
3. A way to get the Cartesian product of both Stirling sets.
4. A way to permutate through the items of a set. (n!)
On StackOverflow you can find many ways of how to permuatate items and how to find cartesian products, here is a an example (as extension method):
 public static IEnumerable<IEnumerable<T>> Permute<T>(this IEnumerable<T> list)
    {
        if (list.Count() == 1)
            return new List<IEnumerable<T>> { list };

        return list.Select((a, i1) => Permute(list.Where((b, i2) => i2 != i1)).Select(b => (new List<T> { a }).Union(b)))
                   .SelectMany(c => c);
    }
This was the easy part of the code.
The more difficult part (imho) is finding all possible n partitions of a set.
So to solve this, I first solved the greater problem of how to find all possible partitions of a set (not just of size n).
I came up with this recursive function:
public static List<List<List<T>>> AllPartitions<T>(this IEnumerable<T> set)
    {
        var retList = new List<List<List<T>>>();

        if (set == null || !set.Any())
        {
            retList.Add(new List<List<T>>());
            return retList;
        }
        else
        {
            for (int i = 0; i < Math.Pow(2, set.Count()) / 2; i++)
            {
                var j = i;

                var parts = new [] { new List<T>(), new List<T>() };


                foreach (var item in set)
                {
                    parts[j & 1].Add(item);
                    j >>= 1;
                }

                foreach (var b in AllPartitions(parts[1]))
                {
                    var x = new List<List<T>>();

                    x.Add(parts[0]);

                    if (b.Any())
                        x.AddRange(b);

                    retList.Add(x);
                }
            }
        }
        return retList;
    }
The return value of : List<List<List<T>>> just means a List of all possible partitions where a partition is a list of sets and a set is a list of elements.
We do not have to use the type List here, but it simplifies indexing.
So now let's put everything together:
Main Code
//Initialize our sets
        var employees = new [] { "Adam", "Bob" };
        var jobs = new[] {  "1", "2", "3" };

        //iterate to max number of partitions (Sum)
        for (int i = 1; i <= Math.Min(employees.Length, jobs.Length); i++)
        {
            Debug.WriteLine("Partition to " + i + " parts:");

            //Get all possible partitions of set "employees" (Stirling Set)
            var aparts = employees.AllPartitions().Where(y => y.Count == i);
            //Get all possible partitions of set "jobs" (Stirling Set)
            var bparts = jobs.AllPartitions().Where(y => y.Count == i);

            //Get cartesian product of all partitions 
            var partsProduct = from employeesPartition in aparts
                      from jobsPartition in bparts
                               select new {  employeesPartition,  jobsPartition };

            var idx = 0;
            //for every product of partitions
            foreach (var productItem in partsProduct)
            {
                //loop through the permutations of jobPartition (N!)
                foreach (var permutationOfJobs in productItem.jobsPartition.Permute())
                {
                    Debug.WriteLine("Combination: "+ ++idx);

                    for (int j = 0; j < i; j++)
                    {
                        Debug.WriteLine(productItem.employeesPartition[j].ToArrayString() + " : " + permutationOfJobs.ToArray()[j].ToArrayString());
                    }
                }
            }
        }
Output:
Partition to 1 parts:
Combination: 1
{ Adam , Bob } : { 1 , 2 , 3 }
Partition to 2 parts:
Combination: 1
{ Bob } : { 2 , 3 }
{ Adam } : { 1 }
Combination: 2
{ Bob } : { 1 }
{ Adam } : { 2 , 3 }
Combination: 3
{ Bob } : { 1 , 3 }
{ Adam } : { 2 }
Combination: 4
{ Bob } : { 2 }
{ Adam } : { 1 , 3 }
Combination: 5
{ Bob } : { 3 }
{ Adam } : { 1 , 2 }
Combination: 6
{ Bob } : { 1 , 2 }
{ Adam } : { 3 }
We can easily check our out put just by counting the results.
In this example we have a set of 2 elements, and a set of 3 elements, Equation 2 states that we need S(2,1)S(3,1)1!+S(2,2)S(3,2)2! = 1+6 = 7
which is exactly the number of combinations we got.
For reference here are examples of Stirling number of the second kind:
S(1,1) = 1
S(2,1) = 1
S(2,2) = 1
S(3,1) = 1
S(3,2) = 3
S(3,3) = 1
S(4,1) = 1
S(4,2) = 7
S(4,3) = 6
S(4,4) = 1
Here is an example of a ToString I use for display purposes:
public static String ToArrayString<T>(this IEnumerable<T> arr)
    {
        string str = "{ ";

        foreach (var item in arr)
        {
            str += item + " , ";
        }

        str = str.Trim().TrimEnd(',');
        str += "}";

        return str;
    }
Here is a fast permuation algorithm if you want faster results:
The main part of this algorithm is finding the Stirling sets, I've used an inneficient Permutation method, here is a faster one based on the QuickPerm algorithm:
public static IEnumerable<IEnumerable<T>> QuickPerm<T>(this IEnumerable<T> set)
    {
        int N = set.Count();
        int[] a = new int[N];
        int[] p = new int[N];

        var yieldRet = new T[N];

        var list = set.ToList();

        int i, j, tmp ;// Upper Index i; Lower Index j
        T tmp2;

        for (i = 0; i < N; i++)
        {
            // initialize arrays; a[N] can be any type
            a[i] = i + 1; // a[i] value is not revealed and can be arbitrary
            p[i] = 0; // p[i] == i controls iteration and index boundaries for i
        }
        yield return list;
        //display(a, 0, 0);   // remove comment to display array a[]
        i = 1; // setup first swap points to be 1 and 0 respectively (i & j)
        while (i < N)
        {
            if (p[i] < i)
            {
                j = i%2*p[i]; // IF i is odd then j = p[i] otherwise j = 0

                tmp2 = list[a[j]-1];
                list[a[j]-1] = list[a[i]-1];
                list[a[i]-1] = tmp2;

                tmp = a[j]; // swap(a[j], a[i])
                a[j] = a[i];
                a[i] = tmp;

                //MAIN!

               // for (int x = 0; x < N; x++)
                //{
                //    yieldRet[x] = list[a[x]-1];
                //}
                yield return list;
                //display(a, j, i); // remove comment to display target array a[]

                // MAIN!

                p[i]++; // increase index "weight" for i by one
                i = 1; // reset index i to 1 (assumed)
            }
            else
            {
                // otherwise p[i] == i
                p[i] = 0; // reset p[i] to zero
                i++; // set new index value for i (increase by one)
            } // if (p[i] < i)
        } // while(i < N)
    }
This will take cut the time by 80-90%
However, most of the CPU cycles go to string building, which is specifically needed for this example.
This will be a make it a bit faster:
             results.Add(productItem.employeesPartition[j].Aggregate((a, b) => a + "," + b) + " : " + permutationOfJobs.ToArray()[j].Aggregate((a, b) => a + "," + b));
Compiling in x64 will be better cause these strings are taking a lot of memory.
I liked the question, it wasn't trivial. I want to make the AllPartitions method more optimized in the future.