Tuesday, November 12, 2013

MapReduce on .NET Steroids

Ok, so everyone's talking about MapReduce, so I'm gonna chime in, .NET Style!

Wikipedia defines "MapReduce" as a programming model for processing large data sets with a parallel, distributed algorithm on a cluster.

"The actual origins of MapReduce are arguable, but the paper that most cite as the one that started us down this journey is “MapReduce: Simplified Data Processing on Large Clusters” by Jeffrey Dean and Sanjay Ghemawat in 2004. This paper described how Google split, processed, and aggregated their data set of mind-boggling size." (O'reilly publishing)

If you're thinking about distributed parallel computing, think again, this blog post is only about the Functional Programming Map and Reduce model.
I'm going to show you how the basics are done using .net and Linq as an amazing functional programming framework, this is the basics we need in order to take the next step towards distributed computing easily.

So what is "Map-Reduce" ?

Well, it's basically a data processing model that's aimed at large datasets, where you need as much horsepower as you can get in order  to get results at a reasonable time frame.
If you can split your data source into X parts, that have no dependency among each other, OR if you have a list of data items (which is basically the same thing) you are good to go.

So go where? Let's think of a "simple" problem/example.
Let's say we have a set of newspaper articles and we want to find out the number of times certain key words show up, e.g. "Peace", "War", "Love", "Hate"...

First step, mapping step, give every intern (assuming you have intern to spend on this stupid project) one newspaper, and tell him: "You! take this newspaper and write down on a blank piece of paper every time a key word shows up!"

Once, your eager minions are done you need to take their mapped data and send it to the next step, groung step. You take one of your loved workers and tell him/her to take new sheets of blank paper, one sheet will be for "Peace", one for "War" and so on. The grouper needs to go through every word found and write them again but on the appropriately designated sheet.

After this we have sheets of "War", sheets of "Peace", of "Love" and of "Hate".
But we still don't know how many words of each type. (yes, if you're thinking that our grouper couldv'e already counted while grouping you are correct, but remember that this is a simple example :) ).

Next and final step, Reduce!
Find a bunch of workers that are bored and give each a sheet from the grouper's sheets.
Tell them to find the data we want from the grouped data, in this simple case, just count the words.

In the end they will give you the result:
"War" - X times
"Peace" - Y times
and so on for the rest of the key words.

What we have do here is essentially MapReduce in all of it's blazing parallel distributed glory!!
This is what Google DOES!
They distributed the data sheets to many workers (computers) let them work in parallel, grouped the results, and reduced back the answers with more distributed parallel computing! no need to bother fellow interns! :)

I ran across a cute example on the web for map reduce concerning how Facebook tells us what friends in common we have with our own friends.
http://stevekrenzel.com/finding-friends-with-mapreduce
(Facebook probably does something different, but it's a great example)

You'd think that this would be an easy task... you'd think that it's simple using SQL. However please remind yourself that Facebook does not use relational databases for it's 1 Billions users!!
So it has to be either key value or graph databases and probably combined together :) I would probably use SQL-MapReduce or something of that nature.

So, in order to fully understand the theory behind this paradigm shift in distributed computing, we need to fully grasp the Functional Programming theory underlying this methodology. (I cannot stress this enough).

Let's try to accomplish this word counting task in C#.
Remember, we need to Map -> Group -> Reduce.

Well first off we need some sort of Data Source. let's say a list of strings, where each string represents a sheet of words.

And so:

var sheets = new List();
sheets.AddRange(some sheets);

So, we're generally thinking of an enumerable data structure for now.

How would we go about defining our mapper?
A mapper needs to take a sheet of paper and write down each time a key word is encountered.

public static IList Map(String sheet)
{
       string pattern = "war|peace|love|hate";

       var matches = (from Match m in Regex.Matches(sheet, pattern) select m.Value).ToList();

       return matches;
}

As you can see it's pretty much hard wired to our specific task, not a good programming practice.
Let's deal with that in a moment.

Okay, now we have our data mapped, that's good but we need to group it somehow, and for any grouping action to occur one needs a Key.
In this simple example, it's pretty obvious that we need to group the mapped data by the words we're looking for.

How about:

var groups = sheets.SelectMany(Map).GroupBy(y => y);

That's pretty simple and adequate.
After we have our groups, we want to get the final results we've been waiting for, and that's the number of repetitions of every key word.

So our reducers can work independently, each can take a group for instance and solve our task.

public static KeyValuePair Reduce(IGrouping group)
{
     return new KeyValuePair(group.Key, group.Count());
}

That's it! All we have to do now is combine it all.

var mapped = sheets.SelectMany(Map);
var groups = mapped.GroupBy(y => y);
var results = groups.Select(Reduce);

Our results look something like:
[love:2]
[war: 4]
and so on...

To simplify things we could just write:

var results = sheets.SelectMany(Map).GroupBy(y => y, Reduce);

Which is exactly the same thing.
Pretty simple isn't it!
This is map reduce by functional programming. The map and reduce stages both can run in parallel.

So what's missing? ... oh yeah, it's really non generic, or is it...

Well if you look at that one liner just above, the only thing that's not generic is the "sheets" source, but that's easily transformed to any source.

To make things generic, let's put on the table the types we need.

TSource - for our datasource type
TMapper - for our mapping results
TKeySelector - for our grouping key
TResult - for the end result type we want

If we create some sort of generic Map-Reduce method, it will need all of these parameters to work.
And what are they?

TSource - is the type of an IEnumerable or some sort.
TMapper - is a part of a function, TSource->IEnumerable
TKeySelector - is part of our key selecting  function TMapper->TKeySelector
TResult - we be returned from a function TKeySelector , IEnumerable -> TResult

And so as written as an extension method, it would look something like:

public static IEnumerable<TResult> MapReduce<TSource, TMapper, TKeySelector, TResult>(
           this IEnumerable<TSource> source,
           Func<TSource, IEnumerable<TMapper>> mapper,
           Func<TMapper, TKeySelector> keySelector,
           Func<TKeySelector, IEnumerable<TMapper>, TResult> reducer)
        {
            return source.SelectMany(mapper).GroupBy(keySelector, reducer);
        }

wow, you really have to look at it for a second to realize that it's this simple using Linq.
Without it, think of the amount of code needed to loop and group...
This is the amazing strength of modern C#, simplicity.

Now just provide the needed delegates and this function will mapreduce and problem.

Just to get a taste of distribution of processing, just add PLinq.
You can add .AsParallel() to the mappers and reducers and voila.
So easy.

But wait...

Delegates are so constrictive.
What if our datasource was IQueryable... and let's say that it manages it's own distributed computing mechanism or selecting and grouping...
For this we need to modify our method by just a bit:

public static IQueryable<TResult> MapReduce<TSource, TMapper, TKeySelector, TResult>(
            this IQueryable<TSource> source,
            Expression<Func<TSource, IEnumerable<TMapper>>> mapper,
            Expression<Func<TMapper, TKeySelector>> keySelector,
            Expression<Func<TKeySelector, IEnumerable<TMapper>, TResult>> reducer)
        {
            return source.SelectMany(mapper).GroupBy(keySelector, reducer);
        }

Wait.. are you serious? that's it? just add expression around the func?
YES!

Now if you have a query provider that supports distributed computing, this method will do the job.

And there is: http://research.microsoft.com/en-us/projects/dryadlinq/

unfortunately it's a discontinued project, but interesting nonetheless.

http://hadoopsdk.codeplex.com/ is what microsoft is focusing on.


Monday, July 29, 2013

How to delete all contents inside a word document bookmark using C# and Open XML

I had the need to mark a certain section in a word document and then remove it by code if it is not needed in a certain scenario. 
My best solution was to use bookmarks, so I had the to remove all contents between two bookmarks (or actually inside one bookmark) inside that word document.

So I marked the section and inserted a bookmarkthat encapsulated it and named it appropriately.

This is the code that did the job:


public void DeleteBetweeen(string bookmarkName)
        {
            var body = m_document.MainDocumentPart.Document.Body;

            //Find the bookmark element, it's either under the root element, or inside a paragraph
            var bookmark = body.Descendants<BookmarkStart>().FirstOrDefault(x => x.Name == bookmarkName);

            //Could not find it.. it maybe inside a more complex element
            if (bookmark == null)
                throw new ArgumentOutOfRangeException("bookmarkName", "Bookmark not found in file");

            //If the bookmark is inside a paragraph we want to delete all siblings of that paragraph
            var start = bookmark.Parent is Paragraph ? bookmark.Parent : bookmark;
            var sibling = start.NextSibling();

            //Delete all elements until we reach our bookmark end tag
            while (!(sibling is BookmarkEnd))
            {
                var temp = sibling;
                sibling = sibling.NextSibling();
                temp.Remove();
            }
            
        }

Wednesday, March 27, 2013

How to implement a Restricted Boltzmann Machine in C#

If anyone wants to "feel" the difference between Matlab or Python and languages such as C#, I suggest that the first thing they do is try to program basic mathematical fundamentals, such as linear algebra.
C# is a very robust language, however, it does not help you in any way, if you want to simplify coding math oriented subjects.
You basically need to write everything from scratch.

Now, AI is usually built on neural networks, and neural networks usually process data in vectors or matrices.
And if we have matrices and vectors, we usually need outer products, dot products, triangulations, decompositions,etc...

Since Matlab is built for math, and python has the Numpy library, what would take a few lines of code using those languages would take hundreds if not thousands of lines of code in C#.
Now, yes, I am stating the "obvious", you could say that if I had an isomorphic mode code library in C#, it would be almost the same.
That is true, to a degree, even if we would have had all the same functions, the syntax itself is not suited for algebraic manipulations.
For example:
In python if we want to update the 2nd to 4th element in the first column of a matrix with a vector we need only write:
X[2:4,0] = V

Now no matter what methods you have in C#, CLR syntax does not allow such typing.
But that's the thing, isn't it. Let's say I want to test a theory right now, and if it's not good i want to move on.
I don't want to spend hours or days building or searching for helper classes just so i can "test" a simple theory.

This is where I take my hat off to Matlab and even Python, and I strongly recommend that although I think that .net (c#) is one of the best pieces of software engineering languages ever created, the simplicity and readability of Matlab / Python has a greater advantage in rapid development, specifically for research.

A Restricted Boltzmann machine is an interesting unsupervised machine learning algorithm. It is a generative stochastic neural network that can learn a probability distribution over its set of inputs.

Feature extraction really gets interesting when you stack the RBMs one on top of the other creating a Deep Belief Network.
I will not go into the theory of the Boltzmann machine, regular or restricted.

I recommend looking at the original papers by Geoffrey E. Hinton, Yoshua Bengio and more.
Also E. Chen's post on the subject and python implementation is very good and intuitive.

A word about Arrays in C#:
Standard multidimensional arrays in C# are similar in syntax to C++ and take the form of (e.g.) int[][].
They are called Jagged arrays.
In C# "Multidimensional" arrays are actually typed as: int[,] meaning they are in x,y Cartesian coordinate system.
These are much more intuitive, and require no further initialization of inner rows.
However, due to a poor CLR implementation they are much slower than jagged arrays.
Thus requiring even further code to be written in order to achieve simple and readable code for building our neural networks.

In my project I build a matrix class, and a vector class (which i don't really use) actually I think that the vector class should be inherited from a matrix class.
Our matrix class encapsulates a jagged array and has a multidimensional indexer for our ease of use.
This is the basis of creating clean and short code for our algorithm.

A Single RBM is inefficient in extracting small number of features from a large input source, so stacked RBMs (what are called deep learning networks) provide a more efficient way to spread the feature learning from layer to layer.

In the following example we trained a dual layer RBM, 1024 visible inputs to 50 hidden outputs and on to 16 hidden outputs in the 2nd layer.
We trained 100 handwritten digits, 50 epochs for the first layer, and 1500 epochs for the 2nd layer.
As you can see, the original digit is on the left, and the reconstructed (from 16 boolean features) image is on the right.
It is quite similar to the original but not perfect, cause naturally it learned other examples or writing the number "2".


00000000011111100000000000000000
00000000011111111000000000000000
00000000111111111100000000000000
00000001111111111100000000000000
00000001111111111110000000000000
00000000111111111111100000000000
00000001111111111111100000000000
00000001111111011111100000000000
00000001111110011111100000000000
00000011111110011111100000000000
00000000111111011111110000000000
00000000111110011111100000000000
00000000011110011111100000000000
00000000001100111111100000000000
00000000000000011111100000000000
00000000000000011111100000000000
00000000000000111111100000000000
00000000000000111111000000000000
00000000000000111110000000000000
00000000000000111111000000000000
00000000000001111110000000000000
00000000000001111110000000000000
00000000000011111100000000000000
00000000000011111110000000000000
00000000000011111100000000000000
00000000001111111111000000000000
00000000011111111111111111100000
00000000111111111111111111110000
00000000111111111111111111111000
00000000111111111111111111111100
00000000111111111111111111111100
00000000011111111111111111111000

00000000010001001000000000000000
00000000011011111000001100001000
00000000001111110100000000000000
00010000111111111011100000000000
00000011111111111101000100000000
00000011011101011111111001000000
00000011101110111111111000000000
00000001111000000111111000000000
00000001110010010111100000000000
00000110111000000011011000000000
00000000110101001111110000000000
00000001011000101111100000000000
00100001001010001111000000000000
00000001111000101111111000000000
00000000001000001110101000000000
00000000000010001011110001000000
00000000110010001110100001000000
00000000000100111111000000000000
00000010100000110111011001010000
00000000100001101111010000000000
00000000001000011110011001000000
00000011000101111100000011000000
00000000000000111000000000100000
00000000101101111100000101000000
00010000010111011010000100000000
00000000000111111001100100100000
00000100011111111010111000000010
00000000111111110111111111111100
00000000111111111111011111111100
00000000111111111111101101111000
00000000001101100111101101100000
00000001010110000000010001011000

So now the challange is to make our computations as efficient and fast as possible to make it more realistic for applications.

I've added the "Daydreaming" functionality.
It requires more study on my part, but it is interesting to see what features pop up when random data is inputted in to the machine.

Please feel free to view/download or fork my code on Github

Further work to be done:
This is "Simple" example.
Next phase is to add Greedy-Layer Wise Pre-Training, and supervised fine tuning.

Sources and further reading:

"Learning Deep Architectures for AI" By Joshua Bengio
"A Practical Guide to Training Restricted Boltzmann Machines" By Geoffrey Hinton
E.Chen's Python implementation 

Sunday, March 3, 2013

Nested Class Constructor Accessibility

I wanted to share this very nice, small, creative piece of code that I've stumbled upon while looking for different perspectives on the subject.

The problem at hand is, let us say, that you have a Class, a "Parent" or "Container" class, and you want this class to have the ability to instantiate other classes.
This is a very real and sometimes serious situation, where access to constructor is forbidden to everyone and everything except one controlling object.

One "usual" pattern is using a factory builder. however, usually you will be able to either access the constructor still, or accessing the factory method from almost everywhere.

There are a few good solutions out there involving interfaces. The only downside to them is that they require quite a bit of code to be written down, for such a simple functionality.

And to be honest, what are we asking for???
To have a class with a nested class that only the containing class can access!

http://stackoverflow.com/questions/1664793/how-to-restrict-access-to-nested-class-member-to-enclosing-class

Here you can see different implementations that deal with this problem.
Check out Ray Burns' answer: Answer


public class Journal
{
  private static Func<object, JournalEntry> _newJournalEntry;

  public class JournalEntry
  {
    static JournalEntry()
    {
      _newJournalEntry = (object) => new JournalEntry(object);
    }
    private JournalEntry(object value)
    {
      ...

Here we can see, that in the nested class's static constructor we initialize a delegate to our true nested class's constructor, this is a private static field in the containing class, so it can only be accesses within the containing class.

Thus, in this example, JournalEntry can only be instantiated by functionality that has to be implemented inside the Journal object.

It might seem trivial to some, and might seem complex to others, I must admit I never thought of attacking the subject in this fashion, and it is quite creative.
Kudos.

Thursday, February 7, 2013

Shazam's Alogorithm

I recently replaced my old blackberry to an android phone.
One of the apps that i downloaded was "Shazam".

If you are not familiar with it, download it now, it's pretty cool.
What it does is very simple, it recognizes songs. So, just play a song on the radio and click on Shazam.
It will practically instantly detect the song playing on that radio.

It works so well, I've tried to think how the heck it works.
I came up with all sort of algorithms they might have used, Neural networks, Clustering algorithms, etc...
And then I came across this blog post:

How-shazam-works

Genius.
I thought that they had a AI that clustered and classified similar patterns.
But using a fingerprint algorithm is actually much much simpler and better.
A fingerprint algorithm doesn't look for "similar" patterns, it looks for an EXACT pattern.Think of it as a hash-table, naturally it's O(1) when looking for a key.
So if you can build an enormous hash-table  that keys are the hash-code of the song's fingerprint, voila! you got a O(1) song recognition system.

Downside:
You cannot hum a song to it. Cause it can only recognize exact patterns.
You cannot find similar songs (songs that have a general similar beat or instruments playing), again cause it uses an exact search algorithm.


Still, it is pretty amazing when you use it.

Tuesday, December 25, 2012

Dynamic Time Warping - UCR Suite in C#

So, as mentioned in this comment, to achieve fast DTW calculations, one way is to use "lower bounds" in the DTW process, the UCR Suite is a great example of how this can be acheived.

I wanted to post this in my blog a long time ago, just didn't have the time to do everything for this.
the UCR Suite is written in C++ and since it's really easy moving around pointers than it is in C# safe code, I understand the choice of language.
However, with some thought the same code could be written in C# and be even faster and more efficient.

So, since i am primarily a .net coder, and wanted to only use managed code and no unsafe exceptions, I had to take the C++ code and port it to C# code.

When built in "Release" configurations it is quite faster than the C++ Suite.

Prof. Eamonn Keogh and company did a great job optimizing DTW and the applications are endless.

I've placed the C# Code as a Visual Studio 2010 Solution on github.
Get the code here!

This code is written without unsafe code.
I might publish another unsafe version for even better performance.

Friday, June 29, 2012

All partitions of two sets C#


I've answered an interesting question on StackOverflow.com
The question was, provided two unique sets, how to produce all posible combination of subsets. For example provided these sets:
List<string> employees = new List<string>() { "Adam", "Bob"};
List<string> jobs      = new List<string>() { "1", "2", "3"};

We can split the two sets to the maximum of two subsets, and here are all the combinations:
Adam       : 1
Bob        : 2 , 3
Adam       : 1 , 2
Bob        : 3
Adam       : 1 , 3
Bob        : 2
Adam       : 2
Bob        : 1 , 3
Adam       : 2 , 3
Bob        : 1
Adam       : 3
Bob        : 1 , 2
Adam, Bob  : 1, 2, 3

First of all before you write your code down, lets understand the underlying combinatorics of this question.
Basically, you require that for any partition of set A, you require the same number of parts in set B.
E.g. If you split set A to 3 groups you require that set B will be split to 3 groups as well, if you don't at least one element won't have a corresponding group in the other set.
It's easier to picture it with splitting set A to 1 group we must have one group made from set B as in your example (Adam, Bob : 1, 2, 3) .
So now, we know that set A has n elements and that set B has k elements.
So naturally we cannot request that any set be split more that Min(n,k).

Let's say we split both sets to 2 groups (partitions) each, now we have 1*2=2! unique pairs between the two sets.
Another example is 3 groups each would give us 1*2*3=3! unique pairs.

However, we're still not finished, after any set is split into several subsets (groups), we can still order the elements in many combinations.
So for m partitions of a set we need to find how many combinations of placing n elements in mpartitions.
This can be found by using the Stirling number of the second kind formula:

(Eq 1) Stirling Number of the Second Kind
This formula gives you The number of ways of partitioning a set of n elements into k nonempty sets.

So the total number of combinations of pairs for all partitions from 1 to min(n,k) is:

(Eq 2) All combinations
In short, it's the sum of all partition combinations of both sets, times all combinations of pairs.
So now we understand how to partition and pair our data we can write the code down:
Code:
So basically, if we look at our final equation (2) we understand the we need four parts of code to our solution.
1. A summation (or loop)
2. A way to get our Stirling sets or partitions from both sets
3. A way to get the Cartesian product of both Stirling sets.
4. A way to permutate through the items of a set. (n!)
On StackOverflow you can find many ways of how to permuatate items and how to find cartesian products, here is a an example (as extension method):
 public static IEnumerable<IEnumerable<T>> Permute<T>(this IEnumerable<T> list)
    {
        if (list.Count() == 1)
            return new List<IEnumerable<T>> { list };

        return list.Select((a, i1) => Permute(list.Where((b, i2) => i2 != i1)).Select(b => (new List<T> { a }).Union(b)))
                   .SelectMany(c => c);
    }
This was the easy part of the code.
The more difficult part (imho) is finding all possible n partitions of a set.
So to solve this, I first solved the greater problem of how to find all possible partitions of a set (not just of size n).
I came up with this recursive function:
public static List<List<List<T>>> AllPartitions<T>(this IEnumerable<T> set)
    {
        var retList = new List<List<List<T>>>();

        if (set == null || !set.Any())
        {
            retList.Add(new List<List<T>>());
            return retList;
        }
        else
        {
            for (int i = 0; i < Math.Pow(2, set.Count()) / 2; i++)
            {
                var j = i;

                var parts = new [] { new List<T>(), new List<T>() };


                foreach (var item in set)
                {
                    parts[j & 1].Add(item);
                    j >>= 1;
                }

                foreach (var b in AllPartitions(parts[1]))
                {
                    var x = new List<List<T>>();

                    x.Add(parts[0]);

                    if (b.Any())
                        x.AddRange(b);

                    retList.Add(x);
                }
            }
        }
        return retList;
    }
The return value of : List<List<List<T>>> just means a List of all possible partitions where a partition is a list of sets and a set is a list of elements.
We do not have to use the type List here, but it simplifies indexing.
So now let's put everything together:
Main Code
//Initialize our sets
        var employees = new [] { "Adam", "Bob" };
        var jobs = new[] {  "1", "2", "3" };

        //iterate to max number of partitions (Sum)
        for (int i = 1; i <= Math.Min(employees.Length, jobs.Length); i++)
        {
            Debug.WriteLine("Partition to " + i + " parts:");

            //Get all possible partitions of set "employees" (Stirling Set)
            var aparts = employees.AllPartitions().Where(y => y.Count == i);
            //Get all possible partitions of set "jobs" (Stirling Set)
            var bparts = jobs.AllPartitions().Where(y => y.Count == i);

            //Get cartesian product of all partitions 
            var partsProduct = from employeesPartition in aparts
                      from jobsPartition in bparts
                               select new {  employeesPartition,  jobsPartition };

            var idx = 0;
            //for every product of partitions
            foreach (var productItem in partsProduct)
            {
                //loop through the permutations of jobPartition (N!)
                foreach (var permutationOfJobs in productItem.jobsPartition.Permute())
                {
                    Debug.WriteLine("Combination: "+ ++idx);

                    for (int j = 0; j < i; j++)
                    {
                        Debug.WriteLine(productItem.employeesPartition[j].ToArrayString() + " : " + permutationOfJobs.ToArray()[j].ToArrayString());
                    }
                }
            }
        }
Output:
Partition to 1 parts:
Combination: 1
{ Adam , Bob } : { 1 , 2 , 3 }
Partition to 2 parts:
Combination: 1
{ Bob } : { 2 , 3 }
{ Adam } : { 1 }
Combination: 2
{ Bob } : { 1 }
{ Adam } : { 2 , 3 }
Combination: 3
{ Bob } : { 1 , 3 }
{ Adam } : { 2 }
Combination: 4
{ Bob } : { 2 }
{ Adam } : { 1 , 3 }
Combination: 5
{ Bob } : { 3 }
{ Adam } : { 1 , 2 }
Combination: 6
{ Bob } : { 1 , 2 }
{ Adam } : { 3 }
We can easily check our out put just by counting the results.
In this example we have a set of 2 elements, and a set of 3 elements, Equation 2 states that we need S(2,1)S(3,1)1!+S(2,2)S(3,2)2! = 1+6 = 7
which is exactly the number of combinations we got.
For reference here are examples of Stirling number of the second kind:
S(1,1) = 1
S(2,1) = 1
S(2,2) = 1
S(3,1) = 1
S(3,2) = 3
S(3,3) = 1
S(4,1) = 1
S(4,2) = 7
S(4,3) = 6
S(4,4) = 1
Here is an example of a ToString I use for display purposes:
public static String ToArrayString<T>(this IEnumerable<T> arr)
    {
        string str = "{ ";

        foreach (var item in arr)
        {
            str += item + " , ";
        }

        str = str.Trim().TrimEnd(',');
        str += "}";

        return str;
    }
Here is a fast permuation algorithm if you want faster results:
The main part of this algorithm is finding the Stirling sets, I've used an inneficient Permutation method, here is a faster one based on the QuickPerm algorithm:
public static IEnumerable<IEnumerable<T>> QuickPerm<T>(this IEnumerable<T> set)
    {
        int N = set.Count();
        int[] a = new int[N];
        int[] p = new int[N];

        var yieldRet = new T[N];

        var list = set.ToList();

        int i, j, tmp ;// Upper Index i; Lower Index j
        T tmp2;

        for (i = 0; i < N; i++)
        {
            // initialize arrays; a[N] can be any type
            a[i] = i + 1; // a[i] value is not revealed and can be arbitrary
            p[i] = 0; // p[i] == i controls iteration and index boundaries for i
        }
        yield return list;
        //display(a, 0, 0);   // remove comment to display array a[]
        i = 1; // setup first swap points to be 1 and 0 respectively (i & j)
        while (i < N)
        {
            if (p[i] < i)
            {
                j = i%2*p[i]; // IF i is odd then j = p[i] otherwise j = 0

                tmp2 = list[a[j]-1];
                list[a[j]-1] = list[a[i]-1];
                list[a[i]-1] = tmp2;

                tmp = a[j]; // swap(a[j], a[i])
                a[j] = a[i];
                a[i] = tmp;

                //MAIN!

               // for (int x = 0; x < N; x++)
                //{
                //    yieldRet[x] = list[a[x]-1];
                //}
                yield return list;
                //display(a, j, i); // remove comment to display target array a[]

                // MAIN!

                p[i]++; // increase index "weight" for i by one
                i = 1; // reset index i to 1 (assumed)
            }
            else
            {
                // otherwise p[i] == i
                p[i] = 0; // reset p[i] to zero
                i++; // set new index value for i (increase by one)
            } // if (p[i] < i)
        } // while(i < N)
    }
This will take cut the time by 80-90%
However, most of the CPU cycles go to string building, which is specifically needed for this example.
This will be a make it a bit faster:
             results.Add(productItem.employeesPartition[j].Aggregate((a, b) => a + "," + b) + " : " + permutationOfJobs.ToArray()[j].Aggregate((a, b) => a + "," + b));
Compiling in x64 will be better cause these strings are taking a lot of memory.
I liked the question, it wasn't trivial. I want to make the AllPartitions method more optimized in the future.