6 posts

## Want to give me a function for X-MAS? :-)

Back to Forum: Tech Off
• Hi, and Merry Christmas to everyone!

Well, what's more fun to do on xmas than some coding in VS2005

Now, I was wondering what would be the most efficient way to do the following:

I have a couple of arrays (let's say, 2 to 5 arrays typical), each containing a (potentially large) set of numbers.

Now I need a resulting array (or list, or whatever) with only the numbers that are present in each of those original arrays.

Really simple indeed, but can anybody advice what would be the most performant way to do this in .NET 2.0?
It needs to be very fast, because the function needs to work real-time and the arrays can contain thousands of numbers.

Thanks!!!

• Arrange the lists in order, shortest to longest.
Compare the items in the shortest list to those in the next longest.
If a number is found in the list, look for it in the next longest list.
Use "AndAlso" to ensure shortcutting of boolean logic.

SimonJ

• Have you considered using a Dictionary instead? I believe they are hash-based in implementation, so look-ups should be very efficient; the basis would be to keep the shortest array, but put all the bigger arrays into a seperate Dictionary per array.

Then iterate through the smallest array, query each Dictionary to see if the current element is present.

• A lot depends on the kind of data you are going to manipulate... replicates and sorting comes to mind.
As far as sorting is concerned, I would say that if the data are not already sorted it would be a safe bet to sort them anyway. A linear scan will be O(N^2), while the sorted + binary search version will be O(N*log(N))... well, ok, the Quicksort has a wort case O(N^2), so in that case you would lose something, but that rather rare.

For the algorithm, here is a crude attempt that seems to be reasonably efficient, at least with random values.

public static int [] Match (int [] [] arrays) {
for (int i = 0; i < arrays.Length; i++) {
Array.Sort (arrays [i]);
}
ArrayList res = new ArrayList ();
int current = 0;
int start = 0;
int index = 0;
while (index < arrays [current].Length) {
int currentValue = arrays [current] [index];
for (;;) {
current = (current + 1) % arrays.Length;
if (current == start) {
index++;
break;

} else {
int foundIndex = Array.BinarySearch (arrays [current], currentValue);
if (foundIndex < 0) {
index = ~foundIndex;
start = current;

break;
}
}
}
}

return (int []) res.ToArray (typeof (int));
}
Happy new year (too late for Christmas).
--m

• Assuming you need to get the arrays sorted and merged you can best use a heapsort/merge combo. I dont have a sample ready here, but I could look it up for you on my code CD's once my office at home is ready (I'm getting new paint and a new floor )

If I remember it correctly heapsort is O(N log N) and merging should be O(N). I'm not sure on the merging, but the heapsort is really fast and cheap in memory usage if you implement it right . And it doesnt have a worstcase scenario of O(N^2)

• If I were playing about, I'd be trying to do this with bit ops where the bit set at position X means that X is present in that array.  Then you could test with & - unfortunately C# doesn't support Integers with that many bits

It would of course mean that you'd have to have as many bits as the largest number in the array, and it would be saving space rather than increasing performance.. but that's what playing with code is for I guess.