IT Anawer: What's the most efficient way of filtering an array based on the contents of another array?

Say I have two arrays, items and removeItems and I wanted any values found in removeItems to be removed from items.

The brute force mechanism would probably be:

var animals = ["cow","dog","frog","cat","whale","salmon","zebra","tuna"];
var nonMammals = ["salmon","frog","tuna","spider"];
var mammals = [];
var isMammal;

for(var i=0;i<animals.length;i++){
   isMammal = true;
   for(var j=0;j<nonMammals;j++){
     if(nonMammals[j] === animals[i]){
       isMammal = false;
       break;
     }
   }
   if(isMammal){
     mammals.push(animals[i]);
   }
}

This is what? O(N^2)? Is there a more efficient way?

From stackoverflow

That's actually O(M * N).

Probably you could do better sorting the animals array first, then doing a binary search. You'll be able to reduce to O(N * log N) - well, that's if log N < M anyway.

Anyway, if you're working with JS and that runs client-side, then just try to keep amount of data at minimum, or their browsers will yell at you with every request.

Stuart Branham : I was just writing something about sorting the lists first as well. You end up with n*lg(n) + m*lg(n) to sort the main list and then search main list m times, which is effectively m*lg(n) - much better than n^2
With jQuery it's pretty easy:
```
function is_mammal(animal)
{
    return $.inArray(animal, nonMammals) == -1;
}

mammals = $.grep(animals, is_mammal);
```
See docs for $.grep and $.inArray.

Seb : That's easy, but doesn't make it any better than O(N^2). Just because you "hide" the loops doesn't mean $.inArray() and $.grep() don't have them.
Basically what you want to do is efficiently compute the set difference S \ T. The (asymptotically) fastest way I know is to put T into a hashmap (that makes |T| steps) and go over each s in S (that makes |S| steps) checking wether s in T (which is O(1)). So you get to O(|T| + |S|) steps.

Seb : While this _is_ faster than my answer, it requires more memory for the hash table. There're pros and cons everywhere, right? :) Anyway, I believe this is the best solution, so there goes my +1.

bayer : Yes, that's why I wrote the asymptotically there. ;) Personally I'd guess that for small inputs even the squared solution suffices.

IT Anawer

Tuesday, May 3, 2011

What's the most efficient way of filtering an array based on the contents of another array?

0 comments:

Post a Comment

Blog Archive