Tech Off Thread

8 posts

Forum Read Only

This forum has been made read only by the site admins. No new threads or comments can be added.

(STL) move_if algorithm, what's your take on it ?

Back to Forum: Tech Off
  • User profile image
    Jonas_No

    There's no move_if algorithm so i wrote one.

    What's your opinion of it (improvements, bugs, etc...) ?

    template <typename FwdIt, typename Container, typename Predicate>
    inline FwdIt move_if(FwdIt first, FwdIt last, Container &cont, Predicate pred)
    {
        if (first == last)
            return last; // Empty so nothing to move
        const size_t size = count_if(first, last, pred);
        if (size == 0)
            return last; // Nothing to move
        cont.resize(size);
        FwdIt new_end = first;
        auto c = cont.begin();
        for (auto i = first; i != last; ++i)
        {
            if (pred(*i)) // Should it move it ?
                *c++ = move(*i);
            else
                *new_end++ = move(*i);
        }
        return new_end;
    }

    How you use it:

    int main()
    {
        vector<int> v(10);
        generate(v.begin(), v.end(), [] () -> int {
            static size_t i = 1;
            return i++ * 11;
        });
    
        vector<int> n;
        auto new_end = move_if(v.begin(), v.end(), n, [](int i){return (i % 2)==0;});
        v.erase(new_end, v.end());
    
        auto print_vec_int = [](vector<int> v) {
            cout << "[" << v.size() << "]";
            for_each(v.cbegin(), v.cend(), [](const int &i) {
                cout << " " << i;
            });
            cout << endl;
    
        };
    
        print_vec_int(v);
        print_vec_int(n);
    }
    /* Output:
    [5] 11 33 55 77 99
    [5] 22 44 66 88 110
    */
     

  • User profile image
    STL

    You should use make_move_iterator() with copy_if() instead:

    C:\Temp>type meow.cpp
    #include <algorithm>
    #include <iostream>
    #include <iterator>
    #include <ostream>
    #include <string>
    #include <vector>
    using namespace std;
    
    int main() {
        vector<string> dst;
    
        {
            vector<string> src;
    
            src.push_back("The Motion Picture");
            src.push_back("The Wrath Of Khan");
            src.push_back("The Search For Spock");
            src.push_back("The Voyage Home");
            src.push_back("The Final Frontier");
            src.push_back("The Undiscovered Country");
            src.push_back("Generations");
            src.push_back("First Contact");
            src.push_back("Insurrection");
            src.push_back("Nemesis");
            src.push_back("Star Trek");
    
            copy_if(
                make_move_iterator(src.begin()),
                make_move_iterator(src.end()),
                back_inserter(dst),
                [](const string& s) {
                    return s != "The Final Frontier";
                }
            );
        }
    
        for (auto i = dst.begin(); i != dst.end(); ++i) {
            cout << *i << endl;
        }
    }
    
    C:\Temp>cl /EHsc /nologo /W4 /MTd meow.cpp
    meow.cpp
    
    C:\Temp>meow
    The Motion Picture
    The Wrath Of Khan
    The Search For Spock
    The Voyage Home
    The Undiscovered Country
    Generations
    First Contact
    Insurrection
    Nemesis
    Star Trek

  • User profile image
    STL

    I should explain what's wrong with your move_if():

    1. It requires forward iterators. copy_if() accepts input iterators.

    2. It writes into a container. copy_if() writes into an iterator, which is more general.

    3. It performs two passes, applying the predicate 2N times. copy_if() performs a single pass.

    4. It's not actually imitating copy_if(). It's imitating partition_copy(), which splits a source range into two output ranges (but not exactly, as partition_copy() prohibits input overlapping output).

  • User profile image
    Jonas_No

    Good points.

    1. I'm not sure what the difference is between forward iterators and input iterators.
    Isn't forward iterator the most generic iterator ? I'll have to refresh my c++ knowledge.

    2. Yes, it did feel wrong using a container. Using a container is less general too. Now when i think about it, i should have gone for the more general way.

    2 and 3: At the time i was thinking that the programmer would usually alloc up a new container with the needed size anyway so why not do it in the function. But yes it does limit its usability.

    4. What do you mean ? 
    move_if reuses the input iterator which results in more efficiency then partition_copy.
    I based move_if on how remove_if works.


    Here's version 2 - how move_if should have been in the first place

    #include <algorithm>
    #include <vector>
    #include <iostream>
    #include <iterator>
    
    using namespace std;
    
    template <typename InputIt, typename OutIt, typename Predicate>
    inline InputIt move_if(InputIt first, InputIt last, OutIt out, Predicate pred)
    {
        if (first == last)
            return last; // Empty so nothing to move
        InputIt new_end = first;
        for (auto i = first; i != last; ++i)
        {
            if (pred(*i)) // Should it move it ?
                *out++ = move(*i);
            else
                *new_end++ = move(*i);
        }
        return new_end;
    }
    
    int main()
    {
        vector<int> v(10);
        generate(v.begin(), v.end(), [] () -> int {
            static size_t i = 1;
            return i++ * 11;
        });
    
        auto is_prime = [](int i){return (i % 2)==0;};
        const size_t size = count_if(v.cbegin(), v.cend(), is_prime);
        vector<int> n(size);
        auto new_end = move_if(v.begin(), v.end(), n.begin(), is_prime);
        v.erase(new_end, v.end());
    
        auto print_vec_int = [](vector<int> v) {
            cout << "[" << v.size() << "]";
            for_each(v.cbegin(), v.cend(), [](const int &i) {
                cout << " " << i;
            });
            cout << endl;
    
        };
    
        print_vec_int(v);
        print_vec_int(n);
    }
    
    /* Output:
    [5] 11 33 55 77 99
    [5] 22 44 66 88 110
    */
    

     

    Your suggestion of using make_move_iterator() with copy_if() while the moving works it doesn't fix up the input container.

    #include <algorithm>
    #include <vector>
    #include <iostream>
    #include <iterator>
    #include <string>
    
    using namespace std;
    
    int main()
    {
        vector<int> v(10);
        generate(v.begin(), v.end(), [] () -> int {
            static size_t i = 1;
            return i++ * 11;
        });
    
        auto is_prime = [](int i){return (i % 2)==0;};
        const size_t size = count_if(v.cbegin(), v.cend(), is_prime);
        vector<int> n(size);
        copy_if(make_move_iterator(v.begin()), make_move_iterator(v.end()), n.begin(), is_prime);
    
        auto print_vec_int = [](vector<int> v) {
            cout << "[" << v.size() << "]";
            for_each(v.cbegin(), v.cend(), [](const int &i) {
                cout << " " << i;
            });
            cout << endl;
    
        };
    
        print_vec_int(v);
        print_vec_int(n);
    }
    
    /* Output:
    [10] 11 22 33 44 55 66 77 88 99 110
    [5] 22 44 66 88 110
    */
    
     

    Using your example code (slightly modified):

    #include <algorithm>
    #include <vector>
    #include <iostream>
    #include <iterator>
    #include <string>
    
    using namespace std;
    
    int main()
    {
        vector<string> dst;
    
        {
            vector<string> src;
    
            src.push_back("The Motion Picture");
            src.push_back("The Wrath Of Khan");
            src.push_back("The Search For Spock");
            src.push_back("The Voyage Home");
            src.push_back("The Final Frontier");
            src.push_back("The Undiscovered Country");
            src.push_back("Generations");
            src.push_back("First Contact");
            src.push_back("Insurrection");
            src.push_back("Nemesis");
            src.push_back("Star Trek");
    
            copy_if(
                make_move_iterator(src.begin()),
                make_move_iterator(src.end()),
                back_inserter(dst),
                [](const string& s) {
                    return s != "The Final Frontier";
            });
    
            for (auto i = src.begin(); i != src.end(); ++i) {
                cout << "\"" << *i << "\""<< endl;
            }
        }
    
        cout << endl;
    
        for (auto i = dst.begin(); i != dst.end(); ++i) {
            cout << "\"" << *i << "\""<< endl;
        }
    }
    
    /* Output:
    ""
    ""
    ""
    ""
    "The Final Frontier"
    ""
    ""
    ""
    ""
    ""
    ""
    
    "The Motion Picture"
    "The Wrath Of Khan"
    "The Search For Spock"
    "The Voyage Home"
    "The Undiscovered Country"
    "Generations"
    "First Contact"
    "Insurrection"
    "Nemesis"
    "Star Trek"
    
    */
    
     

  • User profile image
    STL

    > 1. I'm not sure what the difference is between forward iterators and input iterators.
    > Isn't forward iterator the most generic iterator ? I'll have to refresh my c++ knowledge.

    The read-only iterator hierarchy goes: input, forward, bidi, random. Input iterators are single-pass iterators, which means that this does NOT work for them:

    InIt i = ...; // get i from somewhere
    Val v1 = *i; // get a value from i
    InIt old = i; // copy i, this is OK so far but bogus code is imminent
    ++i; // increment i, still OK so far, but old is now UNUSABLE
    Val v2 = *old; // FORBIDDEN!!!

    The best example of an input iterator is istream_iterator. As you read values, they're consumed from the stream, so you can't go back and replay them.

    Forward iterators are multi-pass iterators, merely constrained to go forwards and never backwards. The best example of a forward iterator is forward_list::iterator (i.e. a singly-linked list).

    > I based move_if on how remove_if works.

    Ah, that wasn't clear to me. Your new implementation is much better; here are some notes:

    5. Your (first == last) check is unnecessary. When an algorithm is inherently correct for empty ranges, you should let it automatically work.

    6. You don't actually need new_end - you can use first instead.

    7. is_prime is really is_even!

    8. count_if() followed by move_if() performs 2N tests. Instead, move_if() to back_inserter() would perform N tests. (In that case, the destination n should start out empty.) If you're concerned about reallocation overhead (which usually isn't a big deal), you can say n.reserve(v.size());.

    9. print_vec_int should take const vector<int>&.

    10. But at the same time, there's no reason to take const int&.

    > Your suggestion of using make_move_iterator() with copy_if() while the moving works it doesn't fix up the input container.

    Yes, that was intentional. If I wasn't too concerned about consuming a little extra space, I would use partition_copy() with make_move_iterator() and back_inserter() to 2 output vectors, then swap one of the outputs with the input.

  • User profile image
    Jonas_No

    Sometimes i do trust the compiler a bit too much thinking its smart enough to
    understand what i'm doing when it can be really stupid and stubborn sometimes.

    Here's version 3:

    #include <algorithm>
    #include <vector>
    #include <iostream>
    #include <iterator>
    
    using namespace std;
    
    template <typename FwdIt, typename OutIt, typename Predicate>
    inline FwdIt move_if(FwdIt first, FwdIt last, OutIt out, Predicate pred)
    {
        first = find_if(first, last, pred);
        if (first == last)
            return first; // Nothing to move.
    
        for (auto i = first; i != last; ++i)
        {
            if (pred(*i)) // Should it move it ?
                *out++ = move(*i); // Move it.
            else
                *first++ = move(*i);
        }
        return first;
    }
    
    int main()
    {
        vector<int> v(10);
        generate(v.begin(), v.end(), [] () -> int {
            static size_t i = 1;
            return i++ * 11;
        });
    
        auto is_even = [](const int i){return (i % 2)==0;};
        
        //const size_t size = count_if(v.cbegin(), v.cend(), is_even);
        //vector<int> n(size);
        //auto new_end = move_if(v.begin(), v.end(), n.begin(), is_even);
    
        // This is slower then the code above when 'vector<int> v(10000);'
        // Naively timed test only using QueryPerformanceCounter
        // Is this what you meant ?
        vector<int> n;
        n.reserve(v.size());
        auto new_end = move_if(v.begin(), v.end(), back_inserter(n), is_even);
    
        v.erase(new_end, v.end());
    
        auto print_vec_int = [](const vector<int> &v) {
            cout << "[" << v.size() << "]";
            for_each(v.cbegin(), v.cend(), [](const int i) {
                cout << " " << i;
            });
            cout << endl;
        };
    
        print_vec_int(v);
        print_vec_int(n);
    }
    
    /* Output:
    [5] 11 33 55 77 99
    [5] 22 44 66 88 110
    */



    When ever i see 'back_inserter' i think of performance loss. Is that incorrect ?

    What do you think about that 'find_if' "optimization" ?

    5. I know its bad to write code when tired but coding just flows when tired. Oh the dilemma Perplexed

    6. Nice catch. Btw how big of an issue is that ? Cant the compiler optimize that away ?

    7. I'm not used to doing that kind of math but i understand what you mean.
    Naively i just run a few tests and see which runs the fastest.
    The only goal i have is to make it a performant and correct as possible.
    Learning something new in the process is always fun too.
    "reallocation overhead (which usually isn't a big deal)"
    That surprised me a bit. Isn't (&&) and std::move's main reason for existing just
    because of reducing reallocation overhead ?
    Now i'm a bit confused. When writing algorithms shouldn't you have reallocation
    overhead in mind or are there some simple guidelines you can follow ?

    8. See code, please.

    9. I wasn't really concentrating on optimizing the debug print code but i would have
    changed it eventually anyway.
    Performance issues bug me until i fix them.

    10. But it will not hurt performance right or is it bad in some way ?

    "Yes, that was intentional."
    Explain more, please. Explain your thinking.
    If the algorithm didnt clean up after itself then the programer would need to do extra work
    Good example here would be 'remove_if' algorithm
    Also it would leave the input container in a less performant state.
    That is using it as is would eat extra cpu cycles.
     
    How big of an issue is it really that move_if use forward iterators ?
     
    I'm beginning to understand the many decisions you must be making working on the STL
    for every single function. Your C++ kung fu must indeed be strong ;)
    How do you know when to stop: optimizing / correction cleaning and move on to the next thing ?

  • User profile image
    STL

    > Sometimes i do trust the compiler a bit too much thinking its smart enough to understand what i'm doing when it can be really stupid and stubborn sometimes.

    Compilers are good at micro-optimization, bad at macro-optimization. Humans are the reverse.

    > When ever i see 'back_inserter' i think of performance loss. Is that incorrect ?

    It's very convenient, and avoids length-related errors. However, for performance-critical code, capacity testing on every push_back(), and occasional reallocation, are significant.

    > What do you think about that 'find_if' "optimization" ?

    It's reasonable, but the test afterwards is still unnecessary.

    > 6. Nice catch. Btw how big of an issue is that ? Cant the compiler optimize that away ?

    Maybe, maybe not.

    > That surprised me a bit. Isn't (&&) and std::move's main reason for existing just because of reducing reallocation overhead ?

    They exist to eliminate unnecessary dynamic memory allocations and deallocations, which can be triggered by temporaries or vector reallocation, etc.

    > Now i'm a bit confused. When writing algorithms shouldn't you have reallocation overhead in mind or are there some simple guidelines you can follow  ?

    Yes, but you should also count your operations. 2N is a lot bigger than N.

    > 10. But it will not hurt performance right or is it bad in some way ?

    Shouldn't affect perf, it's just unnecessary.

    > Explain more, please.

    I was assuming that the whole source range would be discarded.

    > How big of an issue is it really that move_if use forward iterators ?

    Someone with input iterators who wants to use your algorithm will be sad. That someone may be you.

    > I'm beginning to understand the many decisions you must be making working on the STL for every single function.

    :->

    > How do you know when to stop: optimizing / correction cleaning and move on to the next thing ?

    I just try to fix bugs until my bug count reaches zero. I also fix todos on my todo list when I'm in the neighborhood.

  • User profile image
    Jonas_No

    Making move_if support input iterators will make it become "partition_move" or did you have something else in mind ?

    With find_if  in move_if it'll be N+1, is that correct ?
    Since one item is tested twice. 

    #include <algorithm>
    #include <vector>
    #include <iostream>
    #include <iterator>
    #include <sstream>
    
    using namespace std;
    
    template <typename FwdIt, typename OutIt, typename Predicate>
    inline FwdIt move_if(FwdIt first, FwdIt last, OutIt out, Predicate pred)
    {
        first = find_if(first, last, pred);
    
        for (auto i = first; i != last; ++i)
        {
            if (pred(*i)) // Should it move it ?
                *out++ = move(*i); // Move it.
            else
                *first++ = move(*i);
        }
        return first;
    }
    
    // move_if have become partition_move
    template <typename InIt, typename OutIt1, typename OutIt2, typename Predicate>
    inline pair<OutIt1, OutIt2> move_if(InIt first, InIt last, OutIt1 out_false, OutIt2 out_true, Predicate pred)
    {
        for (auto i = first; i != last; ++i)
        {
            if (pred(*i)) // Should it move it ?
                *out_true++ = move(*i); // Move it.
            else
                *out_false++ = move(*i);
        }
        return make_pair<OutIt1, OutIt2>(out_false, out_true);
    }
    
    int main()
    {
        auto is_even = [](const char c){return (c % 2) == 0;};
    
        std::istringstream iss("The quick brown fox jumps over the lazy dog");
        std::istream_iterator<char> begi(iss), endi;
    
        vector<char> n_f, n_t;
        move_if(begi, endi, back_inserter(n_f), back_inserter(n_t), is_even);
    
        auto print_vec_char = [](const vector<char> &v) {
            cout << "[" << v.size() << "]";
            for_each(v.cbegin(), v.cend(), [](const char c) {
                cout << " " << c;
            });
            cout << endl;
        };
        
        print_vec_char(n_f);
        print_vec_char(n_t);
    }
    
    /* Output:
    [19] e q u i c k o w o u m s o e e a y o g
    [16] T h b r n f x j p v r t h l z d
    */
    

     

    > I was assuming that the whole source range would be discarded.

     How would you do it if the source range would be used ?

     

    I a pedant so making my code as efficient as possible is a goal.

    Using move_if is more efficient because you need one less container and if you really needed to preserve the input container then partition_copy would do the trick.

Conversation locked

This conversation has been locked by the site admins. No new comments can be made.