Entries:
Comments:
Posts:

Loading User Information from Channel 9

Something went wrong getting user information from Channel 9

Latest Achievement:

Loading User Information from MSDN

Something went wrong getting user information from MSDN

Visual Studio Achievements

Latest Achievement:

Loading Visual Studio Achievements

Something went wrong getting the Visual Studio Achievements

C9 Lectures: Stephan T. Lavavej - Standard Template Library (STL), 8 of n

Download

Right click “Save as…”

Welcome to another installment of C9 Lectures covering the powerful general C++ library, STL. Joining us once again is the great Stephan T. Lavavej, Microsoft's keeper of the STL cloth (this means he manages the partnership between the owners of STL (dinkumware) and Microsoft, including, of course, bug fixes and enhancements to the STL that ships as part of Visual C++). Simply, Stephan is a C++ library developer.

As is Stephan's nature, he elaborates on technical details in very substantive way. The Standard Template Library, or STL, is a C++ library of container classes, algorithms, and iterators. STL provides many fundamental algorithms and data structures. Furthermore, the STL is a general-purpose library: its components are heavily parameterized, such that almost every component in the STL is a template.

In part 8, Stephan guides us on a journey into the dense forest of Regular Expressions.

Enjoy! Learn!

Books mentioned by Stephen:

The C++ Standard Library: A Tutorial And Reference by Nicolai M. Josuttis
Effective STL by Scott Meyers

[STL Introduction lecture links]

Part 1 (sequence containers)

Part 2 (associative containers)

Part 3 (smart pointers)

Part 4 (Nurikabe solver) - see Wikipedia's article and Stephan's updated source code

Part 5 (Nurikabe solver, continued)

Part 6 (algorithms and functors)

Part 7 (algorithms and functors, continued)

Part 8 (regular expressions)

Part 9 (rvalue references)

Part 10 (type traits)

Tags:

Follow the Discussion

  • Another great video from STL Big Smile!

    While I wait for part 9, I'll dig a couple of places in my existing code base for regex replacements Wink (but, please, don't let me waiting for too long!)

    Thank you so much for the content!

  • CharlesCharles Welcome Change

    9 is coming next week.Stephan recorded 8 and 9 on the same day, so there is no excuse for 9 not posting next week. I'm to blame if it doesn't!
    C

  • Yeah!!

    GOOD!!

    Thank you, Charles.

     Great Lecture and video.

    This episode is interesting.

    Thanks alot, Stephan!! 

    thumb up !!

  • Mark R.Mark R.

    EXCELLENT! Love these. Please keep 'em coming.
    Thanks!!!

  • GregGreg

    I downloaded the high quality MP4, and the audio is a few seconds ahead of the video.  Anyone else seeing this?

  • If I match the regular expression "a+" against a sequence of "a" characters, how does the regex determine how many "a" characters to choose? Is there something like a "maximum munch rule" for regular expressions?

  • VaclavVaclav

    @NotFredSafe:If the regex is greedy (default), as many characters as possible are matched. It is explained in the PPT document on page 16.

  • ToddintrToddintr Toddintr

    Great lecture series, thank you.

    My question is really about lecture #1 -- I posted this question at the page for lecture # 1 originally: Why does the DOS screen pop up during your Visual Studio builds?  I use VS 2010 and my installation does not do this.  Thanks again -

    Todd

  • MarcMarc

    Awesome! Thank you for this video!

  • Michael HamiltonMichael Hamilton

    [quote]
    6 hours ago, Toddintr wrote

    Great lecture series, thank you.
    My question is really about lecture #1 -- I posted this question at the page for lecture # 1 originally: Why does the DOS screen pop up during your Visual Studio builds?  I use VS 2010 and my installation does not do this.  Thanks again -
    Todd

    [/quote]
    STL is putting a breakpoint on the last line and is running in debug mode in most videos.  This causes his programs to break before closing.

  • STLSTL

    [NotFredSafe]
    > If I match the regular expression "a+" against a sequence of "a" characters,
    > how does the regex determine how many "a" characters to choose? Is there
    > something like a "maximum munch rule" for regular expressions?

    [Vaclav]
    > @NotFredSafe:If the regex is greedy (default), as many characters as
    > possible are matched. It is explained in the PPT document on page 16.

    Slight correction: greediness is a property of individual quantifiers, not whole regexes. A regex can contain both greedy quantifiers and non-greedy quantifiers:

    C:\Temp>type meow.cpp
    #include <iostream>
    #include <ostream>
    #include <regex>
    #include <string>
    using namespace std;
    
    int main() {
        const string s("abc123456170117011701def654321172917291729ghi");
        smatch m;
        const regex r("abc([0-9]+)((?:1701)+)def([0-9]+?)((?:1729)+)ghi");
    
        if (regex_match(s, m, r)) {
            for (int i = 1; i <= 4; ++i) {
                cout << "m[" << i << "]: " << m[i] << endl;
            }
        } else {
            cout << "EPIC FAIL" << endl;
        }
    }
    
    C:\Temp>cl /EHsc /nologo /W4 meow.cpp
    meow.cpp
    
    C:\Temp>meow
    m[1]: 12345617011701
    m[2]: 1701
    m[3]: 654321
    m[4]: 172917291729

    Greedy quantifiers want to consume as many characters as possible, while still allowing the regex to match the string (regex_match() versus regex_search() determines whether the whole regex must match the whole string, or just a substring).

    Non-greedy quantifiers want to consume as few characters as possible, while still allowing the regex to match the string.

    In the example above, ([0-9]+) is greedy, so it'll consume 123456 and two of the 1701s. It can't eat the third, though, because ((?:1701)+) comes next and wants to eat at least one 1701. (This might be confusing because two greedy quantifiers, ([0-9]+) and ((?:1701)+), appear next to each other. You can think of the regex as being matched left to right, so ([0-9]+) comes first and gets to decide what it's going to do. It'll satisfy its own preferences, while being aware of whether the rest of the regex will match the rest of the string.)

    Because ([0-9]+?) is non-greedy, it wants to eat as few digits as possible. Ideally, it'd like to eat just the 6. However, that would leave 54321172917291729ghi for ((?:1729)+)ghi to match, which wouldn't work. Therefore, the ([0-9]+?) consumes 654321, and leaves all three 1729s for ((?:1729)+) to eat.

    [Toddintr]
    > Why does the DOS screen pop up during your Visual Studio builds?

    I'm building console programs, so when I compile-and-execute them, the Command Prompt appears. If I were building graphical programs, their GUIs would appear.

    I usually run the compiler from the Command Prompt, as in the example above (if you haven't seen this, Start > All Programs > Microsoft Visual Studio 2010 > Visual Studio Tools > Visual Studio Command Prompt (2010) will launch a Command Prompt where you can run cl.exe, the compiler). But I use the Visual Studio IDE in the videos because most people are familiar with that.

  • @Stephan: thanks for another brilliant lecture.

    @Charles: more than three weeks of post-production and no special effects? Not even a lens flare?

    Anyway, thank you guys. You've got to love Channel9.

  • It is explained in the PPT document on page 16.

    Wouldn't it make sense to link to the PPT above?

  • STLSTL

    I've uploaded my slides to my SkyDrive: http://cid-e66e02dc83efb165.office.live.com/view.aspx/docs/regex-1.0.pptx

  • So MeiSo Mei

    [quote]
    Nov 17, 2010 at 1:26 PM, Charles wrote
    9 is coming next week.Stephan recorded 8 and 9 on the same day, so there is no excuse for 9 not posting next week. I'm to blame if it doesn't![/quote]
    So it's friday, time is up and oh <sarcasm> surprise </sarcasm> Charles failedi saw that coming.Charles, predicable much ?yes Charles there is still time (time zones, sigh) but i couldnt resist being early with the blaming and hating <sarcasm> since i love microsoft employees so much  </sarcasm> 
    well we all have to have a hobby right ? Oh who am i kidding microsoft deserves it :3
     

  • So MeiSo Mei

    Stephan T. Lavavej is an exception of course to the above, love you!
     

  • CharlesCharles Welcome Change

    [quote] Nov 17, 2010 at 1:26 PM, Charles wrote 9 is coming next week.Stephan recorded 8 and 9 on the same day, so there is no excuse for 9 not posting next week. I'm to blame if it doesn't!

    So it's friday, time is up and oh <sarcasm> surprise </sarcasm> Charles failedi saw that coming.Charles, predicable much ?yes Charles there is still time (time zones, sigh) but i couldnt resist being early with the blaming and hating <sarcasm> since i love microsoft employees so much  </sarcasm>  well we all have to have a hobby right ? Oh who am i kidding microsoft deserves it :3  

    [/quote]

    Hoildays changed the plan. Next week! Smiley

    C

  • So MeiSo Mei

    @Charles:So you're not the sharpest one when it comes to planing then, shouldnt you be good at that in your job, anyway, oh well could happen to all of us, although shouldnt you have seen this holidays coming since it's every year at the same date ?

  • So Mei, would you like some cheese with that whine?

  • So MeiSo Mei

    @NotFredSafe: Sure, can i have some biscuits too ?     
    Note: I was just pointing out the obvious flaws in Charles excuse.What's next excuse ?"Sorry, santa claus came at Christmas so i couldn't ..."   "Sorry, the dog ate my to do list"  
    "Sorry, a butterfly died at the other side of the planet so i couldn't ..."Do i get more cheese now ? with biscuits ? and some tea too ? ooo yummy...

  • VikasVikas

    Hi Stephan,
    Another great lecture.  I was working with boost::regex and didn't know tr1 has it already. I found the interface similar to boost::regex, so i wanted to know if tr1 regex is adopted from boost and i should expect similar interface?
    Thanks,
    Vikas.

  • STLSTL

    Part 9: http://channel9.msdn.com/Shows/Going+Deep/C9-Lectures-Stephan-T-Lavavej-Standard-Template-Library-STL-9-of-n

    [Vikas]
    > i wanted to know if tr1 regex is adopted from boost

    Yep, TR1/C++0x <regex> was based on Boost.Regex.

    > and i should expect similar interface?

    Their interfaces are virtually identical. The biggest differences are in their regular expression grammars. std::regex uses ECMAScript 3's grammar, which was derived from Perl's. boost::regex supports those features, but has recently implemented even more advanced features from Perl (e.g. possessive quantifiers).

  • Jan WindischJan Windisch

    Hello,
    I am replacing the key_compare() with cmp  function in a set 


    std::set<
    long,mycmp<long> >s;
    mycmp<long >* p=&s.key_comp();
     It looks like this STL version creates a class copy, so the copied class members are hard to reach. p points to the original  class instance.
     I am not able to find info on how to:
    a/Prevent STL copiing the class, so it works with the original. 
    b/Locate description of some established method to comunicate with the copied class data members.
    Thanks Jan
     


  • STLSTL

    What exactly are you trying to do with the comparator?

  • Jan WindischJan Windisch

    @STL:Hello, thanks for quick response!
    and for excellent lessons on STL! Without the lessons, I'd awoid STL...
    To drill it down to essentials. I'm trying to do what I used to do with quicksort and its callback. I am aware this is -probably- not intended use of STL. I have a lot of strings I have to sort, or rather, I do not want to sort in a set. I'm putting the strings( counted in 1000+) in a vector, numberig them by arrival as en index in the vector, which I sort  in a set. Sacrificing some ram, things used to run much faster that way. In the set<long>'s key_compare, I'm comparing strings. I even use set.find to exclude doublets. That means I need 2 ways of comparing strings, one for set.find and another one for set.insert.
    In addition, I wish I could hide a lot of data members. Recently, I'm forced to use static/global data(Hm...) So forget about easy concurrency...
    Otherwise, the code works.
    // compile with: /EHsc#include <stdafx.h>#include <ppl.h>#include <iostream>#include<set>#include<vector>#include<string>#include<algorithm>#include<functional>using std::binary_function;
    // the class data members are temporarily referenced as globals because I experience key_comp() // class data member access problems.
    // there is one code, so DBG tracks key_comp(), 2 data members sets
    // in addition to constructor, operator () adresses the copy as well
     std::vector<std::string> v;
    // instead of sorting strings, put them in an unsorted  // vector and sort corresponding vector indexes in a set// Pros: On most architectures it will outperform string sort.// Cons: extra memory allocation of index values//template<class T> class cmp;// compiles OK without declarationstemplate<class T> class cmp{public:
    // forced to global residence#if 0  int st,mxi; std::string stl;#endif cmp(){}; //~cmp();// linker error bool operator () (const T& t1,const T& t2)const{   switch(st){ case 0:// sort vector arrival indexes in a set return v.at(t1)<v.at(t2); case 1:// find index of the string, mxi is max index value+1,not present in the set if (t1==mxi)return stl<v.at(t2);  return v.at(t1)<stl; default:  // the program is sick
      std::cout<<"program error: st,mxi,stl="<<st<<" "<<mxi<<" "<<stl<<std::endl;  terminate();  ;   }//sw  }//op};
    Another piece of code:
    const std::string str[10]={"1","9","8","6","5","1","9","3","2","1"};  int maxind=0;  std::cout<<std::endl<<"Input:"<<std::endl;   for(int i=0;i<10;i++){ std::cout<<str[i]<<"  ";  // find str st=1;mxi=maxind;stl=str[i];if(s.find(maxind)!=s.end())continue;// found v.push_back(str[i]);// not found st=0;  s.insert(maxind);maxind++;   }
       std::cout<<std::endl<<"Output:"<<std::endl;   for_each( s.begin(),s.end(),prn);   std::cout<<std::endl; //  How?? [](int n){std::cout<<std::endl<<v.at(n);}   s.clear();   v.clear();      //____________________   std::set<long,cmpo<long> >s1;   cmpo<long>* p1=&s1.key_comp();
    Thanks Jan

  • STLSTL

    Are you saying that your ultimate goal is to sort a sequence of strings? With VC10, just use sort() on vector<string>. Move semantics will automatically avoid unnecessary dynamic memory allocation/deallocation during the sort.

    Here's VC10 sorting 170,000 strings occupying 10 MB in 106 ms on my Q9450:

    C:\Temp>type meow.cpp
    // http://www.gutenberg.org/cache/epub/5050/pg5050.txt
    
    #include <stddef.h>
    #include <stdlib.h>
    #include <algorithm>
    #include <fstream>
    #include <iostream>
    #include <numeric>
    #include <ostream>
    #include <string>
    #include <vector>
    #include <windows.h>
    using namespace std;
    
    long long counter() {
        LARGE_INTEGER li;
        QueryPerformanceCounter(&li);
        return li.QuadPart;
    }
    
    long long frequency() {
        LARGE_INTEGER li;
        QueryPerformanceFrequency(&li);
        return li.QuadPart;
    }
    
    void print_time(const long long start, const long long finish, const char * const s) {
        cout << s << ": " << (finish - start) * 1000.0 / frequency() << " ms" << endl;
    }
    
    int main() {
        ifstream f("pg5050.txt");
    
        if (!f) {
            cout << "ERROR: pg5050.txt not found." << endl;
            return EXIT_FAILURE;
        }
    
        vector<string> v;
    
        for (string s; getline(f, s); ) {
            v.push_back(s);
        }
    
        cout << "v.size(): " << v.size() << endl;
    
        cout << "Sum of string lengths: " <<
            accumulate(v.cbegin(), v.cend(), size_t(0),
                [](size_t n, const string& s) { return n + s.size(); }
            ) << endl;
    
        const long long start = counter();
    
        sort(v.begin(), v.end());
    
        const long long finish = counter();
    
        print_time(start, finish, "sort()");
    }
    
    C:\Temp>dir | findstr pg5050
    02/14/2010  08:27 PM        10,056,715 pg5050.txt
    
    C:\Temp>cl /EHsc /nologo /W4 /MT /O2 /GL meow.cpp
    meow.cpp
    Generating code
    Finished generating code
    
    C:\Temp>meow
    v.size(): 171072
    Sum of string lengths: 9714572
    sort(): 106.201 ms

    Attempting to sort the strings indirectly is even slower:

        const long long start = counter();
    
        vector<const string *> p(v.size());
    
        for (size_t i = 0; i < v.size(); ++i) {
            p[i] = &v[i];
        }
    
        sort(p.begin(), p.end(), [](const string * l, const string * r) { return *l < *r; });
    
        const long long finish = counter();
    
        print_time(start, finish, "Indirect sort");
    }
    
    C:\Temp>cl /EHsc /nologo /W4 /MT /O2 /GL meow.cpp
    meow.cpp
    Generating code
    Finished generating code
    
    C:\Temp>meow
    v.size(): 171072
    Sum of string lengths: 9714572
    Indirect sort: 134.223 ms

    Using set is even worse (#include <set> above):

        const long long start = counter();
    
        struct comparator {
            bool operator()(const string * l, const string * r) const {
                return *l < *r;
            }
        };
    
        set<const string *, comparator> s;
    
        for (auto i = v.cbegin(); i != v.cend(); ++i) {
            s.insert(&*i);
        }
    
        const long long finish = counter();
    
        print_time(start, finish, "Indirect sorting through set");
    }
    
    C:\Temp>cl /EHsc /nologo /W4 /MT /O2 /GL meow.cpp
    meow.cpp
    Generating code
    Finished generating code
    
    C:\Temp>meow
    v.size(): 171072
    Sum of string lengths: 9714572
    Indirect sorting through set: 185.68 ms

  • Jan WidischJan Widisch

    Hello,
    Please, You have not answered my question: how do I access data members of my set's class key_compare() in STL ?
    I will not only sort, I will remove about 80% of redundant info, using set.find.
    Using a vector, I am out of memory on the destination computer, which will not be upgraded.
    Using a set, things will work.
    Thanks for taking Your time for the comparations! 
    Sorting strings of a vector can make life easier later on in an application, so its good thing to optimize it.
     The comparations:
    Q9450, what an impressing speed! With that power, I would test every case building the containers from scratch for every case, thus re-reading the string file or the like.
    a/You do not add the time to build up the first vector, and the build up of the other containers may take longer than You think. It may be missleading. 
    b/You sort it, and then feed it(sorted) to the pointer and set versions. The sort times can not be compared, its different data. Some sort versions do not like sorted  data.
    c/ For the vector, You prealocate the final space, but not for the set.
    But anyway, the pointer and set versions can be somehow compared. I expect the set perfom worse, it has to rebuild the tree for every new item. But in my app, from the pointer vector version, I would have to compare 100% and delete about 80% of the items, which is already done in the set version. So I expect both methods take  about the same time  for my app.
    Thanks Jan
     

  • STLSTL

    > You have not answered my question: how do I access data members of my set's class key_compare() in STL ?

    That's because I'm trying to understand and solve the overall problem. From what I've seen so far, your current approach is not ideal.

    > I will not only sort, I will remove about 80% of redundant info, using set.find.

    Ah, you're saying that the data isn't in memory to begin with. In that case, why not use set<string>?

    Additionally, set::insert automatically avoids inserting duplicate elements. Calling set::find before set::insert performs *two* lookups instead of one!

    > Q9450, what an impressing speed!

    It was fast in 2008.  I'd call it average for a dev machine now.

    > With that power, I would test every case building the containers from scratch for every case, thus re-reading the string file or the like.

    Oh, I presented my code in what appears to be a slightly misleading manner in an attempt to save space. The snippets beginning with "const long long start = counter();" are intended to entirely replace the bottom of the original program beginning with that line, forming three independent programs. I believed that this was clear but if it wasn't, I apologize.

    > a/You do not add the time to build up the first vector

    That was intentional. My understanding at the time was that the data was already in memory.

    > b/You sort it, and then feed it(sorted) to the pointer and set versions. The sort times can not be compared, its different data. Some sort versions do not like sorted  data.

    This was not the case - see above.

    > c/ For the vector, You prealocate the final space, but not for the set.

    Again, I assumed that the data was already in memory.

    (This is why *extremely clear* problem descriptions are vital.)

    set<string> should outperform set<const string *, comparator> above due to reduced indirection (but I expect the difference to be slight).

    However, there is a performance bug in VC10 RTM that you should be aware of. When inserting lots of duplicate data into a set (which is your scenario), VC10 RTM can perform unnecessary allocations and deallocations which significantly harm performance. (The code allocates a node, tries to insert it in the set, notices that it's a duplicate, and then deallocates the node.) When you insert *const lvalues*, the unnecessary work is avoided. So, T t; s.insert(t); triggers the slow codepath, but s.insert(static_cast<const T&>(t)); triggers the fast codepath.

    This problem has been thoroughly fixed in VC11.

     

  • Jan WindischJan Windisch

    Hello,
    I understand You are wondering why I use two containers. It will be compared  to another software, which is designed that way. This is the only way to sort strings in that design. Its fast, but wery limited and very, very inconvenient to work with. I think it will be abandoned, regardless of STL performance. Later, I will change STL approach to more simple, one container  design, for this gentleman who wants it this way.  I already did in other simple projects for myself and others, in fact. I have not experienced any STL introduced troubles at all. But my STL experience and knowlege is still very limited.
    It is a strange "double container" design that caused me strange troubles.
    So thanks on performance hints, if that is so easy to do so why not do it.
    I do not expect top performance from STL in this early release, there are so many other countless reasons to use it. Dot.
    In a post marked now 4 days ago I posted a small test of "double container software" to show it works with globals. You see if I find an item already in the set, it will not insert it.
     If I could not design set.find in few minutes, I would not. I heard other peple discovered the node behavior by insert, but it cannot hart performace significantly I believe.
    Now, I can see while debugging, there is a copy of my key_comp class data somewhere.
    I understand I have to implement a copy constructor and get a pointer to and work with the copy. But I do not know why the  copy is implemented, so I wonder if there is something more I have to do.
    Thanks Jan
     

  • SebastianSebastian

    Hi Stephan,
    Awesome video series. Looking forward to the next parts.
    One question: Qt's regex engine allows to determine whether a given string is a prefix of something that could match a regex, e.g. for the regext a+b+, it would recognize that "aaaaa" is a valid prefix. This is very useful for validating user input as it comes in. Is there a way to make the std regex engine do this?
    On a side note, if you use Start Without Debugging (Ctrl+F5) to start programs, VS will insert keep the console open, so you don't need a dummy line with a breakpoint.

  • STLSTL

    > Is there a way to make the std regex engine do this?

    std::regex can't, but boost::regex can. See http://www.boost.org/doc/libs/1_45_0/libs/regex/doc/html/boost_regex/partial_matches.html

  • Jan WindischJan Windisch

    Hello,
    class set with replaced functor comparator is a frenetic copier of key_comp object.
    It creates 3 copies to begin with, later it copies copy #2 to copy #3 at every acess
    of its functions. Of course its possible to callback from a class copy creator, but a function using global control variables is maybe better.
     Questions:
    Is there any rational explanation of the software's behavior?
    It is claimed functor object adds a "state". What does that mean in these circumstances?
     
    Thanks Jan

  • ChanChan

    Thank you very much Stephan for such a nice series of STL ;)! I love your talk ^_^

  • How does the regex implementation compare to RE2?

  • PhilhippusPhilhippus

    If I understood correctly Stephan said DO NOT loop incrementally through smatch m[i] results. Would the following code work in context?:
    static int i;
    if(regex_match(s,m,r)) 
    {
    while(i <= m.size())
    return foo( m[++i], pf);
    }

  • evanevan

    really talking with this Guy on STL is twice as valuable as reading four books on stl . awesome and looking for a presentation like this on MFC. the presentation is not simply about programming but it include scientific computation which helps understand what is under the hood.

Remove this comment

Remove this thread

close

Comments Closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums,
or Contact Us and let us know.