Skip Navigation

C9 Lectures: Stephan T. Lavavej - Standard Template Library (STL), 8 of n

Download this episode

Download Video

Description

Welcome to another installment of C9 Lectures covering the powerful general C++ library, STL. Joining us once again is the great Stephan T. Lavavej, Microsoft's keeper of the STL cloth (this means he manages the partnership between the owners of STL (dinkumware) and Microsoft, including, of course, bug fixes and enhancements to the STL that ships as part of Visual C++). Simply, Stephan is a C++ library developer.

As is Stephan's nature, he elaborates on technical details in very substantive way. The Standard Template Library, or STL, is a C++ library of container classes, algorithms, and iterators. STL provides many fundamental algorithms and data structures. Furthermore, the STL is a general-purpose library: its components are heavily parameterized, such that almost every component in the STL is a template.

In part 8, Stephan guides us on a journey into the dense forest of Regular Expressions.

Enjoy! Learn!

Books mentioned by Stephen:

The C++ Standard Library: A Tutorial And Reference by Nicolai M. Josuttis
Effective STL by Scott Meyers

[STL Introduction lecture links]

Part 1 (sequence containers)

Part 2 (associative containers)

Part 3 (smart pointers)

Part 4 (Nurikabe solver) - see Wikipedia's article and Stephan's updated source code

Part 5 (Nurikabe solver, continued)

Part 6 (algorithms and functors)

Part 7 (algorithms and functors, continued)

Part 8 (regular expressions)

Part 9 (rvalue references)

Part 10 (type traits)

Embed

Format

Available formats for this video:

Actual format may change based on video formats available and browser capability.

    The Discussion

    • User profile image
      JeanGa

      Another great video from STL Big Smile!

      While I wait for part 9, I'll dig a couple of places in my existing code base for regex replacements Wink (but, please, don't let me waiting for too long!)

      Thank you so much for the content!

    • User profile image
      Charles

      9 is coming next week.Stephan recorded 8 and 9 on the same day, so there is no excuse for 9 not posting next week. I'm to blame if it doesn't!
      C

    • User profile image
      Spetum

      Yeah!!

      GOOD!!

      Thank you, Charles.

       Great Lecture and video.

      This episode is interesting.

      Thanks alot, Stephan!! 

      thumb up !!

    • User profile image
      Mark R.

      EXCELLENT! Love these. Please keep 'em coming.
      Thanks!!!

    • User profile image
      Greg

      I downloaded the high quality MP4, and the audio is a few seconds ahead of the video.  Anyone else seeing this?

    • User profile image
      NotFredSafe

      If I match the regular expression "a+" against a sequence of "a" characters, how does the regex determine how many "a" characters to choose? Is there something like a "maximum munch rule" for regular expressions?

    • User profile image
      Vaclav

      @NotFredSafe:If the regex is greedy (default), as many characters as possible are matched. It is explained in the PPT document on page 16.

    • User profile image
      Toddintr

      Great lecture series, thank you.

      My question is really about lecture #1 -- I posted this question at the page for lecture # 1 originally: Why does the DOS screen pop up during your Visual Studio builds?  I use VS 2010 and my installation does not do this.  Thanks again -

      Todd

    • User profile image
      Marc

      Awesome! Thank you for this video!

    • User profile image
      Michael Hamilton

      [quote]
      6 hours ago, Toddintr wrote

      Great lecture series, thank you.
      My question is really about lecture #1 -- I posted this question at the page for lecture # 1 originally: Why does the DOS screen pop up during your Visual Studio builds?  I use VS 2010 and my installation does not do this.  Thanks again -
      Todd

      [/quote]
      STL is putting a breakpoint on the last line and is running in debug mode in most videos.  This causes his programs to break before closing.

    • User profile image
      STL

      [NotFredSafe]
      > If I match the regular expression "a+" against a sequence of "a" characters,
      > how does the regex determine how many "a" characters to choose? Is there
      > something like a "maximum munch rule" for regular expressions?

      [Vaclav]
      > @NotFredSafe:If the regex is greedy (default), as many characters as
      > possible are matched. It is explained in the PPT document on page 16.

      Slight correction: greediness is a property of individual quantifiers, not whole regexes. A regex can contain both greedy quantifiers and non-greedy quantifiers:

      C:\Temp>type meow.cpp
      #include <iostream>
      #include <ostream>
      #include <regex>
      #include <string>
      using namespace std;
      
      int main() {
          const string s("abc123456170117011701def654321172917291729ghi");
          smatch m;
          const regex r("abc([0-9]+)((?:1701)+)def([0-9]+?)((?:1729)+)ghi");
      
          if (regex_match(s, m, r)) {
              for (int i = 1; i <= 4; ++i) {
                  cout << "m[" << i << "]: " << m[i] << endl;
              }
          } else {
              cout << "EPIC FAIL" << endl;
          }
      }
      
      C:\Temp>cl /EHsc /nologo /W4 meow.cpp
      meow.cpp
      
      C:\Temp>meow
      m[1]: 12345617011701
      m[2]: 1701
      m[3]: 654321
      m[4]: 172917291729

      Greedy quantifiers want to consume as many characters as possible, while still allowing the regex to match the string (regex_match() versus regex_search() determines whether the whole regex must match the whole string, or just a substring).

      Non-greedy quantifiers want to consume as few characters as possible, while still allowing the regex to match the string.

      In the example above, ([0-9]+) is greedy, so it'll consume 123456 and two of the 1701s. It can't eat the third, though, because ((?:1701)+) comes next and wants to eat at least one 1701. (This might be confusing because two greedy quantifiers, ([0-9]+) and ((?:1701)+), appear next to each other. You can think of the regex as being matched left to right, so ([0-9]+) comes first and gets to decide what it's going to do. It'll satisfy its own preferences, while being aware of whether the rest of the regex will match the rest of the string.)

      Because ([0-9]+?) is non-greedy, it wants to eat as few digits as possible. Ideally, it'd like to eat just the 6. However, that would leave 54321172917291729ghi for ((?:1729)+)ghi to match, which wouldn't work. Therefore, the ([0-9]+?) consumes 654321, and leaves all three 1729s for ((?:1729)+) to eat.

      [Toddintr]
      > Why does the DOS screen pop up during your Visual Studio builds?

      I'm building console programs, so when I compile-and-execute them, the Command Prompt appears. If I were building graphical programs, their GUIs would appear.

      I usually run the compiler from the Command Prompt, as in the example above (if you haven't seen this, Start > All Programs > Microsoft Visual Studio 2010 > Visual Studio Tools > Visual Studio Command Prompt (2010) will launch a Command Prompt where you can run cl.exe, the compiler). But I use the Visual Studio IDE in the videos because most people are familiar with that.

    • User profile image
      Garp

      @Stephan: thanks for another brilliant lecture.

      @Charles: more than three weeks of post-production and no special effects? Not even a lens flare?

      Anyway, thank you guys. You've got to love Channel9.

    • User profile image
      NotFredSafe

      It is explained in the PPT document on page 16.

      Wouldn't it make sense to link to the PPT above?

    • User profile image
      STL
    • User profile image
      So Mei

      [quote]
      Nov 17, 2010 at 1:26 PM, Charles wrote
      9 is coming next week.Stephan recorded 8 and 9 on the same day, so there is no excuse for 9 not posting next week. I'm to blame if it doesn't![/quote]
      So it's friday, time is up and oh <sarcasm> surprise </sarcasm> Charles failedi saw that coming.Charles, predicable much ?yes Charles there is still time (time zones, sigh) but i couldnt resist being early with the blaming and hating <sarcasm> since i love microsoft employees so much  </sarcasm> 
      well we all have to have a hobby right ? Oh who am i kidding microsoft deserves it :3
       

    • User profile image
      So Mei

      Stephan T. Lavavej is an exception of course to the above, love you!
       

    • User profile image
      Charles

      [quote] Nov 17, 2010 at 1:26 PM, Charles wrote 9 is coming next week.Stephan recorded 8 and 9 on the same day, so there is no excuse for 9 not posting next week. I'm to blame if it doesn't!

      So it's friday, time is up and oh <sarcasm> surprise </sarcasm> Charles failedi saw that coming.Charles, predicable much ?yes Charles there is still time (time zones, sigh) but i couldnt resist being early with the blaming and hating <sarcasm> since i love microsoft employees so much  </sarcasm>  well we all have to have a hobby right ? Oh who am i kidding microsoft deserves it :3  

      [/quote]

      Hoildays changed the plan. Next week! Smiley

      C

    • User profile image
      So Mei

      @Charles:So you're not the sharpest one when it comes to planing then, shouldnt you be good at that in your job, anyway, oh well could happen to all of us, although shouldnt you have seen this holidays coming since it's every year at the same date ?

    • User profile image
      NotFredSafe

      So Mei, would you like some cheese with that whine?

    • User profile image
      So Mei

      @NotFredSafe: Sure, can i have some biscuits too ?     
      Note: I was just pointing out the obvious flaws in Charles excuse.What's next excuse ?"Sorry, santa claus came at Christmas so i couldn't ..."   "Sorry, the dog ate my to do list"  
      "Sorry, a butterfly died at the other side of the planet so i couldn't ..."Do i get more cheese now ? with biscuits ? and some tea too ? ooo yummy...

    • User profile image
      Vikas

      Hi Stephan,
      Another great lecture.  I was working with boost::regex and didn't know tr1 has it already. I found the interface similar to boost::regex, so i wanted to know if tr1 regex is adopted from boost and i should expect similar interface?
      Thanks,
      Vikas.

    • User profile image
      STL

      Part 9: http://channel9.msdn.com/Shows/Going+Deep/C9-Lectures-Stephan-T-Lavavej-Standard-Template-Library-STL-9-of-n

      [Vikas]
      > i wanted to know if tr1 regex is adopted from boost

      Yep, TR1/C++0x <regex> was based on Boost.Regex.

      > and i should expect similar interface?

      Their interfaces are virtually identical. The biggest differences are in their regular expression grammars. std::regex uses ECMAScript 3's grammar, which was derived from Perl's. boost::regex supports those features, but has recently implemented even more advanced features from Perl (e.g. possessive quantifiers).

    • User profile image
      Jan Windisch

      Hello,
      I am replacing the key_compare() with cmp  function in a set 


      std::set<
      long,mycmp<long> >s;
      mycmp<long >* p=&s.key_comp();
       It looks like this STL version creates a class copy, so the copied class members are hard to reach. p points to the original  class instance.
       I am not able to find info on how to:
      a/Prevent STL copiing the class, so it works with the original. 
      b/Locate description of some established method to comunicate with the copied class data members.
      Thanks Jan
       


    • User profile image
      STL

      What exactly are you trying to do with the comparator?

    • User profile image
      Jan Windisch

      @STL:Hello, thanks for quick response!
      and for excellent lessons on STL! Without the lessons, I'd awoid STL...
      To drill it down to essentials. I'm trying to do what I used to do with quicksort and its callback. I am aware this is -probably- not intended use of STL. I have a lot of strings I have to sort, or rather, I do not want to sort in a set. I'm putting the strings( counted in 1000+) in a vector, numberig them by arrival as en index in the vector, which I sort  in a set. Sacrificing some ram, things used to run much faster that way. In the set<long>'s key_compare, I'm comparing strings. I even use set.find to exclude doublets. That means I need 2 ways of comparing strings, one for set.find and another one for set.insert.
      In addition, I wish I could hide a lot of data members. Recently, I'm forced to use static/global data(Hm...) So forget about easy concurrency...
      Otherwise, the code works.
      // compile with: /EHsc#include <stdafx.h>#include <ppl.h>#include <iostream>#include<set>#include<vector>#include<string>#include<algorithm>#include<functional>using std::binary_function;
      // the class data members are temporarily referenced as globals because I experience key_comp() // class data member access problems.
      // there is one code, so DBG tracks key_comp(), 2 data members sets
      // in addition to constructor, operator () adresses the copy as well
       std::vector<std::string> v;
      // instead of sorting strings, put them in an unsorted  // vector and sort corresponding vector indexes in a set// Pros: On most architectures it will outperform string sort.// Cons: extra memory allocation of index values//template<class T> class cmp;// compiles OK without declarationstemplate<class T> class cmp{public:
      // forced to global residence#if 0  int st,mxi; std::string stl;#endif cmp(){}; //~cmp();// linker error bool operator () (const T& t1,const T& t2)const{   switch(st){ case 0:// sort vector arrival indexes in a set return v.at(t1)<v.at(t2); case 1:// find index of the string, mxi is max index value+1,not present in the set if (t1==mxi)return stl<v.at(t2);  return v.at(t1)<stl; default:  // the program is sick
        std::cout<<"program error: st,mxi,stl="<<st<<" "<<mxi<<" "<<stl<<std::endl;  terminate();  ;   }//sw  }//op};
      Another piece of code:
      const std::string str[10]={"1","9","8","6","5","1","9","3","2","1"};  int maxind=0;  std::cout<<std::endl<<"Input:"<<std::endl;   for(int i=0;i<10;i++){ std::cout<<str[i]<<"  ";  // find str st=1;mxi=maxind;stl=str[i];if(s.find(maxind)!=s.end())continue;// found v.push_back(str[i]);// not found st=0;  s.insert(maxind);maxind++;   }
         std::cout<<std::endl<<"Output:"<<std::endl;   for_each( s.begin(),s.end(),prn);   std::cout<<std::endl; //  How?? [](int n){std::cout<<std::endl<<v.at(n);}   s.clear();   v.clear();      //____________________   std::set<long,cmpo<long> >s1;   cmpo<long>* p1=&s1.key_comp();
      Thanks Jan

    • User profile image
      STL

      Are you saying that your ultimate goal is to sort a sequence of strings? With VC10, just use sort() on vector<string>. Move semantics will automatically avoid unnecessary dynamic memory allocation/deallocation during the sort.

      Here's VC10 sorting 170,000 strings occupying 10 MB in 106 ms on my Q9450:

      C:\Temp>type meow.cpp
      // http://www.gutenberg.org/cache/epub/5050/pg5050.txt
      
      #include <stddef.h>
      #include <stdlib.h>
      #include <algorithm>
      #include <fstream>
      #include <iostream>
      #include <numeric>
      #include <ostream>
      #include <string>
      #include <vector>
      #include <windows.h>
      using namespace std;
      
      long long counter() {
          LARGE_INTEGER li;
          QueryPerformanceCounter(&li);
          return li.QuadPart;
      }
      
      long long frequency() {
          LARGE_INTEGER li;
          QueryPerformanceFrequency(&li);
          return li.QuadPart;
      }
      
      void print_time(const long long start, const long long finish, const char * const s) {
          cout << s << ": " << (finish - start) * 1000.0 / frequency() << " ms" << endl;
      }
      
      int main() {
          ifstream f("pg5050.txt");
      
          if (!f) {
              cout << "ERROR: pg5050.txt not found." << endl;
              return EXIT_FAILURE;
          }
      
          vector<string> v;
      
          for (string s; getline(f, s); ) {
              v.push_back(s);
          }
      
          cout << "v.size(): " << v.size() << endl;
      
          cout << "Sum of string lengths: " <<
              accumulate(v.cbegin(), v.cend(), size_t(0),
                  [](size_t n, const string& s) { return n + s.size(); }
              ) << endl;
      
          const long long start = counter();
      
          sort(v.begin(), v.end());
      
          const long long finish = counter();
      
          print_time(start, finish, "sort()");
      }
      
      C:\Temp>dir | findstr pg5050
      02/14/2010  08:27 PM        10,056,715 pg5050.txt
      
      C:\Temp>cl /EHsc /nologo /W4 /MT /O2 /GL meow.cpp
      meow.cpp
      Generating code
      Finished generating code
      
      C:\Temp>meow
      v.size(): 171072
      Sum of string lengths: 9714572
      sort(): 106.201 ms

      Attempting to sort the strings indirectly is even slower:

          const long long start = counter();
      
          vector<const string *> p(v.size());
      
          for (size_t i = 0; i < v.size(); ++i) {
              p[i] = &v[i];
          }
      
          sort(p.begin(), p.end(), [](const string * l, const string * r) { return *l < *r; });
      
          const long long finish = counter();
      
          print_time(start, finish, "Indirect sort");
      }
      
      C:\Temp>cl /EHsc /nologo /W4 /MT /O2 /GL meow.cpp
      meow.cpp
      Generating code
      Finished generating code
      
      C:\Temp>meow
      v.size(): 171072
      Sum of string lengths: 9714572
      Indirect sort: 134.223 ms

      Using set is even worse (#include <set> above):

          const long long start = counter();
      
          struct comparator {
              bool operator()(const string * l, const string * r) const {
                  return *l < *r;
              }
          };
      
          set<const string *, comparator> s;
      
          for (auto i = v.cbegin(); i != v.cend(); ++i) {
              s.insert(&*i);
          }
      
          const long long finish = counter();
      
          print_time(start, finish, "Indirect sorting through set");
      }
      
      C:\Temp>cl /EHsc /nologo /W4 /MT /O2 /GL meow.cpp
      meow.cpp
      Generating code
      Finished generating code
      
      C:\Temp>meow
      v.size(): 171072
      Sum of string lengths: 9714572
      Indirect sorting through set: 185.68 ms

    • User profile image
      Jan Widisch

      Hello,
      Please, You have not answered my question: how do I access data members of my set's class key_compare() in STL ?
      I will not only sort, I will remove about 80% of redundant info, using set.find.
      Using a vector, I am out of memory on the destination computer, which will not be upgraded.
      Using a set, things will work.
      Thanks for taking Your time for the comparations! 
      Sorting strings of a vector can make life easier later on in an application, so its good thing to optimize it.
       The comparations:
      Q9450, what an impressing speed! With that power, I would test every case building the containers from scratch for every case, thus re-reading the string file or the like.
      a/You do not add the time to build up the first vector, and the build up of the other containers may take longer than You think. It may be missleading. 
      b/You sort it, and then feed it(sorted) to the pointer and set versions. The sort times can not be compared, its different data. Some sort versions do not like sorted  data.
      c/ For the vector, You prealocate the final space, but not for the set.
      But anyway, the pointer and set versions can be somehow compared. I expect the set perfom worse, it has to rebuild the tree for every new item. But in my app, from the pointer vector version, I would have to compare 100% and delete about 80% of the items, which is already done in the set version. So I expect both methods take  about the same time  for my app.
      Thanks Jan
       

    • User profile image
      STL

      > You have not answered my question: how do I access data members of my set's class key_compare() in STL ?

      That's because I'm trying to understand and solve the overall problem. From what I've seen so far, your current approach is not ideal.

      > I will not only sort, I will remove about 80% of redundant info, using set.find.

      Ah, you're saying that the data isn't in memory to begin with. In that case, why not use set<string>?

      Additionally, set::insert automatically avoids inserting duplicate elements. Calling set::find before set::insert performs *two* lookups instead of one!

      > Q9450, what an impressing speed!

      It was fast in 2008.  I'd call it average for a dev machine now.

      > With that power, I would test every case building the containers from scratch for every case, thus re-reading the string file or the like.

      Oh, I presented my code in what appears to be a slightly misleading manner in an attempt to save space. The snippets beginning with "const long long start = counter();" are intended to entirely replace the bottom of the original program beginning with that line, forming three independent programs. I believed that this was clear but if it wasn't, I apologize.

      > a/You do not add the time to build up the first vector

      That was intentional. My understanding at the time was that the data was already in memory.

      > b/You sort it, and then feed it(sorted) to the pointer and set versions. The sort times can not be compared, its different data. Some sort versions do not like sorted  data.

      This was not the case - see above.

      > c/ For the vector, You prealocate the final space, but not for the set.

      Again, I assumed that the data was already in memory.

      (This is why *extremely clear* problem descriptions are vital.)

      set<string> should outperform set<const string *, comparator> above due to reduced indirection (but I expect the difference to be slight).

      However, there is a performance bug in VC10 RTM that you should be aware of. When inserting lots of duplicate data into a set (which is your scenario), VC10 RTM can perform unnecessary allocations and deallocations which significantly harm performance. (The code allocates a node, tries to insert it in the set, notices that it's a duplicate, and then deallocates the node.) When you insert *const lvalues*, the unnecessary work is avoided. So, T t; s.insert(t); triggers the slow codepath, but s.insert(static_cast<const T&>(t)); triggers the fast codepath.

      This problem has been thoroughly fixed in VC11.

       

    • User profile image
      Jan Windisch

      Hello,
      I understand You are wondering why I use two containers. It will be compared  to another software, which is designed that way. This is the only way to sort strings in that design. Its fast, but wery limited and very, very inconvenient to work with. I think it will be abandoned, regardless of STL performance. Later, I will change STL approach to more simple, one container  design, for this gentleman who wants it this way.  I already did in other simple projects for myself and others, in fact. I have not experienced any STL introduced troubles at all. But my STL experience and knowlege is still very limited.
      It is a strange "double container" design that caused me strange troubles.
      So thanks on performance hints, if that is so easy to do so why not do it.
      I do not expect top performance from STL in this early release, there are so many other countless reasons to use it. Dot.
      In a post marked now 4 days ago I posted a small test of "double container software" to show it works with globals. You see if I find an item already in the set, it will not insert it.
       If I could not design set.find in few minutes, I would not. I heard other peple discovered the node behavior by insert, but it cannot hart performace significantly I believe.
      Now, I can see while debugging, there is a copy of my key_comp class data somewhere.
      I understand I have to implement a copy constructor and get a pointer to and work with the copy. But I do not know why the  copy is implemented, so I wonder if there is something more I have to do.
      Thanks Jan
       

    • User profile image
      Sebastian

      Hi Stephan,
      Awesome video series. Looking forward to the next parts.
      One question: Qt's regex engine allows to determine whether a given string is a prefix of something that could match a regex, e.g. for the regext a+b+, it would recognize that "aaaaa" is a valid prefix. This is very useful for validating user input as it comes in. Is there a way to make the std regex engine do this?
      On a side note, if you use Start Without Debugging (Ctrl+F5) to start programs, VS will insert keep the console open, so you don't need a dummy line with a breakpoint.

    • User profile image
      STL

      > Is there a way to make the std regex engine do this?

      std::regex can't, but boost::regex can. See http://www.boost.org/doc/libs/1_45_0/libs/regex/doc/html/boost_regex/partial_matches.html

    • User profile image
      Jan Windisch

      Hello,
      class set with replaced functor comparator is a frenetic copier of key_comp object.
      It creates 3 copies to begin with, later it copies copy #2 to copy #3 at every acess
      of its functions. Of course its possible to callback from a class copy creator, but a function using global control variables is maybe better.
       Questions:
      Is there any rational explanation of the software's behavior?
      It is claimed functor object adds a "state". What does that mean in these circumstances?
       
      Thanks Jan

    • User profile image
      Chan

      Thank you very much Stephan for such a nice series of STL ;)! I love your talk ^_^

    • User profile image
      emn13

      How does the regex implementation compare to RE2?

    • User profile image
      Philhippus

      If I understood correctly Stephan said DO NOT loop incrementally through smatch m[i] results. Would the following code work in context?:
      static int i;
      if(regex_match(s,m,r)) 
      {
      while(i <= m.size())
      return foo( m[++i], pf);
      }

    • User profile image
      evan

      really talking with this Guy on STL is twice as valuable as reading four books on stl . awesome and looking for a presentation like this on MFC. the presentation is not simply about programming but it include scientific computation which helps understand what is under the hood.

    Comments closed

    Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.