C9 Lectures: Stephan T Lavavej - Advanced STL

C9 Lectures: Stephan T Lavavej - Advanced STL, 5 of n

Download this episode

Download Video

Description

There are two STLs: the Standard Template Library and Stephan T. Lavavej Smiley

Advanced STL covers the gory details of the STL's implementation -> you will therefore need to be versed in the basics of STL, competent in C++ (of course), and be able to pay attention! Stephan is a great teacher and we are so happy to have him on Channel 9—the only place you'll find this level of technical detail regarding the internals of the STL. There are no books. There are no websites. This is Stephan taking us into what is uncharted territory for most, even those with a more advanced STL skill set.

In this 5th part of the n-part series, STL digs into the Boost Library (http://www.boost.org). In his words, it's an open source, super quality, community-driven STL++. Stephan will walk you through a sample application from end to end, using boost.

[Advanced STL]

Part 1 (shared_ptr - type erasure)

Part 2 (equal()/copy() - algorithm optimizations)

Part 3 (_ITERATOR_DEBUG_LEVEL, #pragma detect_mismatch, and /d1reportSingleClassLayout)

Part 4 (rvalue references v2.1 and associative container mischief)

Part 5 (deduplicator, using Boost.Bimap/Filesystem/ScopeExit) - see Stephan's deduplicate.cpp

Part 6 (container pretty printer) - see Stephan's pretty_printer.cpp

Embed

Format

Available formats for this video:

Actual format may change based on video formats available and browser capability.

    The Discussion

    • Joshua Boyce

      Any chance we could get the source code posted?

    • Thomas Petit

      Very cool to see some Boost goodness in channel 9 ! Thanks for the vid. (impeccable timing with BoostCon too :)

      But Stephan, you miss the perfect importunity in your function read_file to sneak one of the little underused gem of boost: boost.iostreams.mapped_file.

      I have found it to be the best way to deal with binary files hands down. It's :

      1) Extremely fast. Very often I have seen speed access being an order of magnitude faster than C++ iostream or C stdio functions (on Windows)
      2) Super easy to use and concise. begin() or data() return a char* to the first byte of the mapping and then you can simply iterate through it and the system will take care of everything.

      For example read_file can be rewrite in two lines like this :

      #include <vector>
      #include <string>
      #include <boost/iostreams/device/mapped_file.hpp>

      std::vector<char> read_file(const std::string& name)
      {
      boost::iostreams::mapped_file_source file(name);
      return std::vector<char>(file.begin(), file.end());
      }

    • new2STL

      @Thomas Petit: Thats one of the motives people want the boot::iostreams on next C++. Mapped files are part of any POSIX-compliant system, and provides its benefits and drawbacks on accessing file I/O. It is recommended for big files, for huge chunks of data in and out and for sequential access, it can cause excessive page faults if used too frequently and randomly. In the case of Stephan, conventional file manger should be suffice Smiley (unless he intentionally use mmap for show it like you suggest Wink )

      http://msdn.microsoft.com/en-us/library/ms810613.aspx

      http://en.wikipedia.org/wiki/Memory-mapped_file

    • STL

      Joshua Boyce> Any chance we could get the source code posted?

      Yep, sorry - I've been slammed with work and haven't gotten a chance to clean up my source code like I planned, or post a How To Build Boost article. Here's the "original recipe" version, as explored in the video:

      http://cid-e66e02dc83efb165.office.live.com/self.aspx/deduplicator/deduplicate.cpp

      Disclaimer: the SHA-256 code is very lightly tested. As I recall mentioning, I originally wrote this at home, where I have an SHA-256 implementation written from scratch and exhaustively tested. But it's large and bulky and I wanted to show off Boost.ScopeExit, so I ripped it out and replaced it with the <bcrypt.h> machinery in a matter of minutes. It seemed to work fine but I wouldn't be surprised if I had gotten something wrong in the conversion.

      The Boost.Filesystem and Boost.Bimap machinery is essentially unchanged from what I had at home, and I feel much more confident that that works as expected.

      Thomas Petit> But Stephan, you miss the perfect importunity in your function read_file to sneak one of the little underused gem of boost: boost.iostreams.mapped_file.

      Cool, I wasn't familiar with that. (My knowledge of Boost is surprisingly patchy. There are the parts that I used in college, where I used a bunch of stuff but it's all really old now. Then there are the parts that have been incorporated into TR1/C++0x - I'm very familiar with that, and reasonably familiar with the deltas between Boost and the Standard, since I've spent years working on them. And then there are the parts that I've played with at home recently, like Boost.Bimap. In between, there are a bunch of libraries that I haven't used, and parts of libraries that I haven't explored.)

    • Spetum

      WOW~!!~ BOOST!!

      Big Smile

      Thanks Charles and Stephan!!

    • Deraynger

      VERY awesome! Just started using a little boost (just shared_ptr and noncopyable headers), so this comes in handy. I wouldn't mind some more boost videos.

      One minor complaint is the time constraint you seem to have. Does Charles decide that, or did you have to get back to work?

       

      You mentioned that use use stdio.h (performance I guess). I've extremely tried to stick with pure C++ (as part of me learning it correctly), a lot seem to recommend stdio though (for performance?).

       

      What would be cool is some OpenGL/DirectX use of STL, like a texture manager, mesh loader etc.

       

      Some general questions:

      How would you return the vertices vector to an OpenGL function? What I do is (don't have it here, but somehow) something like:

      inline const vector<vert3>& Verts( void ) const 
      {
          return _vertices;
      }
      
      // Somewhere in loop code
      glDrawElements( someVal, someVal, someVal, &myObj->Verts()[0] );
      

      Probably wrong (and a bunch of things I don't like, const correctness not guaranteed as elements can be modified, returning references are bad, etc. ), and should I return &vert3[0] (I guess it is the same thing?).

      I don't want to copy the vector (obviously), especially as it gets called very often. I'm also not sure if my function is inline... would that disable RVO?

       

      How about vector<unsigned char> containing color information, and the format is BGR, but I need RGB. Is there a better way, then using a for loop and store a temporary of B, assigning R to B, and the temporary to R?

       

       ...and finally, a last question. Should simple function variables, that are just assigned once (per function call), and not modified later on, be defined const (e.g. const float someValue = x / y;)?

    • new2STL

      @Deraynger: Well, part of your questions are better fit on the XNA Blog team or use the Ch9 forums. I've opened a thread for use SSE align with std::unique_ptr and std::array (the two play nice with _aligned_malloc), I could expand the forum with stuff you want like the load of textures (pretty easy with WIC) or you can open a new thread and I'll be happy share my experiments with you.

      In case of BGR and RGB its a old case of endianness, you need see if OpenGL (or DirectX) provide support for both in hardware (extensions string), if don't have support you only way is switch the positions in host memory or in device memory (gpgpu).

    • Deraynger

      @new2STL: Hey, yes, just thought STL or someone here probably has some tricks up their sleeves.

      The BGR/RGB is for OpenGL ES. Would have to look again, but I don't think it supports BGR. I mainly wanted to know, whether there's a better way to swap all.

      XNA is C# or .NET, isn't it?

      I do have my own loaders/managers etc., was just thinking that I could probably see better code, and improve mine with STL goodness, like maybe making a template class for the different texture types, instead of deriving from an interface/abstract class or automatically deducing the correct loader to return from a factory class without having to explicitly define it (e.g. if (fileType == FileType::PNG) { return new PngTextureLoader(); } else if ... etc.)

    • new2STL

      @Deraynger: About XNA, its mainly C# for cross-platform, but have lots of native code too due XBOX/PC optimization, most of the helper library in the DirectX SDK falls into the xna namespace

      OpenGL ES are a subset of OpenGL 2.1, in 2.1 the BGR textures turned in core but for ES I think they are optional. Anyway, you can run a pixel shader for swap the bits positions, there are many ways to do it. But if the texture is small (lets say, a square tile of 64 or 128 pixels) it can be done in host memory with little impact in the performance.

      I had to say I'm watching all the awesome videos here on Going Deep for the same reason as you, to better learn STL and use its power in better code Big Smile

    • Deraynger

      @new2STL: Hey, ok thanks for the info on that. Never really did anything in DX.

      Ok regarding the BGR swapping or doing it with a shader. It isn't anything critical, just wrote what came to mind, and could be improved Wink

      What bothers me with the general OpenGL books/examples, ist that it's almost always written for C (also new ones, for portability Perplexed). So, definitely no STL in them, meaning that there are also no good examples of how to create some ideal classes that everyone needs.

      Yes, I guess these videos are quite popular (if the views on the top are somewhat accurate) Smiley

    • Charles

      @Deraynger: I don't decide how much time STL gets for his lectures... He is given one hour of studio time which means he can't do one hour of lecture time... Set up and all that. Then there is how much work STL has to do at his day and night job. Its amazing he has 30 minutes in his day to do these. Thank you, STL!!!!! Smiley

       

      C

       

    • Ivan

      Urgh 35 min limit is to harsh. : Otherwise great video!
      I can agree with the boost fans here... great to see Boost videos!
      I have a couple of questions.
      1. Why no timestamp on the batch file. Seems unsecure to trust old hashes and not to check modification date of the files. Then again maybe I misunderstood the program.
      2. Why hash files that have unique size?(btw that could have been a nice example of using STL, although it wouldn't be elegant: I presume that you could do multimap<size, file> and for every block of identical sized files do potential elimination). That would be nice to see because I don't know a nice way to do "for_each_multimap_block_that_has_the_same_key" elegantly.
      Then again maybe I misunderstood the program, again. :)
      3. Isnt there some other "normal" sha256 lib that doesnt require all the cleanup... TBH I dont know why it does require so much cleanup in the first place-maybe because OS tries to prevent other processes from looking. :)

    • Int64

      Boost is truly great, not just from a content standpoint, but in how fast they release new versions/updates. I'm using it primary for TR1 support, but there's alot of other excellent stuff I've used:

      • String Algorithms/Format (boost::format is probably my most frequently used non-TR1 feature)
      • UUID
      • Exceptions/error_info
      • Serialization
      • noncopyable
      • Slots/Signals
      • Threads
      • checked_delete
      • FileSystem/path
      • Date/Time
      • IOStreams/zlib support
      • ASIO (networking)
      • Test (Although, I actually prefer UnitTest++ for unit testing)

      I'm off to try mapped files, something I can use that I wasn't aware of - thanks Thomas.

      , Deraynger wrote

      Some general questions:

      How would you return the vertices vector to an OpenGL function? What I do is (don't have it here, but somehow) something like:

      1
      2
      3
      4
      5
      6
      7
      inlineconst vector<vert3>& Verts( void ) const
      {
          return_vertices;
      }
       
      // Somewhere in loop code
      glDrawElements( someVal, someVal, someVal, &myObj->Verts()[0] );

      Why not just return the pointer?

      vert3 const* Foo::vertices_data() const
      {
          if (_vertices.empty())
              throw std::runtime_error("See Effective STL, item 16");
          return &_vertices[0];
      }
      
      ...
      
      Foo foo;
      vert3 const* data = foo.vertices_data();

    • STL

      Deraynger> I wouldn't mind some more boost videos.

      I'll see what I can do - there are several more Boost libraries that I'm familiar with.

      Deraynger> One minor complaint is the time constraint you seem to have. Does Charles decide that, or did you have to get back to work?

      That's all my fault. I'm insanely busy with VC11, and I didn't have time to prepare an example in advance. So on the day of filming, I furiously hacked out the deduplicate.cpp that I presented, making me run several minutes late. By the time I got to the studio, we had to finish up fast.

      Deraynger> You mentioned that use use stdio.h (performance I guess).

      I specifically have a love-hate, mostly hate, relationship with iostreams. iostreams are type-safe and buffer-safe, while stdio is not. On the other hand, iostreams performance is typically miserable, and trying to do anything tricky with them will expose you to their Standard-mandated guts which are absolutely awful (the contrast in design between iostreams, full of virtual nonsense, and the STL proper, is remarkable).

      iostreams are fine for simple text I/O when performance doesn't matter. For binary I/O, iostreams don't really buy anything, and I prefer to wrap stdio.h instead (it must be wrapped, because manipulating FILE * directly is a recipe for resource leaks - as I eternally have to explain).

      Deraynger> What would be cool is some OpenGL/DirectX use of STL, like a texture manager, mesh loader etc.

      I have code at home that wraps OpenGL with Boost and the STL, but lifting it out into something presentable would take forever.

      Deraynger> should I return &vert3[0] (I guess it is the same thing?).

      v.data() is the C++0x way to get a pointer to a vector's elements. Unlike &v[0], it is correct when v is empty (but in that case the pointer can't be dereferenced, obviously).

      Deraynger> I don't want to copy the vector (obviously), especially as it gets called very often.

      Remember move semantics.

      Deraynger> I'm also not sure if my function is inline... would that disable RVO?

      According to my knowledge, the compiler's decisions to actually-inline something, and to apply the RVO/NRVO, are independent.

      Deraynger> Is there a better way, then using a for loop and store a temporary of B, assigning R to B, and the temporary to R?

      Not manipulating the data at all with the CPU, and getting the GPU to do it in a shader, would be ideal. Shaders are crazy powerful. (Even better, if you control the source, just ensure that the data is in the right format to begin with. The cheapest code is the code you never execute.) Otherwise, you can loop through the vector and std::swap() the colors.

      Deraynger> Should simple function variables, that are just assigned once (per function call), and not modified later on, be defined const (e.g. const float someValue = x / y;)?

      Yes. That clarifies intent and prevents mistakes.

      Ivan> Seems unsecure to trust old hashes and not to check modification date of the files.

      If a directory is deduplicated, and then files are modified (or deleted!), then data has been irrevocably lost. "Don't do that then."

      Ivan> Why hash files that have unique size?

      That would be a reasonable optimization - avoid hashing unique-sized files, unless and until duplicate-sized files appear (because the deduplicator can be run on a set of files, more files can be added, and then the deduplicator can be run again). Good idea, I hadn't thought of that.

      Ivan> I don't know a nice way to do "for_each_multimap_block_that_has_the_same_key" elegantly.

      This is actually super easy and elegant. Use multimap::equal_range().

      Ivan> Isnt there some other "normal" sha256 lib that doesnt require all the cleanup...

      SHA-256 can be written from scratch, and I've done that - but this is how the Windows API exposes it. The usual advice, which is good advice, is to never attempt to implement anything crypto-related by yourself. In my case, I learned C by implementing SHA-1, and I'm extremely comfortable (and careful!) with such code. Windows' implementation is probably faster than mine, though - I never resort to assembly.

      I would have used Boost for this, but they don't yet have an implementation of SHA-256.

    • Marek

      >> Windows' implementation is probably faster than mine, though - I never resort to assembly.
      You should try some library like OpenSSL which is optimized to use dedicated co-processor (if you have one, such in VIA processors) or special instruction set included in newest Intel processors.

    • Deraynger
      Thanks to all for your answers!!! C (& STL), I'm very grateful for any time you take, especially when you're very busy! These are the only C9 videos I follow, so I'm always waiting for some new ones, and then it doesn't last too long. But rather short than none Smiley Thanks again! @STL: I'm not able to use move semantics, since too few compilers support it. Regarding your STL, Boost & OpenGL stuff, maybe you could just show a portion of it, and describe it a bit. Do you know of a smart (STL?) way to return an instance from a factory class, without having to use some if/else? I thought that one could use some traits (forgot the name again, but like you did in the container helper class in the first series, with typedef'ing the items). Then any class deriving from say a texture loader abstract class, could define what they support, say ".raw", then the rawLoader would be returned, when a raw file(-name) gets passed to the factory? Maybe I just need to study your example a little better there.
    • new2STL

      > STL: I have code at home that wraps OpenGL with Boost and the STL, but lifting it out into something presentable would take forever.

      O yeah! Your text rendering engine, pretty nice Smiley

    • spons

      Marek, exactly, just use the OpenSSL crypto library. It contains every hash algorithm anyone would ever want to use. They're all implemented using the same set of functions (init, update, finalize). You won't have to implement anything yourself to support buffered I/O. The documentation seems to be outdated so browse the source to see what's available (sha256 is in the sha dir).

    • STL

      Marek> You should try some library like OpenSSL which is optimized to use dedicated co-processor (if you have one, such in VIA processors) or special instruction set included in newest Intel processors.

      I expect and believe that the Windows implementation uses special instructions when applicable and available.  This is true for things like memcpy() in the CRT.

      Deraynger> Do you know of a smart (STL?) way to return an instance from a factory class, without having to use some if/else?

      If I had to do something like that, I'd construct a std::map<Key, Factory> and populate it accordingly. For example, the Key could be a std::string storing a filename extension, and the Factory could be a std::function<std::shared_ptr<ImageParser> ()>. Having written non-member functions std::shared_ptr<ImageParser> makePNGParser() and std::shared_ptr<ImageParser> makeJPEGParser(), I could say m["png"] = makePNGParser; m["jpeg"] = makeJPEGParser (or m.insert(std::make_pair("png", makePNGParser)) if I were so inclined). Later, given an extension, I could perform a lookup to determine whether the extension was supported, and if so, retrieve a factory function for a parser.

      However, when faced with this problem in the past, I've simply performed a sequence of regex_match()es, and called the proper loading function, without any factory functions being involved. I do use inheritance, in a highly structured form - always with non-virtual interfaces, always with shared_ptr<Base> - but only when the situation actually demands it - that is, when I need different types for something, but later I want to forget the specific types and treat them uniformly at runtime. (If I want to treat different types uniformly at compiletime, that calls for templates.)

    • C64

      This Boost.Filesystem library seems cool (especially the iterator tool), thanks for sharing.

      BTW: Does it work properly with Unicode characters? I see your code uses std::string, it would be good if we could safely use it with Unicode filenames. Or does Boost.Filesystem use UTF-8 as Unicode encoding instead of UTF-16, so std::string is OK in this case?

      Thanks for clarifying this.

       

    • Int64

      Something I just remembered - if you want a quick and easy way to start using boost, this site provides an installer for the headers and pre-built binaries: (VS2003-VS2010, 32-bit only)

      http://www.boostpro.com/download/

      , C64 wrote

      BTW: Does it work properly with Unicode characters? I see your code uses std::string, it would be good if we could safely use it with Unicode filenames. Or does Boost.Filesystem use UTF-8 as Unicode encoding instead of UTF-16, so std::string is OK in this case?

      Thanks for clarifying this.

      The path class supports multiple character types. So you can replace these:

      void free_function(std::string const& file_name);
      void free_function(std::wstring const& file_name);

      with this:

      void free_function(boost::filesystem::path const& file_name);

      One function instead of one for each string type you need to support.

      http://www.boost.org/doc/libs/1_46_1/libs/filesystem/v3/doc/reference.html#class-path

    • Iva

      Stephen again tnx for the answers.
      IMHO good topic for next video could be regex>. Why? Because I have some CS background and I was surprised that VS had regex bugs- I always thought that regex machinery is some DFA/NFA generator that it is hard to do, but when you do it then it is hard to miss bugs.
      Also if you will still be making videos in the future the day that VS 2012 goes public beta you could make a video detailing some cool new feature that was NDA just 1 day ago. :)
      Also more boost would be great.

    • Ivan

      ups, post above was me, sorry for the possible confusion.

    • Deraynger

      @STL: Ok, will definitely look into it. I still don't fully get the benefit from the NVI pattern. Gotta re-look it up in the Effective/More Effective C++ book.

      Edit: Looked it up, but if I have a one step method (no setup etc. needed, then a public virtual function should be fine without NVI).

      Edit 2: After looking at the Template Method Pattern, I think I understand what you mean. If I'm not mistaken, then the Base class(using NVI, possibly having an abstract member function) could be acting instead of a seperate factory class having to return an instance. The Base class would then know load the item itself (possibly having a overriden loader method in a derived class). I'll have to try and let you know if I don't get further Smiley  

      I see the benefit of the map, providing better code-reuse, without having to modify several code-sections. Runtime lookup satisfies my needs completely.

      What I also don't understand is the shared_ptr<Base>, is that an aggregation, and not an inheritance?

    • Mr Crash

      Looking at this video when you're so stressed is painful !
      I get stressed just by watching you Perplexed

      I agree with other viewers, STL needs more studio time.
      Good c++ videos a rare on c9 so you should give them more time. Much much more time Smiley

      I took a look at scope_exit.hpp, oh the horror of macro's, simply unreadable. This can be done so much better with c++0x.

      scope_exit([&hAlg]{
      etc..
      });


      scope_exit.hpp is a perfect example of why i don't use boost. That macro fetish should not be allowed in boost at all.


      @STL: That trick in 'hexify'. Is it only good for when you want to write something out fast (quick and dirty) ?

    • Will Watts

      An excellent lecture.

      Your code calls the (useful) initial_path() function... which is, annoyingly, deprecated - see
      :http://www.boost.org/doc/libs/1_46_1/libs/filesystem/v3/doc/deprecated.html

      More Boost in future lectures would be most welcome: there are so many good things in there, but it is hard to get to know about them.

    • Mark

      I hate to say this, but I didn't particularly enjoy this video at all. When it's titled "Advanced STL", then I expect to see, well, things relating to the STL. If you want to teach me about hashing and about Boost, then title the video appropriately.

    • STL

      Int64> The path class supports multiple character types.

      Yes - and Filesystem V3 (now the default) does this better than V2 did.

      Ivan> I always thought that regex machinery is some DFA/NFA generator that it is hard to do, but when you do it then it is hard to miss bugs.

      Actually, regex's implementation is exceedingly tricky, and bugs in it usually aren't obvious precisely because it's building and running a complicated NFA.

      Deraynger> What I also don't understand is the shared_ptr<Base>, is that an aggregation, and not an inheritance?

      shared_ptr<Derived> is convertible to shared_ptr<Base>. This allows shared_ptr to be used with inheritance hierarchies.

      Mr Crash> I took a look at scope_exit.hpp, oh the horror of macro's, simply unreadable.

      Are you aware that our STL implementation contains unreadable, horrifying macros? And yet, it gets the job done.

      Mr Crash> That trick in 'hexify'. Is it only good for when you want to write something out fast (quick and dirty) ?

      There's nothing wrong with it, it's just unusual - many people haven't seen indexing into a string literal like that before.

      Will Watts> Your code calls the (useful) initial_path() function... which is, annoyingly, deprecated

      Ah, I didn't know that. Calling current_path() at the beginning of main() and saving it somewhere is a reasonable thing to do.

      Mark> When it's titled "Advanced STL", then I expect to see, well, things relating to the STL.

      Noted. A few points (in order of decreasing hilarity):

      1. I prefer the following explanation: that the "STL" in the series title refers to myself. :-> (This was Isaac Asimov's reply to people criticizing the title of The Intelligent Man's Guide To Science.)

      2. Boost.Filesystem has been proposed for TR2 (in fact, it was the only proposal before TR2 was put into cryostasis due to C++0x running late), and I consider it highly likely that it'll make its way into the Standard beyond C++0x. So I claim, with some justification, that this really is about the STL - just the STL of the future.

      3. Boost is very much related to the STL, both in interface and in authorship. Even if, say, boost::bimap never makes it into the Standard, its similarity to std::map is undeniable, and it definitely solves problems that the STL alone cannot solve nearly as easily. As I've been saying since Intro Part 1, the STL is a library, not a framework, and it's supposed to be used alongside other libraries. More than anything else, Boost plays nice with the STL by design.

      Before VC9 SP1 with TR1, Boost would have had a much larger starring role in my videos - I covered shared_ptr way back in Intro Part 3.

    • Deraynger

      Oh, makes sense, was thinking too far ahead of myself Smiley

      STL>shared_ptr<Derived> is convertible to shared_ptr<Base>. This allows shared_ptr to be used with inheritance hierarchies.

    • Philhippus

      Surprised no one has mentioned the rather elitist use of the forbidden goto command in that CNG code. I took a sharp intake of breath on seeing that.

      I agree with the posts above about the time constraint sucking somewhat. STL having to blast through some convoluted programming at the speed of a syntax parser is hard to follow.

    • Ivan

      @STL
      "Actually, regex's implementation is exceedingly tricky, and bugs in it usually aren't obvious precisely because it's building and running a complicated NFA."
      If you wrote any part of it it would be interesting for CompSci nerds like me it would be an interesting video.
      BTW do you share the code between C# regex and c++ regex? It would seem like a reasonable thing to do, unless the have different expression power.
      "This is actually super easy and elegant. Use multimap::equal_range()."
      I meant like in for_each_equal_range, I presume it should be something like (while range.first!=range.second) but I hoped that there is some quick way of doing it.

    • Mr Crash

      , STL wrote

      Are you aware that our STL implementation contains unreadable, horrifying macros? And yet, it gets the job done.



      I know you use macros as workaround for N-arguments.
      I haven't seen anything like what i saw in scope_exit.hpp thou. I have only seen small marcos for debugging.

      Can you give an example ?

      What i didnt like in scope_exit.hpp is that the starting / entry code (BOOST_SCOPE_EXIT, BOOST_SCOPE_EXIT_END) was macrofied, having marcos inside functions doesn't bother me because they are contained / simple and no macro on macro action.

      // macro on macro action example
      #define macro1(...) stuff
      
      #define macro2a(...)  macro1(..)
      #define macro2b(...)  macro1(..)
      #define macro2c(...)  macro1(..)
      
      #define macro3(...)  macro2a(..)
      
      #define macro4(...)  macro2b(..)
      
      #define macro(...)  macro4( macro2a() macro2b, macro4, macro3), macro2c(...)
      // Now also imagine these macros are not defined one after another and instead are being defined all over the file.
      // Oh the happy code jumping, prepare for confusion, yay!
      // Now  your job is to unwrap 'macro', and understand how it works and what it does, good luck !  
       

    • STL

      Ivan> If you wrote any part of it it would be interesting for CompSci nerds like me it would be an interesting video.

      Like the rest of the STL, I didn't write <regex>. The two major exceptions are make_shared<T>() in VC10 and uniform_int_distribution in VC11.

      Ivan> BTW do you share the code between C# regex and c++ regex?

      Nope.

      Ivan> I meant like in for_each_equal_range

      Oh, there's no dedicated function for that, but it's easy enough to write:

      C:\Temp>type meow.cpp
      #include <iostream>
      #include <map>
      #include <ostream>
      #include <string>
      #include <utility>
      using namespace std;
      
      int main() {
          multimap<int, string> mm;
      
          mm.insert(make_pair(3, "cat"));
          mm.insert(make_pair(7, "hamster"));
          mm.insert(make_pair(3, "dog"));
          mm.insert(make_pair(5, "eagle"));
          mm.insert(make_pair(3, "bat"));
          mm.insert(make_pair(7, "gorilla"));
          mm.insert(make_pair(3, "ape"));
      
          for (auto i = mm.begin(), k = mm.begin(), end = mm.end(); i != end; i = k) {
              for ( ; k != end && i->first == k->first; ++k) {
                  cout << "(" << k->first << ", " << k->second << ") ";
              }
      
              cout << endl;
          }
      }
      
      C:\Temp>cl /EHsc /nologo /W4 /MTd meow.cpp
      meow.cpp
      
      C:\Temp>meow
      (3, cat) (3, dog) (3, bat) (3, ape)
      (5, eagle)
      (7, hamster) (7, gorilla)

    • Ivan

      Wow, very nice of you Stephen to take the time to write the example. Tnx. To bad that this isnt SO so only a few people will see it.
      BTW I have a small suggestion. I presume you are under constant deadlines so for next videos if it is easier you should focus on your STL stuff instead on the nonSTL stuff because you can give lecture on the STL in the middle of the night, while for writing boost examples it takes time. Again I like boost, but I would hate it if you would abandon the lectures because you dont have enough time to do them.

    • Deraynger

      >Ivan: Again I like boost, but I would hate it if you would abandon the lectures because you dont have enough time to do them.

      I second that!

    • Deraynger

      ( Remove )

    • KerrekSB
      Here's an idea for the next episode: multithreading with <thread>, synchronised data structures best practices etc. Optional extra: What's the use for <atomic>?
    • new2STL

      @KerrekSB: For talk about std::thread Stephan must change the compiler to gcc 4.6 or use the paid implementation from just::thread, or the boost::thread (that is close enough to the std:Smiley

      Now I can't remember if it is Sutter or Meyers that will do a talk about that topic on next meeting in CPP&B. Keep tracking in theirs blogs for video and slides.

      Plus Sutter is doing a 2nd round interview: "The followup interview on Channel 9 has been scheduled, and will be shot on Thursday, June 2."

    • KerrekSB
      @new2STL: Oh OK, I didn't realize VS10 doesn't have std::thread yet. (I'm using GCC4.4 myself.) No worries then. Alternative topic suggestion: Allocators, and whether it makes sense two write those yourself.
    • KerrekSB

      Re topic suggestion, here's something that's definitely STL and probably qualifies as "advanced". For quite some time I've been looking for a completely generic way to print out STL containers, in a way that fits into the usual stream operation idiom and handles nested containers. After bringing up the problem on StackOverflow and then on Channel 9 here, we finally seem to have pieced together a complete solution.

      The original question is at StackOverflow, where Marcelo Cantos provided the first step. Later, Sven Groot improved on that in the Channel 9 Forum. Finally, I added a SFINAE-based type trait class to detect containers and remove the need to mention any explicit container classes.

      I posted the final header and a working example on StackOverflow. I think this might be of general interest, as it allows you to output any sort of container type you may have defined without repeatedly writing either your own operator, or a for loop, or even a foreach construct with a lambda over and over again.

      Possible ideas for extending the construction: 1) implement output for std::tuple in the same way as is done for std::pair, 2) add a C-array wrapper class to make C arrays amenable to printing through the same mechanism.

    • Ivan

      @KarrekSB
      "Alternative topic suggestion: Allocators, and whether it makes sense two write those yourself. "
      +1 I'm kind of unhappy that there isnt a .reserve for map set and "multi" versions of those containers... and always wondered is there a way to do something that will allocate continuous memory space for map. I know that map isnt random acces, but the idea is data locality. Imagine having first 3-4 levels of RB tree loaded into cache when you load the root note. :)

    • KerrekSB
      @Ivan: There's also the fact that in the STL you can only specify one single allocator, but that allocator gets used both for the actual contained objects and also for the internal bookkeeping data in an opaque way that you cannot control. Well, I don't really know how much one can say about them, but I thought they're one part of the STL that's sort of always there but never really prominent, so it'd be cool to hear about it..
    • CornedBee

      Nice video.

      I was sad to miss you at BoostCon this year. Hope you'll find time to come next year.

      Sebastian

    • foke

      @STL: Very nice videoS!

      Very interesting subjects, extremely well explained, and, for some reason, your accent is perfectly understandable to my (non-english) ears :)

      Btw I'm also interested in more boost videos, but maybe more focused on what's happening under the hood, not things we can read in the documentation.
      C++0x is fine too.

    • XTF

      @STL: Are you aware of Step Into Specific? It's a context menu option that allows you to avoid repeatedly stepping into and out of functions until you hit the right one.

      hexify's parameter type is const vector<unsigned char>&. boost::iterator_range<unsigned char> might've been better. STL lacks a ptr pair wrapper (for contiguous memory).

       > hash_file.right.find(f) != hash_file.right.end()

      Can't you use hash_file.right.count(f) here?

      > const bimap<string, string>::left_const_iterator j = hash_file.left.find(h);

      Can't you just insert and check the return value to avoid a second lookup?

       

      Before you said vector<unsigned char> should be used for buffers. However, it initializes the buffer to 0, which is unnecessary overhead. Isn't there a better solution?

       

      Greetings,

       

      Olaf

    • Deraynger

      ( Nevermind, see STLs answer Tongue Out )

    • STL

      Ivan> you can give lecture on the STL in the middle of the night, while for writing boost examples it takes time.

      Thanks - but the real answer is that everything takes time.

      KerrekSB> Here's an idea for the next episode: multithreading with <thread>, synchronised data structures best practices etc. Optional extra: What's the use for <atomic>?

      I can't show you VC11 yet, sorry.

      (I've already resolved bugs on Connect saying that we've implemented <thread>/etc. in VC11, so I can reveal that much.)

      KerrekSB> Alternative topic suggestion: Allocators, and whether it makes sense two write those yourself.

      C++0x/VC11 has significantly reworked allocators, so I'd prefer to wait for that.

      KerrekSB> For quite some time I've been looking for a completely generic way to print out STL containers

      That's a great idea - I think I'll do Part 6 about this. I took a different approach, and my machinery is capable of printing out stuff like this:

      [[4]<"khan"'"of"'"the"'"wrath">, [3]<"home"'"the"'"voyage">, [3]<"country"'"the"'"undiscovered">]

      Ivan> I'm kind of unhappy that there isnt a .reserve for map set and "multi" versions of those containers...

      They're node-based. reserve() doesn't make sense for them.

      Ivan> and always wondered is there a way to do something that will allocate continuous memory space for map.

      You can write a pool-based allocator. Note that Windows' Low Fragmentation Heap, which is used by default (I'm glossing over subtleties here), is node-friendly.

      KerrekSB> There's also the fact that in the STL you can only specify one single allocator, but that allocator gets used both for the actual contained objects and also for the internal bookkeeping data in an opaque way that you cannot control.

      Well, we need to allocate our internal helper objects, and you REALLY want us to do that with your (rebound) allocator, instead of std::allocator. (This is actually how I broke the compiler immediately after joining VC.) This inherently requires the whole rebinding scheme.

      CornedBee> I was sad to miss you at BoostCon this year. Hope you'll find time to come next year.

      Yeah, I was insanely busy this year. I do hope to attend next year.

      Sebastian> for some reason, your accent is perfectly understandable to my (non-english) ears Smiley

      I grew up in Colorado, so I have the "standard Midwestern" accent. Very rarely, I'll trip over an R if I'm not paying attention (I think I got that from my father).

      XTF> Are you aware of Step Into Specific?

      No - I put all my skill points into C++, not IDEs. I'll have to look into that.

      XTF> hexify's parameter type is const vector<unsigned char>&. boost::iterator_range<unsigned char> might've been better.

      That would be an improvement (I haven't used iterator_range yet).

      XTF> Can't you use hash_file.right.count(f) here?

      count() is less efficient for multi-containers. I always use find().

      XTF> Can't you just insert and check the return value to avoid a second lookup?

      I believe so, yes. I would always do that for map, but this was my first time using bimap, and I forgot to consider that.

      XTF> Before you said vector<unsigned char> should be used for buffers. However, it initializes the buffer to 0, which is unnecessary overhead. Isn't there a better solution?

      You could write a simple class if performance really, really mattered.

    • Ivan

      Stephen very nice to hear you ans, I was worried that this is the end of the series. :) Also about
      "Ivan> I'm kind of unhappy that there isnt a .reserve for map set and "multi" versions of those containers...

      They're node-based. reserve() doesn't make sense for them."
      Can you explain why. The same problem that you have with vector you can have with them. 10k insertions in multimap will call mem allocation 10k times, right? 10k insetion into (empty vector) will call less than 30. I understand the difference- you never have to reallocate existing part of the RBTree, but still I presume that people would be happy with avoiding a bunch of calls to malloc, data locality(and by that I mean it in a cool way(one level n node, and all his children in one block, not a couple level n nodes in one memory block, like here(figure 6-please don't ignore the picture because the smart-a** tone of the article :) ): http://queue.acm.org/detail.cfm?id=1814327 ).
      So when VC11 pops out lets hope that you will do a allocator video. :)

    • XTF

      XTF> hexify's parameter type is const vector<unsigned char>&. boost::iterator_range<unsigned char> might've been better.

      That would be an improvement (I haven't used iterator_range yet).

      True, just wanted to point out that using const vector<unsigned char>& isn't very generic. And that I miss a ptr pair wrapper in the STL. Wink

      XTF> Can't you use hash_file.right.count(f) here?

      count() is less efficient for multi-containers. I always use find().

      I'm not familiar with bimap, but I assume it's not a multi-container.

      You could write a simple class if performance really, really mattered.

      I like the you don't pay for what you don't use principle in C++, but using vectors for buffers violates that. I'd also prefer not to write my own class for such basic functionality.

       

    • XTF

      They're node-based. reserve() doesn't make sense for them."

      Can you explain why.

      The allocator could allocate a pool and serve requests from that to avoid calling malloc.

    • STL

      Ivan> Stephen very nice to hear you ans, I was worried that this is the end of the series.

      I was on vacation the week of Memorial Day.

      Ivan> 10k insertions in multimap will call mem allocation 10k times, right?

      Yes, and that's fundamental. The node-based containers (list/forward_list, map/set, and their multi/unordered variants) allow any node to be erased at any time. Therefore, they need a separate allocation for each node. In contrast, if you erase a vector's element, it'll move any remaining elements down to fill the hole, and keep the extra capacity in reserve.

      XTF> I'm not familiar with bimap, but I assume it's not a multi-container.

      No, but following the convention means that I don't have to be extra careful when using multi-containers.

    • KerrekSB

      KerrekSB>> For quite some time I've been looking for a completely generic way to print out STL containers

      STL> That's a great idea - I think I'll do Part 6 about this. I took a different approach, and my machinery is capable of printing out stuff like this:

      [[4]<"khan"'"of"'"the"'"wrath">, [3]<"home"'"the"'"voyage">, [3]<"country"'"the"'"undiscovered">]

      That's great, I look forward to the episode! In the meantime, I've put our pretty-printer on GitHub, here's the project website. There's a type-erasing helper class for convenient one-off delimiter overriding, too.

      KerrekSB>> There's also the fact that in the STL you can only specify one single allocator, but that allocator gets used both for the actual contained objects and also for the internal bookkeeping data in an opaque way that you cannot control.

      STL> Well, we need to allocate our internal helper objects, and you REALLY want us to do that with your (rebound) allocator, instead of std::allocator. (This is actually how I broke the compiler immediately after joining VC.) This inherently requires the whole rebinding scheme.

      Are you saying that it just can't be made to work or doesn't make sense to permit two separate allocators, one for objects and one for bookkeeping, like:

      template <typename T, typename ObjAlloc, typename InternalAlloc = ObjAlloc> class Container;

      Or would it suffice if I overloaded T::operator new() so that vector<T> would use std::alloc for bookkeeping but my own new for T?

    • Deraynger

      I am wondering, how I can force VS2010 and XCode to make me include the headers, although another included header already contains it.

      STL also prefers to include all, and I myself don't always know if I'm using something part of the included header (e.g. A) or a header (B) included in the header (A).

    • STL

      KerrekSB> There's a type-erasing helper class for convenient one-off delimiter overriding, too.

      I consider type erasure to be unnecessary here, so it'll be interesting to compare our solutions. :->

      I took your is_container that looks for const_iterator - I tried to write my own that detected begin(), but it snapped VC's mind like a twig.

      KerrekSB> Are you saying that it just can't be made to work or doesn't make sense to permit two separate allocators, one for objects and one for bookkeeping

      It doesn't make sense. Why would you want to treat them differently?

      The node-based containers don't even allocate individual elements. They allocate nodes that contain individual elements.

      Deraynger> I am wondering, how I can force VS2010 and XCode to make me include the headers, although another included header already contains it.

      I'm not aware of any automated way to verify this. Having one would be nice.

    • STL

      Part 6 filming is scheduled for Friday, July 1.

    • Deraynger

      @STL: Cool, thanks for the info!

    • KerrekSB
      @STL: Great, looking forward to your solution! By the way, I also thought at first I should look for the existence of begin/end member functions, but I found the check for the iterator type sufficient in the end. One could have used expression SFINAE with pointer-to-member-function type for that, but I didn't see the need for it in the end.
    • Jonas

      @STL: Any change you can mention some info on how to do fuzzy grouping ?
      This example data:
      asd111bla
      asd22bla
      asd322bla
      other stuff_1
      other 2 stuff
      other stuff_4

      Will be grouped into this:
      Group 1:
      asd111bla
      asd22bla
      asd322bla

      Group 2:
      other stuff_1
      other 2 stuff
      other stuff_4

      I've been researching how to do this but i'm unsure how to achieve this.
      Comparing similarities of just two strings is relatively simple but the above is a problem.
      You can do it using the same technique as comparing two strings but that will be very slow.

      So can you show how to achieve this using the STL.

    • STL

      I filmed Part 6 today!

      Here's my Pretty Printer code: https://skydrive.live.com/redir.aspx?cid=e66e02dc83efb165&resid=E66E02DC83EFB165!292

      I won't answer questions about it until Part 6 is uploaded, though. There's no reason for me to explain stuff twice. :->

      Jonas: The "hard parts", which are independent of the STL, are determining the similarity between strings (e.g. with an edit distance algorithm), and determining when strings are dissimilar enough to begin a new group. The STL makes everything else easy.

      Here's an example where I'm judging similarity by common prefix length, and I consider strings with similarity < 2 to belong to different groups. You could imagine a more clever algorithm that, instead of using a fixed constant, adjusted it up or down to produce more or fewer groups, depending on the number of strings in the input.

      C:\Temp>type fuzzy.cpp
      #include <stddef.h>
      #include <algorithm>
      #include <iostream>
      #include <ostream>
      #include <string>
      #include <vector>
      using namespace std;
      
      size_t similarity(const string& a, const string& b) {
          const string& sm = a.size() < b.size() ? a : b;
          const string& lg = a.size() < b.size() ? b : a;
      
          return mismatch(sm.begin(), sm.end(), lg.begin()).first - sm.begin();
      }
      
      vector<vector<string>> fuzzy_grouping(const vector<string>& v) {
          vector<vector<string>> ret;
      
          for (auto i = v.begin(); i != v.end(); ++i) {
              if (i == v.begin() || similarity(i[-1], *i) < 2) {
                  ret.push_back(vector<string>());
              }
      
              ret.back().push_back(*i);
          }
      
          return ret;
      }
      
      int main() {
          vector<string> v;
      
          for (string s; getline(cin, s); ) {
              v.push_back(s);
          }
      
          vector<vector<string>> groups = fuzzy_grouping(v);
      
          for (auto i = groups.cbegin(); i != groups.cend(); ++i) {
              for (auto k = i->begin(); k != i->end(); ++k) {
                  cout << *k << " ";
              }
      
              cout << endl;
          }
      }
      
      C:\Temp>cl /EHsc /nologo /W4 fuzzy.cpp
      fuzzy.cpp
      
      C:\Temp>fuzzy
      cat
      catastrophe
      cataclysm
      catapult
      cave
      cavern
      dog
      doom
      doberman
      bear
      bearing
      bare
      barely
      ^Z
      cat catastrophe cataclysm catapult cave cavern
      dog doom doberman
      bear bearing
      bare barely

      I've made fuzzy_grouping() copy the strings because I'm lazy - you could easily have it return vector<size_t> with group lengths or something else.

    • KerrekSB
      When's the new episode coming online? Smiley Here's a tenuous suggestion -- "Extra Advanced STL"! Are you familiar with EASTL (https://github.com/paulhodge/EASTL) in any way? Why would someone need an alternative for the standard library, and why aren't the ideas of EASTL useful in the general standard? Is there anything a non-game programmer could learn from it?
    • Jonas

      @STL:
      That works surprisingly quite well when the data is sorted which makes it depend on the sorting algo for good groupings. That's sort of iffy.
      'mismatch' isn't even a longest common subsequence (LCS) or a pattern matching algorithm of any kind as far as i know.

      'mismatch' algorithm is a nice find though, didn't know of it.
      Do not quite understand it yet but i did just learn of it ;)

      I apologize for being unclear in my first message, as my defense i was very tired. Was up half the night coding on another thing.

      When i was talking about comparing strings for similarities i was talking about stuff like LCS (http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring) and pattern matching.

      What do you think of this, am i over-thinking it ?
      Can the stl help with this, as in an algorithm already exists that does this for me ? I can't find any thought.

    • Sago

      @KerrekSB patience dude, the infamous wmv encoder isn't known for it's speed (or efficiency for that matter)

      Also C++ videos isn't known to be prioritized on c9. They have disgustingly obvious favoritism issues here (c#, vb, wpf, silverlight, etc, languages and products owned by microsoft take priority / is more important)

      Btw have anyone else noticed that all these microsoft owned languages are much slower then the standard non microsoft owned languages ?
      Just compare vs2010 and vs2008, wpf did some nasty things to visual studio, a horrible mistake that should be reversed if you ask me but someones ego seems to be in the way for that to happen, sigh typical. No its not a money thing, moving to wpf was a big risk and risks cost much money when they fail period!

      anyway..

      That's why there's so few c++ videos on c9 although we were promised many c++ videos half a year ago.

      c9 have become an ad site for microsoft products ew!
      Where did the "for the developers" go ?
      Did popularity get to their head ?

      Oh please do prove me wrong but a simple search query doesnt lie.

      ah oh my who pushed the rant button.

      Mr Stephan T Lavavej keep up the good work without you there would be no reason to visit c9 anymore.

    • KerrekSB

      Mr Stephan T Lavavej keep up the good work without you there would be no reason to visit c9 anymore.

      Hah, for sure, thanks a lot for these videos -  great work, STL!

    • Charles

      @KerrekSB patience dude, the infamous wmv encoder isn't known for it's speed (or efficiency for that matter)

      Also C++ videos isn't known to be prioritized on c9. They have disgustingly obvious favoritism issues here (c#, vb, wpf, silverlight, etc, languages and products owned by microsoft take priority / is more important)

      Btw have anyone else noticed that all these microsoft owned languages are much slower then the standard non microsoft owned languages ?
      Just compare vs2010 and vs2008, wpf did some nasty things to visual studio, a horrible mistake that should be reversed if you ask me but someones ego seems to be in the way for that to happen, sigh typical. No its not a money thing, moving to wpf was a big risk and risks cost much money when they fail period!

      anyway..

      That's why there's so few c++ videos on c9 although we were promised many c++ videos half a year ago.

      c9 have become an ad site for microsoft products ew!
      Where did the "for the developers" go ?
      Did popularity get to their head ?

      Oh please do prove me wrong but a simple search query doesnt lie.

      ah oh my who pushed the rant button.

      Mr Stephan T Lavavej keep up the good work without you there would be no reason to visit c9 anymore.

      Whatever, dude.... -> Look here.

      C

    • Josh K

      @Charles lol you just got self owned by providing the c++ tag search link. "simple search query doesnt lie"

      Did you not read the post correctly ?

      Do you dismis and ignore the post because you are biased or just in a bad mood ?
      That post is feedback too. In a cranky form but it is still feedback.

      I'm conserned that you do not listen to your users.

      Sago sounds cranky but he do make good points. There are not many c++ videos made compared to other languages. There are c++ chitchat videos but they do not provide any useful information.

      We do need much more STL like videos. That provide useful information.

      The visual studio 2010 performace issues are also sadly true. I can mention it have memory issues too due to c# garage colletion.

    • Charles

      @Josh K: Self-owned? OK... I'm not in a bad mood. Here: Smiley

      If you look at the net new content over the past 6 months, I'd say that native content (C++) is increasing relative to the non-native stuff, but still not as frequent - I wasn't really arguing that. STL has a full time job as a C++ Library Developer - in fact, STL is the only STL-focused developer at Microsoft (meaning, he is the sole owner of the STL that ships with VC++...).

      Sorry we don't post one C++ video per day. I truly wish we could.

      C

    • STL

      KerrekSB> Are you familiar with EASTL (https://github.com/paulhodge/EASTL) in any way?

      I've looked at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html , and almost all of its concerns have either been fixed in C++0x or don't apply to VC ("The Microsoft C++ compiler does a rather good job of inlining").

      Jonas> That works surprisingly quite well when the data is sorted which makes it depend on the sorting algo for good groupings.

      It's looking at prefixes, so std::sort() would work just fine.

      Jonas> 'mismatch' algorithm is a nice find though, didn't know of it. Do not quite understand it yet but i did just learn of it Wink

      It just reports the first place where two sequences differ.

      Jonas> Can the stl help with this, as in an algorithm already exists that does this for me ?

      The STL does a lot of stuff, but it doesn't have every data structure and it especially doesn't have every algorithm that's ever been invented. It serves as an extremely useful foundation for implementing more complicated data structures and algorithms, though.

      I took the time to implement a more complicated fuzzy_grouping(). This one relies on levenshtein_distance(), which I've written (with Wikipedia's help) to accept arbitrary and possibly different forward iterators (this algorithm needs forward iterators, and can't accept input iterators). I've tested it with forward_list<char> and list<char>, for example.

      What it does is create a multiset of strings, ordered by decreasing size (so, largest first). It grabs the largest string from the multiset, and looks for "similar" strings in the rest of the set, where "similar" is defined as having a Levenshtein distance <= N/2, where N is the size of this "primary" string. The primary and all similar strings (possibly 0) are emitted as a group. I calculate the distance from the primary, to avoid the problem of grouping "cold cord card ward warm" together despite the first and last words having no letters in common.

      C:\Temp>type meow.cpp
      #include <stddef.h>
      #include <algorithm>
      #include <iterator>
      #include <numeric>
      #include <vector>
      using namespace std;
      
      // http://en.wikipedia.org/wiki/Levenshtein_distance
      template <typename FwdIt1, typename FwdIt2> size_t levenshtein_distance(
          FwdIt1 first1, FwdIt1 last1, FwdIt2 first2, FwdIt2 last2) {
      
          const size_t size1 = distance(first1, last1);
          const size_t size2 = distance(first2, last2);
      
          if (size1 > size2) {
              return levenshtein_distance(first2, last2, first1, last1);
          }
      
          vector<size_t> prev(size1 + 1);
          vector<size_t> curr(size1 + 1);
      
          iota(curr.begin(), curr.end(), 0);
      
          FwdIt2 iter2 = first2;
      
          for (size_t j = 0; j < size2; ++j, ++iter2) {
              prev.swap(curr);
      
              curr[0] = j + 1;
      
              FwdIt1 iter1 = first1;
      
              for (size_t i = 0; i < size1; ++i, ++iter1) {
                  if (*iter1 == *iter2) {
                      curr[i + 1] = prev[i];
                  } else {
                      curr[i + 1] = min(curr[i], min(prev[i + 1], prev[i])) + 1;
                  }
              }
          }
      
          return curr.back();
      }
      
      #include <algorithm>
      #include <iostream>
      #include <ostream>
      #include <regex>
      #include <set>
      #include <string>
      #include <vector>
      using namespace std;
      
      struct size_greater {
          template <typename T> bool operator()(const T& a, const T& b) const {
              return a.size() > b.size();
          }
      };
      
      vector<vector<string>> fuzzy_grouping(const vector<string>& v) {
          multiset<string, size_greater> s(v.begin(), v.end());
      
          vector<vector<string>> ret;
      
          while (!s.empty()) {
              string primary = *s.begin();
              s.erase(s.begin());
      
              ret.push_back(vector<string>());
              ret.back().push_back(primary);
      
              const size_t similar = max(static_cast<size_t>(1), primary.size() / 2);
      
              for (auto i = s.begin(); i != s.end(); ) {
                  if (levenshtein_distance(primary.begin(), primary.end(), i->begin(), i->end()) <= similar) {
                      ret.back().push_back(*i);
                      s.erase(i++);
                  } else {
                      ++i;
                  }
              }
          }
      
          return ret;
      }
      
      int main() {
          const regex r("\\w+");
      
          vector<string> v;
      
          for (string s; getline(cin, s); ) {
              v.insert(v.end(), sregex_token_iterator(s.begin(), s.end(), r), sregex_token_iterator());
          }
      
          vector<vector<string>> groups = fuzzy_grouping(v);
      
          for (auto i = groups.cbegin(); i != groups.cend(); ++i) {
              for (auto k = i->begin(); k != i->end(); ++k) {
                  cout << *k << " ";
              }
      
              cout << endl;
          }
      }
      
      C:\Temp>cl /EHsc /nologo /W4 /MTd meow.cpp
      meow.cpp
      
      C:\Temp>meow
      I call our world Flatland, not because we call it so, but to make its
      nature clearer to you, my happy readers, who are privileged to live in
      Space.
      Imagine a vast sheet of paper on which straight Lines, Triangles,
      Squares, Pentagons, Hexagons, and other figures, instead of remaining
      fixed in their places, move freely about, on or in the surface, but
      without the power of rising above or sinking below it, very much like
      shadows--only hard with luminous edges--and you will then have a pretty
      correct notion of my country and countrymen.  Alas, a few years ago, I
      should have said "my universe:"  but now my mind has been opened to
      higher views of things.
      ^Z
      privileged
      countrymen country
      Triangles
      Pentagons Hexagons
      remaining rising
      Flatland
      straight
      luminous
      universe
      because
      clearer
      readers years
      Imagine
      Squares
      figures
      instead
      surface
      without with
      sinking
      shadows
      correct
      nature are
      places Space paper Alas
      freely pretty
      notion not
      should world
      opened
      higher other
      things Lines then
      happy
      sheet
      which
      fixed
      their the the
      about above but but but
      power
      below
      edges
      views
      call call will
      make move like have have
      live
      vast has
      very
      much
      only on on
      hard said and and and
      mind in in in
      been
      our or or
      its it it
      you you
      who
      few
      ago
      now
      we
      so to to to to
      my my my my
      of of of of of
      I a a a I
      

      Again, I make no claims that this is an especially intelligent algorithm - it's just an example of what you can write with the STL.

    • Deraynger

      A few newbie questions (while waiting for part 6) regarding namespaces etc.:

      1. In the .cpp file, what do you consider best (or what's correct)?

      • Use a using namespace MyNamespace;
      • Surround it with the same namespace i.e. namespace MyNamespace { <All functions> }
      • Use the namespace like void MyNamespace::MyClass::Do() { <code> }

      2. If I have a Class in src/Utils/MyClass.h, and it relies on src/Utils/Detail/OtherClass.h, should I

      •  add the whole namespace i.e. Utils::MyClass::DoSth() { Utils::Detail::OtherClass }
      • or just the deeper data, i.e. Utils::MyClass::DoSth() { Detail::OtherClass
      • and should I (again) use a using like using Utils::Detail::OtherClass  (or using Detail::OtherClass based on the above point)?
      • or rather of the namespace: using namespace Utils::Detail;

      3. Do you add the whole namespace when calling something from another structure?

      • Like src/Helpers?
      • and in the same structure?

      4. Should I include from the base path (like boost does it)?

      • #include "Utils/Detail/OtherClass.h"
      • or only lower structure, if I'm calling it within Utils then just #include "Detail/OtherClass.h"

      5. What's if it's lower, i.e. from Details back to some other class, e.g. Helpers, what do you recommend?

      • #include "Helpers/SomeHelper.h"
      • #include "../../Helpers/SomeHelper.h"

      I know about using a shorter name for long namespaces (not sure of the syntax): using Det = Utils::Details; or something like that, but am not such a fan of it.

      If I could decide, I'd choose, deeper Namespaces don't need to be used with the whole namespace, just the deeper part, e.g. Detail::OtherClass if calling from Utils to Utils/Detail. Higher ones would need the whole one (but that makes it less readable IMO).

      In boost, the only thing I found in a source file was a huge list of using Full::Namespace::SomeClass;

    • Josh K

      @Charles: I know STL is very busy. I said "STL _like_ videos". You got to have more pepole then just STL that code in c++ and have good information and experience that we can learn from, right ?

      "native content (C++) is increasing relative to the non-native stuff"
      Most if not all of them are chitchat videos. If you're lucky only 5 minutes of a 60 minutes chitchat video have good information.

    • Charles

      @Josh K: You're asking for more tutorial-lecture-demonstration type native content. Got it....

      C

      PS: Yes, there are obviously many C++ devs capable of doing this... That alone doesn't equate to the real potential for more STL-like content. (not everybody wants to lecture, or be on video, or do screencasts or even blog). I will try and get more folks to do something on C9 in a tutorial style for native developers.

    • KerrekSB
      Charles, first off many thanks for all the work you put into this website - that's really cool. Second, don't feel compelled to force anything, if it doesn't come naturally, nobody needs to be on screen just for the sake of it. That said, it's a genuine pleasure listening to anything STL or Herb Sutter have to say (his last video Q&A on C++0x was actually pretty substantial!); I guess the pleasure comes from hearing someone who has a deep understanding of the matter and a clear way of thinking about it (e.g. any interview or talk by Bruce Schneier is equally enjoyable to watch).

      If you have more people like that who would bring that clarity and insight to the screen, then by all means try and record them! Smiley I'm sure that substantial content on any sort of programming (but native especially) would always be appreciated!

      How about a series on parallelism, for example? Smiley

    • STL

      Deraynger:

      #1: I've traditionally defined member functions at global scope with fully qualified names (MyNamespace::MyClass::MyMember). However, this is moderately verbose, especially if return types need to be qualified. (To the right of the parenthesis in ReturnType MyNamespace::MyClass::MyMember(STUFF, you can use names from MyNamespace without qualifying them as such, because the compiler has already seen you mention MyNamespace. This is not true for the ReturnType.)

      I might switch to defining member functions in a reopened namespace MyNamespace { }.

      A using-directive (that's what "using namespace MyNamespace;" is called) would feel weird for the purposes of defining member functions.

      #2: Only Detail::OtherClass is necessary. Within Utils::MyClass::DoSth(), unqualified name lookup (to find "Detail") searches "inside out", and will search Utils (which has the Detail you want) before reaching into the global namespace (which is where you don't want to go).

      You can use a using-directive (using namespace Detail;) or using-declaration (using Detail::OtherClass;) if you like.

      However, remember that using-directives and using-declarations should NEVER appear at global scope in headers. That is enormously polluting and defeats the whole point of namespaces.

      #3: I don't understand this question. As a general rule, I avoid writing unnecessary or redundant code (with very specific exceptions), and that includes unnecessary qualification.

      #4: It depends - something like Boost is a very special and complex case.

      If I were maintaining something in Boost, I'd include "my own stuff" with "relative paths", so boost/kittens/cat.hpp would include boost/kittens/detail/feedme.hpp with #include "detail/feedme.hpp". cat.hpp's own directory is first in the search order, so this should preserve everything if the whole tree is moved to boost/cute_fluffy_kittens/... I'd include "other stuff" with "base paths", like #include "boost/tuple/tuple.hpp", again to make myself immune to my own directory being moved.

      #5: I would avoid ".." if possible.

      I may have gotten something wrong with this include path stuff - it's been a long time since I've worked with deeply nested directories. The STL has an extremely flat directory structure, and my projects at home are the same way (and smaller).

      > I know about using a shorter name for long namespaces (not sure of the syntax):
      > using Det = Utils::Details; or something like that, but am not such a fan of it.

      They're called namespace aliases: "namespace MSKittens = Microsoft::Cute::Fluffy::Kittens;"

      I have used them for extremely verbose nested namespaces. My idea of a long namespace name is greater than 5 characters.

    • Deraynger

      STL:

      Thanks a lot for taking your time to answer my questions.

      Regarding #1, so it's fine then that I do namespace kittens { <functions> } in the source file?

      Regarding #2, not declare using directives/declarations in headers, I know that from your first STL series Smiley

      Regarding #3, you basically answered it with boost. What I meant was, I have boost/kittens/detail/feedme.hpp and need to reference something, say boost/kittens/cat.h, e.g. a friend. In feedme.hpp would you do: friend class boost::kittens::cat;? Second part of it was, if it's refering to something in boost/helpers/stl_utils.hpp, would you use the fully qualified name (which you would, based on your answer to #4).

       

      New questions (while still waiting Tongue Out). This always bothers me, since I don't know these licensing details (maybe you might).

      Are we allowed to use your code legally for ourselves, by just downloading it, and should or may we add the following 2 lines?

      // Copyright (c) Stephan T. Lavavej 2011. All rights reserved.

      // Copyright (c) Dopefish 2011. All rights reserved.

      Can we also just add our copyright (although I would probably add both, so I know where I got it from)?

      I got some files from someone online, he said I could use it how I want, commercially/modified etc., does that mean I can also change the copyright notice (would still keep his name in the file, where it still somehow represents the same class/file)?

      What happens if I retype a code from someone, by looking at it, and modifying it. Do I still need to include his copyright notice? What's if the code is (maybe partially) on a blog. Can I then just use it without hist copyright notice?

    • Ivan

      @ Charles & STL
      I agree with Josh K, although STL is pretty rare(person that likes his job so much that he is willing to spend time to explain his work in a lecture) I'm sure that you could find some other people like(or similar to) STL. For me I would like(ofc desires of one person are irrelevant but this is an example) a lecture on something that Herb mentioned: C++ Next or C++ Prim-language isomorphic to C++ but much easier to parse(=better tools, better compile speed, maybe even speed if it could make it easier to compiler to prove some things so that it can perform optimizations). I know that this isnt in development(at least not public) but IMHO people who understand it like Herb or C++ compiler people could do a lecture on it with some basic examples with like 2-3 hours of preparations. There is a bunch of stuff that you could ask them.
      @ STL
      Speaking of the compiler people :) :
      1. Why dont you bother them to make you(and all other lib writers ofc) a _Ugly to normal compiler(you use normal naming convention to write libraries but when you want to ship the product you call makemeugly.exe on source code).
      2. about namespaces and headers and poluting - why dont you ask (nicely :) ) compiler people to do the similar thing like i proposed in 1.
      #local_using - that applies to the current file only.
      That means that when they compile code to the "real" file they would remove #local_using namespace std; line and replace all vector with std::vector, swap with std::swap and so on...

      P.S. I know that I might sound ungrateful because all this is free and STL lectures are awesome and I apologize for that, but it is like Stephen wrote on the whiteboard- there are no books. So I'm now hooked on STL lectures and I would like more of the same drug.

    • STL

      Deraynger> Regarding #1, so it's fine then that I do namespace kittens { <functions> } in the source file?

      Yes, that's fine.

      Deraynger> In feedme.hpp would you do: friend class boost::kittens::cat;?

      If the class granting friendship is boost::kittens::detail::can_opener, I'd say friend class cat; because unqualified name lookup is inside-out.

      Deraynger> Are we allowed to use your code legally for ourselves, by just downloading it, and should or may we add the following 2 lines?

      I cannot answer legal questions.

      Ivan> 1. Why dont you bother them to make you(and all other lib writers ofc) a _Ugly to normal compiler(you use normal naming convention to write libraries but when you want to ship the product you call makemeugly.exe on source code).

      Internal names need to be _Ugly, but external names need to be pretty. It's difficult to decide programmatically which is which, especially given our convention of having external algorithms like count_if() call internal helpers like _Count_if().

      Ivan> 2. about namespaces and headers and poluting - why dont you ask (nicely Smiley ) compiler people to do the similar thing like i proposed in 1. #local_using - that applies to the current file only.

      As a general rule, I am opposed to non-Standard extensions.

    • Charles

      @KerrekSB: Did you watch the C++ AMP presentations? I will be doing some more interviews with the PPL people and related tooling. Native parallelism is indeed an interesting topic. I've been wanting to visit the C++ compiler people, but alas they are rather swamped building the next version of the compilers. I would like to spend some time with the VC IDE people, but alas they are swamped building the next version (and like everything else around here -> we can't talk about it until it's ready to be talked about). My hope is that by BUILD, we'll have a lot to share for native devs. Let's see....

      C

    • Ivan

      @STL
      one quick one regarding the standard: why doesnt the standard define that compiler/linker should be able to figure that it included the same file already? include guards smell like 80s spirit :)
      @Charles
      I know that my every comment is something else on my wishlist, but I noticed that it looks like that Herb has a secret crush on Clojure. Could you bug him to make a short video explanation why. Ofc I might be totally wrong about Herbs feelings about Clojure.

    • STL

      Ivan> why doesnt the standard define that compiler/linker should be able to figure that it included the same file already?

      Sometimes you do want to be able to include a subheader repeatedly (like VC10's <xxshared>, already deleted in VC11).  It would be trivial to Standardize #pragma once with a different name like #once.  At the cost of breaking backwards compatibility, idempotency could be made the default, with repeated inclusion enabled by #many or whatever.  (Migration could be achieved with compiler/preprocessor switches.)

      I suspect that nobody on the Committee considered it an urgent enough problem to fix - expert attention is a finite resource.

    • Charles

      @Ivan: I know nothing of Herb's feelings with respect to Clojure. He may like it (many people who like Lisps do), but I don't know what makes you think he has a crush on the language. Next time I speak with him, I'll ask.
      C

    • spons

      @STL
      Good to know variadic templates have been implemented Wink.

    • Ivan

      @ Charles :
      I was overstating it a bit, but that is the feeling I got from one interview...I remember Herb mentioning that Clojure does some things in an interesting way, or something like that. Afain I know that Herb and STL time is scarce and precious so I would be happy with ANY deep content(for example I didnt really like AMP presentation because it was very high level overview-that is important but I prefer hard core deep stuff like C++0x design decisions and Advanced STL series..) from Herb
      @ STL this might be a stupid question but why would you want to "include subheader repeatedly"
      If this is a stupid question skip it. :) Its just that I never heard something like that. I mean I know that you can include header 100x, that is why include guards exist, but I got a feeling that you are talking about something else.

    • Charles

      Here is part 6, driven by part 5 and Niners! Thank you, STL. You are a scholar and a gentleman Smiley

      http://channel9.msdn.com/Shows/Going+Deep/C9-Lectures-Stephan-T-Lavavej-Advanced-STL-6-of-n

      C

       

    • SMD

      Couldn't you use std::shared_ptr with a custom deleter instead of boost::scoped_exit?

    • Michael J

      Hi Stephen

      A potential bug. Your code seems to assume that sha256 never has collisions. While collisions are rare, they do happen. If you have two different files generate the same hash, you will lose one of them.

    Comments closed

    Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.