Entries:
Comments:
Posts:

Loading User Information from Channel 9

Something went wrong getting user information from Channel 9

Latest Achievement:

Loading User Information from MSDN

Something went wrong getting user information from MSDN

Visual Studio Achievements

Latest Achievement:

Loading Visual Studio Achievements

Something went wrong getting the Visual Studio Achievements

C9 Lectures: Stephan T Lavavej - Advanced STL, 5 of n

Download

Right click “Save as…”

There are two STLs: the Standard Template Library and Stephan T. Lavavej Smiley

Advanced STL covers the gory details of the STL's implementation -> you will therefore need to be versed in the basics of STL, competent in C++ (of course), and be able to pay attention! Stephan is a great teacher and we are so happy to have him on Channel 9—the only place you'll find this level of technical detail regarding the internals of the STL. There are no books. There are no websites. This is Stephan taking us into what is uncharted territory for most, even those with a more advanced STL skill set.

In this 5th part of the n-part series, STL digs into the Boost Library (http://www.boost.org). In his words, it's an open source, super quality, community-driven STL++. Stephan will walk you through a sample application from end to end, using boost.

[Advanced STL]

Part 1 (shared_ptr - type erasure)

Part 2 (equal()/copy() - algorithm optimizations)

Part 3 (_ITERATOR_DEBUG_LEVEL, #pragma detect_mismatch, and /d1reportSingleClassLayout)

Part 4 (rvalue references v2.1 and associative container mischief)

Part 5 (deduplicator, using Boost.Bimap/Filesystem/ScopeExit) - see Stephan's deduplicate.cpp

Part 6 (container pretty printer) - see Stephan's pretty_printer.cpp

Tags:

Follow the Discussion

  • Joshua BoyceJoshua Boyce

    Any chance we could get the source code posted?

  • Thomas PetitThomas Petit

    Very cool to see some Boost goodness in channel 9 ! Thanks for the vid. (impeccable timing with BoostCon too :)

    But Stephan, you miss the perfect importunity in your function read_file to sneak one of the little underused gem of boost: boost.iostreams.mapped_file.

    I have found it to be the best way to deal with binary files hands down. It's :

    1) Extremely fast. Very often I have seen speed access being an order of magnitude faster than C++ iostream or C stdio functions (on Windows)
    2) Super easy to use and concise. begin() or data() return a char* to the first byte of the mapping and then you can simply iterate through it and the system will take care of everything.

    For example read_file can be rewrite in two lines like this :

    #include <vector>
    #include <string>
    #include <boost/iostreams/device/mapped_file.hpp>

    std::vector<char> read_file(const std::string& name)
    {
    boost::iostreams::mapped_file_source file(name);
    return std::vector<char>(file.begin(), file.end());
    }

  • new2STLnew2STL xkcd.com

    @Thomas Petit: Thats one of the motives people want the boot::iostreams on next C++. Mapped files are part of any POSIX-compliant system, and provides its benefits and drawbacks on accessing file I/O. It is recommended for big files, for huge chunks of data in and out and for sequential access, it can cause excessive page faults if used too frequently and randomly. In the case of Stephan, conventional file manger should be suffice Smiley (unless he intentionally use mmap for show it like you suggest Wink )

    http://msdn.microsoft.com/en-us/library/ms810613.aspx

    http://en.wikipedia.org/wiki/Memory-mapped_file

  • STLSTL

    Joshua Boyce> Any chance we could get the source code posted?

    Yep, sorry - I've been slammed with work and haven't gotten a chance to clean up my source code like I planned, or post a How To Build Boost article. Here's the "original recipe" version, as explored in the video:

    http://cid-e66e02dc83efb165.office.live.com/self.aspx/deduplicator/deduplicate.cpp

    Disclaimer: the SHA-256 code is very lightly tested. As I recall mentioning, I originally wrote this at home, where I have an SHA-256 implementation written from scratch and exhaustively tested. But it's large and bulky and I wanted to show off Boost.ScopeExit, so I ripped it out and replaced it with the <bcrypt.h> machinery in a matter of minutes. It seemed to work fine but I wouldn't be surprised if I had gotten something wrong in the conversion.

    The Boost.Filesystem and Boost.Bimap machinery is essentially unchanged from what I had at home, and I feel much more confident that that works as expected.

    Thomas Petit> But Stephan, you miss the perfect importunity in your function read_file to sneak one of the little underused gem of boost: boost.iostreams.mapped_file.

    Cool, I wasn't familiar with that. (My knowledge of Boost is surprisingly patchy. There are the parts that I used in college, where I used a bunch of stuff but it's all really old now. Then there are the parts that have been incorporated into TR1/C++0x - I'm very familiar with that, and reasonably familiar with the deltas between Boost and the Standard, since I've spent years working on them. And then there are the parts that I've played with at home recently, like Boost.Bimap. In between, there are a bunch of libraries that I haven't used, and parts of libraries that I haven't explored.)

  • WOW~!!~ BOOST!!

    Big Smile

    Thanks Charles and Stephan!!

  • VERY awesome! Just started using a little boost (just shared_ptr and noncopyable headers), so this comes in handy. I wouldn't mind some more boost videos.

    One minor complaint is the time constraint you seem to have. Does Charles decide that, or did you have to get back to work?

     

    You mentioned that use use stdio.h (performance I guess). I've extremely tried to stick with pure C++ (as part of me learning it correctly), a lot seem to recommend stdio though (for performance?).

     

    What would be cool is some OpenGL/DirectX use of STL, like a texture manager, mesh loader etc.

     

    Some general questions:

    How would you return the vertices vector to an OpenGL function? What I do is (don't have it here, but somehow) something like:

    inline const vector<vert3>& Verts( void ) const 
    {
        return _vertices;
    }
    
    // Somewhere in loop code
    glDrawElements( someVal, someVal, someVal, &myObj->Verts()[0] );
    

    Probably wrong (and a bunch of things I don't like, const correctness not guaranteed as elements can be modified, returning references are bad, etc. ), and should I return &vert3[0] (I guess it is the same thing?).

    I don't want to copy the vector (obviously), especially as it gets called very often. I'm also not sure if my function is inline... would that disable RVO?

     

    How about vector<unsigned char> containing color information, and the format is BGR, but I need RGB. Is there a better way, then using a for loop and store a temporary of B, assigning R to B, and the temporary to R?

     

     ...and finally, a last question. Should simple function variables, that are just assigned once (per function call), and not modified later on, be defined const (e.g. const float someValue = x / y;)?

  • new2STLnew2STL xkcd.com

    @Deraynger: Well, part of your questions are better fit on the XNA Blog team or use the Ch9 forums. I've opened a thread for use SSE align with std::unique_ptr and std::array (the two play nice with _aligned_malloc), I could expand the forum with stuff you want like the load of textures (pretty easy with WIC) or you can open a new thread and I'll be happy share my experiments with you.

    In case of BGR and RGB its a old case of endianness, you need see if OpenGL (or DirectX) provide support for both in hardware (extensions string), if don't have support you only way is switch the positions in host memory or in device memory (gpgpu).

  • @new2STL: Hey, yes, just thought STL or someone here probably has some tricks up their sleeves.

    The BGR/RGB is for OpenGL ES. Would have to look again, but I don't think it supports BGR. I mainly wanted to know, whether there's a better way to swap all.

    XNA is C# or .NET, isn't it?

    I do have my own loaders/managers etc., was just thinking that I could probably see better code, and improve mine with STL goodness, like maybe making a template class for the different texture types, instead of deriving from an interface/abstract class or automatically deducing the correct loader to return from a factory class without having to explicitly define it (e.g. if (fileType == FileType::PNG) { return new PngTextureLoader(); } else if ... etc.)

  • new2STLnew2STL xkcd.com

    @Deraynger: About XNA, its mainly C# for cross-platform, but have lots of native code too due XBOX/PC optimization, most of the helper library in the DirectX SDK falls into the xna namespace

    OpenGL ES are a subset of OpenGL 2.1, in 2.1 the BGR textures turned in core but for ES I think they are optional. Anyway, you can run a pixel shader for swap the bits positions, there are many ways to do it. But if the texture is small (lets say, a square tile of 64 or 128 pixels) it can be done in host memory with little impact in the performance.

    I had to say I'm watching all the awesome videos here on Going Deep for the same reason as you, to better learn STL and use its power in better code Big Smile

  • @new2STL: Hey, ok thanks for the info on that. Never really did anything in DX.

    Ok regarding the BGR swapping or doing it with a shader. It isn't anything critical, just wrote what came to mind, and could be improved Wink

    What bothers me with the general OpenGL books/examples, ist that it's almost always written for C (also new ones, for portability Perplexed). So, definitely no STL in them, meaning that there are also no good examples of how to create some ideal classes that everyone needs.

    Yes, I guess these videos are quite popular (if the views on the top are somewhat accurate) Smiley

  • CharlesCharles Welcome Change

    @Deraynger: I don't decide how much time STL gets for his lectures... He is given one hour of studio time which means he can't do one hour of lecture time... Set up and all that. Then there is how much work STL has to do at his day and night job. Its amazing he has 30 minutes in his day to do these. Thank you, STL!!!!! Smiley

     

    C

     

  • IvanIvan

    Urgh 35 min limit is to harsh. : Otherwise great video!
    I can agree with the boost fans here... great to see Boost videos!
    I have a couple of questions.
    1. Why no timestamp on the batch file. Seems unsecure to trust old hashes and not to check modification date of the files. Then again maybe I misunderstood the program.
    2. Why hash files that have unique size?(btw that could have been a nice example of using STL, although it wouldn't be elegant: I presume that you could do multimap<size, file> and for every block of identical sized files do potential elimination). That would be nice to see because I don't know a nice way to do "for_each_multimap_block_that_has_the_same_key" elegantly.
    Then again maybe I misunderstood the program, again. :)
    3. Isnt there some other "normal" sha256 lib that doesnt require all the cleanup... TBH I dont know why it does require so much cleanup in the first place-maybe because OS tries to prevent other processes from looking. :)

  • Boost is truly great, not just from a content standpoint, but in how fast they release new versions/updates. I'm using it primary for TR1 support, but there's alot of other excellent stuff I've used:

    • String Algorithms/Format (boost::format is probably my most frequently used non-TR1 feature)
    • UUID
    • Exceptions/error_info
    • Serialization
    • noncopyable
    • Slots/Signals
    • Threads
    • checked_delete
    • FileSystem/path
    • Date/Time
    • IOStreams/zlib support
    • ASIO (networking)
    • Test (Although, I actually prefer UnitTest++ for unit testing)

    I'm off to try mapped files, something I can use that I wasn't aware of - thanks Thomas.

    , Deraynger wrote

    Some general questions:

    How would you return the vertices vector to an OpenGL function? What I do is (don't have it here, but somehow) something like:

    1
    2
    3
    4
    5
    6
    7
    inlineconst vector<vert3>& Verts( void ) const
    {
        return_vertices;
    }
     
    // Somewhere in loop code
    glDrawElements( someVal, someVal, someVal, &myObj->Verts()[0] );

    Why not just return the pointer?

    vert3 const* Foo::vertices_data() const
    {
        if (_vertices.empty())
            throw std::runtime_error("See Effective STL, item 16");
        return &_vertices[0];
    }
    
    ...
    
    Foo foo;
    vert3 const* data = foo.vertices_data();

  • STLSTL

    Deraynger> I wouldn't mind some more boost videos.

    I'll see what I can do - there are several more Boost libraries that I'm familiar with.

    Deraynger> One minor complaint is the time constraint you seem to have. Does Charles decide that, or did you have to get back to work?

    That's all my fault. I'm insanely busy with VC11, and I didn't have time to prepare an example in advance. So on the day of filming, I furiously hacked out the deduplicate.cpp that I presented, making me run several minutes late. By the time I got to the studio, we had to finish up fast.

    Deraynger> You mentioned that use use stdio.h (performance I guess).

    I specifically have a love-hate, mostly hate, relationship with iostreams. iostreams are type-safe and buffer-safe, while stdio is not. On the other hand, iostreams performance is typically miserable, and trying to do anything tricky with them will expose you to their Standard-mandated guts which are absolutely awful (the contrast in design between iostreams, full of virtual nonsense, and the STL proper, is remarkable).

    iostreams are fine for simple text I/O when performance doesn't matter. For binary I/O, iostreams don't really buy anything, and I prefer to wrap stdio.h instead (it must be wrapped, because manipulating FILE * directly is a recipe for resource leaks - as I eternally have to explain).

    Deraynger> What would be cool is some OpenGL/DirectX use of STL, like a texture manager, mesh loader etc.

    I have code at home that wraps OpenGL with Boost and the STL, but lifting it out into something presentable would take forever.

    Deraynger> should I return &vert3[0] (I guess it is the same thing?).

    v.data() is the C++0x way to get a pointer to a vector's elements. Unlike &v[0], it is correct when v is empty (but in that case the pointer can't be dereferenced, obviously).

    Deraynger> I don't want to copy the vector (obviously), especially as it gets called very often.

    Remember move semantics.

    Deraynger> I'm also not sure if my function is inline... would that disable RVO?

    According to my knowledge, the compiler's decisions to actually-inline something, and to apply the RVO/NRVO, are independent.

    Deraynger> Is there a better way, then using a for loop and store a temporary of B, assigning R to B, and the temporary to R?

    Not manipulating the data at all with the CPU, and getting the GPU to do it in a shader, would be ideal. Shaders are crazy powerful. (Even better, if you control the source, just ensure that the data is in the right format to begin with. The cheapest code is the code you never execute.) Otherwise, you can loop through the vector and std::swap() the colors.

    Deraynger> Should simple function variables, that are just assigned once (per function call), and not modified later on, be defined const (e.g. const float someValue = x / y;)?

    Yes. That clarifies intent and prevents mistakes.

    Ivan> Seems unsecure to trust old hashes and not to check modification date of the files.

    If a directory is deduplicated, and then files are modified (or deleted!), then data has been irrevocably lost. "Don't do that then."

    Ivan> Why hash files that have unique size?

    That would be a reasonable optimization - avoid hashing unique-sized files, unless and until duplicate-sized files appear (because the deduplicator can be run on a set of files, more files can be added, and then the deduplicator can be run again). Good idea, I hadn't thought of that.

    Ivan> I don't know a nice way to do "for_each_multimap_block_that_has_the_same_key" elegantly.

    This is actually super easy and elegant. Use multimap::equal_range().

    Ivan> Isnt there some other "normal" sha256 lib that doesnt require all the cleanup...

    SHA-256 can be written from scratch, and I've done that - but this is how the Windows API exposes it. The usual advice, which is good advice, is to never attempt to implement anything crypto-related by yourself. In my case, I learned C by implementing SHA-1, and I'm extremely comfortable (and careful!) with such code. Windows' implementation is probably faster than mine, though - I never resort to assembly.

    I would have used Boost for this, but they don't yet have an implementation of SHA-256.

  • MarekMarek

    >> Windows' implementation is probably faster than mine, though - I never resort to assembly.
    You should try some library like OpenSSL which is optimized to use dedicated co-processor (if you have one, such in VIA processors) or special instruction set included in newest Intel processors.

  • Thanks to all for your answers!!! C (& STL), I'm very grateful for any time you take, especially when you're very busy! These are the only C9 videos I follow, so I'm always waiting for some new ones, and then it doesn't last too long. But rather short than none Smiley Thanks again! @STL: I'm not able to use move semantics, since too few compilers support it. Regarding your STL, Boost & OpenGL stuff, maybe you could just show a portion of it, and describe it a bit. Do you know of a smart (STL?) way to return an instance from a factory class, without having to use some if/else? I thought that one could use some traits (forgot the name again, but like you did in the container helper class in the first series, with typedef'ing the items). Then any class deriving from say a texture loader abstract class, could define what they support, say ".raw", then the rawLoader would be returned, when a raw file(-name) gets passed to the factory? Maybe I just need to study your example a little better there.
  • new2STLnew2STL xkcd.com

    > STL: I have code at home that wraps OpenGL with Boost and the STL, but lifting it out into something presentable would take forever.

    O yeah! Your text rendering engine, pretty nice Smiley

  • Marek, exactly, just use the OpenSSL crypto library. It contains every hash algorithm anyone would ever want to use. They're all implemented using the same set of functions (init, update, finalize). You won't have to implement anything yourself to support buffered I/O. The documentation seems to be outdated so browse the source to see what's available (sha256 is in the sha dir).

  • STLSTL

    Marek> You should try some library like OpenSSL which is optimized to use dedicated co-processor (if you have one, such in VIA processors) or special instruction set included in newest Intel processors.

    I expect and believe that the Windows implementation uses special instructions when applicable and available.  This is true for things like memcpy() in the CRT.

    Deraynger> Do you know of a smart (STL?) way to return an instance from a factory class, without having to use some if/else?

    If I had to do something like that, I'd construct a std::map<Key, Factory> and populate it accordingly. For example, the Key could be a std::string storing a filename extension, and the Factory could be a std::function<std::shared_ptr<ImageParser> ()>. Having written non-member functions std::shared_ptr<ImageParser> makePNGParser() and std::shared_ptr<ImageParser> makeJPEGParser(), I could say m["png"] = makePNGParser; m["jpeg"] = makeJPEGParser (or m.insert(std::make_pair("png", makePNGParser)) if I were so inclined). Later, given an extension, I could perform a lookup to determine whether the extension was supported, and if so, retrieve a factory function for a parser.

    However, when faced with this problem in the past, I've simply performed a sequence of regex_match()es, and called the proper loading function, without any factory functions being involved. I do use inheritance, in a highly structured form - always with non-virtual interfaces, always with shared_ptr<Base> - but only when the situation actually demands it - that is, when I need different types for something, but later I want to forget the specific types and treat them uniformly at runtime. (If I want to treat different types uniformly at compiletime, that calls for templates.)

  • C64C64

    This Boost.Filesystem library seems cool (especially the iterator tool), thanks for sharing.

    BTW: Does it work properly with Unicode characters? I see your code uses std::string, it would be good if we could safely use it with Unicode filenames. Or does Boost.Filesystem use UTF-8 as Unicode encoding instead of UTF-16, so std::string is OK in this case?

    Thanks for clarifying this.

     

  • Something I just remembered - if you want a quick and easy way to start using boost, this site provides an installer for the headers and pre-built binaries: (VS2003-VS2010, 32-bit only)

    http://www.boostpro.com/download/

    , C64 wrote

    BTW: Does it work properly with Unicode characters? I see your code uses std::string, it would be good if we could safely use it with Unicode filenames. Or does Boost.Filesystem use UTF-8 as Unicode encoding instead of UTF-16, so std::string is OK in this case?

    Thanks for clarifying this.

    The path class supports multiple character types. So you can replace these:

    void free_function(std::string const& file_name);
    void free_function(std::wstring const& file_name);

    with this:

    void free_function(boost::filesystem::path const& file_name);

    One function instead of one for each string type you need to support.

    http://www.boost.org/doc/libs/1_46_1/libs/filesystem/v3/doc/reference.html#class-path

  • IvaIva

    Stephen again tnx for the answers.
    IMHO good topic for next video could be regex>. Why? Because I have some CS background and I was surprised that VS had regex bugs- I always thought that regex machinery is some DFA/NFA generator that it is hard to do, but when you do it then it is hard to miss bugs.
    Also if you will still be making videos in the future the day that VS 2012 goes public beta you could make a video detailing some cool new feature that was NDA just 1 day ago. :)
    Also more boost would be great.

  • IvanIvan

    ups, post above was me, sorry for the possible confusion.

  • @STL: Ok, will definitely look into it. I still don't fully get the benefit from the NVI pattern. Gotta re-look it up in the Effective/More Effective C++ book.

    Edit: Looked it up, but if I have a one step method (no setup etc. needed, then a public virtual function should be fine without NVI).

    Edit 2: After looking at the Template Method Pattern, I think I understand what you mean. If I'm not mistaken, then the Base class(using NVI, possibly having an abstract member function) could be acting instead of a seperate factory class having to return an instance. The Base class would then know load the item itself (possibly having a overriden loader method in a derived class). I'll have to try and let you know if I don't get further Smiley  

    I see the benefit of the map, providing better code-reuse, without having to modify several code-sections. Runtime lookup satisfies my needs completely.

    What I also don't understand is the shared_ptr<Base>, is that an aggregation, and not an inheritance?

  • Looking at this video when you're so stressed is painful !
    I get stressed just by watching you Perplexed

    I agree with other viewers, STL needs more studio time.
    Good c++ videos a rare on c9 so you should give them more time. Much much more time Smiley

    I took a look at scope_exit.hpp, oh the horror of macro's, simply unreadable. This can be done so much better with c++0x.

    scope_exit([&hAlg]{
    etc..
    });


    scope_exit.hpp is a perfect example of why i don't use boost. That macro fetish should not be allowed in boost at all.


    @STL: That trick in 'hexify'. Is it only good for when you want to write something out fast (quick and dirty) ?

  • Will WattsWill Watts

    An excellent lecture.

    Your code calls the (useful) initial_path() function... which is, annoyingly, deprecated - see
    :http://www.boost.org/doc/libs/1_46_1/libs/filesystem/v3/doc/deprecated.html

    More Boost in future lectures would be most welcome: there are so many good things in there, but it is hard to get to know about them.

  • MarkMark

    I hate to say this, but I didn't particularly enjoy this video at all. When it's titled "Advanced STL", then I expect to see, well, things relating to the STL. If you want to teach me about hashing and about Boost, then title the video appropriately.

  • STLSTL

    Int64> The path class supports multiple character types.

    Yes - and Filesystem V3 (now the default) does this better than V2 did.

    Ivan> I always thought that regex machinery is some DFA/NFA generator that it is hard to do, but when you do it then it is hard to miss bugs.

    Actually, regex's implementation is exceedingly tricky, and bugs in it usually aren't obvious precisely because it's building and running a complicated NFA.

    Deraynger> What I also don't understand is the shared_ptr<Base>, is that an aggregation, and not an inheritance?

    shared_ptr<Derived> is convertible to shared_ptr<Base>. This allows shared_ptr to be used with inheritance hierarchies.

    Mr Crash> I took a look at scope_exit.hpp, oh the horror of macro's, simply unreadable.

    Are you aware that our STL implementation contains unreadable, horrifying macros? And yet, it gets the job done.

    Mr Crash> That trick in 'hexify'. Is it only good for when you want to write something out fast (quick and dirty) ?

    There's nothing wrong with it, it's just unusual - many people haven't seen indexing into a string literal like that before.

    Will Watts> Your code calls the (useful) initial_path() function... which is, annoyingly, deprecated

    Ah, I didn't know that. Calling current_path() at the beginning of main() and saving it somewhere is a reasonable thing to do.

    Mark> When it's titled "Advanced STL", then I expect to see, well, things relating to the STL.

    Noted. A few points (in order of decreasing hilarity):

    1. I prefer the following explanation: that the "STL" in the series title refers to myself. :-> (This was Isaac Asimov's reply to people criticizing the title of The Intelligent Man's Guide To Science.)

    2. Boost.Filesystem has been proposed for TR2 (in fact, it was the only proposal before TR2 was put into cryostasis due to C++0x running late), and I consider it highly likely that it'll make its way into the Standard beyond C++0x. So I claim, with some justification, that this really is about the STL - just the STL of the future.

    3. Boost is very much related to the STL, both in interface and in authorship. Even if, say, boost::bimap never makes it into the Standard, its similarity to std::map is undeniable, and it definitely solves problems that the STL alone cannot solve nearly as easily. As I've been saying since Intro Part 1, the STL is a library, not a framework, and it's supposed to be used alongside other libraries. More than anything else, Boost plays nice with the STL by design.

    Before VC9 SP1 with TR1, Boost would have had a much larger starring role in my videos - I covered shared_ptr way back in Intro Part 3.

  • Oh, makes sense, was thinking too far ahead of myself Smiley

    STL>shared_ptr<Derived> is convertible to shared_ptr<Base>. This allows shared_ptr to be used with inheritance hierarchies.

  • PhilhippusPhilhippus

    Surprised no one has mentioned the rather elitist use of the forbidden goto command in that CNG code. I took a sharp intake of breath on seeing that.

    I agree with the posts above about the time constraint sucking somewhat. STL having to blast through some convoluted programming at the speed of a syntax parser is hard to follow.

  • IvanIvan

    @STL
    "Actually, regex's implementation is exceedingly tricky, and bugs in it usually aren't obvious precisely because it's building and running a complicated NFA."
    If you wrote any part of it it would be interesting for CompSci nerds like me it would be an interesting video.
    BTW do you share the code between C# regex and c++ regex? It would seem like a reasonable thing to do, unless the have different expression power.
    "This is actually super easy and elegant. Use multimap::equal_range()."
    I meant like in for_each_equal_range, I presume it should be something like (while range.first!=range.second) but I hoped that there is some quick way of doing it.

  • , STL wrote

    Are you aware that our STL implementation contains unreadable, horrifying macros? And yet, it gets the job done.



    I know you use macros as workaround for N-arguments.
    I haven't seen anything like what i saw in scope_exit.hpp thou. I have only seen small marcos for debugging.

    Can you give an example ?

    What i didnt like in scope_exit.hpp is that the starting / entry code (BOOST_SCOPE_EXIT, BOOST_SCOPE_EXIT_END) was macrofied, having marcos inside functions doesn't bother me because they are contained / simple and no macro on macro action.

    // macro on macro action example
    #define macro1(...) stuff
    
    #define macro2a(...)  macro1(..)
    #define macro2b(...)  macro1(..)
    #define macro2c(...)  macro1(..)
    
    #define macro3(...)  macro2a(..)
    
    #define macro4(...)  macro2b(..)
    
    #define macro(...)  macro4( macro2a() macro2b, macro4, macro3), macro2c(...)
    // Now also imagine these macros are not defined one after another and instead are being defined all over the file.
    // Oh the happy code jumping, prepare for confusion, yay!
    // Now  your job is to unwrap 'macro', and understand how it works and what it does, good luck !  
     

  • STLSTL

    Ivan> If you wrote any part of it it would be interesting for CompSci nerds like me it would be an interesting video.

    Like the rest of the STL, I didn't write <regex>. The two major exceptions are make_shared<T>() in VC10 and uniform_int_distribution in VC11.

    Ivan> BTW do you share the code between C# regex and c++ regex?

    Nope.

    Ivan> I meant like in for_each_equal_range

    Oh, there's no dedicated function for that, but it's easy enough to write:

    C:\Temp>type meow.cpp
    #include <iostream>
    #include <map>
    #include <ostream>
    #include <string>
    #include <utility>
    using namespace std;
    
    int main() {
        multimap<int, string> mm;
    
        mm.insert(make_pair(3, "cat"));
        mm.insert(make_pair(7, "hamster"));
        mm.insert(make_pair(3, "dog"));
        mm.insert(make_pair(5, "eagle"));
        mm.insert(make_pair(3, "bat"));
        mm.insert(make_pair(7, "gorilla"));
        mm.insert(make_pair(3, "ape"));
    
        for (auto i = mm.begin(), k = mm.begin(), end = mm.end(); i != end; i = k) {
            for ( ; k != end && i->first == k->first; ++k) {
                cout << "(" << k->first << ", " << k->second << ") ";
            }
    
            cout << endl;
        }
    }
    
    C:\Temp>cl /EHsc /nologo /W4 /MTd meow.cpp
    meow.cpp
    
    C:\Temp>meow
    (3, cat) (3, dog) (3, bat) (3, ape)
    (5, eagle)
    (7, hamster) (7, gorilla)

  • IvanIvan

    Wow, very nice of you Stephen to take the time to write the example. Tnx. To bad that this isnt SO so only a few people will see it.
    BTW I have a small suggestion. I presume you are under constant deadlines so for next videos if it is easier you should focus on your STL stuff instead on the nonSTL stuff because you can give lecture on the STL in the middle of the night, while for writing boost examples it takes time. Again I like boost, but I would hate it if you would abandon the lectures because you dont have enough time to do them.

  • >Ivan: Again I like boost, but I would hate it if you would abandon the lectures because you dont have enough time to do them.

    I second that!

  • ( Remove )

  • Here's an idea for the next episode: multithreading with <thread>, synchronised data structures best practices etc. Optional extra: What's the use for <atomic>?
  • new2STLnew2STL xkcd.com

    @KerrekSB: For talk about std::thread Stephan must change the compiler to gcc 4.6 or use the paid implementation from just::thread, or the boost::thread (that is close enough to the std:Smiley

    Now I can't remember if it is Sutter or Meyers that will do a talk about that topic on next meeting in CPP&B. Keep tracking in theirs blogs for video and slides.

    Plus Sutter is doing a 2nd round interview: "The followup interview on Channel 9 has been scheduled, and will be shot on Thursday, June 2."

  • @new2STL: Oh OK, I didn't realize VS10 doesn't have std::thread yet. (I'm using GCC4.4 myself.) No worries then. Alternative topic suggestion: Allocators, and whether it makes sense two write those yourself.
  • Re topic suggestion, here's something that's definitely STL and probably qualifies as "advanced". For quite some time I've been looking for a completely generic way to print out STL containers, in a way that fits into the usual stream operation idiom and handles nested containers. After bringing up the problem on StackOverflow and then on Channel 9 here, we finally seem to have pieced together a complete solution.

    The original question is at StackOverflow, where Marcelo Cantos provided the first step. Later, Sven Groot improved on that in the Channel 9 Forum. Finally, I added a SFINAE-based type trait class to detect containers and remove the need to mention any explicit container classes.

    I posted the final header and a working example on StackOverflow. I think this might be of general interest, as it allows you to output any sort of container type you may have defined without repeatedly writing either your own operator, or a for loop, or even a foreach construct with a lambda over and over again.

    Possible ideas for extending the construction: 1) implement output for std::tuple in the same way as is done for std::pair, 2) add a C-array wrapper class to make C arrays amenable to printing through the same mechanism.

  • IvanIvan

    @KarrekSB
    "Alternative topic suggestion: Allocators, and whether it makes sense two write those yourself. "
    +1 I'm kind of unhappy that there isnt a .reserve for map set and "multi" versions of those containers... and always wondered is there a way to do something that will allocate continuous memory space for map. I know that map isnt random acces, but the idea is data locality. Imagine having first 3-4 levels of RB tree loaded into cache when you load the root note. :)

  • @Ivan: There's also the fact that in the STL you can only specify one single allocator, but that allocator gets used both for the actual contained objects and also for the internal bookkeeping data in an opaque way that you cannot control. Well, I don't really know how much one can say about them, but I thought they're one part of the STL that's sort of always there but never really prominent, so it'd be cool to hear about it..
  • Nice video.

    I was sad to miss you at BoostCon this year. Hope you'll find time to come next year.

    Sebastian

  • fokefoke

    @STL: Very nice videoS!

    Very interesting subjects, extremely well explained, and, for some reason, your accent is perfectly understandable to my (non-english) ears :)

    Btw I'm also interested in more boost videos, but maybe more focused on what's happening under the hood, not things we can read in the documentation.
    C++0x is fine too.

  • XTFXTF

    @STL: Are you aware of Step Into Specific? It's a context menu option that allows you to avoid repeatedly stepping into and out of functions until you hit the right one.

    hexify's parameter type is const vector<unsigned char>&. boost::iterator_range<unsigned char> might've been better. STL lacks a ptr pair wrapper (for contiguous memory).

     > hash_file.right.find(f) != hash_file.right.end()

    Can't you use hash_file.right.count(f) here?

    > const bimap<string, string>::left_const_iterator j = hash_file.left.find(h);

    Can't you just insert and check the return value to avoid a second lookup?

     

    Before you said vector<unsigned char> should be used for buffers. However, it initializes the buffer to 0, which is unnecessary overhead. Isn't there a better solution?

     

    Greetings,

     

    Olaf

  • ( Nevermind, see STLs answer Tongue Out )

  • STLSTL

    Ivan> you can give lecture on the STL in the middle of the night, while for writing boost examples it takes time.

    Thanks - but the real answer is that everything takes time.

    KerrekSB> Here's an idea for the next episode: multithreading with <thread>, synchronised data structures best practices etc. Optional extra: What's the use for <atomic>?

    I can't show you VC11 yet, sorry.

    (I've already resolved bugs on Connect saying that we've implemented <thread>/etc. in VC11, so I can reveal that much.)

    KerrekSB> Alternative topic suggestion: Allocators, and whether it makes sense two write those yourself.

    C++0x/VC11 has significantly reworked allocators, so I'd prefer to wait for that.

    KerrekSB> For quite some time I've been looking for a completely generic way to print out STL containers

    That's a great idea - I think I'll do Part 6 about this. I took a different approach, and my machinery is capable of printing out stuff like this:

    [[4]<"khan"'"of"'"the"'"wrath">, [3]<"home"'"the"'"voyage">, [3]<"country"'"the"'"undiscovered">]

    Ivan> I'm kind of unhappy that there isnt a .reserve for map set and "multi" versions of those containers...

    They're node-based. reserve() doesn't make sense for them.

    Ivan> and always wondered is there a way to do something that will allocate continuous memory space for map.

    You can write a pool-based allocator. Note that Windows' Low Fragmentation Heap, which is used by default (I'm glossing over subtleties here), is node-friendly.

    KerrekSB> There's also the fact that in the STL you can only specify one single allocator, but that allocator gets used both for the actual contained objects and also for the internal bookkeeping data in an opaque way that you cannot control.

    Well, we need to allocate our internal helper objects, and you REALLY want us to do that with your (rebound) allocator, instead of std::allocator. (This is actually how I broke the compiler immediately after joining VC.) This inherently requires the whole rebinding scheme.

    CornedBee> I was sad to miss you at BoostCon this year. Hope you'll find time to come next year.

    Yeah, I was insanely busy this year. I do hope to attend next year.

    Sebastian> for some reason, your accent is perfectly understandable to my (non-english) ears Smiley

    I grew up in Colorado, so I have the "standard Midwestern" accent. Very rarely, I'll trip over an R if I'm not paying attention (I think I got that from my father).

    XTF> Are you aware of Step Into Specific?

    No - I put all my skill points into C++, not IDEs. I'll have to look into that.

    XTF> hexify's parameter type is const vector<unsigned char>&. boost::iterator_range<unsigned char> might've been better.

    That would be an improvement (I haven't used iterator_range yet).

    XTF> Can't you use hash_file.right.count(f) here?

    count() is less efficient for multi-containers. I always use find().

    XTF> Can't you just insert and check the return value to avoid a second lookup?

    I believe so, yes. I would always do that for map, but this was my first time using bimap, and I forgot to consider that.

    XTF> Before you said vector<unsigned char> should be used for buffers. However, it initializes the buffer to 0, which is unnecessary overhead. Isn't there a better solution?

    You could write a simple class if performance really, really mattered.

  • IvanIvan

    Stephen very nice to hear you ans, I was worried that this is the end of the series. :) Also about
    "Ivan> I'm kind of unhappy that there isnt a .reserve for map set and "multi" versions of those containers...

    They're node-based. reserve() doesn't make sense for them."
    Can you explain why. The same problem that you have with vector you can have with them. 10k insertions in multimap will call mem allocation 10k times, right? 10k insetion into (empty vector) will call less than 30. I understand the difference- you never have to reallocate existing part of the RBTree, but still I presume that people would be happy with avoiding a bunch of calls to malloc, data locality(and by that I mean it in a cool way(one level n node, and all his children in one block, not a couple level n nodes in one memory block, like here(figure 6-please don't ignore the picture because the smart-a** tone of the article :) ): http://queue.acm.org/detail.cfm?id=1814327 ).
    So when VC11 pops out lets hope that you will do a allocator video. :)

  • XTFXTF

    XTF> hexify's parameter type is const vector<unsigned char>&. boost::iterator_range<unsigned char> might've been better.

    That would be an improvement (I haven't used iterator_range yet).

    True, just wanted to point out that using const vector<unsigned char>& isn't very generic. And that I miss a ptr pair wrapper in the STL. Wink

    XTF> Can't you use hash_file.right.count(f) here?

    count() is less efficient for multi-containers. I always use find().

    I'm not familiar with bimap, but I assume it's not a multi-container.

    You could write a simple class if performance really, really mattered.

    I like the you don't pay for what you don't use principle in C++, but using vectors for buffers violates that. I'd also prefer not to write my own class for such basic functionality.

     

  • XTFXTF

    They're node-based. reserve() doesn't make sense for them."

    Can you explain why.

    The allocator could allocate a pool and serve requests from that to avoid calling malloc.

  • STLSTL

    Ivan> Stephen very nice to hear you ans, I was worried that this is the end of the series.

    I was on vacation the week of Memorial Day.

    Ivan> 10k insertions in multimap will call mem allocation 10k times, right?

    Yes, and that's fundamental. The node-based containers (list/forward_list, map/set, and their multi/unordered variants) allow any node to be erased at any time. Therefore, they need a separate allocation for each node. In contrast, if you erase a vector's element, it'll move any remaining elements down to fill the hole, and keep the extra capacity in reserve.

    XTF> I'm not familiar with bimap, but I assume it's not a multi-container.

    No, but following the convention means that I don't have to be extra careful when using multi-containers.

  • KerrekSB>> For quite some time I've been looking for a completely generic way to print out STL containers

    STL> That's a great idea - I think I'll do Part 6 about this. I took a different approach, and my machinery is capable of printing out stuff like this:

    [[4]<"khan"'"of"'"the"'"wrath">, [3]<"home"'"the"'"voyage">, [3]<"country"'"the"'"undiscovered">]

    That's great, I look forward to the episode! In the meantime, I've put our pretty-printer on GitHub, here's the project website. There's a type-erasing helper class for convenient one-off delimiter overriding, too.

    KerrekSB>> There's also the fact that in the STL you can only specify one single allocator, but that allocator gets used both for the actual contained objects and also for the internal bookkeeping data in an opaque way that you cannot control.

    STL> Well, we need to allocate our internal helper objects, and you REALLY want us to do that with your (rebound) allocator, instead of std::allocator. (This is actually how I broke the compiler immediately after joining VC.) This inherently requires the whole rebinding scheme.

    Are you saying that it just can't be made to work or doesn't make sense to permit two separate allocators, one for objects and one for bookkeeping, like:

    template <typename T, typename ObjAlloc, typename InternalAlloc = ObjAlloc> class Container;

    Or would it suffice if I overloaded T::operator new() so that vector<T> would use std::alloc for bookkeeping but my own new for T?

  • I am wondering, how I can force VS2010 and XCode to make me include the headers, although another included header already contains it.

    STL also prefers to include all, and I myself don't always know if I'm using something part of the included header (e.g. A) or a header (B) included in the header (A).

  • STLSTL

    KerrekSB> There's a type-erasing helper class for convenient one-off delimiter overriding, too.

    I consider type erasure to be unnecessary here, so it'll be interesting to compare our solutions. :->

    I took your is_container that looks for const_iterator - I tried to write my own that detected begin(), but it snapped VC's mind like a twig.

    KerrekSB> Are you saying that it just can't be made to work or doesn't make sense to permit two separate allocators, one for objects and one for bookkeeping

    It doesn't make sense. Why would you want to treat them differently?

    The node-based containers don't even allocate individual elements. They allocate nodes that contain individual elements.

    Deraynger> I am wondering, how I can force VS2010 and XCode to make me include the headers, although another included header already contains it.

    I'm not aware of any automated way to verify this. Having one would be nice.

  • STLSTL

    Part 6 filming is scheduled for Friday, July 1.

  • @STL: Cool, thanks for the info!

  • @STL: Great, looking forward to your solution! By the way, I also thought at first I should look for the existence of begin/end member functions, but I found the check for the iterator type sufficient in the end. One could have used expression SFINAE with pointer-to-member-function type for that, but I didn't see the need for it in the end.
  • JonasJonas

    @STL: Any change you can mention some info on how to do fuzzy grouping ?
    This example data:
    asd111bla
    asd22bla
    asd322bla
    other stuff_1
    other 2 stuff
    other stuff_4

    Will be grouped into this:
    Group 1:
    asd111bla
    asd22bla
    asd322bla

    Group 2:
    other stuff_1
    other 2 stuff
    other stuff_4

    I've been researching how to do this but i'm unsure how to achieve this.
    Comparing similarities of just two strings is relatively simple but the above is a problem.
    You can do it using the same technique as comparing two strings but that will be very slow.

    So can you show how to achieve this using the STL.

  • STLSTL

    I filmed Part 6 today!

    Here's my Pretty Printer code: https://skydrive.live.com/redir.aspx?cid=e66e02dc83efb165&resid=E66E02DC83EFB165!292

    I won't answer questions about it until Part 6 is uploaded, though. There's no reason for me to explain stuff twice. :->

    Jonas: The "hard parts", which are independent of the STL, are determining the similarity between strings (e.g. with an edit distance algorithm), and determining when strings are dissimilar enough to begin a new group. The STL makes everything else easy.

    Here's an example where I'm judging similarity by common prefix length, and I consider strings with similarity < 2 to belong to different groups. You could imagine a more clever algorithm that, instead of using a fixed constant, adjusted it up or down to produce more or fewer groups, depending on the number of strings in the input.

    C:\Temp>type fuzzy.cpp
    #include <stddef.h>
    #include <algorithm>
    #include <iostream>
    #include <ostream>
    #include <string>
    #include <vector>
    using namespace std;
    
    size_t similarity(const string& a, const string& b) {
        const string& sm = a.size() < b.size() ? a : b;
        const string& lg = a.size() < b.size() ? b : a;
    
        return mismatch(sm.begin(), sm.end(), lg.begin()).first - sm.begin();
    }
    
    vector<vector<string>> fuzzy_grouping(const vector<string>& v) {
        vector<vector<string>> ret;
    
        for (auto i = v.begin(); i != v.end(); ++i) {
            if (i == v.begin() || similarity(i[-1], *i) < 2) {
                ret.push_back(vector<string>());
            }
    
            ret.back().push_back(*i);
        }
    
        return ret;
    }
    
    int main() {
        vector<string> v;
    
        for (string s; getline(cin, s); ) {
            v.push_back(s);
        }
    
        vector<vector<string>> groups = fuzzy_grouping(v);
    
        for (auto i = groups.cbegin(); i != groups.cend(); ++i) {
            for (auto k = i->begin(); k != i->end(); ++k) {
                cout << *k << " ";
            }
    
            cout << endl;
        }
    }
    
    C:\Temp>cl /EHsc /nologo /W4 fuzzy.cpp
    fuzzy.cpp
    
    C:\Temp>fuzzy
    cat
    catastrophe
    cataclysm
    catapult
    cave
    cavern
    dog
    doom
    doberman
    bear
    bearing
    bare
    barely
    ^Z
    cat catastrophe cataclysm catapult cave cavern
    dog doom doberman
    bear bearing
    bare barely

    I've made fuzzy_grouping() copy the strings because I'm lazy - you could easily have it return vector<size_t> with group lengths or something else.

  • When's the new episode coming online? Smiley Here's a tenuous suggestion -- "Extra Advanced STL"! Are you familiar with EASTL (https://github.com/paulhodge/EASTL) in any way? Why would someone need an alternative for the standard library, and why aren't the ideas of EASTL useful in the general standard? Is there anything a non-game programmer could learn from it?
  • JonasJonas

    @STL:
    That works surprisingly quite well when the data is sorted which makes it depend on the sorting algo for good groupings. That's sort of iffy.
    'mismatch' isn't even a longest common subsequence (LCS) or a pattern matching algorithm of any kind as far as i know.

    'mismatch' algorithm is a nice find though, didn't know of it.
    Do not quite understand it yet but i did just learn of it ;)

    I apologize for being unclear in my first message, as my defense i was very tired. Was up half the night coding on another thing.

    When i was talking about comparing strings for similarities i was talking about stuff like LCS (http://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Longest_common_substring) and pattern matching.

    What do you think of this, am i over-thinking it ?
    Can the stl help with this, as in an algorithm already exists that does this for me ? I can't find any thought.

  • SagoSago

    @KerrekSB patience dude, the infamous wmv encoder isn't known for it's speed (or efficiency for that matter)

    Also C++ videos isn't known to be prioritized on c9. They have disgustingly obvious favoritism issues here (c#, vb, wpf, silverlight, etc, languages and products owned by microsoft take priority / is more important)

    Btw have anyone else noticed that all these microsoft owned languages are much slower then the standard non microsoft owned languages ?
    Just compare vs2010 and vs2008, wpf did some nasty things to visual studio, a horrible mistake that should be reversed if you ask me but someones ego seems to be in the way for that to happen, sigh typical. No its not a money thing, moving to wpf was a big risk and risks cost much money when they fail period!

    anyway..

    That's why there's so few c++ videos on c9 although we were promised many c++ videos half a year ago.

    c9 have become an ad site for microsoft products ew!
    Where did the "for the developers" go ?
    Did popularity get to their head ?

    Oh please do prove me wrong but a simple search query doesnt lie.

    ah oh my who pushed the rant button.

    Mr Stephan T Lavavej keep up the good work without you there would be no reason to visit c9 anymore.

  • Mr Stephan T Lavavej keep up the good work without you there would be no reason to visit c9 anymore.

    Hah, for sure, thanks a lot for these videos -  great work, STL!

  • CharlesCharles Welcome Change

    @KerrekSB patience dude, the infamous wmv encoder isn't known for it's speed (or efficiency for that matter)

    Also C++ videos isn't known to be prioritized on c9. They have disgustingly obvious favoritism issues here (c#, vb, wpf, silverlight, etc, languages and products owned by microsoft take priority / is more important)

    Btw have anyone else noticed that all these microsoft owned languages are much slower then the standard non microsoft owned languages ?
    Just compare vs2010 and vs2008, wpf did some nasty things to visual studio, a horrible mistake that should be reversed if you ask me but someones ego seems to be in the way for that to happen, sigh typical. No its not a money thing, moving to wpf was a big risk and risks cost much money when they fail period!

    anyway..

    That's why there's so few c++ videos on c9 although we were promised many c++ videos half a year ago.

    c9 have become an ad site for microsoft products ew!
    Where did the "for the developers" go ?
    Did popularity get to their head ?

    Oh please do prove me wrong but a simple search query doesnt lie.

    ah oh my who pushed the rant button.

    Mr Stephan T Lavavej keep up the good work without you there would be no reason to visit c9 anymore.

    Whatever, dude.... -> Look here.

    C

  • Josh KJosh K

    @Charles lol you just got self owned by providing the c++ tag search link. "simple search query doesnt lie"

    Did you not read the post correctly ?

    Do you dismis and ignore the post because you are biased or just in a bad mood ?
    That post is feedback too. In a cranky form but it is still feedback.

    I'm conserned that you do not listen to your users.

    Sago sounds cranky but he do make good points. There are not many c++ videos made compared to other languages. There are c++ chitchat videos but they do not provide any useful information.

    We do need much more STL like videos. That provide useful information.

    The visual studio 2010 performace issues are also sadly true. I can mention it have memory issues too due to c# garage colletion.

  • CharlesCharles Welcome Change

    @Josh K: Self-owned? OK... I'm not in a bad mood. Here: Smiley

    If you look at the net new content over the past 6 months, I'd say that native content (C++) is increasing relative to the non-native stuff, but still not as frequent - I wasn't really arguing that. STL has a full time job as a C++ Library Developer - in fact, STL is the only STL-focused developer at Microsoft (meaning, he is the sole owner of the STL that ships with VC++...).

    Sorry we don't post one C++ video per day. I truly wish we could.

    C

  • STLSTL

    KerrekSB> Are you familiar with EASTL (https://github.com/paulhodge/EASTL) in any way?

    I've looked at http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2271.html , and almost all of its concerns have either been fixed in C++0x or don't apply to VC ("The Microsoft C++ compiler does a rather good job of inlining").

    Jonas> That works surprisingly quite well when the data is sorted which makes it depend on the sorting algo for good groupings.

    It's looking at prefixes, so std::sort() would work just fine.

    Jonas> 'mismatch' algorithm is a nice find though, didn't know of it. Do not quite understand it yet but i did just learn of it Wink

    It just reports the first place where two sequences differ.

    Jonas> Can the stl help with this, as in an algorithm already exists that does this for me ?

    The STL does a lot of stuff, but it doesn't have every data structure and it especially doesn't have every algorithm that's ever been invented. It serves as an extremely useful foundation for implementing more complicated data structures and algorithms, though.

    I took the time to implement a more complicated fuzzy_grouping(). This one relies on levenshtein_distance(), which I've written (with Wikipedia's help) to accept arbitrary and possibly different forward iterators (this algorithm needs forward iterators, and can't accept input iterators). I've tested it with forward_list<char> and list<char>, for example.

    What it does is create a multiset of strings, ordered by decreasing size (so, largest first). It grabs the largest string from the multiset, and looks for "similar" strings in the rest of the set, where "similar" is defined as having a Levenshtein distance <= N/2, where N is the size of this "primary" string. The primary and all similar strings (possibly 0) are emitted as a group. I calculate the distance from the primary, to avoid the problem of grouping "cold cord card ward warm" together despite the first and last words having no letters in common.

    C:\Temp>type meow.cpp
    #include <stddef.h>
    #include <algorithm>
    #include <iterator>
    #include <numeric>
    #include <vector>
    using namespace std;
    
    // http://en.wikipedia.org/wiki/Levenshtein_distance
    template <typename FwdIt1, typename FwdIt2> size_t levenshtein_distance(
        FwdIt1 first1, FwdIt1 last1, FwdIt2 first2, FwdIt2 last2) {
    
        const size_t size1 = distance(first1, last1);
        const size_t size2 = distance(first2, last2);
    
        if (size1 > size2) {
            return levenshtein_distance(first2, last2, first1, last1);
        }
    
        vector<size_t> prev(size1 + 1);
        vector<size_t> curr(size1 + 1);
    
        iota(curr.begin(), curr.end(), 0);
    
        FwdIt2 iter2 = first2;
    
        for (size_t j = 0; j < size2; ++j, ++iter2) {
            prev.swap(curr);
    
            curr[0] = j + 1;
    
            FwdIt1 iter1 = first1;
    
            for (size_t i = 0; i < size1; ++i, ++iter1) {
                if (*iter1 == *iter2) {
                    curr[i + 1] = prev[i];
                } else {
                    curr[i + 1] = min(curr[i], min(prev[i + 1], prev[i])) + 1;
                }
            }
        }
    
        return curr.back();
    }
    
    #include <algorithm>
    #include <iostream>
    #include <ostream>
    #include <regex>
    #include <set>
    #include <string>
    #include <vector>
    using namespace std;
    
    struct size_greater {
        template <typename T> bool operator()(const T& a, const T& b) const {
            return a.size() > b.size();
        }
    };
    
    vector<vector<string>> fuzzy_grouping(const vector<string>& v) {
        multiset<string, size_greater> s(v.begin(), v.end());
    
        vector<vector<string>> ret;
    
        while (!s.empty()) {
            string primary = *s.begin();
            s.erase(s.begin());
    
            ret.push_back(vector<string>());
            ret.back().push_back(primary);
    
            const size_t similar = max(static_cast<size_t>(1), primary.size() / 2);
    
            for (auto i = s.begin(); i != s.end(); ) {
                if (levenshtein_distance(primary.begin(), primary.end(), i->begin(), i->end()) <= similar) {
                    ret.back().push_back(*i);
                    s.erase(i++);
                } else {
                    ++i;
                }
            }
        }
    
        return ret;
    }
    
    int main() {
        const regex r("\\w+");
    
        vector<string> v;
    
        for (string s; getline(cin, s); ) {
            v.insert(v.end(), sregex_token_iterator(s.begin(), s.end(), r), sregex_token_iterator());
        }
    
        vector<vector<string>> groups = fuzzy_grouping(v);
    
        for (auto i = groups.cbegin(); i != groups.cend(); ++i) {
            for (auto k = i->begin(); k != i->end(); ++k) {
                cout << *k << " ";
            }
    
            cout << endl;
        }
    }
    
    C:\Temp>cl /EHsc /nologo /W4 /MTd meow.cpp
    meow.cpp
    
    C:\Temp>meow
    I call our world Flatland, not because we call it so, but to make its
    nature clearer to you, my happy readers, who are privileged to live in
    Space.
    Imagine a vast sheet of paper on which straight Lines, Triangles,
    Squares, Pentagons, Hexagons, and other figures, instead of remaining
    fixed in their places, move freely about, on or in the surface, but
    without the power of rising above or sinking below it, very much like
    shadows--only hard with luminous edges--and you will then have a pretty
    correct notion of my country and countrymen.  Alas, a few years ago, I
    should have said "my universe:"  but now my mind has been opened to
    higher views of things.
    ^Z
    privileged
    countrymen country
    Triangles
    Pentagons Hexagons
    remaining rising
    Flatland
    straight
    luminous
    universe
    because
    clearer
    readers years
    Imagine
    Squares
    figures
    instead
    surface
    without with
    sinking
    shadows
    correct
    nature are
    places Space paper Alas
    freely pretty
    notion not
    should world
    opened
    higher other
    things Lines then
    happy
    sheet
    which
    fixed
    their the the
    about above but but but
    power
    below
    edges
    views
    call call will
    make move like have have
    live
    vast has
    very
    much
    only on on
    hard said and and and
    mind in in in
    been
    our or or
    its it it
    you you
    who
    few
    ago
    now
    we
    so to to to to
    my my my my
    of of of of of
    I a a a I
    

    Again, I make no claims that this is an especially intelligent algorithm - it's just an example of what you can write with the STL.

  • A few newbie questions (while waiting for part 6) regarding namespaces etc.:

    1. In the .cpp file, what do you consider best (or what's correct)?

    • Use a using namespace MyNamespace;
    • Surround it with the same namespace i.e. namespace MyNamespace { <All functions> }
    • Use the namespace like void MyNamespace::MyClass::Do() { <code> }

    2. If I have a Class in src/Utils/MyClass.h, and it relies on src/Utils/Detail/OtherClass.h, should I

    •  add the whole namespace i.e. Utils::MyClass::DoSth() { Utils::Detail::OtherClass }
    • or just the deeper data, i.e. Utils::MyClass::DoSth() { Detail::OtherClass
    • and should I (again) use a using like using Utils::Detail::OtherClass  (or using Detail::OtherClass based on the above point)?
    • or rather of the namespace: using namespace Utils::Detail;

    3. Do you add the whole namespace when calling something from another structure?

    • Like src/Helpers?
    • and in the same structure?

    4. Should I include from the base path (like boost does it)?

    • #include "Utils/Detail/OtherClass.h"
    • or only lower structure, if I'm calling it within Utils then just #include "Detail/OtherClass.h"

    5. What's if it's lower, i.e. from Details back to some other class, e.g. Helpers, what do you recommend?

    • #include "Helpers/SomeHelper.h"
    • #include "../../Helpers/SomeHelper.h"

    I know about using a shorter name for long namespaces (not sure of the syntax): using Det = Utils::Details; or something like that, but am not such a fan of it.

    If I could decide, I'd choose, deeper Namespaces don't need to be used with the whole namespace, just the deeper part, e.g. Detail::OtherClass if calling from Utils to Utils/Detail. Higher ones would need the whole one (but that makes it less readable IMO).

    In boost, the only thing I found in a source file was a huge list of using Full::Namespace::SomeClass;

  • Josh KJosh K

    @Charles: I know STL is very busy. I said "STL _like_ videos". You got to have more pepole then just STL that code in c++ and have good information and experience that we can learn from, right ?

    "native content (C++) is increasing relative to the non-native stuff"
    Most if not all of them are chitchat videos. If you're lucky only 5 minutes of a 60 minutes chitchat video have good information.

  • CharlesCharles Welcome Change

    @Josh K: You're asking for more tutorial-lecture-demonstration type native content. Got it....

    C

    PS: Yes, there are obviously many C++ devs capable of doing this... That alone doesn't equate to the real potential for more STL-like content. (not everybody wants to lecture, or be on video, or do screencasts or even blog). I will try and get more folks to do something on C9 in a tutorial style for native developers.

  • Charles, first off many thanks for all the work you put into this website - that's really cool. Second, don't feel compelled to force anything, if it doesn't come naturally, nobody needs to be on screen just for the sake of it. That said, it's a genuine pleasure listening to anything STL or Herb Sutter have to say (his last video Q&A on C++0x was actually pretty substantial!); I guess the pleasure comes from hearing someone who has a deep understanding of the matter and a clear way of thinking about it (e.g. any interview or talk by Bruce Schneier is equally enjoyable to watch).

    If you have more people like that who would bring that clarity and insight to the screen, then by all means try and record them! Smiley I'm sure that substantial content on any sort of programming (but native especially) would always be appreciated!

    How about a series on parallelism, for example? Smiley

  • STLSTL

    Deraynger:

    #1: I've traditionally defined member functions at global scope with fully qualified names (MyNamespace::MyClass::MyMember). However, this is moderately verbose, especially if return types need to be qualified. (To the right of the parenthesis in ReturnType MyNamespace::MyClass::MyMember(STUFF, you can use names from MyNamespace without qualifying them as such, because the compiler has already seen you mention MyNamespace. This is not true for the ReturnType.)

    I might switch to defining member functions in a reopened namespace MyNamespace { }.

    A using-directive (that's what "using namespace MyNamespace;" is called) would feel weird for the purposes of defining member functions.

    #2: Only Detail::OtherClass is necessary. Within Utils::MyClass::DoSth(), unqualified name lookup (to find "Detail") searches "inside out", and will search Utils (which has the Detail you want) before reaching into the global namespace (which is where you don't want to go).

    You can use a using-directive (using namespace Detail;) or using-declaration (using Detail::OtherClass;) if you like.

    However, remember that using-directives and using-declarations should NEVER appear at global scope in headers. That is enormously polluting and defeats the whole point of namespaces.

    #3: I don't understand this question. As a general rule, I avoid writing unnecessary or redundant code (with very specific exceptions), and that includes unnecessary qualification.

    #4: It depends - something like Boost is a very special and complex case.

    If I were maintaining something in Boost, I'd include "my own stuff" with "relative paths", so boost/kittens/cat.hpp would include boost/kittens/detail/feedme.hpp with #include "detail/feedme.hpp". cat.hpp's own directory is first in the search order, so this should preserve everything if the whole tree is moved to boost/cute_fluffy_kittens/... I'd include "other stuff" with "base paths", like #include "boost/tuple/tuple.hpp", again to make myself immune to my own directory being moved.

    #5: I would avoid ".." if possible.

    I may have gotten something wrong with this include path stuff - it's been a long time since I've worked with deeply nested directories. The STL has an extremely flat directory structure, and my projects at home are the same way (and smaller).

    > I know about using a shorter name for long namespaces (not sure of the syntax):
    > using Det = Utils::Details; or something like that, but am not such a fan of it.

    They're called namespace aliases: "namespace MSKittens = Microsoft::Cute::Fluffy::Kittens;"

    I have used them for extremely verbose nested namespaces. My idea of a long namespace name is greater than 5 characters.

  • STL:

    Thanks a lot for taking your time to answer my questions.

    Regarding #1, so it's fine then that I do namespace kittens { <functions> } in the source file?

    Regarding #2, not declare using directives/declarations in headers, I know that from your first STL series Smiley

    Regarding #3, you basically answered it with boost. What I meant was, I have boost/kittens/detail/feedme.hpp and need to reference something, say boost/kittens/cat.h, e.g. a friend. In feedme.hpp would you do: friend class boost::kittens::cat;? Second part of it was, if it's refering to something in boost/helpers/stl_utils.hpp, would you use the fully qualified name (which you would, based on your answer to #4).

     

    New questions (while still waiting Tongue Out). This always bothers me, since I don't know these licensing details (maybe you might).

    Are we allowed to use your code legally for ourselves, by just downloading it, and should or may we add the following 2 lines?

    // Copyright (c) Stephan T. Lavavej 2011. All rights reserved.

    // Copyright (c) Dopefish 2011. All rights reserved.

    Can we also just add our copyright (although I would probably add both, so I know where I got it from)?

    I got some files from someone online, he said I could use it how I want, commercially/modified etc., does that mean I can also change the copyright notice (would still keep his name in the file, where it still somehow represents the same class/file)?

    What happens if I retype a code from someone, by looking at it, and modifying it. Do I still need to include his copyright notice? What's if the code is (maybe partially) on a blog. Can I then just use it without hist copyright notice?

  • IvanIvan

    @ Charles & STL
    I agree with Josh K, although STL is pretty rare(person that likes his job so much that he is willing to spend time to explain his work in a lecture) I'm sure that you could find some other people like(or similar to) STL. For me I would like(ofc desires of one person are irrelevant but this is an example) a lecture on something that Herb mentioned: C++ Next or C++ Prim-language isomorphic to C++ but much easier to parse(=better tools, better compile speed, maybe even speed if it could make it easier to compiler to prove some things so that it can perform optimizations). I know that this isnt in development(at least not public) but IMHO people who understand it like Herb or C++ compiler people could do a lecture on it with some basic examples with like 2-3 hours of preparations. There is a bunch of stuff that you could ask them.
    @ STL
    Speaking of the compiler people :) :
    1. Why dont you bother them to make you(and all other lib writers ofc) a _Ugly to normal compiler(you use normal naming convention to write libraries but when you want to ship the product you call makemeugly.exe on source code).
    2. about namespaces and headers and poluting - why dont you ask (nicely :) ) compiler people to do the similar thing like i proposed in 1.
    #local_using - that applies to the current file only.
    That means that when they compile code to the "real" file they would remove #local_using namespace std; line and replace all vector with std::vector, swap with std::swap and so on...

    P.S. I know that I might sound ungrateful because all this is free and STL lectures are awesome and I apologize for that, but it is like Stephen wrote on the whiteboard- there are no books. So I'm now hooked on STL lectures and I would like more of the same drug.

  • STLSTL

    Deraynger> Regarding #1, so it's fine then that I do namespace kittens { <functions> } in the source file?

    Yes, that's fine.

    Deraynger> In feedme.hpp would you do: friend class boost::kittens::cat;?

    If the class granting friendship is boost::kittens::detail::can_opener, I'd say friend class cat; because unqualified name lookup is inside-out.

    Deraynger> Are we allowed to use your code legally for ourselves, by just downloading it, and should or may we add the following 2 lines?

    I cannot answer legal questions.

    Ivan> 1. Why dont you bother them to make you(and all other lib writers ofc) a _Ugly to normal compiler(you use normal naming convention to write libraries but when you want to ship the product you call makemeugly.exe on source code).

    Internal names need to be _Ugly, but external names need to be pretty. It's difficult to decide programmatically which is which, especially given our convention of having external algorithms like count_if() call internal helpers like _Count_if().

    Ivan> 2. about namespaces and headers and poluting - why dont you ask (nicely Smiley ) compiler people to do the similar thing like i proposed in 1. #local_using - that applies to the current file only.

    As a general rule, I am opposed to non-Standard extensions.

  • CharlesCharles Welcome Change

    @KerrekSB: Did you watch the C++ AMP presentations? I will be doing some more interviews with the PPL people and related tooling. Native parallelism is indeed an interesting topic. I've been wanting to visit the C++ compiler people, but alas they are rather swamped building the next version of the compilers. I would like to spend some time with the VC IDE people, but alas they are swamped building the next version (and like everything else around here -> we can't talk about it until it's ready to be talked about). My hope is that by BUILD, we'll have a lot to share for native devs. Let's see....

    C

  • IvanIvan

    @STL
    one quick one regarding the standard: why doesnt the standard define that compiler/linker should be able to figure that it included the same file already? include guards smell like 80s spirit :)
    @Charles
    I know that my every comment is something else on my wishlist, but I noticed that it looks like that Herb has a secret crush on Clojure. Could you bug him to make a short video explanation why. Ofc I might be totally wrong about Herbs feelings about Clojure.

  • STLSTL

    Ivan> why doesnt the standard define that compiler/linker should be able to figure that it included the same file already?

    Sometimes you do want to be able to include a subheader repeatedly (like VC10's <xxshared>, already deleted in VC11).  It would be trivial to Standardize #pragma once with a different name like #once.  At the cost of breaking backwards compatibility, idempotency could be made the default, with repeated inclusion enabled by #many or whatever.  (Migration could be achieved with compiler/preprocessor switches.)

    I suspect that nobody on the Committee considered it an urgent enough problem to fix - expert attention is a finite resource.

  • CharlesCharles Welcome Change

    @Ivan: I know nothing of Herb's feelings with respect to Clojure. He may like it (many people who like Lisps do), but I don't know what makes you think he has a crush on the language. Next time I speak with him, I'll ask.
    C

  • @STL
    Good to know variadic templates have been implemented Wink.

  • IvanIvan

    @ Charles :
    I was overstating it a bit, but that is the feeling I got from one interview...I remember Herb mentioning that Clojure does some things in an interesting way, or something like that. Afain I know that Herb and STL time is scarce and precious so I would be happy with ANY deep content(for example I didnt really like AMP presentation because it was very high level overview-that is important but I prefer hard core deep stuff like C++0x design decisions and Advanced STL series..) from Herb
    @ STL this might be a stupid question but why would you want to "include subheader repeatedly"
    If this is a stupid question skip it. :) Its just that I never heard something like that. I mean I know that you can include header 100x, that is why include guards exist, but I got a feeling that you are talking about something else.

  • CharlesCharles Welcome Change

    Here is part 6, driven by part 5 and Niners! Thank you, STL. You are a scholar and a gentleman Smiley

    http://channel9.msdn.com/Shows/Going+Deep/C9-Lectures-Stephan-T-Lavavej-Advanced-STL-6-of-n

    C

     

  • SMDSMD

    Couldn't you use std::shared_ptr with a custom deleter instead of boost::scoped_exit?

  • Michael JMichael J

    Hi Stephen

    A potential bug. Your code seems to assume that sha256 never has collisions. While collisions are rare, they do happen. If you have two different files generate the same hash, you will lose one of them.

Remove this comment

Remove this thread

close

Comments Closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums,
or Contact Us and let us know.