Entries:
Comments:
Posts:

Loading User Information from Channel 9

Something went wrong getting user information from Channel 9

Latest Achievement:

Loading User Information from MSDN

Something went wrong getting user information from MSDN

Visual Studio Achievements

Latest Achievement:

Loading Visual Studio Achievements

Something went wrong getting the Visual Studio Achievements

rand() Considered Harmful

Download

Right click “Save as…”

Slides (view online)

When you need a random number, don't call rand() and especially don't say rand() % 100!  This presentation will explain why that's so terrible, and how C++11's <random> header can make your life so much easier. 

Follow the Discussion

  • I think this is the SO topic:

    http://stackoverflow.com/questions/4195958/how-do-i-scale-down-numbers-from-rand

    Answer is now down in negative Smiley

  • Why not rand() is a Mersenne twister implementation by default? I thought it was.

  • @tocsa: LOL

  • STLSTL

    Here are my slides: http://sdrv.ms/1e11LXl

    tocsa: Yep, that's the one. I am actually impressed that it has already been edited.

    The C Standard doesn't mandate any particular implementation for rand(), only a minimum range. VC's CRT hasn't changed its implementation of rand() because that could silently break user code, especially if the range were increased (of course, such user code is already unhappy).

  • One of my favorite talks, thank you STL. I was the one that asked the question about the `uniform_int_distribution` - there apparently isn't a way to "resize" the range of a distribution, but I guess that's not very important as distributions are lightweight objects and creating/destroying them won't be a problem.

  • @STL: Totally unrelated, but could you provide the code for the C++14 "tuple explosion" implementation?

  • The GoingNative 2013 videos have disappeared for some reason...hopefully they will be back soon Smiley.

  • STLSTL

    I've uploaded the tuple apply() code: http://sdrv.ms/13tqAnL

    Notes:

    1. This should have compiled with VS 2013 RTM, but it doesn't, so I've filed a compiler bug (DevDiv#775505). This does compile with GCC 4.8.1.

    2. Everything above "// ^^^^^ C++14 ^^^^^" implements C++14's "Compile-time integer sequences", specified by N3690 20.5 [intseq]. This was proposed by Jonathan Wakely (who contributes to GCC and is unaffiliated with MS).

    3. apply_impl() and apply() have been copy-pasted from the example in 20.5.1 [intseq.general]/2 (again, written by Wakely) with one change: the example is incorrect - it is depicted as returning auto when it should be returning decltype(auto) (the difference is significant, not trivial). Because neither VS 2013 RTM nor GCC 4.8.1 support decltype(auto) yet, I used C++11's technique of decltype(return expression).

    4. This technique is compile-time recursive (to form the integer_sequence) but run-time non-recursive: apply() calls apply_impl() calls f(args) and it's done.

  • Thanks!

  • Andrey TarasevichAndrey Tarasevich

    While I understand that sometimes being bombastically assertive is required to get the point across, this presentation goes way over the top.

    So, first and foremost: no, `rand()` is NOT considered harmful, period. `rand()` is a tool, which has it own pros and cons, which in turn define its applications. And in many application `rand()` happens to be the right tool for the job.

    Yes, in most cases it is implemented as a simple shift-and-add (even though it is an implementation detail), but in many cases such implementation hits the exactly proper balance between the quality and performance of pseudo-random generator. I would even say that it is actually true for _most_ general programming applications.

    The issue with pigeon-holing, the problem with `% 100` and timer resolution are well-known. But again, these matters are something that one has to know and understand in order to properly use `rand()`. They do not in any way create any prohibitive obstacles for `rand() usage. Not even close. This is not (and will never be) a reason to consider `rand()` "harmful".

    The idea that the new C++11 random generators are suitable replacement for `rand()` as utility-level pseudo-random generators is laughable at best.

    P.S. `NULL` is an abomination? Yes, I'm aware of the C++11 `nullptr`, but given the context, the reference to `NULL` being "an abomination" makes one question the presenter's competence.

  • STLSTL

    Andrey Tarasevich> `rand()` is a tool

    It is a terrible tool, like a hammer made of ice. It has been superseded by other tools.

    I would actually say that rand() is worse than goto, which is not something that I say lightly.

    > `NULL` is an abomination?

    Yes, it is an abomination. It has been superseded by nullptr, which is superior in every way (as briefly described in my previous talk).

  • liquidliquid

    Hi Stephan, really nice presentation, thank you.

    One question:
    I use it like:

    template <typename T>
    T randomFrom(const T min, const T max)
    {
    static std::random_device rdev;
    static std::default_random_engine re(rdev());
    typedef typename std::conditional<
    std::is_floating_point<T>::value,
    std::uniform_real_distribution<T>,
    std::uniform_int_distribution<T>>::type dist_type;
    dist_type uni(min, max);
    return static_cast<T>(uni(re));
    }

    and during first call (when statics are created) rand_s is called during both of following lines

    static std::random_device rdev;
    static std::default_random_engine re(rdev());

    Is this by design, or am I using it wrong?

    Thank you

  • MattMatt

    Hi STL,

    it would be nice to include the inverse functions of those distributions in cmath.h as well.

    Reason:
    In practice, you are often facing the problem of not generating one-dimensional random numbers but multi-dimensional correlated random numbers.

    Then the standard procedure is usually to model the dependence structure via a copula and then do the
    http://en.wikipedia.org/wiki/Inverse_transform_sampling
    for each of your marginal distributions...

    Hoping to find at least the inverse of the standard normal in c.math in C++14 and thanks for your great talk,

    Matt


  • STLSTL

    liquid:

    1. Your randomFrom() is unsafe to call from multiple threads simultaneously.

    2. I discourage the use of default_random_engine. There's no reason to leave that choice to the library implementer (even though VC chooses mt19937).

    3. randomFrom() constructs a distribution for every call. That's inefficient, because the distribution may accumulate random bits for later use.

    4. The static_cast<T> should not be necessary.

    > during first call (when statics are created) rand_s is called during both of following lines

    That's by design. I didn't get a chance to explain this in the presentation, but rand_s() is totally different from rand(). (This is highly unusual because most meow_s() functions strongly resemble their meow() counterparts.) rand_s() is the CRT's function for cryptographically secure pseudorandom numbers.

    Matt: It sounds like you want more "special math functions", some of which were proposed in TR1 (but not incorporated into C++11). A good start would be to propose an implementation for Boost.

  • Mike MarcinMike Marcin

    You didn't touch at all upon uniform_real_distribution.

    You mentioned the rationale behind uniform_int_distribution being an inclusive range. What is the reason for uniform_real_distribution having a non-inclusive range?

    Also in order to get an inclusive range of say [-1.0, 1.0] with uniform_real_distribution is it correct to do:

    std::uniform_real_distribution<double> d(
    -1.0,
    std::nextafter(1.0, std::numeric_limits<double>::max()) );

  • STLSTL

    Mike Marcin> What is the reason for uniform_real_distribution having a non-inclusive range?

    After doing some standardization archaeology, I found that it was originally proposed as a closed range, but was changed to a half-open range before TR1 was released. The rationale is in http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1544.html , "Issue J11: uniform_real should return open interval?" near the bottom.

    > Also in order to get an inclusive range of say [-1.0, 1.0]

    I would write nextafter(1.0, 2.0) since the second argument is used as a direction only.

  • IvanIvan

    I must agree with Andrey regarding rand(). For beginners it is maybe better just to use new stuff so they dont even have to think about RAND_MAX for eg, but for poor old people like me that were forced to learn about rand() rand() is OK choice for most of the applications...

  • SteveSteve

    In my opinion C++ community needs a weekly dose of STL.
    STL have lots of useful things to say and just as with the STL series on channel9, not enough time to say it all :(

    @Ivan: No, 'Andrey Tarasevich' doesn't know what he is talking about.
    There are lots of examples of were rand is used that cause subtle bugs that cause behavior, predictability.

    Such as shuffling answers to a question indirectly using rand.

    While rand is a quick and dirty thing to use to get the ball rolling, to get as quickly as possible to coding the good stuff, it is bad to use because most users do not know about rand's quirks.

    -

    It annoys me that nullptr can't be used everywhere when calling windows api's. Some functions do bad things when you give them nullptr instead of NULL.

  • STLSTL

    Ivan> for poor old people like me that were forced to learn about rand()

    I've occasionally mentioned that I'm a reformed C programmer. I became a modern C++ programmer by consciously rejecting familiar but inferior techniques and switching to unfamiliar but superior techniques. Unfamiliar things gradually become familiar, but inferior code is permanently inferior.

    This is really the most fundamental lesson behind all of my C++ writings and videos. If you aren't going to adopt new techniques, why bother reading/watching anything?

    Steve> It annoys me that nullptr can't be used everywhere when calling windows api's. Some functions do bad things when you give them nullptr instead of NULL.

    Can you please provide an example? I am not aware of any cases in the Windows API where nullptr will compile but do something differently from 0/NULL at runtime, and I cannot imagine what such a case would look like (as the Windows API doesn't use overloading or templates; outside the Windows API it should still be true that nullptr always safely supersedes 0/NULL, except for contrived cases that specifically detect nullptr_t and do something evil). nullptr was even specified to work properly with old-style varargs (and I believe our implementation is correct although I haven't checked recently).

    (There is also the whole "managed nullptr" thing under /clr, but that's a separate issue.)

  • IvanIvan

    @STL
    I agree in general(Boost<3, C++11 <3 (except decltype bracket horror :P ) but you missed my point. I am saying that for new people it is good to skip learning rand(). For people like me who know the tradeoffs(believe it or not I thought about things like %100 years ago, not because Im smart or anything like that (it is not really quantum mech to figure it out)but because I have a thing for thinking about stuff that most ppl just use) rand() may be beneficial for some uses.
    If you wanna place mouse cursor at "random place" in x,y plane, do you really care that certain positions are more likely? Prob not. If you are programming Warcraft 3 <3 and units have dmg that is in range [a,b]
    do you really care that avg is not really a+b/2 ? Prob not.
    If you run a genetic algorithm that needs a lot of randomnes... is it possible that rand() inferior quality is compensated by its superior speed?
    BTW now that I think of it, you should have mentioned if there is some crappy but fast modern generator in C++11 that is fast as rand but doesnt have rand problems(except it is not really really random :) )
    Boost random has "attributes table" like this, IDK if it exists for std.
    Again STL I really love your videos, but like (iirc)Alexandrescu said a lot of times decision are tradeoffs, not always black/white.

  • IvanIvan

    P.S I know generator + distribution == rand equivalent, I just was typing really fast. :)
    P.S. IF you ever continue advanced STL series , please consider Boost Fusion as topic, it seems like this magic thing to me atm, and learning materials are quite rare.

  • ww2ww2

    Hello, I tested the code in your slide page 22, but why do I get the same sequence every time? I tested under mingw32 with gcc 4.6.3 and your mingwx64 with gcc 4.8.1. They both have the same sequence. Any pitfalls on using random_device? thanks

  • STLSTL

    Ivan> If you wanna place mouse cursor at "random place" in x,y plane, do you really care that certain positions are more likely? Prob not.

    I prefer to write high-quality code no matter what I'm doing.

    > you should have mentioned if there is some crappy but fast modern generator in C++11 that is fast as rand

    Like I said, mt19937 is both extremely high quality and extremely fast.  I strongly doubt you need more than 500 MB/s of randomness.

    > please consider Boost Fusion as topic

    I haven't used it.

    ww2: Unlike VC, GCC hasn't implemented random_device nondeterministically on Windows. Boost has, so you can use Boost.Random.

  • ulidtkoulidtko

    In the context of deterministic RNG, one particularly important API is `split`.

    `split` of a random number engine produces TWO random number engines, whose internal state is completely determined by the state of the initial RNG. Thus, by means of using `split` one can build arbitrarily many random number sequences (streams), all of which would deterministically depend *on a single seed*. Often, the random number streams would arrange in a tree-like structure, rooting at the initial seed and mapping conveniently to the component structure which consumes the random number streams.

    For example, `split` can elegantly solve the problem of reproducibility of RNG by multiple threads. A thread's spawner just has to initialize deterministically the thread's TLS RNG using `split`.

    It's somewhat disappointing to not see such a function in std random for C++. Haskell Random has it. Can I write a proposal?

  • STLSTL

    C++ already has that. It's called the copy constructor:

    C:\Temp>type meow.cpp
    #include <iostream>
    #include <random>
    using namespace std;
    
    int main() {
        mt19937 meow(1729);
        mt19937 purr(meow);
    
        for (int i = 0; i < 5; ++i) {
            cout << meow() << ", " << purr() << endl;
        }
    }
    
    C:\Temp>cl /EHsc /nologo /W4 /MTd meow.cpp
    meow.cpp
    
    C:\Temp>meow
    911214221, 911214221
    2673937510, 2673937510
    1112474867, 1112474867
    3239237157, 3239237157
    1819012629, 1819012629
    

    <random> even provides generator serialization and deserialization.

  • IvanIvan

    @STL
    regarding Fusion people say it is kind of compile time runtime mix STLish thing so you might enjoy it, i tried looking at some boostcon videos and tutorials, seems too complicated for me :)

    Regarding mt, you are right, 500 MB is fast enough in most circumstances, I just wondered is there a generator that is fast as rand() (as implemented in VS) for cases where people really need to be as fast as rand(if those cases exist is a separate story).
    BTW if it is not too much trouble for you to explain...
    if generator is providing [0,30](31 distinct) and you need numbers [0,15](16 distinct) how do you do that... dumb slow and wasteful way is just to repeat generator calls until you get number in [0,16]. and I cant think of a simple transformation that gets you from 31 distinct to 16 with uniform prob.
    P.S. tnx for mentioning serialization and deserialization... I had no idea about that. :)
    P.P.S I thought of a maybe reason why standard doesnt require range to be pow of 2.... maybe one day we will get some really big need to generate numbers in range [0,22...22] (in ternary number system), and really really fast HW generator that generates only 3 states, so you can use it to provide range [0,22...222] with just n_digits calls to it. In short MAYBE it provides opportunity for optimization if range of nums you need to generate and generator range are in special relationship(friends with optimization benefits :) ) for eg you need to generate random digit, and generator provides you with [0,99]... you just return first digit and store other for winter(next call:) )
    Again this is just my idea that I got 10 min ago, I had no contact and I did not read anything about standardization of <rand>

  • STLSTL

    > if generator is providing [0,30](31 distinct) and you need numbers [0,15](16 distinct) how do you do that...

    I use a multi-step process:

    * Using subtraction, I alter the generator's range to start from 0.

    * I figure out the highest power of 2 less than or equal to the (altered) maximum.

    * I throw away any value that's higher than that power of 2, as it is useless to me (in which case I rerun the generator). For [0, 1729] this would throw away anything outside [0, 1024). For [0, 30] this would throw away anything outside [0, 16).  The worst case is having to throw away about half of the values (and it is nearly theoretical since non-power-of-two URNGs are crazy).

    * Then I concatenate bits until I have enough for the output.  For example, if you give me a generator that produces 4 random bits and I need 30 random bits, I need to accumulate bits. (One optimization I currently don't do is to store bits in the distribution object; I didn't notice this was possible until recently.)

    * I do math on the desired output range, to produce a range that's unsigned and 0-based.

    * I take my accumulated random bits and use modulo to wrap them around to the (altered) output range. But I do math to figure out which values are unbiased. If I would get a biased value, I throw it away and start over.

    * Finally I translate the value to the potentially signed, potentially non-0-based desired output range.

  • IvanIvan

    @STL
    tnx for a detailed explanation, I know you are really busy and appreciate the time you took to write this, not to mention the fact that you implemented it in the first place. :)
    I have 2 small follow up questions:
    1) What do you mean by
    "
    (One optimization I currently don't do is to store bits in the distribution object; I didn't notice this was possible until recently.)"
    Are you just dropping extra bits(aka you need 36 and you get them in quants of 11 so you drop 4*11-36 bits)
    2) regarding:
    "* I take my accumulated random bits and use modulo to wrap them around to the (altered) output range. But I do math to figure out which values are unbiased. If I would get a biased value, I throw it
    away and start over."
    Am I right(not sure I understand you 100% correctly) that for evil :P values of output range this could happen almost 50% of the time, for example if u get n bits from generator and you do (2^n) mod (2^(n-1) + 1) almost half of the range is biased and you need to restart-example generate [0-128] from [0-255], [129,255] are evil and you need to restart.
    So isnt it better to get 1 or 2 bits more so that biased range is smaller part of the total range, so you have higher chance of success(success == nonbiased result => no restart).
    Aka if you do (2^(n+2)) mod (2^(n-1)+1) evil range is now just aprox 12.5% of entire range. In example above instead of getting [0,128] from [0,255] you would try to get it from [0,512] or [0,1024].
    Again this is just for certain numbers, aka if you need to generate [0,1020], 10 bits will very rarely give you one of biased values(1021,1022,1023).
    Ofc like usually, this is written very quickly and could contain minor typos or even huge conceptual mistakes -in my defense humans arent hardwired to do PRNG stuff. :P

  • STLSTL

    > 1) What do you mean by "(One optimization I currently don't do is to store bits in the distribution object

    If you give me an mt19937 (32-bit input) and ask for [0, 32) (5-bit output), I run the engine once for each output, throwing away 27 bits every time. That is suboptimal.

    > Am I right(not sure I understand you 100% correctly) that for evil Tongue Out values of output range this could happen almost 50% of the time

    Yes. This happens when the output range is slightly greater than a power of 2, and around half of the input range. For example, mapping [0, 2^32) to [0, 2^31 + 5) is going to trigger a fair number of reruns.

    With more math and more complexity I could reduce this (especially by accumulating bits). When I reimplemented uniform_int_distribution my focus was on correctness, not speed.

    > So isnt it better to get 1 or 2 bits more

    In practice, the input range is 2^32 or 2^64, so we're starting with plenty of bits for most outputs.

  • IvanIvan

    @STL
    again tnx for your A.

    I was assuming u are saving bits, so what I was saying that if you need 21 bit I guess it is better to take 22 or 23 for cases when you are doing modulo (2^20 + 42) so that you have higher chance of success.

    BTW one more quick trivia question -trivia because it is probably not a problem IRL when it comes to performance.
    Can you do some compiletime magic to detect when range is a power of 2 so you dont need to do the check if result is biased, so that you dont need to do the check for that.
    Again I know this is one small if check, this is more of a Core language question and not about performance. I would guess you cant do that since number is inside () (normal function call ) and not inside <> (template param). I know, I know my heuristics('(','<') are really sign of my C++ expertise :P

    aka for this example from cpp wiki reference:

    std::default_random_engine e1(rd());
    std::uniform_int_distribution<int> uniform_dist(1, 6);
    int mean = uniform_dist(e1);

    if instead of 1,6 there was 1, 2048
    could you call different function implementation (assuming that default_random_engine range is multiple of 2048) that doesnt have check for biased result because they cant happen.

  • JamesJames

    Surely the opening brace of a function should be on a newline?

    main()
    {
    // ...
    if (blah) {
    // ...
    }
    }

  • WillWill

    @James No, why "should"? It's only a matter of style (convention)... See http://en.wikipedia.org/wiki/Indent_style (you seem to prefer pure "K&R Style" while STL used the "one true brace style" variant, but both are fine, neither is "bad").

    But in your code, "main()" should definitely be "int main()".

  • FFEFFE

    This is really the most fundamental lesson behind all of my C++ writings and videos. If you aren't going to adopt new techniques, why bother reading/watching anything?

    Your writings and videos are very informative and sometimes entertaining. Please do continue writing and posting videos. Thank you.

     

     

  • RobIIIRobThree Life's unfair, but root password helps!

    Am I the only one that noticed the number 1729 to pop up everywhere in this video? Big Smile

    *edit*
    Dammit, should've waited for the questions on the end of the video (29:05). Oh well...

Remove this comment

Remove this thread

close

Comments Closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums,
or Contact Us and let us know.