84 posts

## "Microsoft Out of Favor With Young, Hip Developers"

Back to Forum: Coffeehouse
• Charles said:
Bass said:
*snip*

Or the STL...

I think what Larry is trying to determine in an interview is "does this person understand the fundamentals?". If the interviewee said "Well, I'd just use  the STL or Boost and be done with it - since I know the algorithms employed there are far better than anything I could scribble out on your whiteboard in 5 minutes....", I don't think this answer would qualify as a "pass"...

C

The STL doesnt't have a string replacement function. That was sort of the point of recommending boost.

• Shining Arcanine said:
Sven Groot said:
*snip*

It can be shown that there is no real penalty for resize-able arrays if they double in size when they are expanded and shrink in size only when the capacity used is less than 1/4 the size of the array, so saying that the string needs to resize is a non-issue assuming it is implemented in this way. As a result, the resize operations can be ignored and those functions are amortized O(1) time.

By the way, it is interesting to note that this algorithm is similar to quicksort, where quicksort is O(nlogn) in the average case, but in the worst case, it becomes an insertion sort, which is O(n^2).

Edit: I noticed that you were talking about this same thing before my post. I feel silly now. Anyway, the problem screams for the use of a theoretical turing machine. In which case, you would think in terms of the machine reading an input tape and writing to an output tape. In that manner, it can have stages {0,1,EOF}. Stage 0 is the starting state. From state zero, the state transistion function can go multiple ways. Here is a list of rule sin the state transition function:

• From state 0, upon seeing a symbol '/0', the machine will write '/0' to the output tape, advance both tape heads and move to state EOF.
• From state 0, upon seeing a symbol '.', the machine will write '/' to the output tape, advance both tape heads and move to state 1.
• From state 1, the machine will output '.', advance the output tape head and move to state 0.
• From state 0, upon seeing a symbol c in the input tape, where c is not '/0' and not '.', the machine will write c to the output tape, advance both the input and output tapes and return to state 0. Obviously, this rule is actually n - 2 rules, where n is the number of characters allowed in the character set.

Larry's second code example is a good approximation of this in C++.

It can be shown that there is no real penalty for resize-able arrays if they double in size when they are expanded

• Sven Groot said:
Shining Arcanine said:
*snip*

Hey probably meant by doubling the array size every time you need to enlarge the array makes trivial performance impact.

Kinda no duh.

• Minh said:
Sven Groot said:
*snip*

Hey probably meant by doubling the array size every time you need to enlarge the array makes trivial performance impact.

Kinda no duh.

Still contradicts my own experience. I've had algorithms where the cost of array resizing was a significant part of the overall time taken, despite the fact that I was doubling it every time.

Not to mention issues of wasted space and heap fragmentation that can occur with resizing. Yes, you can minimize the impact of resizing by picking an appropriate growth factor (and depending on the algorithm, 2 isn't always the best option), but it's still not free, it's still not O(1), and you should definitely not assume it has "no real penalty" without first measuring it.

• Sven Groot said:
Minh said:
*snip*

Still contradicts my own experience. I've had algorithms where the cost of array resizing was a significant part of the overall time taken, despite the fact that I was doubling it every time.

Not to mention issues of wasted space and heap fragmentation that can occur with resizing. Yes, you can minimize the impact of resizing by picking an appropriate growth factor (and depending on the algorithm, 2 isn't always the best option), but it's still not free, it's still not O(1), and you should definitely not assume it has "no real penalty" without first measuring it.

What were you doing?

I figure resizing wouldn't have zero impact... but doubling it does a decent job at performance at the cost of space

• Minh said:
Sven Groot said:
*snip*

What were you doing?

I figure resizing wouldn't have zero impact... but doubling it does a decent job at performance at the cost of space

I was dealing with a very large number of resizable arrays. It was essentially a tree-like structure where the resizable arrays were used to contain the child pointers of each node. I did try some alternatives but everything I came up with was even slower than the resizable arrays.

• Sven Groot said:
Minh said:
*snip*

I was dealing with a very large number of resizable arrays. It was essentially a tree-like structure where the resizable arrays were used to contain the child pointers of each node. I did try some alternatives but everything I came up with was even slower than the resizable arrays.

Hmmm... Arrays implemented by linked list?

• Minh said:
Sven Groot said:
*snip*

Hmmm... Arrays implemented by linked list?

I did try linked lists of course, but it was too slow. We're talking several orders of magnitude slower than the array version here.

• Sven Groot said:
Minh said:
*snip*

I did try linked lists of course, but it was too slow. We're talking several orders of magnitude slower than the array version here.

slower in access? Maybe pre-computed "index" access?

• Minh said:
Sven Groot said:
*snip*

slower in access? Maybe pre-computed "index" access?

Linked Lists are horribly slow. They don't work well with caches and every time you dereference a pointer, you run the risk of a page fault and page faults are very, very expensive.

• Charles said:
Bass said:
*snip*

Or the STL...

I think what Larry is trying to determine in an interview is "does this person understand the fundamentals?". If the interviewee said "Well, I'd just use  the STL or Boost and be done with it - since I know the algorithms employed there are far better than anything I could scribble out on your whiteboard in 5 minutes....", I don't think this answer would qualify as a "pass"...

C

*ding* *ding* *ding*  Give the man a ceegar.

It's about understanding the fundamentals (I once had someone say "Just use vector.reverse()" to reverse a linked list).

This is where fundamentals come in: Write efficient code always.  Don't hyper-optimize your code (unless you have evidence that shows that the code is a bottleneck), but always write code which is tight.

Of course there are caveats: Don't sacrifice readability.  Ever.  In production code, it is absolutely reasonable to use vector.reverse() instead of hand rolling reverse.  Or use String.Replace() if you're running under the CLR.

But you should always be able to determine the inherent inefficiency of using substr to extract elements from a string and erase to advance a pointer.

PS: You're all right that it's O(nm) where n is the length of the string and m is the # of . characters.  Which is still worse than it has to be.

• Bass said:
Ian2 said:
*snip*

The problem is I think a lot of .NET developers really haven't been exposed to anything outside of the .NET realm, and they make claims like this. I'm guilty of this as well.

I've finally been exposed professionally to something other then .NET for development, and I know now that .NET is extremely overrated. C# real strength is in the language, but no so much in the libraries. I don't think the tooling is all that competitive either (VS vs Eclipse/STS). I can't really think of a single reason why I'd use ASP.NET MVC/WebForms over J2EE/Spring from a productivity standpoint. And I've been doing .NET development for years and Java for only a few months.

Not true, Bass. The vast majority will have exposure to Web development - at least some Javascript and possibly PHP. That means, essentially functional programming and higher-order functions at their fingertips. Then, depending on your age, you're very likely to have had experience with Java and the Java Platform. I can't imagine anyone using these forums to fit your stereotype and now apparently, neither do you.

Now that you say that .NET is overrated, what do you mean precisely. Do you have specific examples of frameworks, namespaces and types that you think are inferior and inferior compared to what - and why?

To me, the implication of the statement that C# is a great language must necessarily be that it can be used to create great frameworks. That doesn't automagically always lead to great frameworks, of course, but it should steer you in the right direction. Great languages usually have equally great libraries.

I've always been somewhat unimpressed by the .NET collections namespaces, when compared to the Java Collections library in terms of depth but a feature of C#, that primitive types are automatically boxed and unboxed is really great for collections. And with C# 2.0 generics, a List<int> will not have the boxing overhead. So you get both performance and abstraction whereas, at least historically, in Java, you've had to create specialized collection types for different native types. I believe there is a GNU Java library that does this, Trove or some such. That's clearly inferior - and critically so, as collections is at the heart of both .NET and Java, or any platform really.

Let's also take the Java iterator interface. Erik Meijer has already taken that interface to pieces but it's unnecessarily bloated and does too much. With Reactive Extensions for .NET there is now also a mathematical dual of IEnumerable, the new IObservable. IEnumerable is also greatly integrated into C# via the iterator state machine feature (although there are some performance woes here in some scenarios.) It's that kind of deep analysis that creates great libraries and languages.

Let's also not forget LINQ!  - monadic queries over arbitrary structures. The Java community may have some copy-cat efforts here but was definitely not first in this game (nothing wrong with copying though). Then there's VB and C# language support. It's a superb and pervasive feature set.

I hope in the future we'll see a large immutable collections library in .NET.

The real challenge for .NET, is probably not Java but JVM-"biased" languages such as Scala, that are more advanced than C# and facilitate very coherent, advanced and elegant libraries , much more sophisticated than traditional Java Platform libraries. On the other hand it may be too advanced for many developers.

• Larry Osterman said:
Charles said:
*snip*

*ding* *ding* *ding*  Give the man a ceegar.

It's about understanding the fundamentals (I once had someone say "Just use vector.reverse()" to reverse a linked list).

This is where fundamentals come in: Write efficient code always.  Don't hyper-optimize your code (unless you have evidence that shows that the code is a bottleneck), but always write code which is tight.

Of course there are caveats: Don't sacrifice readability.  Ever.  In production code, it is absolutely reasonable to use vector.reverse() instead of hand rolling reverse.  Or use String.Replace() if you're running under the CLR.

But you should always be able to determine the inherent inefficiency of using substr to extract elements from a string and erase to advance a pointer.

PS: You're all right that it's O(nm) where n is the length of the string and m is the # of . characters.  Which is still worse than it has to be.

I understand that in an interview, it's about testing fundamentals. But personally, if I were interviewing someone, I'd appreciate it if they'd say "in real life, I'd use this library method" in addition to given the proper answer. Because it'd demonstrate that they're aware that reinventing the wheel isn't something you want to do in production code.

Yes, they have to be able to write the code, and know the fundamentals, and it's perfectly fine to ask them to do that during an interview. But imo, they also have to know that in real life, it's better to rely on other people's expertise in some cases. Don't write your own string replacement algorithm unless you've demonstrated that the one from whatever library you're using doesn't meet your needs.

Now, if I were in an interview and had been asked that question, the code I actually would've written would've been this:

```string MyFunction(const string &input)
{
string dest;
dest.reserve(input.size());
for( string::const_iterator it = input.begin(); it != input.end(); ++it )
{
if( *it == '.' )
dest.push_back('\\');
dest.push_back(*it);
}

return dest;
}```

This helps reduce the number of resizes necessary by reserving a reasonable capacity up front (note it will still need one resize if the input contains periods; if anything was specified about the input data I might decide to reserve a capacity larger than input.size()), and it uses iterators because it's a) safer than pointers and b) the string class allows embedded \0 characters, something that Larry's version didn't handle.

• Sven Groot said:
Shining Arcanine said:
*snip*

My algorithms professor said that if you start at 1 and keep doubling, the number of copies you do is 2n, where n is the size of the final array and concluded that there is no performance penalty to having resizeable arrays, because O(2n) = O(n).

I guess since memory allocation is something like a cubic algorithm, the number of allocations, log(n), also would play a role, so the act of allocation dominates the time complexity over the act of copying, but you can cheat by doing allocations on the stack, making the copies dominate.

Of course, if you use Java, then you are stuck with slow dynamic memory allocations. Does anyone know if the same is true for .NET?

• Shining Arcanine said:
Sven Groot said:
*snip*

My algorithms professor said that if you start at 1 and keep doubling, the number of copies you do is 2n, where n is the size of the final array and concluded that there is no performance penalty to having resizeable arrays, because O(2n) = O(n).

I guess since memory allocation is something like a cubic algorithm, the number of allocations, log(n), also would play a role, so the act of allocation dominates the time complexity over the act of copying, but you can cheat by doing allocations on the stack, making the copies dominate.

Of course, if you use Java, then you are stuck with slow dynamic memory allocations. Does anyone know if the same is true for .NET?

Hrm... no performance penalty in having resizeable arrays? What a joke. Yes, doubling the size of the array on resize is better than increasing the size by a constant value. But that doesn't imply that there's no performance penalty. As you noted already you have to allocate a new block of memory (though I'd say that "cubic" is a bit much...) and you have to copy the old array contents. And then there's the problem of wasted memory space...

As for .NET memory allocations: depends what you call memory allocation in this case. If you include the garbage collection time then yes, memory allocation can be slow. But you have to remember that garbage collection runs only from time to time. It doesn't run on every allocation. If there's enough free memory available then memory allocation in .NET is as simple as incrementing a pointer.

• Shining Arcanine said:
Sven Groot said:
*snip*

My algorithms professor said that if you start at 1 and keep doubling, the number of copies you do is 2n, where n is the size of the final array and concluded that there is no performance penalty to having resizeable arrays, because O(2n) = O(n).

I guess since memory allocation is something like a cubic algorithm, the number of allocations, log(n), also would play a role, so the act of allocation dominates the time complexity over the act of copying, but you can cheat by doing allocations on the stack, making the copies dominate.

Of course, if you use Java, then you are stuck with slow dynamic memory allocations. Does anyone know if the same is true for .NET?

Besides the allocation and copying, there is potentially wasted space (and the larger your growth rate, the worse that will get), heap fragmentation, and potentially additional cache misses. In a managed environment such as Java or .Net there is also the costs of zero-initializing the re-allocated array as well as added GC pressure (although this will likely be negligable).

Yes, in many cases this won't make much difference. But in some rare cases, it might. Never make assumptions, and always measure before you optimize.

• jamie said:
ManipUni said:
*snip*

for us it was different .... all design students - in all colleges - are taught on MACs - illustrator.

we were a web shop - and pc design ad shop.  the photoshop work learned was good - but illustrator in our world is like a word perfect file in a word shop = akkk!

the only thing i hate more than ballmer is illustrator    bloated eps - die! die! die!

* i agree with jobs again = adobe SUCKS  (eps = die   PFB = die  FLV = die etc)

I learned Illustrator several years ago in a course at a local community college whose lab and classrooms computers are PCs using Windows XP. The college has one classrom that uses Apple computers.

The challenge is the college's  Multimedia department uses mostly  the latest version of Adobe applications including Photoshop, Illustrator, Flash, Dreamweaver, and Premiere Pro and teaches JavaScript and PHP programming.

The Computer Science department teaches C++ and Java  programming and does not use Visual Studio and therefore does not teach either C# or VB in its programming coures.

Eight to twelve years ago this same college taught Visual Basic using Visual Studio.

While I have asked the college for several years to teach courses using Expression Studio, Visual Studio  and SQL Server they simply do not have the budget to support credit courses with this software.

Microsoft must provide the software for at least three years to the local colleges or they will continue to be frozen out by Adobe applications, the Open Source crowd, and C++ , Java and PHP programming languages.

Microsoft giving  the software away for free to students is a great idea.

Unfortunately, many colleges remain locked in to Adobe applications for implementing web sites with the JavaScript and PHP programming languages.

Looks like the same college might be teaching an ASP.NET in the spring.

However, to date they do not have the budget for buying Visual Studio and SQL Server.

Suggest Microsoft give colleges for a period of five years the latest versions of Visual Studio, Expression Studio and SQL Server for free and have Microsoft Evangelists actively promote this and the free student software to as many colleges and universities as possible.

For young students wanting to become web designers and web deveopers, many local colleges are not exposing students to Microsoft's applications for the simple reason they do not have the money to buy Microsoft software in addition to Adobe's.

Young and hip college students are in many instances not even aware of Microsoft Expression Studio and Visual Studio existence.

What we have here is a failure to communicate.

• Sven Groot said:
Charles said:
*snip*

The STL doesnt't have a string replacement function. That was sort of the point of recommending boost.

You are such a stickler, man.... Whatever. My point was made...

C