ARCast.TV - Tuning The Development Process at Spot Runner

Sign in to queue


When Marco DeMello came to be VP of Engineering at Spot Runner he came to a small engineering team with a startup mentality.  It was a perfect fit for a scrappy small startup but with success the engineerig team and process had to mature to support the demands of the business.  In this episode we discover what it took to put in place a process that supports TDD, Continuous Integration, performance and scalability testing and virtual server integration labs for a world class process.

The Discussion

  • User profile image
    Automated Voice: It's Monday September 24th 2007, and you're watching ARCast.TV.

    Ron Jacobs: Hey, well come back to ARCast.TV friends. This is your host Ron Jacobs and today we are going back to Los Angeles where we are going to continue our chat with Spot Runner, who are just a bunch of guys who have built an amazing business serving an under serviced community.

    That's what we call the long tail, right? It's people who can't afford to do traditional television advertising, have found a way to get that done with Spot Runner. And they are doing it by, building their business around delivering to the masses through the Internet, which requires some very good software.

    And today we are going to learn about how they did this -- not so much the architecture. It's a typical web application, but the process that's what's really interesting. Before we get into that, let's look at one of their television ads, just so you get an idea of kind of thing they are doing.

    [Ad] [duration 30secs]
    Ron: Hey welcome back to ARCast.TV, this is your host Ron Jacobs. And today once again I am at Spot Runner, where I am joined by Marco. And Marco you are a technical guy here.
    Marco: Yes I am the vice president of technology engineering for Spot Runner. I actually came from Microsoft, and I was there for a decade between 1996 and 2006. And I was working on several different projects of Microsoft; the last one was an exchange server 2007. But at the end of September last year, I started talking to David and Nick the co-founders of Spot Runner and we found a very good fit for what I was looking for as the next move of my career.

    And what Spot Runner needed form the point of view was taking the company to the next level, from an engineering perspective, an execution perspective and technology leadership perspective and so on, so forth. So we sort of found the very synergistic opportunity there, and I joined the company and been here the past nine months, working on this incredible new space of online advertisement.
    Ron: Yes, and you know what's interesting about this, is that as we've talked about the last time about the business scenario of Spot Runner, is that technology is a core pillar of what has to work right for a company like Spot Runner to succeed isn't it?
    Marco: Absolutely. I mean look at the space of traditional media advertisement which is where Spot Runner really has its base in its roots, and we look at the process. And the more we learn since I came here, of how broadcasters, radio stations and even ad agencies today transact and exchange not just data, but also things like how they are creative, they track campaigns and how they communicate with their customers.

    The whole process in itself really needs some improvements lets just say that. And it really cries for innovation, so I think the opportunities that Spot Runner faces right now, which will help it being capitalized and built upon, really is a unique one, of really trying to innovate and revolutionize the market that is crying for it.

    And they're very excited and every time we walk into a meeting to discuss the sort of tools, technologies and the systems that we are putting in place to help them be more efficient, to sell faster, to sell better, to buy better or to push the media around more efficiently and to get reports of higher quality and in shorter amount of time back to their customers, or of how their ad campaigns are going. All these things add up to be a sort of a collection of offerings that they are very excited about, and they welcome when they talk to us, very much so.
    Ron: So when you came to the Spot Runner last fall. Describe for me what kind of situation you were coming into?
    Marco: I very much saw what I expected to see, meaning the company as a start-up made very smart choices on how to get to market. Their priority was we want to get to market, put a solution in front of customers, and get our feet in the water and start building a business and market that did not exist, put it that way.

    So I found a system in our website with a back end database and a front end website with a presentation rendering layer, and all the presentations were going back and forth between these two tiers. And it was built very much like a prototypical base would have been built.

    And I believe all the choices that had made there, and if I was the one building the company, put it that way, would have made pretty much the same choices. You need to get your company built and your feet wet, before you start to think about architecture, scalability, like how many tiers your system should have, what kind of services should you expose and what technology is best fit for different solutions. These things are all great, but obliviously they have their time.

    So when I got here I found what I expected to find, but at the same time, it was the right time for shifting that and moving towards more of an enterprise solution, of a highly scalable, distributive, flexible and extensible type of architecture and technology. So it was the right time to start making that shift.
    Ron: So was it a.NET based web application and.....?
    Marco: It was ASP.NET on the front end with some AJAX and.NET 2.0 on both the front end and the back end, with a SQL 2000.
    Ron: So a pretty decent application. It wasn't horrible, or anything like that?
    Marco: Oh no, not at all. I guess I found what I expected and it's was fine. But I mean there were lots of things from a design perspective and from an architectural perspective that needed change but absolutely the site worked.
    Ron: So I am curious about the role of architecture, in this evolution of the system from where you started and where you've come to today, or where you're going? Was there somebody who was like an architect of the previous one when you came in, or did you hire in an architect or did you serve in that role, how does that work?
    Marco: That's an interesting question, when I came here there was an architect Craig Tadlock, an architect for Spot Runner, had started already, so I started shortly after Craig had joined as a full time employee. And I was pleased to find that he had the same sort of thinking along the lines of, where we should go with this technology and what we should do with Spot Runner from a systems and technology perspective to bring it to the next level.

    So we had long conversations about this new project, which we codenamed- Hollywood, which was the version 2.0 of our website, where we actually transitioned into a three tier architecture that equaled the front end rendering layer ASP,.NET, AJAX. And then moving into a service oriented architecture in the middle tier, where we concentrated all the business logic on a set of a very extensible and flexible service managers and service tier, right? And in the back end it is still in SQL but also adding SQL 2005 on the back end, as opposed to just SQL 2000.

    So several changes across the entire system to be able to address needs that we had from a growth perspective, the previous system could not support. And they arranged from being able to have a counter management system in place, that would allows to make quick changes to the website without the need to compile or making it build, to increasing our search capabilities and performance, as well as quality of search and matches on the site, to how we track the media plan and create media plans, how we store things in the data place, to how even we move messages and emails in the system.

    So across the spectrum, we needed to change the architecture so we could grow with the needs of the business. That included customer relationship management, CRM, system, which included new tools for buying media and trafficking. Which we do today visually, as David mentioned in his talk with you.

    Also moving forward we needed to look out how we are going to support additional media types. So, Spot Runner had started with pretty much focusing on cable TV, so now we are moving into other mediums like broadcast, like online search, like online videos, like radio. So in order to be able to accomplish that and satisfy and serve our customers with an umbrella of media types, we needed to be also able to support that from a schema perspective, data representation, buying options, and tracking, and planning. And so all these things needed to change in the architecture as well.

    So it went from that prototype that could afford to make several hard coated "assumptions" about what the business needed, and how the functionality should behave to a much more flexible, extensible open architecture that could accommodate several different customer types, customer needs, customer scales, media types, so and so forth.

    So that's been sort of the evolution. And the architecture -- to get back to your question -- is an instrument or aspect of making that transition successful. You need to think through which sort of architecture will scale with us, so we don't pigeon hole ourselves into a corner again, when we introduce, let's say, auto home billboards. And, then what, can we sell that, can we traffic and track it or can we report on it. So it needs to be able to support all these things, right?

    So having made making that change for us was a very important point inside the technology history of Spot Runner, which happened back in late February, early March.
    Ron: OK. We talked a little bit about the architecture of this, but I am curious about how the development process evolved. So one sees that a start up development process is kind of chaotic; it's just you know, let's just get something done, and let's get it out. And we agree that's probably OK up to a certain point in the company's life cycle.

    But as the company matures and as this is a central piece -- this site's got to work or this company is in trouble. Did you have to bring some changes to the development process to make that work?
    Marco: Yes absolutely, so it's interesting to always keep a balance. We were a start up and we are still a start up. At the same time, as you pointed out, we needed to mature the processes to a point, where it could move from a prototypical code to enterprise quality code. It is a business to business system at the end of the day; it's not a consumer service.

    So it just has to work and it has to work predictably and robustly every single time. So we had to make changes there. And we had to make them in such a way that we could preserve the ability to be agile, nimble and fast. But at the same time bring structure and order that was disciplined and predictability and a very quality-driven methodology into the table.
    Ron: So I am wondering how many people are on the team? Or where they are or how many have you grown to?
    Marco: Yeah, the team has grown dramatically. We were a fairly small team when I got here. It's more than tripled since. It's fairly a large team of engineers for a start up today.

    And a good thing about Spot Runner -- and I always go back to this -- is that we been able to attract an incredible talent pool and it's an amazing team today. I am incredibly privileged to be managing such a strong team of engineers in this area because it is not a typical development area, right? When you think about engineering, you know, you don't think about Los Angeles as an engineering mecca, if you will.
    Ron: Right. Right.
    Marco: It's an entertainment place, not a sort of engineering place. So we've been able to actually attract, retain and grow a fantastic team of engineers here. And team is extremely strong today. They have embraced these changes, and helped galvanize and really strengthen the development process which we are going to talk about now, to a point where the execution now, is incredibly high in rate, quality and prediction. So it's been a fantastic shift and I am very happy to see that in the company.

    So to sort of to address the changes we made, I look back at a dimension that was sort of a more chaotic process that was very much real timed, in trying to get things into the data centre on almost a weekly basis. And to step back and think through, OK let's get things in a structure that we think will be able to drive frequent releases on a monthly basis, instead of a weekly basis, but with higher quality and many more enhancements.

    And then we also wanted to retain the ability to do larger releases, where you couldn't finish in a month and you had to make more structural changes, be they architectural changes, schema changes, or service changes. There would be in a more three or four month timetable.

    So we kept both process-a much more agile monthly process, and a much more structured system release type of process. So if I were to cover that and I can explain to you, and show you very quickly....
    Ron: Sure.
    Marco: ....sort of what our methodology is. We refer to it as 'D cubed', which is a three step process that starts with definition, moves into development and then finally goes into delivery.
    Ron: And these for the system, kind of three or four month's cycles?
    Marco: Yeah. This is for three or our month's cycles. They are more structured cycles and take longer obviously and they require a lot more documentation and clear definition up front. So what we tried to do in that case is that during the definition take all the ideas and the business requirements, codify that into functional specifications and start to do prototyping early on.

    And that's where a bit of the agile nature of the company still plays a very critical role. So we try to prototype a lot in the beginning and then we rely on a prototypical code as we move through to the production code. So that we can actually leverage that effort and investment and at the same time learn as we go, even during system releases, right?
    Ron: OK.
    Marco: So with those prototype inspects we have a high level schedule at hand that we can actually communicate the business and give them sort of a timetable for a system release. So you get a pretty good approximate idea of when the system will be ready. Even though it may not yet be a committed schedule, but a good high level estimate. So the next phase when you move into development is exactly when we will have the detailed designs, of the how... you think of a definition answering your questions on what we're doing, why we're doing it and for whom we're doing it, which are the three W's. The in development the first question it answers is how we are going to do it.

    Then we move into that designed documentations phase very aggressively. And then we get into the tasks. Once we know the designs we sign off the designs, so we can build detailed task lists and we use team foundation services to track all the documentation and task development, but the whole system end to end.

    So we put all those tasks in TFS and that point we are able to look at a detailed bottoms up schedule, which gives us a very predictable cost analysis, and gives us a very high level of confidence of when the code will be complete. Therefore I can project the test complete and project the delivery.
    Ron: Let me ask that you that how long do you think -- and let's imagine it to be a three or four month's cycle -- typically how long would you spend on the definition?
    Marco: On a four month release, I would say that we would spend probably a month on the definition.
    Ron: Yeah.
    Marco: In parallel to the initial definition we start to do some design documentation, because it's not truly waterfall. As areas of the release are defined, the designs and bottoms up schedules are done for those. Then when the long pour, the last area gets defined, then the final schedule is finalized. But it's possible that coding had already begun for other areas.
    Ron: Oh right, OK.
    Marco: It's not like we will spend a month in definition, and nothing else will happen that month.
    Ron: Right.
    Marco: It's more like a rolling process where even if it takes let's say, five weeks for definition to be complete, coding on many areas started before hand. Then the last thing that gets done, defined on the fifth week, then moves into development. Right? So, there's overlap among definition in development. There's no like clear bright line between these two things, or there's some overlap.
    Ron: Then do you have people serving in like a program manager like role, who are working these areas?
    Marco: Yes.
    Ron: OK. So maybe some of the developers are working on maybe finishing up some stuff from a previous phase or something, and the PMs have already moved on to this?
    Marco: That's right.
    Ron: Yeah..
    Marco: So we try to maintain the ability to allocate the resources, the most effective way by not having to have these bright lines. So as a definition is completed for a given area, they can move in to coding in that area. The builds will accommodate that by picking up the change sets according to what's been signed-off, and what's been coded, unit-tested. Having the mentality of focusing on test-driven development, gives us the ability to produce code that is much more stable, and not having build breakers all the time for bad coding.
    Ron: Yeah. Now, I'm curious when you mention the bottom-up scheduling.
    Marco: Yep.
    Ron: For people who are not familiar with scheduling, for people who are not familiar with that term--what does bottom-up scheduling mean?
    Marco: That basically means that instead of having the management team tell the developers when the code to be complete, we're actually asking them, "When are you going to be completed with your code?" Which is an answer they will give, based on their design documents. They will write the design doc for what they are going to build. They do an analysis of the tasks, there's a task breakdown based on the design document they work with the program managers on, regular technical program managers.

    Then out of this developer TPM interaction comes this estimate, which is much more predictable. I think it's a much higher quality number when they've done the initial analysis of the spec, they've done a design document. They've broken down the tasks that it take to build those components and that code. Then you can move the assessments for those tests, and that's what's called, "Bottoms-up." It's done by the people writing the code, as opposed to managers or executives.
    Ron: I can just hear some people out in the world are going to be saying, "Oh, that's going to take way too long." [laughs] "We just want to write some code." If you get that kind of reaction--I mean, how do you defend this practice?
    Marco: It's an incredibly, important question. Because when I came here and we started making these changes, some of those reactions were absolutely present. People felt like, "Well, this is going to be taxing, " or "This is going to take longer, " or "I don't know if this is going to work." So, there was a bit of an evangelization process that took place.

    We had to go show people as we adopted the methodology and the practice, that in fact their work became faster. Right? So if you work within these constructs as a developer, and you write unit tests, and you have good documents, and good quality assessments, your work is of much higher quality. In the end it's done faster, because you only do it once. Instead of having to redo the code over, and over, and over again--and refractor your designs over, and over again, you take that initial "measure twice, cut once" approach.
    Ron: [laughs]
    Marco: Where you actually think twice before you write the code, then you write the code only once, and you write good quality code to begin with. So one of the ways in which this became evident to the company and the engineering team, was this 2.0 release that we had built back in early October when I started here.

    We did an entire analysis of the bottoms up scheduling, and we came up with an aspect that showed it would be done on February 27th 2007, that's when we were going to go live.

    And that was the first time we published our schedule right, based on the bottoms up. The scheduling was never touched it was never altered, and we went live with no p1 and p2 bugs on February 24th 2007, which is three days ahead of the original schedule, with something that was built just five and months before. When that happened a lot of people immediately had this reaction like, I never seen this happen and I didn't know this could be done.

    Such a massive release with so many moving parts and so many engineers, built in five and half months ago could be delivered three days ahead of schedule.
    Ron: And they did it without working 24 hours a day and seven days a week? [laughs]
    Marco: We didn't have to work 24 hours a day seven days a week. But obliviously as a start up, you know, we had to work hard, but that's kind of par for the course. But you know it wasn't any deaf march, people you know had this very uneventful type of release in the end when we went live. And we were happy to see that, and that's basically the last of the pie sort of delivery, when you actually go live with the code and operationalize and maintain it.

    It was a very good and pleasant surprise for everyone involved when we deployed and went live, but it was pretty much uneventful, it just worked.
    Ron: So I am curious when you talk about the delivery, like you mentioned early on when you came in they were in this very kind of agile mode where there like rolling out things weekly. Often when I see less mature shops, I see practices that really hurt the delivery, like, "Oh there's a bug on the site. Well lets just go on there and fix this file and we'll just throw it up there real quick, because we got to fix this right away", and there's well not very much discipline around delivery, and they find this as an area that hurts them.

    So you talked to me a little bit before we started taping of your whole process. So what if you could just walk us through the process, from the point where the developer is writing the code to the point where it's actually delivered on the production service. And lets talk through how that works.
    Marco: Absolutely, let's do that. Let's talk about sort of the development pipeline in the development process. How we take the code from the developers hand all the way to the production system, all right. First of all set the contacts, everything you see in the slide is done in an automated system, meaning that policies, the deployment and chaperoning of the entire code is done by automated scripts driven in team foundation services. And there is no human interaction in any of the labs and any of the deployments.

    So we start with the developer writing some code at their desk. The first thing that happens when they want to check in, is that they have to clear the check-in policies. We have rules that we apply in TFS that make sure for example that your code has comments -- that you have done your code review of the code that you've written, and we have to check and acknowledge if done so.

    And if you don't comply with the policies you're not allowed to check in. So let's say you have cleared those policies, the next thing that happens is you have to run a set of unit tests -- the unit tests that each developer is required to write, plus the unit test of the other developers that they've written already into the code base before you. So all units test have to pass before you check in. If the unit tests fail what happens is you can't check in.

    So unless you build clean and pass all the unit tasks, TFS will not allow you to check it in. So now let's say you were finally able to build clean and the tasks pass, you can check in. And after you check in the build will execute -- the build has this spontaneous integration mode where you build frequently several times, in the hour even. Sometimes it can take an hour but you'd build several times within the hour with all chain sets. And it deploys the code automatically. Again all this pipe line is automated into our integration lab.

    Now when you move into the labs it is important to explain that all the labs are virtualized. We have very large virtual server heads where we can build in real-time different [inaudible] for different releases.

    Currently we are working on four different branches, one of which is a maintenance one and the others are three different releases we are working on. Those three releases can be built in parallel and deployed in parallel through this pipeline you see here in virtual servers.
    Ron: So what's fantastic about this, I love this idea, is that you could say, "When release X gets to our production system our data center is actually going to look different. We're going to have these kinds of servers laid out this way." You can virtualize that so you can begin testing that right from the very beginning.
    Marco: That's right. So what we do is deploy into the integration lab, once the build is ready, to run the same tests that all the developers ran, the integration tests, the unit tests, plus a battery of tests called INT, integration tests. Those tests are written by the automation framework developers which are testers that write test code to test the developer's code. So, we're very much focused on this notion of automation on the test side to reduce the costs and increase the efficiency and the quality of the builds.
    Ron: So this is an interesting change that you brought in. When you started there wasn't really a role for a test engineer or, what we call at Microsoft, an SDET, a software development engineer test.
    Marco: That's right.
    Ron: You didn't have that role. I find out a lot of companies say, "Oh well sure, Microsoft, you guys can afford that but we can't afford to have somebody doing that. If we have somebody who can write code, we're going to have them writing code." How do you react to that?
    Marco: I think it's a very short change mentality to think that way because you can not afford to not afford to do it. I think it's incredibly important if your system has to work and quality is not a currency, meaning you're not willing to trade quality, it just has to work, then you need to be able to predictably and automatically test and validate the code on a frequent basis -- to catch regressions, to catch bugs before they spin out of control or before they become unmanageable. Right?

    That practice, to me, is priceless. There is no way to justify -- and I did, I showed, based on numbers, of code coverage and increasing quality of what an SDET can bring to an engineering organization. So, if before we had a code coverage average of less than 20 percent by having just STEs test the site.

    When we shipped 2.0 our code coverage was upwards of 75 percent because we had SDETs write a whole lot of automation and go at the code and find lots and lots of bugs and also be able to do the regression testing on every build automatically without having to have STEs or test engineers sit in front of a machine and click through the same sequence of buttons and functions over and over and over again. That's money thrown out the window. It's much more efficient to have code do that.
    Ron: But people will say, "But you have developers writing unit tests. Isn't that enough?"
    Marco: No, it's not enough because the unit tests are very effective at catching problems with the sort of micro-code around the function you just built or object or interface but they will not show you where the interaction of your component with other pieces of the system may or may not be faulty.

    So you have to be able to round trip the data from the code the developer wrote all the way to the database, sometimes all the way to a partner website, which we do, and then back into our own system's databases. In order to see this entire round trip the only piece of code that can actually exercise that is an integration task not a unit test. A unit test is very locally focused versus integration is system-wide focused, it's a test across entire sets of systems or components.
    Ron: Ah, OK, fantastic.
    Marco: So, moving along, let's say now we run this whole battery of unit integration tests and we can automatically see the report. Did it pass or not? If it fails, the build is considered broken. It's a build failure because the integration tests failed.

    So you must regress somebody else's code, or your own code could not operate well with the other pieces of the system, even though the unit tests passed.

    You see the difference, right?
    Ron: Yeah.
    Marco: It can still fail even though the unit tests passed, because at integration level, it caught a bug that you didn't anticipate.
    Ron: Right.
    Marco: That bug is assigned to the developer, and the build has to be redone. It becomes a priority-one bug. You have to drop whatever you're doing and rescue the build. You have to bring it back to a working state.

    If it passes then, it gets deployed into the QA lab. Black box QA engineers, who are the traditional STEs, will then be able to start testing the functions, end to end, for fit-and-finish functionality, responsiveness, qualitative experience, qualitative data; things that humans are much better suited for analyzing than code.
    Ron: So this is a bunch of manual testing.
    Marco: A bunch of manual testing happens at this point. Once that happens, they can finish all their test cases, and decide whether the code passed or not. If they find any issues, they will log those bugs. Those bugs, again, get assigned to a developer, and we go back to the beginning of having another build. This is iterated through, during coding, after code complete; throughout tests, until tests complete.

    Once we're finished with all this, and we hit that zero-bug balance, and there's a sign-off, we go into the pre-production lab. In the pre-production lab, we're talking about dedicated hard work in the data center, on a separate cage, where we can do a final sign-off.

    After we've hit zero bugs, and we've done performance and stress testing, we go into this lab to do the final checklist and final sign-off. That takes anywhere from three days to two weeks, depending upon the complexity of the release. After the final sign-off, we go into production, assuming everything passes. If not, we may have to log bugs and push another build to pre-production.

    So that is the pipeline, in a nutshell.
    Ron: Wow.
    Marco: Just so you know, this process, minus the pre-production all the way to the QA lab, happens 20 to 30 times a day, for a given branch.
    Ron: That's incredible. All the automated parts happen that often?
    Marco: Right.
    Ron: Then how often might you push a build into the QA lab?
    Marco: It depends on how often a build passes integration tests.
    Ron: Yeah.
    Marco: The QA push is on-demand. It's all automated, but it's on-demand. The QA release lead -- for a human release -- has to go to a web page.

    Let's say that I'm requesting a new build to QA. Integration tests passed. My team is ready now to go into the next wave of test cases on this particular branch, so push a new build to the QA lab. They hit the button, and it gets deployed to that branch on that virtual lab.

    This usually happens between five and 10 times a day, depending on the build. Sometimes it happens once a day, if the builds aren't of very high quality, or very stable; especially in the early stages of development, when the integration tests are still failing because the code is not up to par yet. It depends on where we are in the development cycle.
    Ron: It seems like this process would be incredibly effective at keeping the defects from getting into late stages of production, or in the QA lab people getting a broken build and starting their strip and wasting all their time on that, or even bugs getting into production. Has that been your experience?
    Marco: It has. We've been able to increase the efficiency of execution of the QA team a lot by having the SDETs who do the automation tests buy down the risky with automation. And the developer write unit tests themselves before they actually check in.

    So by the time a QA engineer gets their hands on the code, the code is useful for testing. So it's good enough to be tested, right? So, it allows us to reduce our costs on the QA team side.

    When people say, "We can't afford to invest in this test", you reduce the cost of QA engineers by increasing the automation workforce. So, it is totally a balancing act, and, like I said before, I think that every company should think very hard about that because what you can't afford is to not do it if you are building enterprise solutions.

    If you're building a very small prototype consumer site and just getting started, fine. Of course nobody's going to do that. But as you move into the next level, and you increase in scale or increase in critical nature of the site. It just has to work because there's a lot of business dependency on it. Then you have to have this kind of quality gates and quality driven mentality on the development side.
    Ron: I'm curious about the performance and scalability, and where do you do the checkpoints around that, or how do you do it?
    Marco: One of the things that you didn't see in that slide was that it would be very cumbersome to have all the different pieces in there is that we have a dedicated performance lab which is not a virtual lab. It's actually dedicated hardware here in our offices.

    What we do is, in the latter stages of development, during the test phase, once we're past code complete, we will then very quickly move on to making our automated deployments into the proof lab. In the proof lab, we run at night. We will run high stress tests to test not just the raw performance of a given type of request or service on the site, but the uptime longevity scalability of the site throughout, let's say, 24 or 48 or 72 hours.

    So you run those barrier tests. Typically, it's hard for a company our size to afford very lengthy performance investigations. Those can be extremely costly depending on where the bottlenecks are.

    We've been very fortunate with having the team be able to do the analysis and debugging of our bottlenecks, the ones that were really preventing us from getting to the level of requests that we needed to have. Right? To sign off. The criteria we set in the beginning.

    So, by using this cycle you saw of front load quality plus the use of the proof lab and doing that analysis on a frequent basis. Being able to iron out a lot of these... Be they store procedure, locks and time outs, or be it C# code inefficiencies. We weed those out of the system and get a very high degree of performance on our system today.
    Ron: So, one of the big problems that a lot of people will say is when they are sitting down to plan their site or whatever, they get really vague performance ideas. Like, the business people will say, "Well, it has to be really fast." Or, "We think we might have twice as many customers six months from now as we have today." Or, they get really kind of vague targets.

    It sounds like you have very specific goals in mind for a given release that you're shooting for.
    Marco: Yes, it's not always, as you just alluded to, not always clear in the beginning of the release just how many types of requests you have, or even how many requests per second you need for a given type of request.

    These things are sometimes a little bit organic, and you have to do, sometimes, educated guessing. Sometimes by talking to the business marketing and sales side, you can actually do very good projections of customers they expect to come use this feature, how many times a day. The times of day, for example, where you expect to peak. Eastern time versus Pacific time. You model those windows, and you take a lot of what you seen production. You bring that back to the lab, and you then apply Multiplier.

    So usually we'll try to be conservative and say, "If we were here..." Let's say we were doing an ad campaign, and there are load spikes to 10 times the peak load of 10:00 a.m. Then what happens? We can emulate that, and simulate that type of spike in the Perf. Lab and see what happens. Usually we will try to be able to clear that bar, the spikes stress bar, before we shift.

    It's not a sustained load, because it's not sustainable. Otherwise you're going to over engineer your hardware, and overspend. So we try to make sure the system can survive, let's say an hour or two under high spike. Then you drop down to the sort of like high-load level, and you can survive that for 24 or 48 hours.

    So, that's usually how we try to project these things. Even when you don't have the sort of "crystal ball" type of prediction of "We will have these kinds of requests, and this kind of load." That's true for most companies. It was true when I was still at Microsoft. It's true here, and I think it's true for a lot of people in this business.
    Ron: So it sounds like everything is wonderful, all this guys, everything's been great, totally easy--but I know that's not true.

    Marco: No, it's not true.
    Ron: So I'm just wondering about from here on out, like as you look out in the next year or two years. What are the things that kind of keep you up at night as you're thinking, "Oh, we've got to handle this. We've got to tackle that"? What kinds of things are you worried about?
    Marco: It's a fascinating question. I still believe that one of the most important things we need to focus on is the people, and the engineering team, and growing a team. Getting more top talent into our body and our D&A pool, keep them challenged, keep them trained. Bringing in new technologies, and how do we integrate innovations that several different companies are creating, one of which Microsoft is a platform we run on, into our intro system? Things like Silverlight, things like ACMI, and Windows Server 2008, and what does that mean to us?

    How do we sort of onboard these technologies in an organized fashion, without disrupting the delivery of the business innovations we have to deliver on, that the business is waiting for? Right? So there's this balance of features and innovations that are top-line and are bottom-line direct impacting vs. technology enhancements that we actually need to bring into the fold to help these. But also be more efficient, and be more skillable or more flexible in the long run, so we can deliver on these initiatives and these feature enhancements better and in a higher quality.

    So, how do we keep a sustained competitive advantage? Is one thing that if you ask me, "What keeps you away at night?" I think a lot about that. Spot Runner is in a very good position today. A very strong position for what we do and the space we are in, and really charging forth very aggressively on bringing more innovations in the tradition of the advertisement space. But we have to think, 'What are the investments that we have to make, in what areas?" And how we balance those investments to be able to maintain that sustained competitive edge that we have today, and continue to attract top talent to an incredible team already. So, those are things that I think a lot about.
    Ron: Thanks so much Marco, for being with me today on ARCast.TV. That's it from Spot Runner here in Los Angeles. We'll see you next time on ARCast.TV.

    Ron: Wow, I'm telling you! If you need to deliver quality software in a web application environment, you can learn a lot from Marco DeMello. I mean, they have an incredible process built using Team Foundation Server. These Virtual Server Environments, it's just fantastic when you think about what this enables them to accomplish.

    Ultimately we have to deliver high quality software, you just can't trade that off. Most of the stuff people are building today, frankly, is not good enough quality. We got to do better. So I hope you enjoyed that, and learned something from Marco and that process. That's what it's all about. We're trying to make it better. Make the world better one website at a time, my friends.

    Hey, we'll see you next time on ARCast.TV.
    Announcer: ARCast.TV is a production of the Microsoft Architecture Strategy Team,

Add Your 2 Cents