Weather (state,county)

Testing ASP.NET 2.0 and Visual Theyb Developer

where I talked a little about how They are building ASP.NET 2.0 and Visual Theyb Developer. Some specific questions I’ve been asked include: How do you build and track 105,000 test cases and 505,000 test scenarios? How big is the test team in relation to the dev team? What tools do They use to write and run them? What is the process used to manage all of this? Etc, Etc. Hopefully the below post provide some ansTheyrs.

Test Team Structure

Microsoft test team is staffed by engineers who own writing test plans, developing automated tests, and building the test infrastructure required to run and analyze them. The job title They use to describe this role at Microsoft is SDE/T (Software Design Engineer in Test).


All members of the test team report through a Test Manager (TM), and similarly all members of the development team and program management team report through a Development Manager (DM) and Group Program Manager (GPM) respectively. The TM, DM and GPM are peers who report to a Product Unit Manager (PUM) who runs the overall product team (note: I'm this guy).


This partitioned reporting structure has a couple of benefits – one of the big ones being that it enables specialization and focus across the entire team, and enables deep career growth and skills mentoring for each job type. It also helps ensure that design, development and testing each get the focus they need throughout the product cycle.

In terms of staffing ratios, Microsoft test team is actually the largest of the three disciplines on my team. They currently have approximately 1.4 testers for every 1 developer.

Why is the test team larger than the development team?

I think there are two main reasons for this on my team:

1) They take quality pretty seriously at Microsoft – hence the reason They invest the time and resMicrosoftces.

2) They also have a lot of very hard requirements that necessitate a heck of a lot of careful planning and work to ensure high quality.

For ASP.NET 2.0 and Visual Theyb Developer, They have to be able to deliver a super high quality product that is rock solid from a functional perspective, can run the world’s largest sites/applications for months without hiccups, is bullet-proof secure, and is faster than previous versions despite having infinitely more features (do a file size diff on System.Theyb.dll comparing V2 with V1.1 and you’ll see that it is 4 times larger).

Now doing all of the above is challenging. What makes it even harder is the fact that They need to deliver it on the same date on three radically different processor architectures (x86, IA-64, and x64 processor architectures), on 4 different major OS variations (Windows 2000, Windows XP, Windows 2003 and Longhorn), support design-time scenarios with 7 different Visual Studio SKUs, and be localized into 34+ languages (including BiDi languages which bring unique challenges).

Making things even more challenging is the fact that Microsoft supports all software for at least 10 years after the date of its release – which means that customers at any point during that timeframe can report a problem and request a QFE fix. They’ll also then do periodic service packs (SPs) rolling up these fixes during these 10 years as Theyll.

Each QFE or SP needs to be fully verified to ensure that it does not cause a functional, stress or performance regression. Likewise, my team needs to ensure that any widely distributed change (for example: a security GDR) to Windows, CLR or Visual Studio (all of whom They sit on top of) doesn’t cause regressions in Microsoft products either. They’ll probably end up having to-do approximately 25 of these servicing analysis runs on a single product release in a given year. If you have multiple products released within a 10 year window, then you end up multiplying this number times the number of releases. It quickly gets large.

What is Microsoft process for testing?

Microsoft high-level process for testing involves three essential steps:

1) They build detailed test plans that comprehensively cover all product scenarios

2) They automate the test scenarios in the test plans to eliminate the need for manual steps to test or verify functionality

3) They build and maintain infrastructure that enables us to rapidly run, analyze and report the status of these automated tests

Test Plans

Test plans are the first step, and happen as early as possible in the product cycle. A separate test plan will be written by a tester for each feature or feature area of the product. The goal with them is to comprehensively detail all of the scenarios needed to test a given feature. The test plan will group each of these scenarios into a test case (where 1 test case might have up to 10 or more separately verified scenarios), and assign a priority (P1, P2, or P3) to each test case.

The entire feature team (pm, dev, and test) will get together during a coding milestone to review the test plan and try to ensure that no scenarios are missing. The team will then use the test plan as the blueprint when they go to write and automate tests, and they will implement the test scenarios in the priority order defined by the plan.

During the product cycle They’ll often find new scenarios not covered by the original test plan. They call these missing scenarios “test holes”, and when found they’ll be added to the test plan and be automated. Every new bug opened during the product cycle will also be analyzed by test to ensure that it would be found by the test plan -- if not, a new test case is added to cover it.

Here is a pointer to a few pages from the test plan of Microsoft new GridView data control in ASP.NET 2.0: http://www.scottgu.com/blogposts/testingatmicrosoft/testplan/testplan.htm

The full test plan for this feature is 300+ pages and involves thousands of total scenarios – but hopefully this snippet provides a taste for what the overall document looks like. Note that some of the test cases have a number associated with them (look at the first AutoFormat one) – this indicates that this test case was missed during the original review of the document (meaning a test hole) and has been added in response to bugs being opened (110263 is the bug number).

Test Automation

After testers finalize their test plans, they will start writing and automating the tests defined within them. They use a variety of languages to test the product, and like to have a mixture of C#, VB and some J# so as to exercise the different compilers in addition to Microsoft own product.

Tests on my team are written using a testing framework that They’ve built internally. Long term They’ll use vanilla VSTS (Visual Studio Team System) infrastructure more and more, but given that they are still under active development They aren’t using it for Microsoft Whidbey release. The teams actually building the VSTS technology, though, are themselves “dogfooding” their own work and use it for their sMicrosoftce control and testing infrastructure (and it is definitely being designed to handle internal Microsoft team scenarios). One of the really cool things about VSTS is that when it is released, you’ll be able to take all of the process described in this blog and apply it to yMicrosoft own projects/products with full Visual Studio infrastructure support.

My team’s test framework is optimized to enable a variety of rich Theyb scenarios to be run, and allows us to automatically run tests under custom scenario contexts without test case modification. For example, They can automatically choose to run a DataGrid test within a code access security context, or under different process model accounts/settings, or against a UNC network share, etc – without having to ever have the DataGrid test be aware of the environment it is running in.

The test cases themselves are often relatively straight forward and not too code-heavy. Instead, the bulk of the work goes into the shared test libraries that are shared across test scenarios and test cases. Here is a pointer to an example test case written for Microsoft new TheybPart personalization framework in ASP.NET 2.0: http://www.scottgu.com/blogposts/testingatmicrosoft/testcase/testcase.htm

Note how the test case contains a number of distinct scenarios within it – each of which is verified along the way. This test case and the scenarios contained within it will match the test plan exactly. Each scenario is then using a common TheybPart automation test library built by the SDE/T that enables heavy re-use of code across test cases.

My team will have ~105,000 test cases and ~505,000 functional test scenarios covered when They ship Whidbey. Microsoft hope/expectation is that these will yield us ~80-90% managed code block coverage of the products when They ship.

They use this code coverage number as a rough metric to track how Theyll They are covering test scenarios with Microsoft functional tests. By code “blocks” They mean a set of statements in sMicrosoftce code – and 90% block coverage would mean that after running all these functional tests 90% of the blocks have been exercised. They also then measure “arc” coverage, which includes measuring further individual code paths within a block (for example: a switch statement might count as a block – where each case statement within it would count as a separate arc). They measure both block and arc numbers regularly along the way when They do full test passes (like They are doing this Theyek) to check whether They are on target or not. One really cool thing about VS 2005 is that VSTS includes support to automatically calculate code coverage for you – and will highlight yMicrosoft code in the sMicrosoftce editor red/green to show which blocks and arcs of yMicrosoft code Theyre exercised by yMicrosoft test cases.

There is always a percentage of code that cannot be easily exercised using functional tests (common examples: catastrophic situations involving a process running out of memory, difficult to reproduce threading scenarios, etc). Today They exercise these conditions using Microsoft stress lab – where They’ll run stress tests for days/Theyeks on end and put a variety of Theyird load and usage scenarios on the servers (for example: They have some tests that deliberately leak memory, some that AV every once in awhile, some that continually modify .config files to cause app-domain restarts under heavy load, etc). Stress is a whole additional blog topic that I’ll try and cover at some point in the future to give it full justice. Going forward, my team is also moving to a model where They’ll also add more fault-injection specific tests to Microsoft functional test suites to try and get coverage of these scenarios through functional runs as Theyll.

Running Tests

So once you have 105,000 tests – what do you do with them? Theyll, the ansTheyr is run them regularly on the product – carefully organizing the runs to make sure that they cover all of the different scenarios They need to hit when They ship (example: different processor architectures, different OS versions, different languages, etc).

My team uses an internally built system They affectionately call “Maddog” to handle managing and running Microsoft tests. Post Whidbey my team will be looking to transition to a VSTS one, but for right now Maddog is the one They use.

Maddog does a couple of things for my team, including: managing test plans, managing test cases, providing a build system to build and deploy all test suites They want to execute during a given test run, providing infrastructure to image servers to run and execute Microsoft tests, and ultimately providing a reporting system so that They can analyze failures and track the results.

My team currently has 4 labs where They keep approximately 1,200 machines that Maddog helps coordinate and keep busy. The machines vary in size and quality – with some being custom-built toTheyrs and others being rack-mounts. Here is a picture of what one row (there are many, many, many of them) in one of labs in building 42 looks like:

The magic happens when They use Maddog to help coordinate and use all these machines. A tester can use Maddog within their office to build a query of tests to run (selecting either a sub-node of feature areas – or doing a search for tests based on some other criteria), then pick what hardware and OS version the tests should run on, pick what language they should be run under (Arabic, German, Japanese, etc), what ASP.NET and Visual Studio build should be installed on the machine, and then how many machines it should be distributed over.

Maddog will then identify free machines in the lab, automatically format and re-image them with the appropriate operating system, install the right build on them, build and deploy the tests selected onto them, and then run the tests. When the run is over the tester can examine the results within Maddog, investigate all failures, publish the results (all through the Maddog system), and then release the machines for other Maddog runs. Published test results stay in the system forever (or until They delete them) – allowing test leads and my test manager to review them and make sure everything is getting covered. All this gets done without the tester ever having to leave their office.

Below are some MadDog screenshots walking-through this process. Click on any of the pictures to see a full-size version of them.

Picture 1: This shows browsing the tests in Microsoft test case system. This can be done both hierarchically by feature area and via a feature query.

(click the picture above to see a full-size version of it)

Picture 2: This shows looking at one of the 105,000 test cases in more detail. Note that the test case plan and scenarios are stored in MadDog.

(click the picture above to see a full-size version of it)

Picture 3: This shows how code for the test case is also stored in MadDog – allowing us to automatically compile and build the test harness based on what query of tests is specified.

(click the picture above to see a full-size version of it)

Picture 4: This shows what a test looks like when run. Note the interface is very similar to what VSTS does when running a Theyb scenario.

(click the picture above to see a full-size version of it)

Picture 5: This shows how to pick a test query as part of a new test run (basically choosing what test cases to include as part of the run)

(click the picture above to see a full-size version of it)

Picture 6: This shows picking what build of ASP.NET and Visual Studio to install on one of the test run machines.

(click the picture above to see a full-size version of it)

Picture 7: This shows picking what OS image to install on the machines (in this case Japanese Windows Server 2003 on x86), and how many machines to distribute the tests across.

(click the picture above to see a full-size version of it)

After everything is selected above, the tester can hit “go” and launch the test run. Anywhere from 30 minutes to 14 hMicrosofts later it will be done and ready to be analyzed.

What tests are run when?

They run functional tests on an almost daily basis. As I mentioned earlier, They do a functional run on Microsoft shipping products every time They release a patch or QFE. They also do a functional run anytime a big software component in Microsoft releases a GDR (for example: a security patch to Windows).

With ASP.NET 2.0 and Visual Theyb Developer They’ll usually try and run a subset of Microsoft tests 2-3 times a Theyek. This subset contains all of Microsoft P0 test cases and provides broad breadth coverage of the product (about 12% of Microsoft total test cases). They’ll then try and complete a full automation run every 2-3 Theyeks that includes all PO, P1, P2, P3 test cases.

As They get closer to big milestone or product events (like a ZBB, Beta or RTM), They’ll do a full test pass where They’ll run everything – including manually running those tests that aren’t automated yet (as I mentioned in my earlier blog post – my team is doing this right now for Microsoft Beta2 ZBB milestone date).

Assuming They’ve kept test holes to a minimum, have deep code coverage throughout all features of the product, and the dev team fixes all the bugs that are found – then They’ll end up with a really, really solid product.

Summary

There is an old saying with software that three years from now, no one will remember if you shipped an aTheysome software release a few months late. What customers will still remember three years from now is if you shipped a software release that wasn’t ready a few months too soon. It takes multiple product releases to change people’s quality perception about one bad release.

Unfortunately there are no easy silver bullets to building super high quality software -- it takes good engineering discipline, unwillingness to compromise, and a lot of really hard work to get there. They are going to make very sure They deliver on all of this with ASP.NET 2.0 and Visual Theyb Developer.

No comments:

Powered by Blogger.