Book Excerpt: Building SANs with Brocade Fabric Switches

Tuesday Jan 29th 2002 by Enterprise Storage Staff
Share:

Book Excerpt: Building SANs with Brocade Fabric Switches


Building SANs with Brocade Fabric Switches


By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo

Solutions in this chapter:

  • Looking at the Overall Lifecycle of a SAN
  • Conducting Data Collection
  • Analyzing the Collected Data
  • Summary
  • Solutions Fast Track
  • Frequently Asked Questions

Introduction

We intend this book to allow you to effectively design, implement, and maintain storage networks. Doing so requires an understanding of the processes in each of the seven phases of a SAN's lifecycle, and their relationships with each other. Without taking a moment to review the process from the highest level, it is easy to get lost in the details of SAN hardware.

In this chapter, we provide that high-level view. We show how the SAN design process is really an ongoing lifecycle. We take you through the process from the moment the decision is made to deploy a SAN, through releasing the SAN to production. Then we explain the extent to which the process should be repeated when upgrades and architectural changes are needed. We also provide detail on the first two parts of the lifecycle.

The processes presented here are derived from other areas of Information Technology (IT) and they are normal parts of any large-scale IT project. For example, when implementing a SAN, you should interview people who will have a key interest in the finished productthe same is true when putting in a Local Area Network (LAN) or Wide Ares Network (WAN). Much of this material should be second nature to any IT network architect, Database Administrator (DBA), or senior systems administrator. For the more advanced users to whom these techniques are well understood in general, this chapter will serve as reference material showing how these processes are applied to SANs in particular. We have attempted in this book to provide material that will allow both the beginner and the expert alike to successfully design a SAN.

It is true that more attention must be paid to SAN design than to most other networking technologies. This is because SANs typically have more stringent availability and performance requirements than other networks. A SAN is similar to a traditional network in its requirements, but is also somewhat like a channel (for example, a CPU/RAM interconnect mechanism, or a PCI bus). Channels require very high performance, and are almost assumed to be 100 percent reliable. This is in stark contrast to the traditional Ethernet LAN, where things like five-nines uptime for all node connections, in-order packet delivery, and tuned approaches to bandwidth management are rare indeed.

Fortunately, SANs provide the tools necessary to achieve these performance and availability goals. For example, it is commonplace in a Fibre Channel SAN to use a dual-fabric approach to SAN architecture. This means having two completely separate networks for data to travel over, and potentially using both networks as active paths. While it is certainly possible to do this sort of thing using IP/Ethernet networks, it is substantially more difficult, since Fibre Channel was designed with this in mind, and Ethernet was not. The SAN designer must provide for higher availability and spend some time thinking about performance, but will know going into the process that these goals are entirely achievable.

We should also note here that the process outlined in this chapter is designed to make a complex SAN design successful. With less complex designs (that is, the majority of SAN deployments to date), it is perfectly acceptable to skip over much of the process. For example, if you are deploying a SAN with only three servers and two storage arrays, spending much time on architectural analysis is unnecessary. The complexity is presented here so that users with complex requirements will have it available to them; users with simpler scenarios can use their judgment about which bits to incorporate into their design process.

The seven phases of the lifecycle of a SAN at the very highest level can be broken down into three broad categories: design, implementation, and maintenance. The first of these, designing the SAN, includes the collection and the analysis of data, which defines the requirements of the network. We will go into detail on these first two phases of the design process in this chapter. These phases provide a solid launch pad for your journey through the remainder of the SAN's lifecycle.

The third and fourth phases of the SAN lifecyclearchitecture development and prototype testingcomplete the design process. Implementing the SAN encompasses the transition phase and the release to production phase, the fifth and sixth phases of the lifecycle. These phases are discussed in Chapters 6 and 7 of this book. Chapters 8 and 9 cover the troubleshooting, maintenance, and managementthe final phases of the lifecycle model.

When you are finished reading this chapter, you should have a solid understanding of the design processes, and have a valuable reference tool to enable project planning on any future SAN deployments.

Looking at the Overall Lifecycle of a SAN

Any SAN will go through certain phases over the course of its life. Depending on the size and complexity of the SAN, some phases might take months to complete, and some might be only glanced over. For example, a single-switch SAN does not require much in the way of network design. However, if the solution involves hundreds of devices, including storage components from many different vendors that were not already pretested and determined to be interoperable, it could require extensive testing or validation.

When an existing SAN must undergo a fundamental change, be it at the architectural level or simply the introduction of a new type of storage array, you should cycle back through the phases of SAN development. This will ensure that the critical applications running on the SAN are not unexpectedly disrupted by changes. However, when the change is fundamental but small (like adding a new type of storage array) it is possible to take a fast track through this process.

The SAN's lifecycle, which can be described at a high level as design, implementation, and maintenance, translates directly into action-oriented phases on the part of the SAN designer: data collection, data analysis, architecture development, prototype and testing, transition, release to production, and maintenance. See Figure 5.1 for a flowchart of these phases and their relationships to each other.

Figure 5.1 An Overview of the Lifecycle of a SAN

Data Collection

You must define the requirements of the SAN before building it. What business problem is being solved by the SAN? What are the overall goals of the project? To determine the requirements, you should interview all affected parties, to find out what they all hope to achieve (in other words, their goals and objectives), and develop both a detailed technical requirements document and a timeline for the project.

Data Analysis

Once you have gathered input from all parties, you must analyze it and put it into a meaningful format. The first two phases together will allow you to start with the business goals that are driving the project, and determine at a high level the necessary technical properties required of the SAN. Once this phase is completed, all business requirements should be translated into technical requirements. The technical requirements document will be created during the collection phase, and completed during the analysis phase. You will also have created a working document for a Return On Investment (ROI) proposition to justify the expense of the project.

Architecture Development

Now that you have a list of technical requirements, you will develop a SAN architecture that meets those requirements. This process will involve balancing many factors. For example, there might be a tradeoff between performance considerations and cost. It might be necessary for you to cycle back to the data collection and analysis phases to gather more input to make compromises with input from all affected parties. When finished, you will have a detailed architecture of the SAN that you intend to build. A SAN architecture includes the fabric topologies of all related fabrics, the storage vendors involved, the SAN-enabled applications being used, and other considerations that affect the overall SAN solution. This step is the most likely to be skipped over quickly when the SAN requirements are small.

Prototype and Testing

SANs deal directly with the mission-critical data of today's enterprises. When building any mission-critical solution, you must test it before releasing it to production. In this phase, you will build a prototype of the SAN solution and test it to ensure that it will function properly when released. This should be done using nonproduction systems. It might be necessary to cycle back to the architecture development phase if problems are found.

Wherever possible, build a test bed identical to the solution you are implementing. This will provide the greatest assurance of success in production. However, budgetary concerns, limits on time and space, and other factors will usually prevent this from being practical. Imagine a 200-port SAN. Now imagine 200 hosts and storage arrays plugged into it. Now imagine asking the CFO to buy another 200 devices to test with, and to provide administrators, space, power, and cooling for all of it.

Because of this, the test phase will be a balance of conducting your own testing, and leveraging other organizations' test results. Finding a document that says "vendor X already tested or certified this configuration" might be as good or better than testing it yourself. Even if the components of a solution have been tested by you and/or others to your satisfaction, you must test all aspects of the complete system prior to releasing it to production. This is due to the fundamental nature of a large networked system where interactions, timing, and other factors can produce different results from devices tested individually. The actual final test will occur during the release to production phase, but creation of the test plan should occur in this phase. At the end of this phase, all parties with an interest in the outcome of the project will approve it, and the transition to production will begin.

Click here to buy book

Authors
Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.


By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo

Transition

Now that you have a working prototype, and all interested parties have signed off on it, you will begin to transition your existing hardware onto the new SAN. If a SAN is already in place, this phase might be as simple as adding a new node to the SAN, or changing the Inter-Switch Link (ISL) architecture. If the SAN is completely new, it might involve a long migration process consisting of moving one production system at a time. In any case, there might be a need to cycle between this phase and the release-to-production phase repeatedly. Once a component has completed the transition onto the SAN, release to production can occur for that component.

Release to Production

Once a component has been transitioned onto the new SAN, it must be tested again and then approved before becoming a part of the enterprise's production environment. Since there might be many components that must be transitioned and released, it might be necessary to cycle between the transition and release-to-production phases repeatedly until all components have entered production. After this phase is complete, the SAN will enter the maintenance phase.

Maintenance

This is the useful life of the SAN. All of the benefits that prompted the SAN designer to implement the SAN in the first place are found in this phase. It is therefore desirable to have a SAN spend as much time as possible in this phase, and as little as possible in the other phases. The goal of this phase is to keep the SAN running as well as possible for as much of the time as possible, and to expand its capabilities only according to the original, tested, and approved parameters. This phase includes adding, changing, or removing components, as well as managing, monitoring, and troubleshooting existing components.

During the maintenance phase, no changes should be made to the SAN that fall outside of the original blueprint that was established in the previous phases. Any such change necessitates a repetition of the entire lifecycle. For example, if the SAN were originally built using vendor X storage arrays, an additional vendor X array could be added as part of maintenance, but an array from vendor Y would require thought and testing before its introduction. It might not require much thought and testing, but it must, in any case, be looked into.

Note: Any fundamental change to the SAN requires a repetition of the entire lifecycle.

In summary, the seven phases of the SAN design lifecycle are:

1. Data Collection
2. Data Analysis
3. Architecture Development
4. Prototype and Test
5. Transition
6. Release to Production
7.Maintenance

Conducting Data Collection

The data collection phase of SAN design is the foundation upon which the SAN will be built. It is vital that the information collected in this phase be both complete and accurate. If the SAN requirements are poorly defined, it is guaranteed that the resulting SAN will meet business objectives poorly. You should therefore take your time with this phase.

Some of the information you will collect is generic to any major IT project. If you already have an established data collection process in your company, integrate the SAN-specific material from this section into that process.

Data collection consists of determining which people you will need to interview, interviewing them, and conducting a physical assessment of existing equipment and facilities. When this process is complete, you will have a technical requirements document consisting of a list of the business problems that the SAN will solve, the business requirements for the SAN, characteristics of all devices that will be attached to it, and detailed information about all relevant facilities. You will also have a timeline for implementation.

Creating an Interview Plan

Who has a stake in the SAN solution? Well, you could argue that every person who uses a system attached to the SAN has a stake in it. While true, this is not useful for creating an interview list, because there would be too many people involved. Similarly, you could argue that only the person who initiated and "owns" the project should be consulted. Again, this is not useful, because it leaves out people who have a strong interest in the project, and might have knowledge that is critical to its success.

A balanced approach to creating an interview list is critical. You can view the people on this list as a SAN solution "core team." Think about having all of these people together in a room, and trying to solve the SAN solution problem together. Try to include everyone needed to solve the problem, but nobody else. Typically, a core team might include:

  • At least one systems administrator


  • At least one storage administrator


  • A network administrator


  • A DBA, if a database server will be involved


  • At least one application specialist associated with each application that will run on the SAN


  • At least one manager who can act as an overall "owner" of the project


  • It is probable that you will be one of these people, in addition to being the SAN designer. Unless you are an external consultant, this is typically the case.

    Once you have a list of the desired members of the core team, you must contact them and ask them to take time to help with the project. Ensure that each team member has allocated the necessary time and that their management appreciates the demands of participating in this team. As the SAN design goal of the team might require a long-term process, getting this buy-in initially will minimize disruption to the team later. Often in the past, SAN design teams did not include network administrators, as the focus was on the storage side. Experience has shown that SANs are networks, and should be coordinated with the traditional IP network groups to ensure that proper networking experience is at hand.

    Whenever possible, schedule an interview as a face-to-face, one-on-one meeting. This format will allow you to communicate the questions and understand the answers most effectively. You should also have a group meeting with the entire core team after conducting individual interviews. This will allow you to resolve any differences before analyzing the data, and review the analysis as a team.

    Conducting the Interviews

    Now that you know who to interview and have a schedule of when you will interview them, you need to know what questions to ask, and what format to put the collected data into. This section contains a suggested set of questions that you should ask, and some detail on what each question is about. It is followed with a summary that could be used to create an interview form.

    Note: Not every person you interview will be able to answer every question. Between the members of the core team, the expertise necessary to answer all of these questions should be completely represented. Some members might provide conflicting answers. You will be in a key position to resolve these differences, and achieve a compromise. It is vital that all affected parties agree with the deployment strategy before implementation begins.

    What Overall Business Problem Are You Trying to Solve?

    A business problem that would initiate a SAN design might be something like:

  • "We need to keep our business running in case of a disaster like an earthquake or fire."


  • "Our backups take so long to finish that they are impacting our ability to process customer orders."


  • "We need to save money on storage by utilizing free space more efficiently."


  • Chapter 6 discusses some of the more common business problems that SANs can solve. Brocade maintains a series of documents that detail specific SAN solutions. These documents are known as Brocade SOLUTIONware configuration guidelines and are available on the Brocade Web site at www.brocade.com/SAN.

    Note: A SAN might be intended to solve multiple business problems. In this case, you should separate each business problem into a different set of questions and answers. You will correlate these during the analysis phase.

    What Are the Business Requirements of the Solution?

    Once you know the business problem that you need to solve, it should be easy to figure out what the business requirements of the solution must be. This is simply a matter of rephrasing the previous answers, with more specific criteria:

  • "The SAN must allow all functionality of all business-critical servers at site X to resume within Y minutes at site Z."


  • "The SAN must allow the following list of servers to complete backups within X minutes: "


  • "The SAN must allow the following list of servers access to the corresponding list of storage arrays: "


  • This is useful because it acts as a migratory step toward turning the business problem into a matching technical solution.

    Click here to buy book

    Building SANs with Brocade Fabric Switches

    Authors
    Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.


    By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo

    Moving from Business Requirements to Technical Requirements

    You should not deploy a SAN simply for the sake of adopting the "hot new technology." SANs are hot because they solve important business problems and allow companies to make more money. This could be fairly directfor example, a matter of saving more money on IT than the project cost, since SANs are very efficient at providing a clear ROI. ROI is often achieved by management efficiencies, resource efficiencies, or better utilization of resources. On the other hand, it could be indirectby making IT systems more efficient, thus increasing users' productivity.

    The first key to a successful SAN deployment is the accurate and complete statement of what business problem(s) you intend for the SAN to solve. Unfortunately, you cannot turn a business problem into a technical solution without work. There is no silver bullet to make your backups run faster so that your users will not have to work on a slow system. However, there are tape libraries that run fast, and can be shared by many devices. This, when combined with an appropriate Fibre Channel fabric, and a SAN-enabled backup application, could amount to the same thing as the silver bullet.

    In order to know which hardware and software will solve your business problem, you have to define in a technical way what you need to accomplish. This is a necessary intermediate step between the business problem and the purchase of specific technical solutions.

    It is fairly straightforward to change a sentence like, "We need to keep our business running in case of a disaster like an earthquake or fire" into a sentence like, "The SAN must allow all functionality of all business-critical servers at site X to resume within Y minutes at site Z." Once you have done this, you will have the business requirements of the solution. You know that you have a business requirements statement when you could phrase it like this, and still have it make sense: "Our business will run better if we have a SAN that can allow all functionality of all business-critical servers at site X to resume within Y minutes at site Z." The components of the business requirements statement are "our business will run better" (or something to that effect) followed by a reasonably specific statement about what the SAN must do to make that happen.

    However, you will still not have the technical requirements detailed. This is not something that you, the SAN designer, can simply ask in an interview. This is a large part of what you will bring to the table as the SAN designer once you have gathered the data and then analyzed it in the next phase. A technical requirements document set should list, in detail:

  • All of the devices that are to be attached to the SAN


  • Their locations


  • The communication patterns between them (random I/O, streaming access such as video, I/O-intensive database access)


  • Their performance characteristics (reads, writes, max/min/typical throughputs)


  • What software will run on them relative to the SAN (for example, a LAN-free backup application, or anything SAN-specific)


  • How all of this is expected to change over time (storage growth, server growth)


  • The technical requirement statement would be, "The SAN needed to meet the business requirements outlined must have the following characteristics: " This would be followed by the body of the technical requirements document. The rest of the questions to ask in the interview process will provide you with the body of this document.

    What Is Known about the Nodes that Will Attach to the SAN?

    You should try to get a list of all information possible about every node attached to the SAN. For each node, the relevant information can include questions about each host, storage device, facilities where hosts and storage will be located, and questions about the SAN itself. Questions about each host could include the following:

  • What operating system is installed? What patch or service pack level?


  • Are fabric HBA/controller drivers available? Are they well tested?


  • What type of connection is supported (private loop, public loop, or fabric)?


  • Which applications will run on this host (databases, e-mail, data replication, file sharing)?


  • How much storage does it require?


  • How will its storage requirements change over time?


  • Physically, what are its dimensions? How heavy is it?


  • Does it rack mount? Does it have a rack kit? Will it set on a shelf?


  • If there is a management console, what type is it? (Is it a traditional keyboard/video/mouse combo [KVM], or is it a serial connection, like a TTY?) Does it need to be permanently attached? (For example, a Sun SPARC server could have a keyboard, mouse, and monitor permanently attached, or it could be managed through a serial port attached to a modem.)


  • How many HBAs will it have?


  • If it has more than one HBA, what software will be used to provide failover or performance enhancements of multiple paths?


  • Do these interfaces exist, or do they need to be purchased? (You should keep track of every piece of equipment that you need to buy for the project, for budgeting and ROI analysis.)


  • If they exist, what are the make, model, and version information?


  • If not, what kind will be purchased to meet the objective?


  • How many Ethernet interfaces will it have?


  • In what temperature range will it operate?


  • Will it need a telephone line for management?


  • Where will the node be physically located?


  • These questions could be used to create an interview form for each host, which might look like the following:

    Questions about each storage device could include the following:

  • What are the make, model, and version information?


  • What type of connection is supported (private loop, public loop, fabric, SCSI, SSA)?


  • How many hosts can this device serve?


  • If it is a multiport device, does it have limits on how many hosts can access it through each port?


  • Physically, what are its dimensions? How heavy is it?


  • What is its capacity in gigabytes?


  • Does it rack mount? Does it have a rack kit? Will it sit on a shelf?


  • If there is a management console, what type is it? Does it need to be permanently attached?


  • How many Fibre Channel interfaces will it have?


  • Do these interfaces exist, or do they need to be purchased?


  • If they exist, what are the make and model? If not, what kind will be purchased?


  • How many Ethernet interfaces will it have?


  • In what temperature range will it operate?


  • Note: Obviously, some of these questions do not relate directly to the SAN deployment. However, they are generally relevant whenever making a large architectural change in a data center. For example, it is necessary to know what temperature a server can operate at in case the server is in a location where temperature control is an issue. In this case, adding a large number of switches might increase the room temperature beyond operating levels. As always, use your judgement about which questions to include in your interview, and which to skip over.

  • Will it need a telephone line for management?


  • Where will the node be physically located?


  • What is the firmware level?


  • For tape libraries, what is the capacity of each cartridge, number of cartridges the library can hold, number and speed of drives, and number of transports?


  • SCSI or Fibre Channel interface? What type of SCSI (wide/narrow, differential/single ended)?


  • Note: While it is possible to manage an entire fabric through a single Ethernet connection, this is not the method that Brocade currently recommends. You should plan on one Ethernet connection per Brocade switch, in addition to planning connections for hosts and other SAN devices. It is also advisable for the highest availability plan to balance switches across multiple electrical circuits, even if an Uninterruptible Power Supply (UPS) protects them.

    Questions about facilities where hosts and storage will be located could include the following:

  • Who is responsible for this facility?


  • Are there any existing optical cables, and what type?


  • Is there sufficient electrical power?


  • What about cooling?


  • Is there enough rack space?


  • What is the network infrastructure?


  • Physical access? If the location is on an upper floor, is there a freight elevator?


  • Answers to questions about the SAN itself must be considered preliminary. They will indicate preconceptions that members of the core team have, but all members should be prepared to be flexible on these preconceptions as the SAN design process progresses. Questions about the SAN itself could include the following:

  • Are there any distance considerations? (For example, long cable runs between floors of a building, campuswide networks, or MAN/WAN connections.)


  • How many hosts will attach to the SAN?


  • How many storage devices will attach to the SAN?


  • If known at this point, do they require any-to-any connectivity? Alternately, are there groups of devices that need to communicate only among themselves?


  • Click here to buy book

    Building SANs with Brocade Fabric Switches

    Authors
    Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.


    By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo

    Moving from Business Requirements to Technical Requirements

    Which SAN-Enabled Applications Do You Have in Mind?

    Will the SAN use a serverless backup application? How about clustering software? How about volume management? This category of software requires special attention because of its close ties to the SAN hardware you choose to build the solution. For example, if you plan to use vendor X serverless backup software, you must make sure that your backup hardware (tape libraries, Fibre Channel/SCSI gateways, etc.) is supported.

    Which Components of the Solution Already Exist?

    Any hardware or software that is already in place and that must be included in the solution will create points for you to build around. You must find out as many details as possible about everything in this category. When you are finished with the interviews, and conduct the physical assessment, you should personally inspect every piece of hardware. This will prevent surprises later in the process. Make sure that you find out exactly where all hardware is located, and how to access it.

    You must pay special attention to devices that already exist and already have Fibre Channel interfaces. Find out which kinds of HBAs are installed in hosts, and which driver revisions are installed on them. Find out code levels for RAID arrays and Fibre Channel tape libraries. Find out if upgrades to driver/code levels are planned or at least allowed.

    Note: You must know if each device is public loop, private loop, or full fabric. Some devices might even be SCSI and require additional hardware to bridge between SCSI and Fibre Channel.

    If possible, you should not use private loop drivers on initiators unless the device does not support fabric drivers or is not easy to upgrade. Private loop hosts require special licenses, typically Brocade QuickLoop and Zoning. Find out if the existing devices are configured as full-fabric devices. If not, find out if their drivers support full fabric, or if they can be upgraded to full fabric. This is not intended to discourage incorporation of private loop devices into a fabric: QuickLoop and Fabric Assist exist specifically to enable this to occur. However, if a device can support full fabric, then integration into the SAN will be easier if it does so.

    Which Components Are Already in Production?

    Components that are in production require special attention in two areas:

  • Duplicate equipment might be desired for testing.


  • The transition phase is more complex.


  • It is vital to know as much as possible about production systems that are going to transition onto the SAN. Therefore, somebody intimately familiar with and responsible for every such system should be included on the core team.

    Which Elements of the Solution Need to Be Prototyped and Tested?

    For relatively simple solutions that involve only components already certified to work together, it might be that you do not have to do any testing at all. For example, if you are implementing a SAN-based solution on a Brocade SOLUTIONware document, you might feel that you need only to do minimal validation. This is opposed to a solution where no documentation or testing information exists, which generally requires extensive validation.

    For more complex solutions involving a large number of devices that might be from many different vendors, you might feel that every single element needs to be tested in combination before release to production can occur. You should get input on this from every member of the core team. If any team member feels that you should conduct inhouse testing on a component, you should strongly consider doing so.

    What Equipment Will Be Available for Testing?

    Any existing equipment that is not in production, and any equipment that is going to be purchased specifically for this project might be good material with which to build a test bed. Existing equipment that is in production is not good to test with. If existing equipment already in production will be transitioned onto the SAN, it might be beneficial to budget for a representative sample of duplicate, nonproduction systems with which to prototype the solution. It is generally a good idea to have such systems available for testing in any case. It may also be possible to borrow systems to test with. In any case, it's probably worth asking your vendors for such loans.

    Whether or not test equipment is available, you should research what testing third-party vendors or third-party organizations have already done. In this way, you will avoid duplicating their efforts. If you cannot get representative test equipment for an element that needs to be prototyped, it might be acceptableand necessaryto rely entirely upon the work done by others to validate the solution.

    Again, with many solutions, this is a perfectly acceptable way to go. If you do not feel that inhouse testing is warranted, then you can save time and money by skipping the prototype and test phase. Just make sure that you have documentation certifying the solution before you make this decision.

    How and When Are Backups to Be Done?

    You need to get a list of everything that relates to the system's backups:

  • What backup hardware will be used?


  • What backup software will be used for each host?


  • Which storage arrays will be backed up by which tape libraries?


  • When will these backups occur?


  • How long can they take?


  • How much data needs to be backed up?


  • Will snapshots be used? How do they work?


  • Will split mirrors be used? How do they work?


  • What Will Be the Traffic Patterns in the Solution?

    You should produce a matrix showing every initiator-to-target communication expected in the SAN. This is necessary to determine performance characteristics, and to set up zoning on the fabric:

  • Which hosts will use a specific storage array?


  • Which hosts in a cluster will talk directly to each other over the SAN?


  • Which backup devices will be performing serverless backups?


  • Which arrays will they be backing up?


  • Create a table listing every device on the SAN that can act as an initiator in one column. This will include every host, every storage virtualization product, and every serverless backup server. It might include storage arrays, if they have data replication capabilities. Then put a second column next to it with all of the targets that each initiator will communicate with (Table 5.1).

    Table 5.1 Initiator-to-Target Mapping

    SAN Traffic Patterns

     

    Initiators

    Targets

    host1

    array3

    host2

    array1

    array2

    tape1

    host3

    array1

    host4

    array1

    array2

    tape1

    array1

    array3

    array3

    array4

    array3

    array4

    Note: that some devices on a SAN can act as both an initiator and a target. If so, they will appear in both columns. See array3 and array4 in Table 5.1. This is how you would indicate that array3 and array4 perform data replication between them.

    You will not necessarily be able to build this table by interviewing one person; it will likely be developed over the course of the interview process, changed as the implementation takes place, and maintained for the life of the SAN.

    What Do We Know about Current Performance Characteristics?

    Any devices that currently exist, and will be transitioned onto the SAN, are candidates for empirical performance testing.

    Create a second set of columns next to the traffic pattern columns, as shown in Table 5.2. You will need entries for peak utilization and sustained utilization. Obviously, you will only be able to enter current data for initiators that already exist, and already communicate with the same targets they will talk to after the SAN is complete.

    Table 5.2 Current Traffic

    SAN Traffic Patterns

    Current Peak

    Current Sustain

    Initiators

    Targets

    MB/sec

    MB/sec

    host1

    array3

    10

    5

    host2

    array1

    array2

    tape1

     

     

    host3

    array1

    50

    10

    host4

    array1

    array2

     

     

    tape1

    array1

    array3

     

     

    array3

    array4

     

     

    array4

    array3

     

     

    In this example, host1 and host3 already exist, and are already talking to array3 and array1, respectively. All of the other devices are to be added, are not talking to the same targets that they will be after the SAN is up, or performance data might simply be unavailable.

    If the owner of a system has already done this kind of analysis, you will simply transfer the numbers to your table. If not, you should work with the owner to get the performance information, as this might have a substantial impact on your SAN design.

    Click here to buy book

    Building SANs with Brocade Fabric Switches

    Authors
    Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.


    By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo

    Gathering Performance Data

    On almost any kind of system, some facility exists for measuring performance. More often than not, there will be multiple options for gathering disk I/O performance information.

    For example, on a Windows NT system, you might use the diskmon feature. You have to install this from the Windows NT Resource Kit. If you do not install diskmon, standard Windows perfmon will not have a disk monitoring tool. Alternately, you could install a package like Intel's Iometer, and use that to generate a simulated load and measure performance. This tool is presently available as a free download from Intel's Web site.

    Under Sun's Solaris operating system, performance can be measured using the iostat utility, the GUI utility perfmeter, or one of a number of third-party utilities like Extreme SCSI. There are similar tools in every UNIX variant. We are providing examples for Solaris only, since the details of these commands will vary between every flavor of UNIX, and providing examples for every variant is impractical. Refer to the man pages for your particular version of UNIX for the exact syntax. There are also a number of options for generating loads under Solaris, ranging from the dd command, toagaina utility like Extreme SCSI.

    Note: Tools like Iometer, dd, and Extreme SCSI should be used with care. It is tempting to use them to generate maximum load. A more useful test to run is to generate a representative load. Try to determine what your application will actually be doing in terms of read/write ratio, and total bandwidth consumption, and use these tools to generate that kind of load on the system.

    In cases where performance data cannot be collected empiricallysuch as when the system in question does not exist yetthere is still hope. Most hosts are not capable of generating sustained load at full wire speed. They are generally going to be limited by other factors. These could include:

  • CPU speed Although Fibre Channel has much lower overhead than the TCP/IP stack, it still takes a fast processor to get near to full performance on a 1 Gbit/sec Fibre Channel link, simply because the processor will be busy running whatever task is actually generating the I/O. While almost all hosts now shipping have sufficiently fast CPUs, you also need to estimate how much of that CPU resource is taken up by other tasks the host is performing that do not result in disk I/O (such as running a TCP/IP stack). Moreover, many data centers have older CPU servers that might not be capable of running at 1 Gbit/sec even without taking these tasks into consideration.


  • PCI bus speed Fibre Channel full duplex is 200 MB/sec. A 32-bit 33 MHz PCI bus can only sustain about 120 MB/sec. A 64-bit 33 MHz or 32-bit 66 MHz PCI bus can handle about 240 MB/sec, and a 64-bit 66 MHz bus can handle about 480 MB/sec. Even on the higher rate buses, you must bear in mind that it is a shared bus. If you put two Fibre Channel HBAs onto a bus that can handle 240 MB/sec, that will be the total possible full-duplex speed for both HBAs. Therefore, you would on average get 120 MB/sec out of each interface. For example, this couldin a balanced read/write environmentmean that you get only 60 MB/sec of read performance out of each card. Also bear in mind that there may be other cards on the bus taking up some of that bandwidth.


  • HBA speed Although designed to work on a 1 Gbit/sec SAN, many HBAs cannot achieve or at least cannot sustain full 1 Gbit/sec transfers. Newer HBAs typically have better performance. Older HBAs might only be able to achieve 60 MB/sec, regardless of the other possible issues.


  • RAID controller speed Many RAID controllers cannot sustain 100 MB/sec per interface on all interfaces simultaneously. Some barely operate at 30 MB/sec per interface, which is more than acceptable for many applications! Finding out the limits of your RAID array should be as simple as calling the vendor's support channel. Of course, you might also check third-party testing results such as those done by many industry magazines for an unbiased opinion.


  • RAM quantity and speed If your system is short on RAM, it might spend a lot of time paging. If it does, performance will be substantially degraded.


  • Disk seek time If your application does a lot of random I/O, the disk heads will have to jump all over the platform. Since disk seek time is an order of magnitude or more slower than a Fibre Channel link, you might have to allocate substantially less bandwidth for random I/O applications like a file server than for sequential I/O applications like a video server or decision support system.


  • Application overhead This ties into the CPU-limit factor. How much CPU do you have, and how much of it is free for handling I/O?


  • Write speed of tape device Most tape drives cannot come anywhere near 100 MB/sec. It is usually sufficient to ask a vendor for performance data in the case of tape drives, although optimistic compression ratios can inflate the performance numbers they provide.


  • In addition, if anything is known about the application that is running on the host, you might be able to make a good guess about how much load it will even try to place on the disk subsystem. For example, if you know that the host is an intranet Web server, and that it receives only 500 hits a day, you can safely guess that its I/O requirements will be minimal.

    Once you have collected your best empirical or estimated numbers for each factor, use the lowest common denominator approach to estimate the maximum bandwidth that the system could need. You can guarantee that the overall system will not outperform its weakest link.

    Also note that on systems with multiple HBAs, I/O load might be distributed across these HBAs. Achieving active-active distribution across HBAs might require third-party applications like the VERITAS Dynamic Multipathing software, Troika's HBA driver, or one of the storage vendor's dual-path products. If this is the case, you might estimate that each HBA will usually have a fraction of the total load. In a dual-fabric, active/active HBA architecture, each HBA normally has 50 percent of the total load. If a system is capable of sustaining 70 MB/sec, then each HBA will sustain 35 MB/sec. Note that this might change during system maintenance if you shut down one path, and the remaining path could then take on the full 70 MB/sec, so the design should incorporate the worst-case scenario. It is usually also good practice to add some padding to the top of this estimate (perhaps 10 percent) to allow for the unexpected.

    Note: Unlike physical-disk counter data, logical-disk counter data is not collected by the NT operating system by default. To obtain performance counter data for logical drives or storage volumes, you must type diskperf -yv at the command prompt. This will cause the disk performance statistics driver used for collecting disk performance data to report data for logical drives or storage volumes. By default, the NT operating system uses the diskperf -yd command to obtain only physical drive data. For more information about using the diskperf command, type diskperf -? at the command prompt.

    What Do We Know about Future Performance Characteristics?

    Performance numbers change over time. Consider a customer database for a catalog retail company. Perhaps you will install the SAN in February, because this is your slow month of the year, and you can get the necessary downtime. You might know that the database host will start talking to its storage array(s) at a sustained rate of 5 MB/sec during the business day, with a peak of only 10 MB/sec. However, when the Christmas season comes along and your business picks up, you might move to a 50 MB/sec sustained rate, peaking at 70 MB/sec. Because of the potential for substantial changes in performance requirements over time, it is essential to plan for both current and projected performance. Most of this might be educated guesswork, since many of the systems you are going to deploy might not yet exist.

    Again, you will need to come up with numbers for both sustained traffic and peak traffic for each communication. Also try to determine what days/times peak performance will occur. This will be added to your table (Table 5.3).

    Table 5.3 Adding Traffic Projections

    SAN Traffic Performance

    SAN Peak Peak Times

    SAN Sustained Patterns

    Performance

    Initiators

    Targets

    Initial

    Expected

    Initial

    Expected

    Initial

    Expected

    host1

    array3

    10

    10

    5

    5

    M.F

    same

     

     

     

     

     

     

    8a-5p

     

    host2

    array1

    array2

    tape1

    0

    0

    20

    70

    70

    20

    0

    0

    0

    50

    50

    0

     

     

    host3

    array1

    50

    50

    10

    20

    M.F

    8a-5p

    + Sa

    10a-4p

    host4

    array1

    array2

    0

    0

    90

    90

    0

    0

    50

    50

     

     

    tape1

    array1

    array3

    0

    0

    20

    20

    0

    0

    0

    0

    Sa

    5p-9p

    Sa 9p-11p

    Same

    same

    array3

    array4

    10

    30

    5

    5

     

     

    array4

    array3

    5

    5

    0

    0

     

     

    Again, you can only enter data for systems about which you can make an educated guess. If you know about what the peak traffic could be based only on the limitations of a system, you might not have any way of guessing when this would occur. You should also enter projected data for systems that you know that you will add later.

    In Table 5.3, host2 and the application it is running might not exist yet, so every piece of data about that system is pure guesswork. Let us say that host2 is a Return Merchandise Authorization (RMA) system, and your rapidly growing company has never had an RMA system before. You might not be able to reliably guess when customers are going to call in with RMA requests most often, or even how many RMAs you are going to get in a given day. The best you can do is determine what performance the hardware and software you are installing could reasonably run at, and design the SAN to support it all the time it could be in use. While this approach might result in over-engineering your network, this is better than the alternative. During future design phases, you can alter the SAN design to adjust or scale back the design accordingly, as well as incorporate other additions and changes.

    For backup devices, peak usage will always correspond with your backup schedule. This will usually not correspond with peak usage of the rest of the system. This is particularly useful knowledge when planning an ISL architecture, because you can often count on having low nonbackup-related utilization of ISLs during backup windows. An obvious exception to this is a SAN that is used solely for performing LAN-free backups.

    Click here to buy book

    Building SANs with Brocade Fabric Switches

    Authors
    Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.


    By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo

    How Much Downtime Is Acceptable to Production Components During Implementation?

    It will likely be necessary to shut down some existing production devices during implementation, to ensure a safe transition onto the SAN. For example, you might have to shut down a host to install an HBA. Determine how much downtime is acceptable for each host, and at what times this can occur. Generally, you should try to schedule more downtime than you think you need to ensure that any unforeseen issues that arise during the implementation can be handled within the downtime window.

    How Much Downtime Is Acceptable for Routine Maintenance? How Much Downtime Is Acceptable for Upgrades and Architectural Changes?

    These two questions are intimately related, becauseto an end userthere is really no difference between downtime to a production system for maintenance, and downtime for an upgrade. Once systems are in production, you will want to keep them running as much as possible.

    Many upgrades can be accomplished with zero downtime by using a double- or triple-redundant fabric architecture. No matter how well you plan the upgrade and maintenance processes beforehand, you will need to shut down specific hosts on occasion. For example, you might want to upgrade an HBA driver, which would typically require a reboot.

    Note: Wherever possible, a redundant fabric architecture should be used. This will ensure the best performance and reliability, and will simplify maintenance tasks. In a redundant fabric architecture, every host has at least two paths to every storage device it connects to, and these paths traverse two completely unconnected fabrics. While it might appear on the surface to be more expensive, if hosts are to be dual-attached anyway, it is actually less expensive to attach them to two separate fabrics than to use one larger fabric, or a director-class switch. This does not even include the downtime ROI calculation, which, in high-availability environments, will usually overshadow the entire cost of the SAN. More details about redundant and resilient fabrics are provided in Chapter 7.

    You should therefore determine in advance when you will be able to schedule downtime for every host and storage array, and for the fabric itself. You might not have to use every scheduled outage, but having them available to you when you do need them is essential.

    One way to do this is to make a list of applications and services provided by the hosts on the SAN, and determine an owner for each. Take your list of SAN devices and map these devices to the applications and services they affect. This will provide a mapping of application/service owners, who are typically responsible for scheduling downtime, to devices that typically require downtime. Have each owner approve the downtime calendar for each device that affects his or her service.

    The mapping of owners to devices should be kept up to date as changes in personnel, applications, and/or SAN infrastructure occur.

    When Do You Need Each Piece of the Solution to Be Complete?

    Once you have a table detailing which of the initiators communicate with which targets, you can begin to create a timeline for the project. Other members of the core team will tell you something like, "the customer database application must be online by mid-June." It is your task to define which SAN components you need to accomplish this, and to develop a timeline for adding these components that meet their requirements.

    This is a high-level list of some of the questions that should appear on a SAN design interview form:

  • What overall business problem are you trying to solve?


  • >What are the business requirements of the solution?


  • What is known about the nodes that will attach to the SAN?


  • Which SAN-enabled application do you have in mind?


  • Which components of the solution already exist?


  • Which components are already in production?


  • Which elements of the solution need to be prototyped and tested?


  • What equipment will be available for testing?


  • How and when are backups to be done?


  • What will the traffic patterns in the solution be?


  • What do we know about current performance characteristics?


  • What do we know about future performance characteristics?


  • How much downtime is acceptable to production components during implementation?


  • How much downtime is acceptable for routine maintenance?


  • How much downtime is acceptable for upgrades and architectural changes?


  • >When do you need each piece of the solution to be complete?


  • Conduct a Physical Assessment

    You should now have the location of every piece of hardware that currently exists. In addition, you should know where each piece of hardware in the eventual SAN will be located.

    Look at each piece of hardware. Make sure that it does exist, and has all necessary pieces to function. This could include things like power cords, keyboard, mouse, monitor, Ethernet card, Ethernet cable, HBAs, and Fibre Channel cables. Note the physical dimensions of the hardware, and its power/cooling requirements. Does it rack mount? Does it have a network interface? How many Fibre Channel interfaces does it have? How much does it weigh? You should already have this information from the interview process, but you should verify that the information you were given is correct.

    Go to each location where SAN equipment or nodes will be installed, and again check to see that your information was correct. Notice how the equipment will fit into the space available. Notice how the equipment will enter the building. You should also have a meeting with the person in charge of the facility to discuss power, cooling, and equipment locations.

    Click here to buy book

    Building SANs with Brocade Fabric Switches

    Authors
    Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.


    By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo

    Analyzing the Collected Data

    Now that you have collected information from all key stakeholders in the project, and verified the accuracy of this information, you will analyze it to determine the characteristics of the required solution. When you have completed this process, you will have a list of technical requirements, and an ROI analysis to justify the project.

    Processing What You Have Collected

    You have a matrix detailing communication between nodes. Attempt to group the nodes by communication patterns. The purpose of this is to determine the amount of known locality in the SAN. Locality of reference is a concept prevalent in many areas of computer science, from disk drive construction to LAN design. Locality is important in SAN design because if you can localize traffic into specific areas of a SAN, you directly improve the SAN's performance and reliability. This will allow a more cost-effective SAN design as well, preventing over-designing the network to handle nonexistent cross traffic. Locality is discussed in greater detail in Chapter 7.

    A SAN with a great deal of known locality might be constructed out of many separate fabrics, with no ISLs whatsoever. A SAN with little or no known locality might require a high-performance ISL architecture (Table 5.4).

    Table 5.4 Initiatorto-Target Mapping for Locality Example

    SAN Traffic Patterns

    Initiators

    Targets

    host1

    array3

    host2

    array1

    array2

    tape1

    host3

    array1

    host4

    array1

    array2

    tape1

    array1

    array3

    array3

    array4

    array4

    array3

    In Table 5.4, array3 would be grouped with host1, tape1, and array4. None of those devices will need to communicate with any of the other devices. They could be grouped onto a single switch, or even put onto a totally separate fabric. You might find it helpful to do the grouping in a diagram. For another example, look at Figure 5.2.

    Figure 5.2 SAN Diagram without Grouping

    Nothing is known about the communication patterns in this SAN. Consequently, there is no way to optimize ISLs for performance. After grouping the initiators with their targets, the SAN diagram could look something like Figure 5.3. If you look carefully, you will notice that there are only 12 connections into this SAN. If there are fewer connections than there are ports in your switches, you do not really need to go through the grouping exercise because localization of traffic will happen automatically. It is only useful if you will be using ISLs; however, as most systems scale well past the size of the largest switches available, it will be a frequent exercise. For the purposes of making the examples more readable, we will just assume that they are all dealing with a subset of the devices that the SAN will support.

    Figure 5.3 SAN Diagram with Simple Grouping

    Making a diagram such as this will allow you to see at a glance what the communication patterns for your SAN are.

    This example is simplistic, and in large SANs, there will likely be conflicts. When you cannot effectively group all of the communication patterns, you should focus on grouping faster performing devices. For example, if you find that the bulk of traffic will be between host1, array3, and array4, these could be grouped separately from tape1 and host2 if necessary. This could happen if you find that there are so many interrelationships that you end up with very many devices, but very few very large groups. The grouping technique does not help for performance if you only have one big group. It could also happen if you have a few devices that are shared by a great many devices, such as a large RAID array in a storage consolidation solution.

    Another way to combat this "group growth" problem is to account for multiple interfaces on storage arrays. Let us say that you have a redundant fabric architecture. Your RAID array has eight interfaces, and each host will access only two of themone interface on each fabric. List each interface on the array separately in your traffic pattern table. Then, you associate servers or groups of servers with specific interfaces. With the array listed as a single entity, a diagram of the communication could look something like Figure 5.4.

    Figure 5.4 SAN Grouping Diagram with Single-Entity Arrays

    If, however, you separate the interfaces, your diagram could look more like Figure 5.5.

    Figure 5.5 SAN Grouping Diagram with Separated Interfaces

    You can indicate that a device crosses groups but does not need much in the way of performance by varying the line color, weight, or pattern. Figure 5.6 shows that the tape robot crosses all groups, but does not need much bandwidth.

    Figure 5.6 SAN Grouping Diagram with Tape Robot Addition

    If you are able to make relatively small performance groups, your SAN will benefit greatly from applying the principal of locality. For now, you simply need to be able to determine the category of architecture you will require: one that has lots of known locality (has well-defined performance groups), or one that does not. This will affect how many switch ports you need to allot for ISLs. If traffic is localized within an area of the SAN, it will obviously not need to make use of ISLs leaving that area. In this case, you will be able to get superior performance even with far fewer ISLs, resulting in more ports available for servers and storage.

    Click here to buy book

    Building SANs with Brocade Fabric Switches

    Authors
    Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.


    By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo

    Establishing Port Requirements

    Now you will determine how many switch ports you will need to purchase. (This is a general estimate for calculating ROI; it might be a bit more or less than your final estimate.)

    Take the ports you found out about during the interview process. Make sure that you account for all ports on each node. Some RAID arrays have many ports, and many hosts have at least two HBAs. Add up these ports to get the total number of exposed ports your SAN will require. You will then divide this by the number of different fabrics you will be using. If you have dual-redundant fabrics, you will divide by two. If you have triple-redundant fabrics, divide by three, and so on. This will give you the number of required exposed ports per fabric. The number of "overhead" ports you must allocate for ISLs and for unused ports will depend on several factors:

  • The total number of required ports per fabric.


  • The amount of known locality.


  • Your need to manage all switches as a single entity.


  • The physical layout of your SANany MAN/WAN connections, or intra-building campus connections, or intra-floor building connectionsmight dictate use of additional ISLs and less than perfect utilization of the ports on each switch.


  • Your applications' expected performance characteristics.


  • The rate of expected growth in port count of the fabric.


  • Your maintenance policies regarding port usages on network devices. For example, you might require that a certain number of ports be left available for expansion or troubleshooting during the course of normal operation.


  • Simple Case

    If the number of required exposed ports is less than the number of ports on a single switch, you will generally need zero ports for ISLs. In this case, you will require one switch per fabric. However, as larger switches utilize more hardware internally to connect the higher number of user ports, a decision might need to be made between using a larger switch versus utilizing a network of smaller ones. The appropriate decision will depend on performance requirements, budget, and design factors. In addition, if you have made small performance groups that have no components in common, you might be able to localize traffic 100 percent, and require no ISLs. You would have many small, unconnected SAN islands if you follow this approach. One reason not to use isolated islands is that requirements change. Someday you might need access between islands at a moment's notice. A robust architecture can achieve your immediate connectivity requirements, and give you the flexibility to handle change as well.

    You will require each fabric to be a network if this is not the case, or if you wish to design in flexibility to your configuration. You will have to reserve port count for these. Simple case requirements include the following:

  • Fewer ports required than exist on a single switch, or


  • Each performance group is well defined and smaller than the number of ports on a single switch.


  • Future requirements for growth and change are minimal.


  • Assume that you have two 16-port arrays (32 storage ports total), 10 dual-HBA servers (20 ports), and two single-port tape libraries (two ports). Your total port count is 54. However, assume further that you are using a dual-redundant SAN architecture. Your port count per fabric is 27. You are building the fabric out of 16-port switches. It is possible that some ISLs are required. You will need to determine how many are needed.

    Variant A

    With a relatively small fabric like this and relatively high locality, you can assume that you will have about 14 free ports per switch. Two switches with two ISLs between them will yield 28 ports per fabric. You are using a dual-redundant architecture, so there will be two fabrics, for a total of four switches. Your grouping diagram will look like Figure 5.7.


    Figure 5.7 Determining ISL Requirements for Variant A

    This grouping would result in an actual implementation resembling Figure 5.8.

    Figure 5.8 Variant A Implementation

    Variant B

    If you decide that you cannot guarantee the localization of traffic for some reason, grouping will not help. Assuming also that you have a requirement for high performance between the switches, you would add two ISLs per switch to the estimate, for a total of about four ISLs per switch. Your architecture might look Figure 5.9.

    Figure 5.9 Adding ISLs for High Performance in Variant B

    The same technique can be applied to any SAN, no matter how complex. In fact, the larger the SAN, the greater the benefits will be from grouping traffic.

    Moderate Case

    If the required exposed port count is about double or triple the per-switch port count, and some locality is known, you will be able to use very few ISLs. In this case, estimate two ISLs per switch. Let us say that you need 26 ports, and you are using 16-port switches. Two ISLs per switch means that you actually get 14 ports per switch. Two switches will give you 28 ports, so you would budget for two switches per fabric, or four switches total.

    Moderate case requirements include the following:

  • No more than three times as many ports are required than are present on a single switch.


  • Performance groups are reasonably well defined. Some locality is known.


  • Future requirements for growth and change are minimal.


  • Note: The low port count/high locality/low ISL count configurations work well for either two or three switches. Two switches would be cascaded together with two ISLs, with 16-port switches yielding 28 ports. Three switches would be connected in a ring, supporting about 40 devices. If you are over that limit, a four-switch full mesh can support about 50 devices. The full-mesh architecture does not scale well beyond that point, and none of these work well if you have performance groups with more than 13 or 14 members. It is feasible to build ring or partial-mesh topology fabrics with higher port counts, but it is generally better to use a core/edge topology for higher port count solutions. These topologies are explained in detail in Chapter 7.>

    Click here to buy book

    Building SANs with Brocade Fabric Switches

    Authors
    Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.


    By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo

    Complex Case

    If you need more ports than one of these configurations will handle, you will need to allocate about four ISLs per switch. You might use fewer than four ISLs on some switches, and perhaps nothing but ISLs will be present on other switches. In the complex case for port count estimates, the intent is to average the ISL requirements.

    Until a detailed architecture is developed, you will have to make general estimates for a few things. If you have any distance requirements, add two ISLs per switch. If you have very high-performance requirements, and very little known locality, add two ISLs per switch.

    Take the estimated number of ISLs per switch (I) and subtract it from the number of ports per switch (PS). Divide the total required ports per fabric (P) by this number and round up. This is the estimated number of switches (S) that you need to budget for. For estimating complex SAN switch counts, S=P/(PS I).

    For example, if you have a need for 30 ports per fabric (P=30), are using 16-port switches (PS=16), and each switch will use about two ISLs (I=2), then the number of switches you estimate needing per fabric is 30/(162). This is 2.14, which rounds up to 3. If you have a single fabric, this is the number of switches you should budget for. If you have a dual-fabric SAN, you should budget for six switches. Complex case requirements include the following:

  • Any number of exposed ports might be required.


  • Performance groups might or might not be defined.


  • Future requirements for growth and change are significant.


  • Preparing an ROI Analysis

    In any business transaction, it is important to understand the economic benefits or the Return On Investment (ROI) that your company will receive. Preparing an ROI analysis for your SAN project will show how your company will not only return the capital investment, but also save additional money as well in time, management, and other efficiencies.

    During the interview process, you made a list of all of the equipment that you would need to purchase. To begin the ROI analysis of your SAN, determine which components are specific to the SAN project. For example, if your company will need to buy additional storage arrays whether or not a SAN is used, these would not be included on the expense side of the analysis. If the SAN is expected to prevent you from having to buy an array, this cost savings would go onto the benefit side of the analysis. You should include any hardware you intend to buy for testing that will not be used elsewhere.

    When accounting for staff time spent on the project, make sure that you only charge the project for time spent beyond what would be spent by not building the SAN. If you are expected to save staff time in the long run, apply this to the benefit side. Your ROI analysis will be a living document, and will be updated as the SAN project develops.

    The Return On Investment Proposition

    Technical justifications for SAN infrastructure deployments can often be made more credible by adding an ROI analysis for the proposed implementation. Follow the guide in the following sections to produce an ROI analysis based on SAN solutions to particular problems.

    Step One: Pick a Theme or Scenario

    Most implementations have a purpose. That purpose could be a server or storage consolidation to improve infrastructure usage and gain economies of scale, ensuring storage and server resources are utilized in the most cost-effective manner. High-availability clustering can improve the availability of mission-critical applications, thus ensuring business continuance and the cost saving associated with it. SAN-based backup deployments improve data integrity by performing backups and restores more efficiently and quickly, again saving in business continuance time and effort.

    Step Two: Identify the Affected Infrastructure Components

    Most SAN deployments will focus on affected servers. Servers can be grouped according to the applications they run or the functional areas they support. Examples of application groupings include Web servers, file and print servers, messaging servers, database servers, and application servers. Functional support servers might include financial and personnel systems or engineering applications. Once the server groups are known, get the characteristics of servers in each group. For example, if your solution fits into a storage consolidation theme, you should consider factors such as:

  • Amount of attached disk storage


  • Storage growth rates


  • Storage space reserved for growth (headroom)


  • Availability requirements


  • Server downtime and an associated downtime cost


  • Server hardware and software costs


  • Maintenance costs


  • The administration effort required to keep the servers up and running


  • Step Three: Identify the SAN-Enabled Benefits

    The scenario approach allows you to focus more closely on the benefits. Server and storage consolidation, for example, will concentrate on benefits accrued from more efficient use of server and storage resources, improved staff productivity, lower platform costs, and better use of the infrastructure. Simply take the list of characteristics you developed in step two, and show how a SAN can provide benefits in those areas. Establishing specific cost savings is one of the two key elements in the ROI process, so be sure to look hard for every area of benefit.

    Step Four: Identify the SAN-Related Costs

    Determining the costs associated with the scenario involves identifying the new components specifically required to build and maintain the SAN. These can include software licenses, switches, Fibre Channel HBAs, optical cables, and any service costs associated with the deployment. Be careful to include only those items that relate directly to the SAN implementation. This is the second key element in the ROI process: if you do not correctly estimate expenses, the ROI might be substantially better or worse than your estimate.

    Step Five: Calculate the ROI

    There are several standard ROI calculations in common use, such as net present value (in dollars), internal rate of return (as a percentage), and payback period (in months). Briefly, these can be defined as:

  • Net Present Value (NPV) A method used in evaluating investments where the net present value of all cash flows is calculated using a given discount rate.


  • Internal Rate of Return (IRR) A discount rate at which the present value of the future cash flows of an investment equal the costs of the investment.


  • Payback Period The length of time needed to recoup the cost of a capital investment on a nondiscount basis.


  • Detailed explanations of these techniques and how to use them can be found in most accounting textbooks. It is likely that your company has a preferred method for calculating ROI. You should determine which method this is, and if there are standard forms for presenting your analysis. Asking your accounting department might be a good first step.

    This approach to calculating ROI allows you to focus on a particular project or infrastructure-based problem. It allows you to reduce deployment risk by deploying SANs in phases by scenario. Deploying by scenario will keep investments limited to the solution at hand and create an investment base for future deployments. The initial investment will improve the ROI on other scenarios by reducing some of the investment required to deploy them.

    The Rest of the Process and the Repetition of the Cycle

    Now you have the following documents:

  • Detailed results from the interview process, which define what the SAN project needs to accomplish. This includes:


  • --A technical requirements document
    --A timeline for accomplishing the tasks associated with implementing the SAN
    --A list of everything that you will need to buy to make the project work

  • A rough idea of how the SAN will be designed.


  • An ROI analysis to justify continuing with the project.


  • These will be used and maintained throughout the life of the SAN. The timeline will be the framework in which all activities in the SAN's lifecycle will reside. In later chapters, you enter the architecture development phase and will use these documents to develop a detailed architecture for your SAN. This will in turn be used to develop a test plan. These documents will be used in the approval process for implementation, and will be kept up to date during the maintenance phase as part of the SAN's documentation set. If any major changes to the SAN are needed, the lifecycle will be repeated and another set of documentation will be produced.

    Click here to buy book

    Building SANs with Brocade Fabric Switches

    Authors
    Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.


    By Josh Judd, Chris Beauchamp, Alex Neefus and Benjamin F. Kuo

    Summary

    The SAN design process consists of seven phases, which are cycled through as needed throughout the life of your SAN. Data collection and analysis together define the requirements of your SAN. These requirements feed into the architecture development process to produce a SAN design blueprint. After you have a plan in place for your SAN, you must test certain components to ensure that it is working the way you thought it would, before you can begin to transition and release it into production. Once the SAN has entered production, it falls into an ongoing maintenance phase, and continues in that phase until a change occurs that causes the cycle to repeat.

    The first two phases (data collection and analysis) are critical to the health of the SAN. Simply put, if the information on which the design is based is incomplete and/or inaccurate, the design will be incorrect.

    Data collection consists of a series of interviews, collecting the answers into a meaningful format (a technical requirements document), and verifying the accuracy of the collected data. It is imperative that all key stakeholders in the SAN project be included on the interview list.

    While listed as a separate phase, data analysis actually coincides with data collection. The objective of the analysis phase is to turn the raw data, which is generally in the form of business requirements, into a more technical formatthe technical requirements document. Some of this occurs "on the fly" during the interview process. However, certain tasks are done after the interviews are complete. For example, detailed port count and performance requirements are generated "on the fly," and an ROI proposition is created after the fact. Once the requirements of the SAN are well defined, the remaining phases can take place. These phases are covered in subsequent chapters.

    Solutions Fast Track

    Looking at the Overall Lifecycle of a SAN

    q       The SAN design process is a cycle.

    q       This process consists of seven phases:

    1. Data Collection

    2. Data Analysis

    3. Architecture Development

    4. Prototype and Test

    5. Transition

    6. Release to Production

    7. Maintenance

    q       Whenever there is a fundamental change to the SAN, the cycle should repeat.

    Conducting Data Collection

    q       Data collection is the foundation on which a SAN is built.

    q       You should interview everybody who has an interest in the project.

    q       During the interview process, create a technical requirements document.

    Analyzing the Collected Data

    q       There are several things that you need to get out of data analysis:

    The number of different fabrics that will make up the SAN solution

    The port count and performance characteristics of each fabric

    An estimate of the hardware required to meet these requirements

    q       You might be able to localize traffic for better performance if you can create well-defined groups.

    q       Prepare an ROI proposition to justify your SAN project.

    Frequently Asked Questions

    Q: Once I have designed my SAN, shouldn't it be done? I don't want to have to keep reinventing the wheel!

    A: Yes and no. After a SAN enters production, it is "done" until you want to change it in a fundamental way. As long as you are happy with leaving your SAN the way it is, there is no reason why you would have to repeat the design cycle. Simply adding a new storage array does not require a repetition of the cycle. Moreover, events that do cause the cycle repeat might cause it to repeat relatively quickly. For example, if you decide to go through the design process because you are adding a new type of storage array to the SAN, and want to validate that doing so won't break anything, you will be able to take a fast track through most of the process. After all, adding this device will not by any stretch of the imagination require that you change your fabric topology, or affect much of your SAN architecture.

    Q: Every end user in my company is a stakeholder in the SAN. Do I need to interview everybody?

    A: It is true that everybody who uses a system is a stakeholder in that system. However, we mean something a little less broad. When we refer to a stakeholder, we mean somebody whose job revolves around taking care of one or more of the systems that will attach to the SAN. This can include systems, database, and storage administrators, as well as other technical people. It can also include people responsible for the data that resides on these systems. For example, a manager responsible for a call center at a phone-in catalog company might be a key stakeholder in the SAN, because he or she is responsible for the data entered into that company's business systemwhich is attached to the SAN. Why is this person a key stakeholder? Because he or she might have something to say about the availability and performance requirements of the system. When in doubt, try to include anybody on the team who wants to be there. It is usually better to have more data than you need, rather than less.

    Q: Do I need to wait until data collection is complete before beginning data analysis?

    A: Actually, the data collection and analysis phases are most effective if there is some degree of overlap. If you have analyzed data from the first interview when you go into the second, you will be able to better understand the answers, and might also be able to direct the line of questioning along more useful lines. Be careful not to develop firm convictions too early on, though. Always approach SAN design scientifically. Never start an interview with a firm preconception of the outcome! Collection and analysis are divided into two phases because some of the analysis naturally occurs after all data collection is complete. For example, you can't prepare an ROI proposition until you have a fairly complete picture of what the SAN will need to accomplish, and some idea of the technical infrastructure that will be involved.

    Click here to buy book

    Building SANs with Brocade Fabric Switches

    Authors
    Josh Judd is a Senior SAN Architect with Brocade Communications Systems, Inc. In addition to writing technical literature, he provides senior-level strategic support for major OEMs and end-users of Brocade storage network products worldwide. Chris Beauchamp (Brocade CFP, CSD) is a Senior SAN Architect for Brocade Communication Systems, Inc. Chris focuses on SAN design and architecture, with an emphasis on scalability and troubleshooting. Alex Neefus is the Lead Interoperability Test Engineer at Lamprey Networks, Inc. Alex has worked on developing testing tools for the SANmark program hosted by the FCIA. Benjamin F. Kuo is a Software Development Manager at TROIKA Networks. Headquartered in Westlake Village, CA, TROIKA Networks is a developer of Fibre Channel Host Bus Adapters, dynamic multipathing, and management software for Storage Area Networks.

    Share:
    Home
    Mobile Site | Full Site
    Copyright 2017 © QuinStreet Inc. All Rights Reserved