Sunday, August 17, 2014

A look at Software Architecture

Architecture principles

Here we are going to talk about very basic principles to follow for any software architecture –

  1. Scalability – It can refer to the capability of a system to increase its total output under an increased load when resources (typically hardware) are added.
  2. Cost effective.
  3. Separation of concern – what is the advantage of loose coupling.
    • Concerns are the different aspects of software functionality. For instance, the "business logic" of software is a concern, and the interface through which a person uses this logic is another. The separation of concerns is keeping the code for each of these concerns separate. Changing the interface should not require changing the business logic code, and vice versa.
    • Divide your application into distinct features with as little overlap in functionality as possible. The important factor is minimization of interaction points to achieve high cohesion and low coupling. However, separating functionality at the wrong boundaries can result in high coupling and complexity between features even though the contained functionality within a feature does not significantly overlap. When designing an application or system, the goal of a software architect is to minimize the complexity by separating the design into different areas of concern. For example, the user interface (UI), business processing, and data access all represent different areas of concern. Within each area, the components you design should focus on that specific area and should not mix code from other areas of concern. For example, UI processing components should not include code that directly accesses a data source, but instead should use either business components or data access components to retrieve data. 
    • Single Responsibility principle. Each component or module should be responsible for only a specific feature or functionality, or aggregation of cohesive functionality.
    • Principle of Least Knowledge (also known as the Law of Demeter or LoD). A component or object should not know about internal details of other components or objects. 
    • Don’t repeat yourself (DRY). You should only need to specify intent in one place. For example, in terms of application design, specific functionality should be implemented in only one component; the functionality should not be duplicated in any other component. 
    • Minimize upfront design. Only design what is necessary. In some cases, you may require upfront comprehensive design and testing if the cost of development or a failure in the design is very high. In other cases, especially for agile development, you can avoid big design upfront (BDUF).
  4. Reusability - Software reusability more specifically refers to design features of a software element (or collection of software elements) that enhance its suitability for reuse.


SOA

Service-oriented architecture (SOA) is a software design and software architecture design pattern based on discrete pieces of software providing application functionality as services to other applications. This is known as service-orientation. It is independent of any vendor, product or technology.

A service is a self-contained unit of functionality, such as retrieving an online bank statement. Services can be combined by other software applications to provide the complete functionality of a large software application. SOA makes it easy for computers connected over a network to cooperate. Every computer can run an arbitrary number of services, and each service is built in a way that ensures that the service can exchange information with any other service in the network without human interaction and without the need to make changes to the underlying program itself.


UML

A unified modeling language for documenting a system in an object oriented manner.
This is a modelling language by which a technical architect can create a design for the system which can be used by the developers.
UML diagrams represent two different views of a system model: 


Static (or structural) view: 

It emphasizes the static structure of the system using objects, attributes, operations and relationships. The structural view includes class diagrams and composite structure diagrams.

  • Class diagram: It describes the structure of a system by showing the system's classes, their attributes, and the relationships among the classes.
  • Component diagram: It describes how a software system is split up into components and shows the dependencies among these components.
  • Composite structure diagram: It describes the internal structure of a class and the collaborations that this structure makes possible.
  • Deployment diagram: It describes the hardware used in system implementations and the execution environments and artifacts deployed on the hardware.
  • Object diagram: It shows a complete or partial view of the structure of a modeled system at a specific time.
  • Package diagram: It describes how a system is split up into logical groupings by showing the dependencies among these groupings.
  • Profile diagram: It operates at the meta-model level to show stereotypes as classes with the <<stereotype>> stereotype, and profiles as packages with the <<profile>> stereotype. The extension relation (solid line with closed, filled arrowhead) indicates what metamodel element a given stereotype is extending.

Dynamic (or behavioral) view: 

It emphasizes the dynamic behavior of the system by showing collaborations among objects and changes to the internal states of objects. This view includes sequence diagrams, activity diagrams and state machine diagrams.
  • Activity diagram: It describes the business and operational step-by-step workflows of components in a system. An activity diagram shows the overall flow of control.
  • UML state machine diagram: It describes the states and state transitions of the system.
  • Use Case Diagram: It describes the functionality provided by a system in terms of actors, their goals represented as use cases, and any dependencies among those use cases.
  • Communication diagram: It shows the interactions between objects or parts in terms of sequenced messages. They represent a combination of information taken from Class, Sequence, and Use Case Diagrams describing both the static structure and dynamic behavior of a system.
  • Interaction overview diagram: It provides an overview in which the nodes represent communication diagrams.
  • Sequence diagram: It shows how objects communicate with each other in terms of a sequence of messages. Also indicates the lifespans of objects relative to those messages.

What is Data conformance?

Modeling for staging area. How it should be? Target driven or source driven? Design staging tables to better suit the target rather than the source. Let's take a look at both approaches.

Target driven:

  • ETL is usually a two-step process. Stage then load. if the staging does mild transformations to better suit the target, I need only create one set of load processes. If the DW gets similar data from multiple sources.
  • All I need to do is create new source specific staging processes and let the existing load processes handle the new source.
  • Sources change. I don't want to rewrite ETL processes from end-to-end because of a change in the source.
  • Most of the heavy transformation logic occurs on the load side. With the staging tables closer in structure to the target, the load process code tends to be simpler.

Source driven:

There are several reasons to keep the staging area relatively simple. One is that it should be able to trace individual records back to the source. This may not be a requirement now, but it could become one in the future as more people use the DW and want to prove that it is correct. If you're dealing with deltas, there is always the possibility that the deltas do not arrive in referentially complete sets but become referentially complete over time. Also, if you are dealing with deletes and updates (including the need to back out and/or replace source files), keeping the data in its original dimensions helps you avoid unintentional business rules.

Normalization – Definition of normal forms.

Database normalization is the process of organizing the fields and tables of a relational database to minimize redundancy and dependency

  • A table is in first normal form if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain.
  • A table is in 2NF if and only if it is in 1NF and no non-prime attribute is dependent on any proper subset of any candidate key of the table.
  • Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every superkey of R.

Logical and physical data model

  • A logical data model describes the data in as much detail as possible, without regard to how they will be physical implemented in the database. Features of a logical data model include: Includes all entities and relationships among them. All attributes for each entity are specified. The primary key for each entity is specified. Foreign keys (keys identifying the relationship between different entities) are specified. Normalization occurs at this level.
  • The steps for designing the logical data model are as follows: Specify primary keys for all entities. Find the relationships between different entities. Find all attributes for each entity. Resolve many-to-many relationships. Normalization.
  • Physical data model represents how the model will be built in the database. A physical database model shows all table structures, including column name, column data type, column constraints, primary key, foreign key, and relationships between tables. Features of a physical data model include: Specification all tables and columns. Foreign keys are used to identify relationships between tables. De-normalization may occur based on user requirements. Physical considerations may cause the physical data model to be quite different from the logical data model. Physical data model will be different for different RDBMS. For example, data type for a column may be different between MySQL and SQL Server.
  • The steps for physical data model design are as follows: Convert entities into tables. Convert relationships into foreign keys. Convert attributes into columns. Modify the physical data model based on physical constraints / requirements.

Object oriented programming principles

  • Data abstraction - Abstraction denotes a model, a view, or some other focused representation for an actual item. It’s the development of a software object to represent an object we can find in the real world.
  • Encapsulation - The ability to provide users with a well-defined interface to a set of functions in a way which hides their internal workings. In object-oriented programming, the technique of keeping together data structures and the methods (procedures) which act on them.
  • Inheritance - The ability to derive new classes from existing classes. A derived class (or "subclass") inherits the instance variables and methods of the base class and may add new instance variables and methods. New methods may be defined with the same names as those in the base class, in which case they override the original one.
  • Polymorphism - Polymorphism refers to the ability to define multiple classes with functionally different, yet identically named methods or properties that can be used interchangeably by client code at run time.

What is Structural conformance?

The benefits of architectural analyses are only achieved if one can guarantee that the implementation conforms to the architecture. We propose an approach for checking and measuring the structural conformance of a software system's implementation to its execution architecture.

Shared nothing architecture. Performance issue – Redistribution/repartitioning.

A shared nothing architecture (SN) is a distributed computing architecture in which each node is independent and self-sufficient, and there is no single point of contention across the system. More specifically, none of the nodes share memory or disk storage.

Shared nothing is popular for web development because of its scalability. As Google has demonstrated, a pure SN system can scale almost infinitely simply by adding nodes in the form of inexpensive computers, since there is no single bottleneck to slow the system down. Google calls this sharding. A SN system typically partitions its data among many nodes on different databases (assigning different computers to deal with different users or queries), or may require every node to maintain its own copy of the application's data, using some kind of coordination protocol. This is often referred to as database sharding.

Loosely coupled architecture

If the client requesting a service must be waiting for the reply, the architecture is "Tightly. Also known as "stop and wait". By contrast, in the architecture type "Loosely", a service client can continue doing other things after a service request.

Therefore, all implementations of this architecture, such as 3-tier or n-tiers, on Web service (with wait request-response), are "Tightly Coupled".

A very frequent question on type of architecture, "Loosely Coupled" is: How responses are processed once they are available? The answer is: Thread process or a function type "Callback"to handle responses. Therefore, the structure of your client program, usually must be divided into two parts, the part that handles the main flow and the other is the function that processes the answers.

Architecture VS Design

Difference between a Designer and an Architect. Where do each of their roles stop?

  1. If you are “architecting” a component, you are defining how it behaves in the larger system. If you are “designing” the same component, you are defining how it behaves internally.
  2. The architecture of a system is its 'skeleton'. It's the highest level of abstraction of a system. What kind of data storage is present, how do modules interact with each other, what recovery systems are in place. Just like design patterns, there are architectural patterns: MVC, 3-tier layered design, etc.
  3. Software design is about designing the individual modules / components. What are the responsibilities, functions, of module x? Of class Y? What can it do, and what not? What design patterns can be used?
  4. So in short, Software architecture is more about the design of the entire system, while software design emphasizes on module / component / class level.
  5. Architecture usually deals with what (is done) and where (it's done), but never with how. That is think is the principle difference - design completes the how that architecture doesn't (and shouldn't) talk about.