Tuesday, October 6, 2015

Ab-Initio - Air sanbox command. Part 2

Air sandbox commands:- 

1. air sandbox find [path] 

        [-up|-down] 
        [-sandbox] 
        [-relative] 
        [-project] 

<path>   Optional. Specifies the directory in which to start looking. Default is the current working directory.

-up | -down Optional. Default is -up. You can choose one:
-up specifies an upward search, toward the filesystem root, to see if the directory at path or any containing directory is a sandbox. Prints the path for the innermost containing sandbox, or fails if there is none. 

-down specifies a recursive downward search of the directory specified by path and its subdirectories. Prints the paths for all sandboxes found, or fails if no sandboxes are found.

-sandbox Optional. Specifies that the sandbox directory found should be listed (the default). Valid only with -up. Specify 
-sandbox only when you are specifying -relative and/or -project.  

-relative Optional. Specifies that the relative path from the sandbox directory to path should be printed. Valid only with -up. 

-project Optional. Specifies that the EME project the sandbox is associated with should be printed. Valid only with -up.

Example 1 
If you run the command without any options, it returns the sandbox directory — the same result if it were run with the -sandbox option. For example, the following command: 

air sandbox find
/home/<user>/test_sand

Running the command with -sandbox: 
> air sandbox find -sandbox

returns the identical path: 
/home/<user>/test_sand
<Server>:<user>:/home/<user>/test_sand/mp

Example 2 
Running the command with only -relative: 
air sandbox find -relative

returns the relative path to sand: mp
<server>:<user>:/home/<user>/test_sand/mp

Example 3 
Running the command with both -sandbox and -relative: 
air sandbox find -sandbox -relative

returns both the sandbox directory and the relative directory: 
/home/<user>/test_sand/mp 

2. air sandbox get-required-files [-force] [<path> ...]


Displays the set of files that are referenced by one or more files within a sandbox. Each line of the output contains the project path and relative filename separated by a tab character.

-force   Specifies that the command should continue even after an error is encountered (for example, if a required file cannot be found or if a URL contains an invalid parameter). In this case, the output may be incomplete.

-path     Path to one or more files in a sandbox. If no path is specified, displays the required files for all files in the sandbox.

3. air sandbox info <path> 

Displays information about a sandbox, or about files within a sandbox.

<path>    Path to a file or directory. The output includes sandbox and datastore information:

       air sandbox info  /home/<user>/test_sand

Filesystem Info:
   Path:             /home/<user>/test_sand
   Size:             8192
   Owner:            <user>
   Created:          2015-10-06 00:25:37
   Modified:         2015-10-06 00:25:37
   Accessed:         2015-10-06 01:00:36
   Permissions:      drwxrwxr-x

EME Datastore Info:
   Datastore URL:    file://<server>.<domain>.com/ai/repo/dev/dev_eme
   Branch:           main
   Project:          /Projects/dir1/dir2/dr3           

4. air sandbox lock <path>

                        [-set | -release | -break | -reset]
                        [-nocheck]
                        [-force] [-unbreakable] [-manual-release]

Performs lock operations on sandbox objects. Default operation is -set, to create a lock.

       <path>          Path to an object in a checked-in sandbox. If the object is a graph, the run script associated with the graph will also be locked.

       -set            Locks an object in a sandbox. Makes the file in the sandbox writable.

       -release        Releases the lock from an object in a sandbox. Makes the file in the sandbox read-only.

       -break          Breaks another user's lock on an object in your sandbox. After breaking it, you may lock the file with the air sandbox lock set command.

       -reset          Resets a lock broken by another user to unlocked state.

       -nocheck        Applicable if -set or -release specified. Causes the command to not do timestamp checking. Normally, air sandbox lock -set prevents you from setting locks on out-of-date files, and air sandbox lock -release prevents you from releasing locks on modified files.

       -force          Applicable if -set specified. Breaks another user's lock, if necessary, before setting lock.

       -unbreakable    Applicable if -set specified. Sets the lock to be unbreakable except by the EME administrator.

       -manual-release Applicable if -set specified. Causes the file to remain locked after the file is checked in.

Monday, October 5, 2015

Ab-Initio - Air sanbox command. Part 1

Air sandbox commands:- 

1. air sandbox create <path>

                          [-template <path>]
                          [-prefix <prefix>]
                          [-replace <s1> <s2>]
                          [-location <location>]
                          [-nodirectories]

Creates a new sandbox in a directory. It creates the directory if  necessary, marks the directory as a sandbox, and creates the built-in sandbox directories.

       <path>           Path to the new sandbox.

       -template        Creates the sandbox from the specified template sandbox.

       -prefix         By default, the sandbox begins with a set of predefined parameters (DML, XFR) used to locate objects within the sandbox. This option is used to add a prefix to the front of these parameters.

For example: -prefix COMMON_ causes these parameters to be called COMMON_DML, COMMON_XFR, and so on.

                        The prefix is not applied to the location parameter and the RUN parameter.

       -replace When creating a sandbox from a template, changes All occurrences of one string <s1> to another string <s2> in all parameter names and expressions.

       -location Specifies the name of a parameter that points to the location of the project. Default is PROJECT_DIR.

       -nodirectories   Does not create the default sandbox directories.

2. air sandbox detach –q <path>

This detaches a sandbox from its associated EME project.
-q[Optional] - Suppresses error if the sandbox directory does not exist. 
Path[Optional] - Path to the sandbox directory. Default is the current directory.

3. air sandbox diff 

                        {<path> |
                        <path1> <path2> |
                        -version <v> <path> |
                        -version <v1> -version <v2> <path>}
                        [-terse | -verbose]
                        [-ignore-param-order]
                        [-text]

Displays differences between two graphs, two plans, or two text files. The method used to determine differences depends on the types of the objects specified, which in turn depend on the filename extensions. 

       <path>        Path to a file in the filesystem.

       -version    Specifies a version of the object to compare. It can be expressed as an version number, as a tag, or as 'current' (current version of the EME datastore). If -version is omitted, the command compares object <path> against the EME version from which the object was checked out.

       -terse    Valid for graphs or plans. If specified, shows only differences that might affect the running of the graph or plan, such as a change to the value of a parameter.

       -verbose    Valid for graphs or plans. If specified, shows all changes to the graph or plan, including minor differences, such as a change to the <x,y> coordinates of a component.

       -ignore-param-order
                   Valid for graphs or plans only. If specified, does not show differences in the order of graph, plan, task, or component parameters.

       -text       If specified, forces the two files to be treated as text, regardless of their actual types.

Example 1
To show the differences between two different graphs in  the filesystem:
          air sandbox diff mp/graph1.mp /disk1/sand/mp/graph2.mp
Example 2
To show the differences between two different plans in the filesystem:
          air sandbox diff plan/planA.mp /disk1/sand/plan/planB.mp

Example 3
To show the differences between a file in a sandbox and the same file in the EME datastore, at the version at which it was checked out, specify the path to the file in the sandbox:
air sandbox diff dml/customers.dml

Example 4
To show the differences between a graph in a sandbox and a specific version of the same graph in the EME datastore, specify the version and the path to the graph in the sandbox:
air sandbox diff -version 785 mp/graph.mp

Example 5
To show the differences between two versions of a text object in an EME datastore, specify both versions of the  object and the path to the object in the sandbox:
          air sandbox diff -version 785 -version 943 /sand/xfr/xfrA.xfr

The command returns an exit code of 0 if there are no differences to report, else it returns a non-zero exit code.

Friday, October 2, 2015

Why we need Data Governance?

We need Data Governance famework to establish accountability and ensuring consistent master data management practices across the organization.
This will establish a strong foundation where Broader Data Management capabilities can be developed.

What a Data Governance framework can provide? -

1) Accountability for data across the organization.
2) Clear standards and processes to control the use of data assets.
3) Set of definitions that encourage consistent and desirable behaviors on data across the enterprise.
4) Provide guidelines for ensuring consistency in the definition, usage and management of data across the enterprise.

Key Benefits - 

1) Risk & Regulatory -

Regulations such as Capital, Liquidity, RRP, CCAR, Single Counterparty Credit Limits etc. require  the use of enterprise “conformed” reference data dimensions.

2) Costs -

i) Multi-million dollar cost reductions through investment in enabling data management technologies.
ii) Decrease in the operational expense of producing high-quality data through proactive data quality maintenance.

3) Efficiency -

i) Increasing focus on repeatable and controlled data management solutions to achieve operational excellence.
ii) Eliminate redundant manual reference data reconciliations to make the process more scalable and less error prone.

4) Growth & Profitability -

Enterprise wide reference/master data standards are required to enable the data sharing among lines of business to support planned strategic initiatives.

Wednesday, September 16, 2015

Data Governance : Data quality measures

Data Quality (DQ) is a niche area required for the integrity of the data management by covering gaps of data issues. This is one of the key functions that aid data governance by monitoring data to find exceptions undiscovered by current data management operations. Data Quality checks may be defined at attribute level to have full control on its remediation steps.

We can have following nine measures/matrices to know quality of any data source.
These measures can be applied irrespective of any tool and technology as these measures are applicable as basic required principles and measures to ensure data quality.

Accuracy -
The degree to which data is consistent with authoritative sources of the truth (e.g. Customer ID must conform to an authorized government-issued document or database). Metric/results will be % of Accuracy, Failure Count.

Completeness -
The degree to which data is required to be populated with a value (e.g., A Customer ID is required for all customers but not prospects). Metric/results will be % of Failure, Failure Count

Comprehensiveness -
The degree to which all expected records are contained in a data store. Metric/results will be % of Comprehensiveness Ratio (records found vs. records expected)

Coverage -
The degree to which data is inclusive of all supported business functions required to produce a comprehensive view for a specific business purpose (e.g., Average Revenue per User reporting for the enterprise should include revenue data from all business areas where revenue is generated). Metric/results will be % of Data Sources Available

Integrity -
The degree to which data retains consistent content across data stores (e.g. Customer ID contains the same value for a Customer across databases). Metric/results will be % of Different, Count of Differences

Logic/Reasonableness -
The degree to which data confirms to tests of reasonableness based on real-world scenarios (e.g., A policy/account holder’s birth date must prove that they are at least 13 years old). Metric/results will be % of Failure, Failure Count

Timeliness -
The degree to which data is consistent with the most recent business event (e.g., Customer ID must be updated within all systems within XX hours of a change made to a Customer record). Metric/results will be % of Failure, Failure Count

Uniqueness -
The degree to which data can be duplicated (e.g., Two non-related customers cannot have the same Customer ID/Party ID.). Metric/results will be % of Duplicated, Duplicate Count

Validity -
The degree to which data conforms to defined business rules for acceptable content (e.g., Customer ID must be 10 characters long). Metric/results will be % of Failure, Failure Count