There are four architecture domains BDAT for short (Business, Data, Application and Technology) architecture activities usually encompass all four of them.
The Business Architecture defines the business strategy, governance, organization, and key business processes.
The Data Architecture describes the structure of an organization’s logical and physical data assets and data management resources.
The Application Architecture provides a blueprint for the individual applications to be deployed, their interactions, and their relationships to the core business processes of the organization.
The Technology Architecture describes the logical software and hardware capabilities that are required to support the deployment of business, data, and application services. This includes IT infrastructure, middleware, networks, communications, processing, standards, etc.
As mentioned a complete enterprise architecture should address all of the four domains, realistically though and due to the usual constraints not all domains are included and the new architecture is scoped and trimmed down to the most impacted domains, this increases the risk of inconsistencies. Such a risk should be clarified and highlighted.
“For a man with a hammer every problem is a nail.” The EA team having access to a very large hammer often leans towards very elaborate solutions to solve problems that could have been solved with a much simpler solution, or even no solution at all. As a solution Architect I’ve often encountered such solutions -and even was part of few of them- very elaborate solutions to resolve problems that should have been handled operationally.
Cost Benefit Analysis conducted during the requirements phase of a project can prevent such scenarios from taking place, whats more important though is keeping an open mind and accepting that often the problem encountered doesn’t require the hammer the EA team is wielding.
Here is a parable I like to share when I encounter such situations, It is said that this is a true story.
Understanding how important that was, the CEO of the toothpaste factory got the top people in the company together and they decided to start a new project, in which they would hire an external engineering company to solve their empty boxes problem, as their engineering department was already too stretched to take on any extra effort.
The project followed the usual process: budget and project sponsor allocated, RFP, third-parties selected, and six months (and $8 million) later they had a fantastic solution — on time, on budget, high quality and everyone in the project had a great time. They solved the problem by using some high-tech precision scales that would sound a bell and flash lights whenever a toothpaste box weighing less than it should. The line would stop, and someone had to walk over and yank the defective box out of it, pressing another button when done.
A while later, the CEO decides to have a look at the ROI of the project: amazing results! No empty boxes ever shipped out of the factory after the scales were put in place. Very few customer complaints, and they were gaining market share. “That’s some money well spent!” – he says, before looking closely at the other statistics in the report.
It turns out, the number of defects picked up by the scales was 0 after three weeks of production use. It should’ve been picking up at least a dozen a day, so maybe there was something wrong with the report. He filed a bug against it, and after some investigation, the engineers come back saying the report was actually correct. The scales really weren’t picking up any defects, because all boxes that got to that point in the conveyor belt were good.
Puzzled, the CEO travels down to the factory, and walks up to the part of the line where the precision scales were installed. A few feet before it, there was a $20 desk fan, blowing the empty boxes out of the belt and into a bin.
“Oh, that — one of the guys put it there ’cause he was tired of walking over every time the bell rang”, says one of the workers.
As an architect you often encounter requirements that are better off not implemented, requirements are triaged through several activities one of them is impact analysis. The law of diminishing returns comes into play here given that the how the complexity of a requirement and its return are often inversely proportional. Its often more productive to partially implement the requirement rather than going to the full extent as the cost will far exceed the return.
The law of diminishing returns states that in all productive processes, adding more of one factor of production, while holding all others constant, will at some point yield lower incremental per-unit returns.
This behaviour can be represented using the following formula with X being the unit of incrementation and i the number of increments.
To put it in a more colloquial form, the more seeds you plant in a field the less yield you get per seed, another example would be developers per project there is a certain infliction point afterwards it doesn’t matter how many developers you add to the project the return remains the same.
An example of requirement that exhibits a highly diminishing result are service assurance requirements, requiring a 99.9% accuracy would cost far more than what it would cost to have a manual workflow to manually verify the deviation. OCR (optical character recognition) projects come in mind and hence you find projects like captcha relying on users to verify the output of the OCR algorithm.
This is not very different from NP-Complete problems and approximation algorithms reaching an acceptable solution at a fraction of the cost is favoured over reaching the solution to a certain problem set with a potentially infinite solution time.
Such as with conventional architects there is no right and wrong decisions while constructing a solution. The combination of all of the decisions made throughout the project results in the success or failure of the final product being able to pick a certain approach or another can be compared to conventional architects picking a certain architecture style for a certain arch or a certain paint colour, decisions that can’t be fully defended/justified while the project is still in progress. Being able to make such decisions requires a certain set of skills that combine multiple disciplines. I compiled a list of the skills I believe should be available in a solution architect.
A broad cross sectional knowledge of the industry s/he is designing solutions for.
Attention to details while being keeping an eye on the bigger picture.
A Strong sense of business and market requirements.
Ability to abstract the requirements to design generic building blocks.
The ability to make confident decisions ( or seemingly confident ) in murky situations.
Out of the box thinking that’d allow him to reuse existing building blocks in none conventional ways.
Ability to document a design in a clear form that can retain the details without being too confusing.
Communication ability with both business as well as technical stakeholders.
The ability to conduct cost benefit analysis to be able to triage the requirements.
Broad knowledge of the technologies being used within the industry he is designing for.
Flexibility to evolve and pivot his design as needed yet the resilience to resist development requirements that do not add value.
Sales ability to be able to sell the design to the developers and the solution to the requester.
Some nice to have skills include :
An attractive projects portfolio.
Ability to construct proof of concepts if required.
A business degree.
This is an open list and you are more than welcomed to add to it to do so please comment with the skill you believe should be added to the either lists.
Designing a fault tolerant system in a loosely coupled system based on async calls can be quite challenging, usually certain trade offs must be made between resilience and performance. The usual challenge faced while designing such a system is missed/unprocessed calls resulting in data drift, This exponentially increases over time eventually turning the system unusable.
GSM customer swapping his SIM card.
SIM migration order is created.
Order processing starts, and SIM swap call is sent to network elements.
Customer’s SIM is swapped but response from network elements is missed/not sent.
CRM order is cancelled by customer care.
Customer now has two different SIMs associated with his account, the one he is using listed in Network, and his old SIM card on CRM.
All subsequent orders will fail since the customer’s service account is inconsistent through the BSS stack.
One way to prevent such an issue from happening all together is to lock the customer for editing until the SIM swap request is completed from network, and if a failure happens during SIM swap the customer remains locked until resolved manually, this approach is called Fault Avoidance, and its quite costly performance wise, also it provides a really poor customer experience.
Fault Tolerance on the other hand allows for such incidents to take place but the system prevents failure from happening. In my opinion the best pattern to handle faults in loosely coupled systems is check-pointing.
Checkpointing is a technique in which the system periodically checks for faults or inconsistencies and attempts to recover from them, thus preventing a failure from happening.
Check-pointing pattern is based on a four-stage approach:
Damage assessment and confinement (sometimes called “firewalling”)
Fault treatment and continued service
If this approach sounds familiar its because its been in use for quite sometime now in SQL (a loosely coupled system between client and DB Server), to retain DB consistency in the event of a faults during long running queries the following steps are taken :
Client session termination is detected (step 1 detection).
Does user have any uncommitted DML queries (step 2 assessment).
Access undo log and pull out data needed to rollback changes (step 3 recovery).
Rollback changes and restore data consistency (step 4 fault treatment).
The pattern used by DBMSs, Checkpoint-rollback Scenario relies on taking a snap shot of the system at certain checkpoints through the process flow and upon failure between two checkpoints restoring the snapshot. However this pattern becomes too complex to implement in multi-tiered systems.
Checkpoint Recovery Block:
This pattern relies on using alternative flows based on the type of fault, the checkpoint recognizes the type of fault and picks the correct block to use to recover from the error and complete the process.
This approach is extensively while coding, try with multiple catching blocks each handling a different type of exception, however instead of using it within the code of a single layer its taken one step further and used on the process level.