The AWS Certified Solutions Architect Professional (CSA-Pro) exam reaches far beyond testing in-depth knowledge of the AWS platform and delves into your ability to make decisions in ambiguous situations, wrestle with sub-optimal trade-offs, and tease-out minute details from paragraphs of text.
Certified Solutions Architect Associateprior to attempting this.
AWS will try to tick you with red herrings.
Assumes you are a CSA associate.
AWS architectures follow Bloom's Taxonomy for learning.
AWS worded the questions so that they can be tricky for people to understand and they want to make sure they understand the analyze and evaulate parts.
The also word the questions to comply with the Duning-Kruger Effect.
The less experience in an area you have, the more likely you are confident in that area. "The less you know, the more you think you know."
There is also a warning for "the valley of despair" as they learn more and realising the breadth and depth of knowledge required.
Using the analogy of using the Geolocation finder - its about the journey and it is about playing around.
The aim with the course is to venture from things that seem to easy and then venture out to the scenic route.
Given the expert task, you will need to venture out on your own path.
"Keeping things in order" - ground rules that a data store will work by.
Look for these in the services of S3 vs Dynamo.
While S3 may look like a file system, it has more in common with databases. S3 paths are a key, not a file path.
|AWS Documentation Statement||What S3 is thinking|
|S3 provides read-after-write consistency for PUTSs of new objects||Never seen this object, no-one has asked about it before. Welcome, new object. You can read it immediately.|
|HEAD or GET reqs of the key before an object exists will result in eventual consistency||Wait, someone asked for the key and I said "never saw it". I remember that, and need to honor that response until I completely write this new object and fully replicate it. I'll let you read it eventually.|
|S3 offers eventual consistency for overwrite PUTs and DELETEs||You want to update or delete an object. Let's make sure we get that update or delete completed locally, then we can replicate it to other places. Until then, I have to serve up the current file. I'll serve up the update/delete once its fully replicated - eventually.|
|Updates to a single key are atomic||Only one person can update this object at a time. If I get two requests, I'll process them in order of their timestamp and you'll see the updates as soon as I replicate elsewhere.|
S3 offers versioning.
You can optionally require multi-factor auth:
S3 also offers cross-region replication for:
This can enable to the ability to transition to archive etc after a certain timeframe based on things such as tags.
|Data Lake Concept||Athena, Redshift Spectrum, QuickSight|
|IoT Streaming Data Repository||Kinises Firehose|
|Machine Learning and AI Storage||Rekognition, Lex, MXNet|
|Storage Class Analysis||S3 Management Analytics|
|SSE-S3||S3's existing encryption key for AES-256|
|SSE-C||Upload your own AES-256 key which S3 uses when it writes to objects|
|SSE-KMS||Use a key generated and managed by AWS Key Management Service|
|Client-Side||Encrypt objects using your own local encryption before uploading to S3 (PGP, GPG, etc)|
|Transfer Acceleration||Speed up data uploading using CloudFront in reverse|
|Requester Pays||Requester rather than the bucket owner pays for requests and data transfer|
|Tags||Assign tags to objects for use in costing, billing, security etc|
|Events||Trigger notifications to SNS, SQS or lambda when certain events happen|
|Static Web Hosting||Simple and massively scalable static website hosting|
|BitTorrent||Use BitTorrent protocol to retrieve any publicly available object by auto-generate a .torrent file|
Glacier is a service by itself with its own API and you don't need S3 to access it.
There is the Glacier Vault that contains archives (file, zip, tar etc max size 40TB, immutable), and can be accessed through policies with a "Glacier Vault Lock" (different to vault access policy, enforce rules like no delete or MFA, immutable) and IAM access.
You need to understand the "policy" and the "access".
A small note on the vault lock process. You create a "Glacier Vault Lock" and you initiate the vault lock. You have 24 hours to "abort" or "complete" the vault lock. This is by design.
The example given was Othello. Think of it like a big field of play. In the example, we could pay for 64 blocks but only start using 4. Then we update more to have 12 blocks. We still pay for 64 blocks but less are available now.
With the Othello snapshot example, our first snapshot may contain the 4/64 blocks. Once we've added more data, once we take another snapshot it will only snapshot the added data. If we then removed a "chip", the next snapshot will record that. It is cost-effective because each snapshot doesn't not equate to the full size of your volume.
If a snapshot is delete, we cannot replicate that snapshot. Think of them as a collection of pointers to data which is stored in S3. Thanks to AWS magic, if you deleted snap one, it will help bring the changes from that into the next snap.
In 1984, Sun released a version of their SunOS as a distrubted file system known as Network File Storage (NFS) and this is what AWS took as inspiration to create EFS.
NFS itself is not considered a secure protocol. You coukd use Amazon DataSync instead that uses a purpose-built protocol securely. It can also use EFSSync to keep multiple EFS in sync.
Remember: EFS is 3x expensive than EBS and 20x more expensive than S3. There are also some NFS v4 features that are not supported (check docs).
An example given here for was that with multi-AZ web entrypoints that connect to a mount point, we could use AWS DataSync to sync that up to an on-premises staging. This will given scalability and redundancy across AZs.
New Name | Old Name | Interface | Function File gateway | N/A | NFS, SMB | Allow on-prem or EC2 instances to store objects in S3 via NFS or SMB mount point Volume Gateway Stored Mode | Gateway-stored Volumes | iSCSI | Async replication of on-prem data to S3 Volume Gateway Cached Mode | Gateway-cached Volumes | iSCSI | Primary data stored in S3 with frequently accessed data cached locally on-prem Tape Gateway | Gateway-Virtual Tape Library | iSCSI | Virtual media changer and tape library for use with existing backup software
It contains a feature called "bandwidth throttling" which is a great feature to ensure remote offices aren't smashed.
Amazon's version of Dropbox or Google Drive.
If you can use it, you should. It will help your Database administrator.
|If you need||Don't use RDS, use|
|Lots of large binary objects (BLOBs)||S3|
|Name/Value Data structure||DynamoDB|
|Data no well structured/unpredicatable||DynamoDB|
|Other DB Platform not supported||EC2|
|Need complete control||EC2|
The example has Multi-AZ RDS with read-replicas used to serivce regional users.
Note: non-transactional sotrage engines like MyISAM don't support replicated, you must use InnoDB (or XtraDB or Maria).
The Read-Replicas use async replication, while the stand-by DBs use sync replication.
A massively scalable key-value storage system.
Relational is great for related across a strong schema. NoSQL have their strength in key-value pairs.
key-value is an attribute where the whole record is known as an
item. DynamoDB has a Primary Key. You can use a composite primary key known as a partition key and sort key. We can have occurences of the same partition key as long as the sort keys are unique.
|Index Type||Description||How to remember|
|Global Secondary Index||Partition key and sort key can be different from those on the table||I'm not restricted to just the partitioning set forth by the partition key, I'm global!|
|Local Secondary Index||Same partiton key as the table but different sort key||I have to stay local and respect the table's partition key, but I can choose whatever sort key I want.|
Use the Global Secondary Index when you want a fast query of attributes outside the primary key - without having to do a table scan (read everything sequentially). "I'd like to query Sales Orders by Customer number rather than Sales Order Number".
Use the Local Secondary Index when you already know the partition key and want to quickly query on some other attribute. "I have the Sales Order Number, but I'd like to retrieve only those records with a certain Material Number."
Using the secondary indexes, you can use
projections to have fast access to data attributes.
|If you need||Consider||Cost||Benefit|
|Access just a few attributes in fastest way possible||Projecting those few attributes in a global secondary index||Minimal||Lowest possible latency access for non-key items|
|Frequently access some non-key attributes||Projecting those attributes in a global secondary index||Moderate; aim to offset cost of table scans||Lowest possible latency access for non-key items|
|Frequently access most non-key attributes||Projecting those attributes or even the entire table in a global secondary index||Up to double||Maximum flexibility|
|Rarely query but write or update frequentky||Projecting keys only for the global secondary index||Minimal||Very fast write or updates for non-partion-key items|
In a NoSQL world, we can use tricks to improve performance and do things that look different to the relational world.
In the example given, we have the
Sort key defined, but we could put global secondary index on
Attribute 1 and
Attribute 2 (where we could say Period and TotalPurchases are the values respectively) then we could update or pull total purchases fast and query or sort by period date.
Another strategy is to leverage
sparse indexes. Not every item might have
period, so DynamoDB has ways to make sure the index only includes those items with the attribute used for the global secondary index.
We can also use global secondary index to create table replicas, we just have to use the same partition key and sort key. When might we use this? Imagine two different tier of customers, we might let premium customers do their writes against tables with a higher RCU/WCU (Read Capacity Unit/Write Capacity Unit) limit.
The last use case is for performance reasons where we want high write capacity limits on the first table, and another that has a high read capacity.
Remember, the replica is eventually consistent.
Large repository for a variety of data which you put a framework or technology on-top of to make use of it. The idea is to shorten the path to take this data and make use of it. This is a way to get around the older method of extract-transform-load to then make sense of it.
We can use S3 to dump a bunch of this data to for which we can point Amazon Redshift Spectrum to in order to query that data.
Graph databases are optimized to work with relations. Think social media networks, product recommendations etc.
Don't expect Neptune questions on the exam.
|Web session store||In cases with load-balanced web servers, store web session info in Redis so if a server is lost, the session info is not lost and another web server can pick-up|
|Database Caching||Use Memcache in front of AWS RDS to cache popular queries to offload work from RDS and return results faster to users|
|Leaderboards||Use Redis to provide a live leaderboard for millions of users of your mobile app|
|Streaming data dashboards||Provide a landing spot for streaming sensor data on the factory floor, providing live real-time dashboard displays|
VPC only supports unicast, not multicast
A cache is a cache. Use the right tool for the job.
In recent years, AWS has released a whole suite of DB options.
Example given was to build some real-time dashboards based on IoT devices in a manufacturing environment.
As with other AWS services, there are functional overlap but you need to make the decisions behind the choices and why.
|DB on EC2||Ultimate control or preferred DB not under RDS|
|RDS||Traditional DB for OLTP. Data well-formed and structured|
|DynamoDB||Name/value pair data or unpredictable data structure. In-mem performance w/ persistence|
|Redshift||Massive amounts of data. Primarily OLAP.|
|Neptune||Relationships between objects a major portion of data value|
|Elasticache||Fast temp storage for small amounts of data. Highly volatile.|
AWS Storage Optionswhite paper and note anti-patterns
|RDS||Traditional relational data models, existing apps requiring RDBMS, OLTP, ACID-compliant|
|DynamoDB||High I/O needs, Scale dynamically|
|EC2||DB not supported under RDS, need complete control|
For further study, the suggestions are that the AWS whitepapers are absolutely required, and the re:Invent videos are optional but recommended. A lot of the time the speakers will use real-world examples that may apply directly to what you want to do. Note that the context on Acloud Guru has the links in the current lesson section for this part.
Focused on both understanding security requirements but now on designing networks for complex organisations.
There are some aspects that also touch on migrating from on-prem into the cloud.
You should already know:
Seek first to understand and then apply.
Open Systems Intercommunication model. This describes how to think about network operations. If one layer has problems, your message is not going to get through.
|5||Session||Setup, Negotiation, Teardown||Sausage|
|1||Physical||CAT5, fiber optic cable, 5GHz, carrier frequency||Please|
Use as a mental checklist. Please do not throw sausage pizza away.
AWS responsibility is general for layer 1-2, where the rest are the customer's responsibility. This is why there is the term "shared responsibility model".
One of the limitations that AWS (and other cloud providers) enforce is no multicast.
Unicast is like a direct photo call between two people, whereas multicast is sending a message to everyone on the network (like a megaphone). Since this is done on the MAC level, it is a level 2 activity (and AWS is multi-tenet).
|TCP (Layer 4)||Connection-based, stateful, acknowledges receipt||After everything I say, I want you to confirm you received it||Web, Email, File Transfer|
|UDP (Layer 4)||Connectionless, stateless, simple, no retransmission delays||I'm going to start talking and it's okay if you miss some words||Streaming media, DNS|
|ICMP (officially layer 3)||Used by network devices to exchange info||We routes can keep in touch about the health of the network using our own language||traceroute, ping|
UDP preferred for media so the server can continually send the client data
AWS uses certain IP addresses in each VPC as reserved.
The Physical to Logical assignment of AZs is done at the Account level (mind-blown).
(starting from network to VPC connectivity)
Suggested ways to help design architectures.
An example of this is a loosely-coupled architecture: "Components can stand independently and require little or no knowledge of the inner workings of the other components."
They have some good benefits when it comes to abstraction.
An example is given about given more resources for a loosely coupled architecture to a particular process that may be time-expensive beforehand.
|Add more instances as demand increases||Add more CPU and/or RAM to existing instance as demand increases|
|No downtime required to scale up or down||Requires restart to scale up or down|
|Automatically supported with Auto-scaling groups||Requires script to automate|
|(Theoretically) unlimited||Limited by instance size|
With scaling, you should scale to match demand. We can scale in and scale out based on demands. The example shows the potential savings over a month based to auto-scaling.
Focused on EC2. Why? Setup scaling groups for EC2 instances; health checks to remove unhealthy instances.
When creating, you need to set some launch configurations:
Here you can also define the scaling policies:
|Target Tracking Policy||Scale based on a predefined or custom metric in relation to a target value||"When CPU utilization gets to 70% on current instances, scale up"|
|Simple Scaling Policy||Waits until health check and cool down period expires before evaluating new need||Let's add new instances slow and steady|
|Step Scaling Policy||Responds to scaling needs with more sophistication and logic||"AGG! Add ALL the instances!"|
There is a "cooldown" period for EC2:
API used to control scaling for resources other than EC2 like Dynamo, ECS, EMR. Why? Provides a common way to interact with the scalability of other services.
Provides centralized way to manage scalability for whole stacks; Predictive scaling feature. Why? Console that can manage both of the above from a unified standpoint.
Provides a holistic way to scale and it provides some high-level scaling strategies that are phrased in business terms, but if you want you can still get into the details.
|Target Tracking Policy||Initiates scaling events to try to track as closely as possible a given target metric||"I want my ECS hosts to stay at or below 70% CPU utilization"|
|Step Scaling Policy||Based on a metric, adjusts capacity iven certain defined thresholds||"I want to increase my EC2 Spot Fleet by 20% every time I add another 10k connections to my ELB"|
|Scheduled Scaling Policy||Initiates scaling events based on a predefined time, day or date||"Every Monday at 0800, I want to increase the Read Capacity Units of my DynamoDB Table to 20k"|
Scaling can be based based on SQS. The lambda function can check capacity and emits a custom metric (using a CloudWatch alarm).
Can be used to dynamically scale based on load and calculating expected capacity.
Without dynamic scaling, you can just use the data to adjust your own scaling policies.
You can also opt-out of this if you don't want AWS collecting this data.
There are different "flavours" of Kinesis (like video streams).
For the exam, focus on data stream. With Kinises, we can even do analytics then and there.
Firehose also allows us to automatically send it to "landing spaces" (if we don't have to process it then and there).
The axis (x,y) illustrate (throughput, size) where throughput consists of read/write capacity units and size consists of "max item size (400KB here)".
In the example given for "hot key" issue, the problem is illustrated when you load WCU for a particular partition if you used a partition key with something like "date" which is unbalanced.
SNS useful for when we need several processes to run in parallel. A great way to achieve loosely-coupled architecture.
A standard queue does not guarantee FIFO.
A set of apps that AWS provides for you to fork or use as a basis for your own applications.
Similar, but serverless framework supports multi-cloud.
A service designed to hook up various event sources, apply some rules and then pass it to other targets.
Why use it? Primarily designed to link AWS and 3rd party applications ie ZenDesk, OneLogin, PagerDuty etc.
There is still a need for batch processing. AWS Batch helps with this.
Tool to help batching on EC2 instances.
You can create a computer environment: managed or unmanaged, spot of on-demand, vCPUs.
Then you can create a Job Queue with a priority and assign it to job env.
You then create a job definiton: script of JSON, env vars, mount points, IAM role, container image, etc.
Finally, you schedule the job.
Elastic MapReduce is not one product. It is a collecton of open source projects.
EMR helps to make this collection more of a "push button".
At the core is
Hadoop HDFS and
HDFS is the distribution of the file system, MapReduce for the processing and ZooKeeper which handles resource co-ordinaton.
Oozie is a workflow framework. Pig is a scripting framework. Hive is a SQL framework for the Hadoop landscape.
Mahout is for ML, HBase is Columnar Datastore for storing Hadoop data.
Flume is used to ingest application and sys logs. Sqoop facililtates import of data from our databases/sources.
Laying over the top is Ambari which is for management and monitoring.