hello world################################ Detection - describe how antivirus detects threats - define Intrusion Detection System (IDS) - identify how Amazon GuardDuty detects threats ###### Security lifecycle: Detection Prevention -> Detection -> Response -> Analysis Detection: Antivirus, IDS ###### Antivirus software The threat of malware - designed to cause harm by interrupting one of CIA triad elements - Confidentiality, Integrity, Availability - Types of malware - Worms - bots - ransomware - viruses - backdoors - rootkits - spyware - adware - Infection methods - Untrusted websites - Emails - Removable devices ## What is antivirus software? - specialized program that prevents, detects, and removes malware - built in to OS, or third-party - use malware signature definitions - scans memory and file system for matches againt malware definitions - removes malware * must regularly update malware definition files ## Intrusion detection system (IDS) - detects security threats and generates alerts - uses different types of threat detection mechanisms - anomaly-based detection - signature-based detection - can be hardware-based or software - different types depending on where IDS is installed anomaly - compares current traffic pattern or activity against established baselines signature - monitor and analyze traffic for known patterns of attack ## Network-based IDS and host-based IDS Network-based (NIDS) - monitor network traffic, detect threats, raise alerts - installed on the network Host-based (HIDS) - monitor logs and critical files on the server, detect threats, raise alerts - installed on the server ######## ## Amazon GuardDuty - detects unauthorized and unexpected activity by analyzingand processing data from logs - AWS CloudTrail event logs - VPC flow logs - DNS logs ################################ ## AWS CloudTrail Provides auditing, security monitoring, and operational troubleshooting. - auditing, compliance monitoring, and governance tool - under Management and Governance How does it work? - an activity happens in account - CloudTrail captures and records, as an event Finding your CloudTrail log files - delivers your log files to an S3 bucket you specify - stored as zipped json (.json.gz) Benefits - User and resource activity - simplified compliance - always on - security automation - analysis and troubleshooting - increase visibility into user and resource activity Best Practices - turn on CT log file integrity validation - aggregate lof files to single S3 bucket - ensure CT enabled globally - restrict access to CT S3 buckets - integrate with CloudWatch ################################ ## AWS Config ## Security configuration challenge Q: How can you check and ensure that the resources in your acct comply with security policies? Ex. password compliance, S3 buckets not public, EC2 private instances not have public IP A: create a rule, and AWS Config alerts you when rule is violated ## AWS Config explained - service for assessing, auditing, and evaluating the configuration of your AWS resources - resource inventory, config history, config change notifications ## AWS Config uses - compliance auditing - security analysis - resource change tracking - troubleshooting ## AWS Config management capabilities - retrieve an inventory of AWS resources - discover new and deleted resources - record config changes continuously - get notified when configs change ## How Config works 1) change occurs in AWS resource 2) AWS config records, normalizes, and stores changes to an S3 bucket 3) AWS Config automatically evaluates changes against rules 4) send config and rule compliance change notifications to SNS topic 5) you receive notification 6) view change history and rule compliance results in console, or use API to access info programmatically ## AWS Config security capabilities - helps meet security and compliance objectives - monitor resource usage activity and config to detect vulnerabilities - continuously evaluate configurations of resources against AWS Config rules - security prevention rules - compliance rules - help troubleshoot config issues ## AWS Config rules - represent desired configuration for a resource, evaluated against config changes on resource ## Managed rules - predefined rules that are used to eval whether resources comply with common best practices Ex: - passwords meet requirements - S3 bucket accessible - EC2 instance no public IP - EBS volumes encrypted ################################ # AWS Trusted Advisor - 5 categories of recommendations that Trusted Advisor produces - security features of Trusted Advisor - Interpret Trusted Advisor recommendations ## Best practices or checks in 5 categories 1) Cost optimization 2) Performance 3) Security 4) Fault tolerance 5) Service limits ## Features - Notifications - Access management - AWS Support API - Action links - Recent changes - Exclude items - 5-min refresh ## Security checks (free) 1) IAM use 2) MFA 3) Security groups - specific ports unrestricted 4) S3 bucket permissions 5) EBS public snapshots 6) RDS public snapshots Check statuses - Green - no problem - Yellow - investigation recommended - Red - action recommended ## Activity Interpret Trusted Advisor recommendations ################################ # Response - list steps in incident investigation process - describe purpose of Business Continuity Plan (BCP) and Disaster Recovery Plan (DRP) Prevention -> Detection -> Response -> Analysis Event response Event Management Event Remediation Event response analogy - getting a flat tire while driving ## Process and planning for event response Automatic notification - Event occurs Admin gives manual notifications - Investigate event - Respond and remediate - Notify dependent stakeholders - Acto to prevent this type of event from happening again in the future ## Understanding the Business Continuity Plan (BCP) and Disaster Recovery Plan (DRP) - prepare for any event that will disrupt the way the business normally works BCP - how to run the business in a reduce capacity DRP - how to recover from an outage or loss and return to a normal situation as quickly as possible Business Continuity Plan: preventative and proactive management tool - list different disaster scenarios - list actions to keep business running - is not activated during an outage ## Disaster Recovery Plan: strategy that helps business recover from disasters and unplanned incidents - Primary goal - restore business functionality quickly and with minimum impact - Security goal - do not lower level of controls or safeguards in place - Follow-on goal - prevent threat, exploit, or disaster from happening again Two key parameters Recovery time objective (RTO) - how quickly does business eed to be back up? Recovery Point Objective (RPO) - how much time and data can business afford to lose? Disaster Recovery: Understanding RTO and RPO Recovery time objective (RTO) - max acceptable delay btw interruption of service and restoration of service. acceptable length of service downtime Recovery point objective (RPO) - max acceptable time since last data recovery point. how much data will be lost and how much will be retrieve RPO compared to RTO RPO - how much data can you afford to lose? focus on data RTO - how quickly do you need to recover IT infrastructure to maintain business continuity? system downtime Data backup Data loss (RPO) Disaster System downtime (RTO) Maximum tolerable downtime (MTD) Data backup Work recovery time (WRT) Work recovery time (WRT) - recovering or restoring data, testing processes, making system live for production. Time btween systems and resource recovery, and start of normal processing Maximum Tolerable Downtime (MTD) - sum of RTO and WRT Total time a business cna be disrupted after a disaster without causing unacceptable consequences from a break in business continuit MTD = RTO + WRT MTD value included as part of BCP and DRP Disaster recovery options - Backup - Replication - snapshot-based - changed data since last snapshot - continuous - Pilot light - minimal version of env is always running Cost balancing - tradeoff between speed of recovery and cost ## Activity: Incident response AWS Trusted Advisor ################################ # Analysis - define security analysis - identify tools and processes - describe different types of testing, monitoring, and logging Security lifecycle: Analysis Prevention -> Detection -> Response -> Analysis Analysis - Security Analysis - Monitoring - Logging Confidentiality, integrity and availability (CIA) - Info must be protected to ensure all three ## Security analysis - Reviewing what happened after a security breach - how many security breaches did you experience? - How did they happen? - how many people did they affect? - how do you prevent them from happening again? Guidelines - Ensure that each threat yields a better security solution even if no breach occurred - Have flexibility when considering option to add to the solution - Maintain a testing environment to test solutions to potential threats Types of security tests - external vulnerability assessment - external penetration test - internal review of applications and platforms Root cause analysis (RCA) > identify origin of security breaches - steps 1) describe the issue and what led to it, how, where, what are consequences 2) go back to baseline, and analyze each event leading up to issue 3) understand links between events, and which event caused issue - even correlation 4) create visual representation (diagram or graph) of sequence of events Risk assessment - likelihood of a threat occurring against a particular asset and the possible impact to that asset if the threat occurs. helps to identify and rank risk - five steps 1) identify threats 2) identify vulnerabilities 3) determine likelihood 4) determine impact 5) determine risk Risk response strategies * Risk avoidance - stop doing the risky activity * Risk transference - assign responsibility for risk to another party * Risk mitigation - implement control to reduce risk * Risk acceptance - do nothing to reduce risk, but monitor and plan a response Monitoring and logging benefits - More effective IT governance - Regulatory compliance - SLA performance validation and compliance - Better management oversight - More effective change control -* Support corporate information security -* More consistent problem identification and resolution Monitoring and logging - Logs - provide data used to examine IT systems and processes - both input and output of monitoring - Monitor logs for - changes - exceptions - other significant events - Records produced from monitoring become logs for further analysis Use metrics - measure the success of the security program - can be both positive and negative - examples - number of attacks occurred, stopped - money saved - IDs created, revoked - security awareness training ## Environment monitoring - Acceptable Use Policy (AUP) - defines how employees or users can be monitored on a company's network - at work - remotely - on mobile devices Types of monitoring - Location - onsite - remote - internal or external - outsourced - Resource - system - network - database - physical - employee - Usage - uage or consumption - location - access restrictions AWS monitoring services - CloudWatch - monitors resources and applications in AWS Cloud and on-premises - AWS Config - record and evaluate configurations - Managed Services for Prometheus - highly available, secure, managed monitoring for containers - GuardDuty - intelligent threat detection - Macie - fully managed data security and privacy service, machine learning and pattern matching to discover and protect sensitive data Monitoring as a Service (MaaS) - CloudWatch - anytime, anywhere access to monitoring info Devices that can be monitored - router, switch, firewall, wireless access point printer, VPN, camera, card reader, laptop, phone, tablet, vehicle Monitoring policy - What should you monitor? How closely? How often? Who monitors? - Outsource? - Data retention monitoring - access to monitoring tools - remote monitoring - who watches the watchers? - policy - regulations Retention policy for monitoring - how long different types of data are maintained - email, access logs, camera or video, contact info, sales records... Monitoring data with Amazon Macie - use machine learning to help ensure no sensitive data is at risk ## Logging Logging policy - define logging policy: what will be logged and how long logs are managed - ensure logging policy and infrastructure support cohesive and integrated enterprise solution Protection of log information - keep logs on the original device, log server, or both - control access (physical and logical) - backup and recovery - retention policy - check timestamps AWS logging services - CloudTrail - tracks user activity and API usage - Config - record and evaluate configuration of resources - VPC Flow logs - capture info about IP traffic ################################ ## Security Best Practices for Creating an AWS Account AWS - security *of* the cloud User - security *in* the cloud Understanding when to use the root directory - change account settings - restore IAM user permissions - change AWS Support plan or cancel AWS Support plan - activate IAM access - view tax invoices - close AWS account - register as a seller - configure S3 bucket - edit or delete S3 bucket Best practices: Stop using root user Create IAM user -> Create IAM group -> Sign in with IAM credentials |-> Give group full admin permissions |-> Add IAM users to this group Require MFA Root user-| |--> Require MFA for all account users -> Access AWS console -> use MFA Root user-| Best practices: AWS CloudTrail (log monitoring tool) Cloudtrail tracks API calls, publishes log files to S3 bucket Activate CloudTrail -> Grant S3 access to those who require it |-> Apply to all Regions |-> Specify an S3 bucket where logs will be stored Best practices: Billing report Activate billing report, such as AWS Cost and Usage Report (S3 bucket) Activate billing report in AWS -> Grant S3 access to those who need it |-> use AWS Cost and Usage Report ################################ ## AWS Compliance Program - how laws, standards, and regulations impact security - identify various regulatory compliance standards - describe AWS compliance program ## Regulatory compliance and standards Goal of security compliance - security compliance ensures security controls meet regulatory and contractual requirements - Regulations mandate security controls Regulatory - Country - Industry Contractual - SLA - service level agreement - PLA - project labor agreement - vary between localities, jurisdictions, and cultures - create policies to support and enforce compliance Compliance levels and noncompliance - Compliance leves vary by authority type. Noncompliance has consequences - External authority - Governmet or laws - mandatory - Open standards - recommended - Best practices - optional - Consequences of noncompliance - Government or laws - civil, criminal, or financial - Open Standards - financial or participation - Best practices - loss of customers, partners, or revenue - Proper reporting required to prove compliance National and international cybersecurity standards - NIST, ENISA, ETSI, ISO, EITF, IEEE, COSO PCI DSS - Payment Card Industry (PCI) Data Security Standard (DSS), payment card transactions HIPAA - Health Insurance Portability and Accountability Act of 1996 - how personally identifiable information should be protected Compliance standards: European Union - General Data Protection Regulation (GDPR) - enhanced control over personal and private data Compliance standards: Canada - Personal Information Protection and Electronic Documents Act (PIPEDA) - how private sector collects, uses, and discloses personal information of clients Compliance standards: Russia - Russian federal law on personal data - individual must provide consent - can revoke previously granted consent - transfer of data outside Russian Federation requires protection in destination country Compliance standards: United States - many compliance requirements - Dodd-Frank, Gramm-Leah-Billey, etc ## AWS compliance program AWS risk and compliance program - provide information about AWS controls - assist customers in documenting security compliance framework Three components - AWS business risk management - AWS control environment and automation - AWS certifications and attestations AWS business risk management - perform risk assessments and risk monitoring of key AWS functional areas - capture business risk management goals in a business plan, that is reevaluated at least biannually - goals include - id and remediate risks - maintain register of known risks - create and maintain security policies - provide security training to AWS employees - perform application security review - also uses independent security firms to perform threat assessments AWS control environment and automation - integrate security and compliance requirements during design and development of each AWS service - establishes control environment that - includes people, policies, processes, and control activities - secures deliver of AWS service offerings - uses automation to eliminate potential process deviations - integrate practices identified by industry-leading cloud bodies AWS certifications and attestations - regular third-party attestation audits to provide assurance that control activities are operating as intended - audits performed against - global and regional security frameworks - customer contract and govmt regulatory requirements - results of audits documents and available in AWS Artifact portal Customer compliance responsibilities - customers responsible for maintaining adequate governance over entire IT control environment - customers need to: - understand required compliance objectives and requirements - establish control environment that meets thos objectives and requirements - understand validation required based on organization's risk tolerance - verify operating effectiveness of control environment ################################ ## AWS Security Resources - explore different types of security resources AWS account teams - first point of contact - guide deployment - point toward resources to resolve security issues AWS Support plans - basic support plan - customer service and communities - AWS Trusted Advisor - Personal Health Dashboard - three tiers of additional support - Developer support plan - Business Support plan - Enterprise Support plan AWS Developer Support - for users that use AWS services for testing within AWS - email cloud support associates - response times - general guidance: 24 hours or less - system impaired: 12 hours or less - can only interact with support during business hours - 7 checks(?) AWS Business Support - for users or businesses with production workloads within AWS - support avail 24/7 by phone, chat, or email - response times - general: 24 hrs or less - system impaired: 12 hours or less - production system impaired: 4 hours or less - production system down: 1 hour or less AWS Enterprise Support - business-critical workloads - less than 15 minutes for business-critical outages - 24/7 phone, chat, email - dedicated technical account manager (TAM) - response times - general: 24 hrs or less - system impaired: 12 hours or less - production system impaired: 4 hours or less - production (internal) system down: 1 hour or less - business-critical (customer server) down: 15 minutes or less On-ramp - 30 minutes or less instead of 15 minutes API Support - Business or Enterprise, On-ramp AWS Professional Service and Partner Network Partner Network (APN) - group of cloud software and service vendors that has certified APN Partners worldwide APN Partners - help customers implement and manage deployment - develop security policies - meet compliance requirements - include system integrators and manage services providers APN Technology Partners - provide software tools and services hosted or run on AWS - include independent software vendors (ISVs) and providers of SaaS AWS advisories and bulletins - provide infor on current vulnerabilities and threats - work with experts to address - report abuse - report vulnerabilities - conduct penetration tests AWS Auditor Learning Path - help understand how internal operations gain compliance - visit compliance website for - recommended training - self-paced labs - auditing resources AWS security benefits - only commercial cloud service that has its services vetted and approved for top-secret workloads - securely scale infrastructure - automate security tasks - integrated security services - large infrastructure env prebuilt for customer - AWS security is strategic and focuses on preventing, detecting, responding, and remediating Support plans - https://aws.amazon.com/premiumsupport/plans/ Describe SysOps in the cloud - automated and repeatable deployments - how IAM provides security over AWS resources - CLI features ## Systems Operations on AWS SysOps - deployment, admin, and monitoring of systems and resources in automatable and reusable manner - reduce errors - real-time visibility through monitoring Systems operations: Responsibilities - SysOps professionals involved in many, or all, facets of delivering IT solutions -- Build, test deploy, monitor, maintain, safeguard SysOps in the cloud - automate development, testing, and deployment of complex IT operations - repeatable deployment - creation of self-describing systems - build well-tested, secure systems - Linux shell scripts, Python or Ruby app, C# app, template format (AWS CloudFormation) Project Introduction - Creating a Troubleshooting Knowledge Base - - Describe common technical challenges when users deploy, upgrade, and maintain Cloud deployments - - Explain how to ocercome specific technical challenges by confirming and adjusting deployment configs - - Present troubleshooting techniques to stakeholders ## Troubleshooting Knowledge Base Spreadsheet to organize and document issues, and record resolution steps - Issue number - Categories - Issue description - Symptoms - Root Cause Analysis (RCA) - Resolution procedures - Helpful tools or resources - Comments ## AWS Identity and Access Management (IAM) Review - Define IAM - types of security credentials and concepts of users and roles - best practices - Centrally manage authentication and access - who can launch, config, manage and remove resources - Create users, groups, and roles - Apply policies to control access - a principal is a person or app that can make a request for an action or operation Access to AWS services - CLI, SDK, Management Console Security credentials, IAM users and roles Types of security credentials | Types of Credentials | Description | | --- | --- | | Email | | # Introduction to Databases - ID different components of a database - Relational vs nonrelational - elements of a well-designed database - purpose and function of database management system (DBMS) - Data - Database - Data model - Relational data model - Schema - Transactions - Relational databases - Nonrelational databases - Database management system (DBMS) #### Data and databases What is data? - raw bits and pieces of information - ex. images words, phone numbers What is a database? - collection of data that is organized into files called tables - tables are logical way of accessing, managing, and updating data #### Data models - data model represents logical structure of data stored in database - data models determine how data can be stored and organized - examples of data models: - Relational - Semi-structured - Entity relationship - Object-based #### Relational model Amazon Aurora instance <---> World (database) <---> country (table) Record (rows) Field (column) |Code |Name |Continent |Region | |ARG |Argentina |South America |South America | |AUS |Australia |Oceania |Australia and New Zealand| #### Schema - schema defines organization of a database - based on data model - describes the elements of a database's design: - tables - columns - relationships - constraints Country { Code, Name, Continent, Region } City { ID, Name, CountryCode, District } #### Small scale or distributed - on one computer - distributed across a company's network - Aurora instance in AWS Cloud #### Relational databases - collection of data items that have predefined relationships between them - often referred to as SQL database - requires fixed definition of the structure of the data - data stored in tables with rows and columns Main reasons to use relational database - natively supports SQL - provides data integrity - supports transactions Example relational - MySQL - Amazon Aurora - PostgreSQL - Microsoft SQL Server - Oracle Use cases - ecommerce - customer relationship management (CRM): managing interactions with customers - Business intelligence (BI) tools: Finance reporting and data analysis #### Relational database examples #### Nonrelational databases - database that does not follow relational model - does not require a fixed definition of the structure of the data - NoSQL, does not use a table structure to store data Use cases - Fraud detection - Internet of Things (IoT) - Social Networks Example NoSQL databases - Amazon DynamoDB - MongoDB - Apache Hbase JSON document [ { ID: 1024, Name: "Mumbai", CountryCode: "IND", District: "Maharashtra" } ] #### Pros and cons of relational and nonrelational databases Relational (SQL) Pros: - known and reliable Technology - simple-to-write complex queries - well-known SQL language - well-supported transactions Cons: - use vertical scaling - include a fixed schema Nonrelational (NoSQL) Pros: - flexible schema - good fit for storing and fast retrieval of massive amounts of data of different types - horizontal scaling - good fit for hierarchical data Cons: - relatively new Technology - do not guarantee data integrity - good fit for complex queries or transactional applications #### DBMS - software or database as a service (DBaaS) that provides database functionality - used mainly for - creating databases - inserting data into database - store, retrieve, update, or delete data - primary benefit of DBaaS is to avoid cost of installing and maintaining servers Two variations of DBMSs - single-user DBMS applications - Microsoft Access - multi-user DMBS applications - Oracle, Microsoft SQL Server, MySQL, IBM Db2 Locations - On-premises (data center) - In the cloud (virtualized data center) #### DBaaS - hosted by third-party providers - reduced cost - fully managed - faster Examples - Amazon RDS - manages common relational admin tasks - Amazon Aurora - part of RDS, fully managed relational engine - Amazon DynamoDB - NoSQL # Data Interaction and Database Transaction - different way to interact with relational database - define characteristics of a transaction - database analyst - database administrator - transaction - atomicity, consistency, isolation, and durability (ACID) #### Data sharing - making data available to multiple users ## Database interaction: Database roles ##### Roles interacting with relational databases - Application Developer - create applications that populate and manipulate data, according to application's functional requirements - end user - uses reports created from information in database - data analyst - collect, clean, and interpret data - database administrator - design, implement, administer, and monitor data in database systems - ensure consistency, quality, and security Application developer - developr and test applications End User - occasionally interacts directly, if they have knowledge of SQL Data analyst - enters SQL commands, view and manipulates data directly, mainly use SELECT command Database Administrator - manage all components, use all SQL commands ## Data interaction models #### Interacting with relational databases - client-server - three-tier web application #### Client-server interaction 1) users use computers and devices that run client application, that uses SQL to request data 2) apps use SQL sent to server to communicate with database 3) server runs DBMS, which receives requests, processes SQL, and returns response #### Three-tier web app interaction Web browser - user Web server - presentation tier Application server - application tier Database server - data tier #### Embedded SQL in application code - application contains SQL command that user requires - developer embeds SQL statements in application code - application is installed on user computer or application server ## Transactions in Databases - collection of changes that must be performed as a unit - ex. account table, keeps track of account balances for customers - reduce balance of checking acct row by $100 - increase balance of savings account row by $100 - two change operations must either both succeed or both fail to preserve integrity of database - transaction is not successful unless each of its operations is successful - transaction - logical unit of work #### Status of transaction Begin --> Active --> Partially committed --> Committed \ / | Failed --> Aborted --> End #### Transactions use cases - run a set of operations so that database never contains result of partial operations - if one operation fails, restored to original state - if no error occurs, full set of statements changes the databases - provide isolation between programs that access a database simultaneously - if isolation does not happen, outcomes could prove to be incorrect #### Properties of transactions - atomicity - all at once or not at all - consistency - any changes will not violate integrity of database, including any constraints - isolation - transactions isolated so they do not interfere with other transactions - durability - when transaction is committed, change is permanent ## Creating Tables and Learning Different Data Types - how to create a new table in a dtabase - how to use data types when creating a table - Data manipulation language (DML) - Data definition language (DDL) - Data control language (DCL) - Predefined data types - Numeric data types - Character string types - Primary Key (PK) - Foreign Key (FK) ## Anatomy of a relational database Database: World Tables - like a Spreadsheet - collections of related information about a particular concept - ex. country, city, countrylanguage Fields or attributes (columns) - one column contains one type of information (ingeter, text, monetary value) Records (rows) - one row contains information for one entity in the table ## SQL - perform many of the necessary actions on a database SQL sublanguage groups - Data manipulation language (DML) - use DML to add, change, or delete data in a table - Data definition language (DDL) - define and maintain objects in database (schema). schema includes tables, columns, and data types - Data control language (DCL) - control access to data in the database #### DML - view, change, and manipulate data - select, update, insert, delete - typically used by data analysts, report authors, and programmers who write client applications Statements - SELECT, INSERT, UPDATE, DELTE #### DDL - create and define database and objects in it - create, delete tables - typically use by DBA and programmers Statements - CREATE, ALTER TABLE, DROP #### DCL - control access to data in database - include commands to grant or revoke db permissions - used by dba and programmers Statements - REVOKE, GRANT ## Basic SQL elements #### Predefined data types |SQL Data Type |Description |Example| |INT |Integer |120000| |CHAR |Fixed-length char string #### Identifiers - represent names of objects that user creates, in contrast to language keywords or statements - capitalize language keywords and commands, define identifiers in lowercase - different DBMS handle capitalization conventions differently CREAT TABLE city ( id INTEGER NOT NULL PRIMARY KEY, name VARCHAR(20) DEFAULT NULL, countrycode VARCHAR(25) NOT NULL, district INTEGER NOT NULL ); #### Constraints on data - enforce limits on the type of data that can go in the table - NOT NULL - UNIQUE - DEFAULT - define rules that permit or limit which values can be stored in columns - constraints declared when table is created - limit kind of data that can be entered - purpose is to enforce data integrity #### Reserved terms and key words - SQL keywords or symbols that have specific meanings when processed - do not use reserved terms in names - symbols: #, ;, :, @ - key words: ADD, CLOSE, DATABASE, EXISTING ## Tables #### Naming tables - purposeful naming conventions - certain factors to drive naming conventions - rules and limitations that DBMS imposes - naming conventions that the organization adopts - clarity #### CREATE TABLE statement - logically organized unit of data storage in SQL - CREATE TABLE city (...); - CREATE TABLE IF NOT EXISTS city (...); #### Columns - each column has specific data type - some data types require one or more parameters. ex. max length - in CRWATE TABLE statement, separate column definitions with a comma column_name DATA_TYPE [(length)] [NOT NULL] [DEFAULT value] #### Primary keys (PK) and foreign keys (FK) - special column that has unique value for each row, and uniquely ids the row - foreign key - special column that holds primary key value from another table, creates relationship btw two tables #### Referential integrity - every non-NULL foreign key value matches an existing, primary key value - ex. city table, country table - - primary key is country code in country table, country code is foreign key in city table #### DROP TABLE statement - permanently remove the table and its data from the database ## Character string data types #### Character string - data type described by a character string data type descriptor data type: CHAR(length) description - character string with fixed length - values must be enclosed in single quotation marks VARCHAR(length) - variable length character string, max length fixed CLOB(length) - Character Large Object (CLOB) - large character string or text data that can have length in order or GB - CLOB usually stored in a separate location reference in the table ## Numeric and date data types #### Numeric types - INTEGER - integer, min and max depend on DBMS - SMALLINT - same as INTEGER, but smaller range of values - BIGINT - large range of values - DECIMAL(p, s) - float with precisions p and scale s. decimal number - FLOAT(p) - float with precision p. precision >= 1, max depends on dbms - REAL - same as FLOAT, but dbms defines precision #### Date and time data types - DATE - date, ex. yyyy-mm-dd - TIME - time of day without time zone, ex. hh:mm:ss - TIMESTAMP - moment in time indicated by date and time, ex. yyyy-mm-dd hh:mm:ss ## Inserting Data into a Database - import a CSV file into a table - list common reason for cleaning data before importing into a database - insert rows into existing table - INSERT INTO, DESCRIBE, NULL #### What is a csv file? - text file in which info is separated by commas - can be opened in plain text - format - each line contains same sequence of data - each data point is separated by a comma - most commonly used format to import or export data ## Importing and exporting data #### Importing a CSV - verify that csv file has data that matches number of columns of table and type of data in each column - create table in MySQL with table name that corresponds to csv file you want to import - import using the LOAD DATA statement - if first row contains column headers, use IGNORE 1 ROWS - if rows are terminated by newline, use TERMINATED BY '\n' - ex. LOAD DATA INFILE 'something.csv' INTO TABLE city FIELDS TERMINATED BY ' , ' ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 ROWS; #### Exporting a CSV - ex. SELECT id, name, countrycode FROM city INTO OUTFILE '/tmp/mysqlfiles/city.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n’ ## Cleaning data #### Why clean data? - as changes are made, issues can arise due to disorganization or errors in the data. reason for cleaning - increased productivity - improved data quality - fewer errors - to combat issues, cleaning function - LEFT, RIGHT, TRIM - select only certain elements of strings, and remove certain characters - CONCAT - combine strings from several columns - LOWER - force to lowercase - UPPER - force to uppercase ## DESCRIBE statement - description of specified table or view - to use INSERT statement, important to know how the table you are inserting into is formatted - ex. DESCRIBE country; ## INSERT statement - INSERT INTO - fundamental to populating table with data - insert a single record or multiple records - referred to as a data manipulation language (DML) command - order of the columns is important INSERT INTO tableName (col_1, col_2,... col_n) VALUES ('val_1', 'val_2',... 'val_n') - when insert a row, must specify a column where data will go - when insert values, must enclose values with single quotation marks ('') for character or date values #### Syntax for INSERT statement INSERT INTO country VALUES ('ANT', 'Netherlands Antilles', 800.0); ## NULL statement - placeholders or to represent a missing value to improve readability. clarify the meaning and actions of conditional statements - INSERT can insert a NULL value into a column. can insert a NULL value into an int column with a condition that the column must not have NOT NULL constraints INSERT INTO tableName(Col_1) values(NULL); - RDBMS can accept optional column values - NULL means not applicable or unknown, and is different from zero or blank - NULL means no value exists for that field - NULL not equal to anything, not even itself - 5 + NULL = NULL - null = null is FALSE - if a column is created with a NOT NULL clause, you must put in a value when you do an INSERT. otherwise get an error ## Selecting Data from a Database - demonstrate how to use SELECT to retrieve data from a database - identify correct syntax of SELECT statement - demonstrate how to select data from certain columns or from all columns - demonstrate how to use WHERE clause to request only certain rows - SELECT, WHERE, FROM, comments ## SELECT statement #### SELECT keyword - SELECT id, name, countrycode FROM city; #### How it works - query is processed out of order. query will pull all data from specified table and then work through each clause - SELECT id, name, countrycode FROM city WHERE countrycode='BRA'; statement order of operations 1) FROM the city table, get all the data 2) WHERE the countrycode is BRA, keep the row (ignore the others) 3) SELECT the specified columns, and ignore the others #### Using the SELECT statement ## SQL SELECT statement syntax structure #### Syntax - required clauses - SELECT, FROM - optional clauses - WHERE, GROUP BY, HAVING, ORDER BY #### SELECT statement considerations - enclose literal string, text, and literal dates with single quotation marks ('') - for readability, capitalize SQL keywords - depending on database engine or configuration, data values that you provide in conditions may be case sensitive #### Different ways to SELECT columns - clause followed by the item or items being acted on. ex. SELECT followed by column names - brackets ([]) enclose optional parameters - with SELECT, must specify one or more columns or use * to request all columns #### Selecing all columns - SELECT * FROM city; - returns all columns from city table in the order they appear in the table ## Optional Clauses - WHERE, GROUP BY, HAVING, ORDER BY #### WHERE SELECT id, name, countrycode FROM city WHERE countrycode='BRA'; - request only certain rows from a table #### GROUP BY SELECT continent, COUNT(*) FROM country GROUP BY continent; - select rows from country table, group rows by continent, and count the number of rows in each group - use column identifier to organize the data in the result set into groups #### HAVING SELECT continent, COUNT(*) FROM country GROUP BY continent HAVING COUNT(*) > 1; - select rows from country table, group rows by continent, and count number of rows in each group - use with GROUP BY to specify which groups to include in results #### ORDER BY SELECT id, name, countrycode FROM city ORDER BY id; - get data from city table, order all rows by id. after finding rows that you're searching for, return only id, name, and countrycode columns - sort query by one or more columns in ascending or descending order ## Comments - single line - ex. -- Diplay the table structure - inline - ex. -- not ID - multiple-lne - ex. /* comment */ ## Performing a Conditional Search - search using conditions - search for range of values and NULL values - search based on string patterns - WHERE, comparison, arithmetic, logical, wildcard, column alias, NULL value ## Simple search conditions - SELECT to retrieve data - WHERE limits data retrieved - WHERE applies search condition to each row of SELECT - use operators to specify data the query includes #### WHERE clause SELECT continent, surfacearea, population, gnp FROM country WHERE name='Ireland'; ## Operators #### Types of operators - arithmetic - comparison - logical - can bse used in SELECT, INSERT, UPDATE, and DELETE statements ## Arithemetic operators and comparison operators #### Arithmetic operators - addition, subtraction, multiplication, division, modulus (remainder) #### Comparison - =, !=, <>, <, <=, >, >= #### Addition arithmetic operator SELECT name, lifeexpectancy, lifeexpectancy+5.5 FROM country WHERE gnp > 1_300_000; #### Subtraction #### Division SELECT name, region, population, surfacearea, population/surfacearea FROM country WHERE population/surfacearea > 3000; #### Modulus ## Logical operators #### Logical operators explained - AND, OR, IN, LIKE, BETWEEN, NOT #### AND operator SELECT name, district, population FROM city WHERE countrycode='IND' AND district='Delhi'; #### OR operator #### IN SELECT name, district, population FROM city WHERE district IN ('Delhi', 'Punjab', 'Kerala'); #### LIKE operator using the % wildcard SELECT name, district, population FROM city WHERE district LIKE 'West%'; - % wildcard, substitute one or more characters in a string #### LIKE using _ wildcard SELECT countrycode, name FROM city WHERE countrycode LIKE '_B_'; - _ wildcard, usig with LIKE, substitute a single character in a string #### BETWEEN SELECT countrycode, name, district, population FROM city WHERE population BETWEEN 500000 AND 505000; #### NOT SELECT name, district, population FROM city WHERE countrycode='CAN' AND district NOT IN ('Ontario', 'Alberta') - NOT can be used with IN, LIKE, and BETWEEN #### Operators can be used in flexible ways WHERE countrycode='CAN' AND district NOT IN ('Ontario', 'Alberta') WHERE countrycode='CAN' AND district <> 'Ontario' AND district <> 'Alberta'; ## Operator precedence - first to last 1) parentheses 2) mult and div 3) add and subtract 4) comparison 5) NOT used with IN, BETWEEN, and LIKE 6) AND 7) OR, BETWEEN, IN, and LIKE #### Using parentheses to enforce operator precedence ## Aliases - assign temporary name to table or column within SQL query - alias exists only while SQL statement is running - useful for assigning names to obscurely named tables and columns - assign meaningful names to query output that uses arithmetic operators - can be specified in SQL statement by using AS keyword - if spaces in alias name, use quotation marks SELECT name, lifeexpectancy, lifeexpectancy+5.5 AS newlifeexpectancy FROM country WHERE gnp > 13000000; #### Including spaces in alias AS 'new life expectancy' ## NULL values use NULL to represent absence of value for a data item - NULL values cannot be compared to each other using comparison operators - NULL values are not equal to one another - use IS NULL and IS NOT NULL when working with NULL values in a WHERE clause - tables can be designed so that NULL values are not allowed #### Example of querying for NULL values SELECT name, lifeexpectancy FROM country WHERE lifeexpectancy IS NULL AND name NOT LIKE '%Island%'; - use IS NULL and IS NOT NULL when working with NULL values #### NULL values can affect query results SELECT name, lifeexpectancy, lifeexpectancy+1 FROM country WHERE lifeexpectancy IS NULL AND name NOT LIKE '%Island%'; - adding +1 to lifeexpectancy column still returns a NULL ## Working with Functions - identify built-in functions - examine DATE functions that can be used in calculations - calculate data using aggregate functions - manipulate string values - aggregate, conversion, date, mathematical, control flow, DISTINCT, COUNT, character strings ## Functions #### Built-in functions - aggregate, conversion, date, string, mathematical, control flow, window #### Example syntax - SELECT CURRENT_DATE(); -> 'YYYY-MM-DD' #### Another example syntax - DATE_ADD() - adds a time or date interval to a date and return a value Query - DATE_ADD(date, INTERVAL value addunit); Output - SELECT DATE_ADD('YYYY-MM-DD', INTERVAL 3 DAY); ## Aggregate functions - AVG, COUNT, MAX, MIN, SUM #### Example syntax - SELECT COUNT(*) AS 'Total Number of Rows' FROM countrylanguage; - SELECT AVG(LifeExpectancy) FROM country; ## DISTINCT (different) keyword - SELECT DISTINCT CountryCode, District FROM city; #### DISTINCT in a COUNT function - SELECT COUNT(DISTINCT CountryCode) as Unique_Country_Codes FROM city; ## Character strings and string functions #### String function: CHAR_LENGTH() - SELECT CHAR_LENGTH('District'); #### INSERT() - SELECT INSERT ("Population", 1, 2, "Mani"); -> Manipulation - (String to be modified, position where to insert second string, number of characters to replace, string to insert) #### Leading and trailing spaces in a string - Extra spaces in a string can cause issues when querying for specific data #### TRIM functions: RTRIM() and LTRIM() - RTRIM - remove blanks to right of a stirng - LTRIM - remove blanks to the left ## Organizing Data - ORDER BY - sort data by a specific column in ascending or descending - GROUP BY and HAVING - group data and filter groups - sorting, ORDER BY, GROUP BY, HAVING #### Organizing data using SQL ## Sorting and ORDER BY keyword #### Query with no sorting SELECT name, continent, surfacearea FROM country WHERE surfacearea >= 50000000; #### Sorted in ascending order SELECT name, continent, surfacearea FROM country WHERE surfacearea >= 50000000 ORDER BY surfacearea ASC; #### Sorted in descending order #### Multiple sort operations SELECT name, continent, surfacearea FROM country WHERE surfacearea >= 50000000 ORDER BY continent ASC, surfacearea DESC; #### Query output by using implicit columns in sort operation SELECT name, continent, surfacearea FROM country WHERE surfacearea >= 50000000 ORDER BY 2 ASC, 3 DESC; ## Grouping and Filtering Data #### Grouping data in query output SELECT continent, name FROM country WHERE (continent = 'South America' AND population > 12000000) OR continent = 'Antarctica' ORDER BY 1, 2; #### Grouping data in query output by using GROUP BY SELECT continent, COUNT(name) AS 'countries' -- <-- COUNT in a groupby query FROM country WHERE (continent = 'South America' AND population > 12000000) OR continent = 'Antarctica' GROUP BY continent ORDER BY 1, 2; #### Using GROUP BY items with filter conditions - WHERE clauses evaluated before the GROUP BY clause - HAVING clause used to filter query results after GROUP BY - HAVING clause will include the same column used in the aggregation function of the SELECT clause #### Adding the HAVING clause as a filter condition SELECT continent, COUNT(name) AS 'countries' FROM country WHERE (continent = 'South America' AND population > 12000000) OR continent = 'Antarctica' GROUP BY continent HAVING COUNT(name) > 5 ORDER BY 1, 2; ## Retrieving Data from Multiple Tables - combine results of two queries into a single output using UNION operator - retrieve data by joining tables - Set operators, UNION, JOIN, Qualified column name ## Set operators - combine the results of multiple queries into a single result set - can use different tables to compare or unite the results into one result set. queries containing set operations referred to as compound queries - UNION - combine two or more result sets (without duplicates) - UNION ALL - ... (including duplicates) - INTERSECT - commin both result sets - MINUS - first result set but not present in second result set #### UNION operator SELECT Name FROM country UNION SELECT Name FROM city; ## JOINs - JOIN clauses used to combine rows from two or more tables - INNER JOIN - rows that match in both tables - LEFT JOIN - all rows from left table - RIGHT JOIN - all rows from right table - FULL JOIN - all rows from both tables #### How JOIN clauses work #### INNER JOIN example SELECT ci.ID AS 'City ID', ci.Name AS 'City Name', co.Name AS 'Country NAME' FROM city ci JOIN country co ON ci.CountryCode=co.Code #### How INNER JOIN example works #### Qualified column names ## Amazon Relational Database Service (Amazon RDS) - six database types - backup options - use cases - details about Amazon Aurora - Amazon RDS, DB instance, DB engine, Aurora, Aurora cluster, Cluster volume ## Amazon RDS - managed db service that sets up and operates a relational database in the cloud #### RDS use cases - Web and mobile applications - high throughput - massive storage scalability - high availability - Ecommerce applications - low-cost database - data security - fully managed solution - mobile and online games - rapid growth capacity - automatic scaling - database monitoring #### DB instance - isolated database environment that runs in the cloud #### Amazon RDS DB instances - Amazon RDS - DB primary instance - DB instance class - CPU, Memory, Network performance - DB instance storage - magnetic, general purpose SSD, provisioned i/o operations per second (IOPS) - supports database types - MySQL, Aurora, Microsoft SQL server, PostgreSQL, MariaDB, Oracle #### Amazon RDS backup - Automatic - creates automated backups (data and transaction logs) of DB instances during the backup window - Manual - creates storage volume snapshots of your DB instances ## Easy Create method in Amazon RDS - use Managemnt Console - use standard MySQL utilities, like MySQL Workbench, to connect to database on the DB instance - depending on DB instance class and amount of storage, can take up to 20 minutes before new instance is available ## High availability with Amazon RDS #### Multi-AZ deployment: Replication - synchronous replication - after making initial full copy, transactions are synchonously replicated to standby copy - enhance avail during planned system maintenance, and protect database against failure and disruptions to AZ #### High availability with Multi-AZ deployment: Failover - if primary db instance fails, automatically bring standby db instance online as the new primary instance - requerest from both applications directed to new primary instance - requesting applications use Amazon RDS DNS endpoint to reference db by name ## Scalability with Amazon RDS #### Read replicas and scaling features - asynchronous replication - ability to promote to the primary instance if needed functionality - read-heavy db workloads - ability to offload read queries - scale out beyond the capacity constraints of a single db instance for read-heavy db workloads - read replica promotion to primary instance, requires manual action, bc it uses async replication - can be created in different region, closer to user, for disaster recovery requirements or reduce latency #### Amazon RDS scaling - instance class - change instance class to increase computation and memory capacity - storage capacity - scale storage for existing instances - increaset storage capacity without incurring downtime ## Aurora - relational database engine - use same code, tools, and applications as MySQL and PostgreSQL - includes high-performance storage subsystem - created with clusters #### Aurora DB cluster - one or more DB instances and cluster volume that manages data for those db instances - cluster volume - virtual db storage volume that spans multiple AZ, each AZ has copy of the DB cluster data #### Contents of an Aurora DB cluster - two types of DB instances - primary db instance - allows reand and write, modification - replica - connects to same storage volume as primary db instance, read only #### Use cases - enterprise applications - SaaS apps - online and mobile gaming ## Amazon DynamoDB - nonrelational database, nosql - table item and attribute - primary, partition, and sort keys - table partition - global tables ## Intro to DynamoDB #### Relational and nonrelational comparison relational - data stored in tables using predefined columns of specifi data types - relationships can be defined between tables using table foreign keys - better performance achieve by adding compute or memory capacity to a single db server nonrelational NoSQL db - tables with flexible column structure - each item can have different number and type of data elements - better performance by adding a new server to existing pool of db servers #### DynamoDB is NoSQL - fully managed, serverless, key-value NoSQL - improve performance by keeping data in memory - keep data secure by encrypting data at rest - protect data with backups and automated copying of data between AWS regions ## How it works #### Key concepts: Tables - tables similar to relational db systems - table name and primary key must be specified at table creation - each DynamoDB table has at least one columns that acts as primary key - primary key is only data required when storing a row. anything else is optional ex. Friends table - use id number for each friend - use id number as primary key #### Key concepts: Attributes - a column is called an attribute - represents a fundamental data element, something that does not need to be broken down further - similar to table columns, or fields in relational db - primary key can consist of one or two attributes ex. FriendID attribute for primary key, other attributes might be: - Name, Hobbies, Favorite colors #### Key concepts; Items - an item is a group of attributes that is uniquely identifiable among all other items - each item consists of one or more attrs - each item uniquely identified by primary key attr - simila to table row in relational db - difference from relational db - number and type of non-primary key attrs can be differnt for eahc item in table ex. data for each friend stored as an item - each friend has ID plus any other attrs that you wnat to store about that person - amount and type of data stored for each attr can vary for each friend #### Key concepts: Primary keys - simple primary key - consists of one attr - attr is called partition key or hash key - composite primary key - primary key consists of two attrs - first attr is partition key or hash key - second attr is the sort key or range attr - Remember: Primary keys uniquely id each item in the table, so no two items can have same primary key ## The concept of partitioning #### Key concpets: Partitions - store items data in partitions - data partitioned and indexed by the primary key - each table has one or more partitions - new partitions added as data is added - tables whose data is distributed evely across multiple partitions generally deliver best performance - partition in which an item's data is stores is determined by primary key attr ex. value for FriendID attr will determine which partition in the Friends table will store the data for that person #### Key concepts: Storing items in partitions #### Key concepts: Storing attributes in partitions ## DynamoDB global tables #### Using global tables - using global table option creates DynamoDB table that is automatically replicated across your choice of AWS Regions - deliver fast, local read and write performance for global applications - applications can stay highly available in even of isolation or degradation of an entire AWS region - global tables eliminate difficult work of replicating data between Regions and resolving update conflicts between copies ex. Millions of friends all over the world, who will use your Friends table? - global tables will put table closer to your friends - being closer to data means that your friends will experience faster query performance EBS - Infrastructure as a Service, can only be used through EC2 instance RDS - Paas (???) // get info about databases and tables, switch to using world database SHOW databases; USE world; SHOW tables; // these do sort of the same thing SHOW columns FROM world.country; DESC world.country; // use OVER function (and partition) to generate running total SELECT Region, Name, Population, SUM(Popuation) OVER(partition by Region ORDER BY Population) as 'Running Total' -> FROM world.country -> WHERE Region='Australia and New Zealand'; // add RANK() function to get ranked output SELECT Region, Name, Population, SUM(Population) OVER(partition by Region ORDER BY Population) as 'Running Total', RANK() over(partition by region ORDER BY population) as 'Ranked' FROM world.country WHERE Region='Australia and New Zealand'; ## AWS Architecture ## AWS Cloud Adoption Framework (CAF) - provides guidance and best practices to help build a comprehensive approach to cloud computing across the organization - CAF guidance helps orgs throughout IT lifecycle to accelerate successful cloud Adoption - organized into perspectives - perspectives consist of sets of capabilities - Perspectives: Business, People, Governance, Platform, Security, Operations ## Core perspectives of the AWS CAF - Business capabilities: Business, People, Governance - Technical capabilities: Platform, Security, Operations #### Business perspectives - IT Finance - IT strategy - Benefits realization - Business risk management - "must ensure that IT is aligned with business needs and that IT investments can be traced to demonstrable business results" #### People perspective - Resource management - Incentive management - Career management - Training management - Organizational change management - "training, staffing, and organizational changes" #### Governance perspective - Portfolio management - Program and project management - Business performance measurement - License management - "ensure that skills and processes align IT strategy and goals with business strategy and goals" #### Platform perspective - Compute provisioning - Network provisioning - Storage provisioning - Database provisioning - Systems and solution architecture - Application development - "understand and communicate the nature of IT systems and their relationships" - "describe the architecture of the target state environment" #### Security perspective - Identity and access management - Detective control - Infrastructure security - Data protection - Incident response - "must ensure that the org meets its security objectives" #### Operations perspective - Service monitoring - Application performance monitoring - Resource inventory management - Release management or change management - Reporting and analytics - Business continuity or disaster recovery (DR) - IT service catalog - "define how day-to-day, quarter-to-quarter, and year-to-year business will be conducted" ## AWS Well-Architected Framework - describes key concepts, design principles, and architectural best practices for designing and running workloads in the AWS Cloud - increase awareness of architectural best practices - address foundational areas that are often neglected - evaluate architectures by using a consistent set of principles #### Well-Architected Framework features Provides - questions that are centered on critically understanding architectural decisions - domain-specific lenses - hands-on labs - AWS Well-Architected Tool - AWS Well-Architected Partner Program Does not provide - implementation details - architectural patterns ## Pillars of the Well-Architected Framework - Operational excellense - deliver business value - Security - protect and monitor systems - Reliability - recover from failure and mitigate disruptions - Performance efficiency - use resources sparingly - Cost optimization - eliminate unneeded expense - Sustainability - minimize environmental impacts #### Operational excellense - deliver business value - ability to monitor systems to - deliver business value - continually improve supporting processes and procedures - manage and automate changes - respond to events - define standards to manage daily operations - design principles - perform operations as code - make frequent, small, reversible changes - refine operations procedures frequently - anticipate failure - learn from all operational events and failures #### Security - protect and monitor systems - ability to - monitor and protect information, systems, and assets - deliver business value through risk assessments and mitigation strategies - identify and manage who can do what - establish controls to detect security events - protect systems and services - protect the confidentiality and integrity of data - design principles - implement strong identity foundation - enable traceability - apply security at all layers - automate security best practices - protect data in transit and at rest - keep people away from data - prepare for security events #### Reliability - recover from failure and mitigate disruption - ability of a system to - recover from infrastructure or service failures - dynamically acquire computing resources to meet demand - mitigate disruptions, such as - misconfigurations - transient network issues - design principles - test recovery procedures - automatically recover from failure - scale horizontally - stop guessing capacity - manage change in automation #### Performance efficiency - use resources sparingly - ability to - use computing resources efficiently to meet system requirements - maintain that efficiency as demand changes and technologies evolve - design principles - democratize advanced technologies - go global in minutes - use serverless architecture - experiment more often - consider mechanical sympathy #### Cost optimization - eliminate unneeded expense - ability to eliminate - unneeded costs - suboptimal resources - design principles - implement cloud financial management - adopt a consumption model - measure overall efficiency - reduce spending on data center operations - analyze and attribute expenditures #### Sustainability - ability to minimize - impact of workloads on the environment - carbon emissions - energy consumptions - waste - design principles - understand your impact - establish sustainability goals - maximize utilization - anticipate and adopt new, more efficient hardware and software offerings - use managed services - reduce the downstream impact of your cloud workloads ## Well-Architected Principles ## Key design principles #### Well-Architected design principles - set of general design principles to facilitate good desing in the cloud - stop guessing your capacity needs - test systems at production scale - automate to make architectural experimentation easier - provide for evolutionary architectures - drive architectures by using data - improve through game days ## Well-architected design principles details #### Stop guessing your capacity needs - traditional environment - when you make a capacity decision before you deploy a system, you might waste expensive idle resources - might need to handle the performance implications of limited capacity - cloud environment - do not need to guess your infrastructure capacity needs - can use as much or as little capacity as you need and can scale up and down automatically #### Test systems at production scale - traditional - usually cost-prohibitive to create duplicate environment solely for testing - most test envs are not tested at live levels of production demand - cloud - create a duplicate env on demand, complete testing, decommission the resources - pay for the test env only when it is running, simulate live env for a fraction of the cost of testing on premises #### Automate - traditional - separate structures and components that require more work to automate (no common API for all parts of infrastructure) - cloud - create and replicate your systems at low cost (no manual effort) - track changes to your automation, audit the impact, and rever to previous parameters when necessary #### Provide for evolutionary architectures - traditional - architectural decisions are implemented as static, one-time events - there might be only a few major versions of a system during its lifetime - as business changes, initial decisions might hinder the ability to meet changing business requirements - cloud - capability to automate and test on demand lowers the risk of impact from design changes - systems can evolve over time so that businesses can take advantage of new innovations as a standard practice #### Drive architectures by using data - traditional - arch decisions often chosen according to organizational defaults - datasets cannot be generated - models and assumptions to size your architecture are probably used - cloud - collect data on how your arch choices affect the behavior of workloads - make fact-based decisions about how to improve workloads - use that data to inform architecture choices and improvements #### Improve through game days - traditional - exercise your runbook only when something bad happened in production - cloud - test how your arch and processes perform by scheduling game days to simulate events in production ## Reliability and High Availability - compare and contrast between reliability and high availability - factors in high availability #### Reliability - Reliability - probability that an entire systems functions for a specified period of time - includes hardware, firmware, and software - measures how long the item performs its intended function - two common measures of reliability - Mean time between failures (MTBF) - total time in service divied by the number of failures - Failure rate - number of failures divided by the total time in service #### Reliability compared to availability - Reliability - measure of how long a resource performs its intended function - Availability - measure of the percentage of time that a resource is operating normally - availability is a percentage of uptime (ex 99.9 percent) over a period of time (commonly a year) - avail is equal to normal operation time divided by the total time - common shorthand - shorthand refers to only the number of 9's - ex. five 9's is 99.999 percent available #### High availability (HA) - ensuring that your application's downtime is minimized as much as possible without te need for human intervention Number of 9s | Percentage of Uptime | Max downtime per year | Equivalent downtime per day #### HA goals - meant to help ensure the following - systems are generally functioning and accessible - downtime is minimized - minimal human intervention is required ## High availability prime factors - Fault tolerance - built-in redundancy of an application's components and its ability to remain operational - Scalability - ability of an application to accommodate growth without changing design - Recoverability - process, policies, and procedures related to restoring service after a catastrophic event #### On-premises HA compared to HA on AWS - Traditional, or on-premises IT - expensive - suitable for only mission-critical applications - AWS Cloud - expand availability and recoverability options by providing the ability to use - multiple servers - isolated redundant data centers with each AZ - multiple AZ's within each AWS region - multiple regions around the world - fault-tolerant services ## Transitioning a Data Center to the Cloud ## Storage and Archiving Overview ## Cloud Storage Overview - service that stores data on the internet through cloud provider that manages and operates data storage as a service #### Cloud storage formats - Object - each piece of data is an object with a unique identifier - File - data organized in a hierarchical format with files and folders - Block - each piece of data broken down into blocks with unique identifiers #### Use cases - Big data analytics - Data warehouses - IoT - Databases - Backup and archive ## AWS Cloud storage - Object storage - File storage - Block storage - Hybrid storage - S3, S3 Glacier, File Cache, EFS, FSx, EBS Object, file, and block storage on AWS Glacier - uses Archives and Vaults NFS for file shares fsx for luster (NFS) (linux) NFS fsx for win file servers (server message block SMB) #### Hybrid cloud storage and edge computing on AWS - AWS Storage Gateway - like a door to a big vault - frequently accessed data is cached - most data is going to be sent to S3 - AWS Snow Family - Snowcone - 8TB - Snowball - compute optimized - ~40 TB - storage optimized - ~60 TB - Snowmobile - 100s of Petabyes and higher (Exabytes) types of storage - file, tape, volume - file and volume use S3 - tape uses Glacier #### AWS Cloud storage scenarios - S3 - scalable, durable, accessible from internet - user-generate content, active archive, serverless, big data storage, backup and recovery - S3 Glacier - secure, durable, lowe-cost, data archiving and long-term backup - EFS - simple, scalable network file system for Linux workloads that uses both AWS Cloud services and on-premises resources - FSx - EBS - Storage Gateway - hybrid storage that augments on-premises env with AWS Cloud storage - disaster recovery, price tiering, or migration Instance store - uses block storage, but not really a service, just something that comes with EC2 instances TODO ## Amazon EBS - persistent block storage volumes - automatically replicated within its AZ - wherever it's created, you have copies - scale #### EBS features - Multiple volume types - Snapshots created in same region - can be encrypted #### EBS use cases EBS volume use cases - boot volumes and primary storage for EC2 instances - data storage for a file system - database hosts Snapshot use cases - create backup of critical workloads - recreate EBS volumes - share and copy data ## EBS volume types - Solid state drives (SSD) - provisioned IOPS SSD - general purpose SSD volumes - Hard disk drives (HDD) - throughput optimized HDD - cold HDD - ex. attach to EC2 instance, but not concerned about speed or high throughput #### Volume type comparison #### Use cases for EBS volume types: SSD #### SSD - Provisioned IOPS - IO-intensive workloads - relational db - NoSQL db - General purpose - recommended for most workloads - system boot volumes - virtual desktop - low-latency interactive apps - development and test environments #### HDD - Throughput-optimized - streaming workloads that require consistent, fast throughput at low price - big data - data warehouses - log processing - not a boot volume - cold - throughput-oriented storage for large volumes of data that are infrequently accessed - lowest storage cost - not a boot volume ## CLI #### Create EBS volume and attach to EC2 instance $ aws ec2 create-volume \ --size 80 \ -- availability-zone us-east-1a \ -- volume-type gp2 Describe volume Attach EBS volume ## Creating a snapshot of a volume ## Copying a snapshot - SnapshotId - changes when you make a copy!!! - AMI ID will be different when you copy to a different region ## Restoring a snapshot ## Checking volume status ## Managing EBS volumes with a lifecycle policy and the AWS CLI #### What is Amazon Data Lifecycle Manager? - automates the creation, retention, and deletion of snapshots - uses tags to identify EBS volumes to backup - uses a lifecycle policy to define the desired backup and retention actions - requires an AWS Identity and Access Management (IAM) role to allow the management actions ## Instance Store - temporary block-level storage for EC2 instance. located on disks that are physically attached to the host computer - temporary, non-persistent storage for EC2 instances - physically attached to host, fast low-latency storage - made of volumes running on virtual devices providing ephemeral block storage - dedicated to a particular instance #### How instance stores work - use block device mapping feature of EC2 API and Console to attach an instance store to an instance - data persists for only the lifetime of its associated instance - cannot create or destroy instance store volumes independently from instances - can control - whether instance stores are exposed to EC2 instance - what device name is used #### Instance store features - features are available for many instance types but not all - number, size, type differ by instance type - instance store must be mounted before you can access it - mounting occurs automatically or manually on Linux depending on the instance type #### Use cases - temporary storage of information that's continually changing - buffers - caches - scratch data - use for data that is replicated across a fleet of instances, such as a load-balanced pool of web servers ## Amazon EFS - don't need to specify capacity ahead of time - uses NFS - accessible from multiple computers, or servers - scalable, fully managed, elastic Network File System (NFS) storage for use with AWS Cloud services and on-premises resources - petabyte scale, low-latency - supports NFS - compatible with multiple AWS services - is compatible with all Linux-based instances and servers - uses tags #### Benefits - high availability - replicated in multiple regions - provides scalable, elastic file storage - dynamic elasticity - grows and shrinks automatically as you add and remove files - fully managed - provides a shared file system storage for Linux workloads #### Performance attributes Storage classes - Standard storage classes - EFS Standard - Standard Infrequent Access - One zone - EFS One zone - One Zone-IA Perfomance models - General purpose - Max IO Throughput modes - Elastic throughput - Bursting throughput - Provisioned throughput #### Use cases - home directories - file system for enterprise applications - application testing and development - database backups - web serving and content management - media workflows - big data analytics #### EFS architecture - EFS can be accessible to VPC across AZ's - each subnet needs a mount target to access the EFS #### How to use EFS 1) create EFS system 2) create EC2 resource, and launch 3) create mount targets in appropriate subnets 4) connect to EC2 instance, and mount EFS file system 5) clean up resources ## Storage with Amazon S3 - Intelligent-Tiering - don't know access patterns, but want to save money - access and security features - how to store, retrieve, and archive S3 objects using CLI ## Amazon S3 - object storage service that provides secure, durable, and highly available data storage in AWS Cloud - use S3 to store and retrieve any amount of data (objects) at any time from anywhere on the web #### Features - storage classes - storage management - access management and security - data processing - storage logging and monitoring - analytics and insights - strong consistency #### S3 storage classes - S3 standard, - S3-IA - S3 standard-IA - S3 One Zone-IA - S3 Glacier Instant retrieval - S3 Glacier Flexible Retrieval - S3 Glacier Deep Archive - S3 on Outposts ## S3 basic concepts #### Buckets and objects - bucket - container for objects stored in S3. every object contained in a bucket - objects - fundamental entities that are store in S3. consiste of object data and metadata #### Keys and regions - object keys - unique identifier for an object within a bucket - regions - geographical area where S3 will store the buckets that you create #### Versioning #### Additional S3 features Feature: object lifecycle management Purpose or benefit - manage objects so they are stored cost-effectively throughout their lifecycle How to use it - create a lifecycle configuration with rules for when objects shoul move to another class or be deleted Feature: Presigned object URL Purpose - share a privat object with use who doesn't have AWS security credentials or permissions How to use it - generate presigned object URL programmatically and provide it to the recipient to access the object Feature: Cross-origin resource sharing (CORS) Purpose - allow S3 bucket that hosts a static website to support CORS. support many origins to one bucket How to use it - create a CORS config on the bucket with rules that identify authorized origins and HTTP operations ## S3 Intelligent-Tiering - optimize cost by automatically storing objects in three tiers, - Frequent Access - Infrequent - Archive Instant Access - Optional tiers - Archive Access - Deep Archive Access #### Lifecycle policies - ex. S3 standard - 30 days --> S3 Standard-IA - 60 days --> S3 Glacier Flexible Retrieval - 1 year --> delete ## Access and security #### public access settings - block new public access control lists (ACLs) and uploading public objects - remove public access granted through public ACLs - block new public bucket policies - block public and cross-account access to buckets that have public policies #### Object Lock - prevent an object from being deleted or overwritten for a fixed amount of time or indefinitely - manage object retention in two ways - retention periods - legal holds - two retention modes - compliance - governance #### Event notification - SQS queue - SNS topic - Lambda ## S3 with CLI TODO ## Amazon S3 Glacier - storage service built for data archiving. high performance, flexible retrieval, low-cost archive storage #### Storage classes - Glacier Instant Retrieval - millisecond access - Glacier Flexible Retrieval - 1-2 times per year, retrieve asynchonously - Glacier Deep Archive - 1-2 times per year #### Vault - container for storing Archives - unique URI form - https://region-specific-endpoint/acct-id/vaults/vault-name #### Archive - any data, such as photo, video, or document #### Job - Glacier job can retrieve an archive or get an inventory of a vault #### Notification configuration - can notify you when a job is completed #### access to Archives - expedited - 1-5 minutes - standard - 3-5 hours - bulk - 5-12 hours #### Security features - IAM contains glacier supported policies to manage access to data - data encryption - encrypts automatically with AES-256 - key management - manages your keys for you #### Resource-based security policies - both vault access policies and Vault Lock policies manage permissions. only vault access policies can be modified at any time #### Compare S3 and S3 Glacier S3 Data volume - no limit Avg latency - ms Item size - 5TB max Cost per GB per month - $$ Billed requests - PUT, COPY, POST, LIST, and get Retrieval pricing - $ S3 Glacier Data volume - no limit Avg latency - minutes or hourse Item size - 40TB max Cost per GB per month - $ Billed requests - UPLOAD and retrieval Retrieval pricing - $$ #### Security comparison - S3 - encryption an optional step - S3 Glacier - encrypted by default #### How to access Glacier - console, REST API, SDK, S3 lifecycle policies ## AWS Storage Gateway - hybrid storage service that enables on-premises applications to use AWS Cloud storage - can use Storage Gateway for backup and archiving, disaster recovery, cloud data processing, storage tiering, and migration - supports file, volume, and tape interfaces #### How Storage Gateway works Applications use storage - NFS or SMB (Files) - iSCSI (Volumes) - iSCSI VTL (Tapes) Storage Gateway VM or appliance Storage Gateway service AWS Cloud storage services - FSx for Windows File Server - S3 - S3 Glacier - EBS - AWS Backup #### Features - durable storage of on-premises data in AWS cloud - uses standard storage Protocols - provides fully manage caching - transfer data in optimized and secure manner - VM or hardware device #### Storage Gateway types - File Gateway - native file access, S3 - FSx File Gateway - native file access to file shares on FSx for Windows File Server - Volume Gateway - access to block storage volumes backed up as EBS snapshots - Tape Gateway - access to virtual tape library that uses S3 archive tiers for long-term retention #### S3 File Gateway - store and access files as objects in S3 - NFS or SMB #### Volume Gateway - access block storage as volumes on S3 - iSCSI #### Tape Gateway - back up and archive data to virtual tape on S3 - iSCSI-VTL #### Use cases - move backups and archives to the cloud - reduce on-premises storage with cloud-backed file shares - provide on-premises applications with low-latency access to data stored in AWS - provide on-premises applications with seamless use of AWS storage ## AWS Transfer Family and Other Migration Services - relatively small amounts of data, like using FTP, SFTP, FTPS - secure transfer service that you can use to transfer files into and out of AWS storage services - supports transferring data from or to AWS storage services - S3 - Amazon Elastic File System (EFS) Network File System (NFS) file systems #### AWS Transfer for SFTP - retain existing workflows - store data in S3 bucket - connect directly with your identity provider systems #### How Transfer Family works ## Migration services #### AWS DataSync - online data transfer service that automates and accelerates the moving of data between on-premises storage systems and AWS storage services. also moves data btw AWS storage services - synchronize btw on premises and AWS - efficient and fast - managed service - connects over the internet or AWS Direct connect - includes AWS DataSync Agent (NFS protocol) #### Use cases - data migration - archiving cold data - data protection - data movement for timely in-cloud processing #### How DataSync works Shared file system (NFS and SMB) --> DataSync Agent --> Direct Connect or Internet --> DataSync --> S3 or EFS #### Snow Family - Snowball, Snowball Edge, Snowcone, Snowmobile - Snowball - storage optimized, 60TB - Snowball Edge - compute optimized, 40TB usable - Snowcone - has compute in it, 8TB - Snowmobile - 100s of PB or EB Snowball edge - work in remote location #### Snowball features - secure enclosure - up to 210 TB shipped in parallel - uncomplicated logistics - end-to-end encryption #### Snowball use cases - sensors or machines - data collection in remote Locations - media and entertainment aggregate content ## AWS Lambda - Traditional deployment and operations - Provision an instance - update the OS - install the application platform - build and deploy applications - configure automatic scaling and load balancing - regularly patch, secure, and monitor servers - monitor and maintain applications - Serverless deployment - Build and deploy applications - Monitor and maintain applications limit of 15 minutes running time, limit of 10 GB RAM, 1000 concurrent calls in a region #### What is AWS Lambda - fully managed service for serverless compute - provide event-driven invocation - subsecond metering - limits the runtime of a function to a maximum of 15 minutes - supports multiple programming languages (java, js, c#, python, ruby, go, powershell) ex. RDS is managed, but not fully managed (doesn't do high availability, scalability) service that provides scheduled events like cron job, but for AWS? EventBridge, used to be called CloudWatch Events #### AWS Lambda usage steps 1) upload code to Lambda 2) set up other AWS services or in-application activity to invoke your code 3) Lambda runs your code when it is invoked #### AWS Lambda use cases 1) users capture an image for their property listing 2) mobile app uploads the new image to S3 3) lambda function is invoked and calls Amazon Rekognition 4) Rekognition retrieves the image from S3 and returns labels for the detected property example use cases: - automated backups - process objects uploaded to S3 - event-driven analysis of logs - event-driven transformations of data - IoT - operating serverless websites #### Steps to develop and deploy a Lambda function 1) Define a handler class in code for function 2) Create Lambda function using console or CLI 3) Create and assign IAM role to the function. include permissions to access the AWS services that are required 4) upload the code for the function 5) invoke the function to test it 6) After function deployed, monitor it using CloudWatch #### Starting and stopping EC2 instances - Stop instances 1) Event (time based) - Schedule an event to invoke Lambda function to stop instance at night 2) Action role - Lambda function is invoked to stop Ec2 instances 3) Instances - EC2 instances enter stopped state - Start instances 4) Schedule an event to invoke Lambda function to start instance in the morning 5) Lambda function is invoked to start the EC2 instances 6) EC2 instances enter the running state #### AWS Lambda layers - configure a lambda function to use libraries that are not included in the deployment package - keep deployment package small - avoid errors in code for package dependencies - share libraries with other developers limited to 250MB compressed #### AWS Lambda quotas - Quotas include: - compute and storage resources - function configuration, deployment, and execution - Lambda API requests - a function that exceeds any of the limits will fail with an exceeded limits exception Step Functions - allow you to chain functions and exceed 15 minutes limit ## Working with AWS Lambda ## APIs and REST #### What is an API? - provides programmatic access to an application - the client application sends a request to the server application by using the server application's API - server application returns a response to the client application - benefits of an API include - provides access to an application's functions without using a GUI - provides a consistent way to invoke an application's functions Client --> Request --> API --> Server --> Response <| AWS API Gateway - talk to backend services Python <|--|> API Gateway <|--|> S3 #### API example Mobile weather application --> API --> Weather bureau's software system #### What are RESTful APIs? - interface that two computer systems use to exchange information securely over the internet - designed for loosely coupled network-based applications - communicate over HTTP - expose resources at specific URIs - many web APIs are RESTful #### RESTful API design principles - Uniform interface - Stateless - Cacheables - Layered system - Code on demand (optional) #### RESTful components - Client - Resource - Request - Response #### REST request format REST request includes the following: - Endpoint (as a URL) - Method - GET - read a resource - POST - create a resource - PUT - update an existing resource - DELETE - delete a resource - Header - Body (Data) #### RESTful request example #### RESTful response example #### REST example with cURL Curl -I -X POST -d @file.json -H "Content-Type: application/json" https://www.example.com/someresource #### HTTP status codes 1xx - informational response 2xx - success 3xx - redirection 4xx - client errors 5xx - Server errors ## Amazon API Gateway - service that developers can use to create and maintain APIs. - ex. can create a REST API for an application that you run on AWS - API Gateway is fully managed service that handles - scaling - access control - monitoring #### API Gateway benefits - efficient API development - performance at any scale - cost savings at scale - monitoring - flexible security controls - RESTful API options ## API Gateway architecture #### Architecture example Applications --> Frontend | Backend --> DynamoDB/Lambda/EC2/VPC API Gateway AWS services processing integration request and responses APIs made accessible through the API Gateway #### How API Gateway is used Applications/IoT/etc --> API Gateway (Gateway cache, CloudWatch) --> Lambda/EC2/etc Amazon Cognito user pools #### API Gateway example - Serverless web application Client |--> S3 |--> Amazon Cognito |--> API Gateway --> Lambda --> DynamoDB Dynamic web application Edge Locations - CloudFront, Transfer Accelerator, Route 53 ## AWS Step Functions #### What is Step Functions? - Step Functions provide serverless orchestration for modern applications - Orchestration centrally manages a workflow by breaking it into multiple steps, adding flow logic, and tracking the inputs and outputs between the steps #### Using Step Functions - gives you the ability to reuse components and use different services in your application - Coordinates existing AWS Lambda functions and microservices into applications - Keeps application logic separated from implementation #### Core concepts - based on workflows (or state machines) and tasks Workflow Start --> Task --> Task --> End State State #### Benefits - Productivity - gives the ability to connect and coordinate distributed components and microservices to quickly create applications - Agility - helps diagnose and debug problems faster - Resilience - manages the operations and infrastructure of service coordination to help ensure availability at scale and under failure #### Features - managed serverless service - main features - automatic scaling - high availability - pay per use - security and compliance #### Use cases - useful for creating end-to-end workflows to manage jobs with dependent components and for dividing business processes into a series of steps - Data processing - IT automation - Ecommerce - Web applications #### Step Functions example account-processing-workflow Start --> Parallel States [Lambda: CheckName, Lambda: CheckAddress] --> Lambda: OpenNewAccount --> End Use XRay to debug serverless functions ## Containers on AWS #### Problem - how can you simplify portability of applications between environments? - Virtual machines vs Containers ########## VM's - Apps - each has Bins/libraries - each has Guest OS Hypervisor Server ########## Containers - Apps - each has bins/libs Docker engine OS Server Reduce having multiple OS's, and can also share some of the bins/libs that are common to the different applications #### What is a container? - an application and its dependencies, which can be run in resource-isolated processes #### Benefits of containers - Environmental consistency - Process isolation - Operational efficiency - Developer productivity - Version control ## Docker #### What is Docker? - like a hypervisor, running on top of OS - application platform used to create, manage, and run containers - developers and engineers can build, test, deploy, and run containers #### Benefits of Docker - Microservices architecture - Stateless - Portable - Single, immutable Artifact - Reliable deployments #### Components of Docker - Docker file - image - Registry - Container - Host ## AWS container services - Image registry - Elastic Container Registry (ECR) - Management - Elastic Container Service (ECS), Elastic Kubernetes Service (EKS) - Hosting - AWS Fargate, EC2 Fargate runs containers without worrying about details, sort of "serverless" version of EC2 #### Amazon ECR - fully managed Docker container registry that developers can use to store, manage, and deploy Docker container images #### Amazon ECS - highly scalable, high-performance container management service that supports Docker containers #### Amazon EKS - managed service that you can use to run Kubernetes on AWS without needing to install and operate your own Kubernetes clusters #### AWS Fargate - compute engine for ECS/EKS that you can use to run containers without needing to manage servers or clusters #### Deploying to AWS Managed container services - Choose your orchestration tool - ECS, EKS - Choose your launch type - Fargate, EC2 EKS is Fully managed (?) On-premises - Kubernetes (?) ## Tooling and Automation ## AWS Systems Manager #### Systems Manager is a collection of capabilities that help you manage your applications and infrastructure running in the AWS Cloud - Software inventory - OS Management and configuration - OS patches - System images |--> EC2 instances #### Capabilities overview Systems Manager - Documents, Automation, Run Command - Session Manager, Patch Manager, Maintenance Windows - State Manager, Parameter Store, Inventory #### Documents - Systems Manager document defines the actions that Systems Manager performs on your managed instances - Documents - Owned by Amazon - AWS provides a collection of predefined Systems Manager documents - Owned by you - customize an existing document, or create your own - Shared with you - shared document from another account #### Automation - Safely automate common and repetitive IT operations and management tasks across AWS resources 1) Create an automation document 2) Run the automation document 3) Monitor the document processing, and verify the results #### Run Command - automated way to run predefined commands against EC2 instances - use predefined, - create your own - choose instances or tags - choose controls or schedules - run a command immediately or on a specific schedule ex command types AWS-InstallWindowsUpdates (Windows) AWS-RunPowerShellScript (Windows, Linux) AWS-RunShellScript (Linux and MacOS) #### Session Manager - Securely connect to instances without opening inbound ports, using bastion hosts, or maintaining SSH keys #### Patch Manager - Deploy OS and software patches automatically across large groups of EC2 instances or on-premises machines 1) Create a patch baseline 2) Create a maintenance window for patching 3) Apply patches and reboot instances 4) Audit results with patch compliance #### Maintenance Windows - Schedule windows of time to run administrative and maintenance tasks across your instances 1) Create a maintenance window 2) Assign targets 3) Assign tasks 4) Review the status of tasks #### State Manager - Maintain consistent configuration of Amazon EC2 or on-premises instances 1) Choose or create an automation document 2) Associate your instances with the document 3) Specify a schedule for the state 4) (Optional) Output data to an Amazon S3 bucket #### Parameter Store - provides a centralized store to manage configuration data or secrets Parameter Store --> kv pair --> AWS KMS parameter name: /Dev/DB/Password parameter value: DE24sdfs #### Inventory - collects information about instances and the software that is installed on them - application data, files, network config, microsoft windows services, server roles, update, system properties ############################### ## Administration Tools #### Software Development Kits (SDKs) - use SDK's to access AWS services programmatically and write administrative scripts in different languages #### AWS CloudFormation - use CloudFormation to create, update, and delete a set of AWS resources as a single unit - define the resources in a template, which can be written in JSON or YAML - CloudFormation provisions the resources defined in a template as a single unit called a stack - key features - preview how proposed changes to a stack will impact the existing environment - detect drift - invoke an AWS Lambda function #### How CloudFormation works 1) Define resources in a template, or use a prebuilt template 2) Upload the template to CloudFormation or point to a template store in an S3 bucket 3) run a create stack action. resources are created across multiple services in your AWS account as a running environment 4) the stack retains control of the resources that are created. you can later update stack, detect drift, or delete stack #### Benefits of CloudFormation - provides the following benefits: reusability, repeatability, and maintainability - ability to - deploy complex environments rapidly - duplicate the same environment - ensure configuration consistency - delete resources in a single action (delete the stack) - propagate the same change to all stacks (update the stacks) #### AWS OpsWorks - can use OpsWorks to automate how servers are configured, deployed, and managed - features - automates configuration management - based on Chef and Puppet open-source automation platforms - three versions - AWS OpsWorks for Chef Automate - AWS OpsWorks for Puppet Enterprise - AWS OpsWorks Stacks ############################### ## Amazon CloudWatch - describe AWS monitoring service, Amazon CloudWatch - three components of CloudWathc - Basic Monitor for EC2 instances - detailed monitoring for EC2 instances - CloudWatch metric, alarm, event #### Use AWS efficiently and gain insight - to use AWS in an efficient way, you need insight into your AWS resources - how do you know when you should launch more EC2 instances? - is your application's performance or availability being affected bc of insufficient capacity? - how much of your infrastructure is actually being used? #### Monitoring resource performance - when you monitor workload performance, ask two questions 1) how can you ensure that your workload has enough resources to meet fluctuating performance requirements? 2) how can you automate resource provisioning to occur on demand? #### What is Amazon CloudWatch? - Monitor the state and utilization of most resources that you can manage under AWS - standard metrics - custom metrics - alarms - notifications - CloudWatch agent collects system-level metrics - EC2 instances - on-premises servers - CloudWatch terms - metric, alarm, events #### Alarm examples - if CPU utilization > 60% for 5 minutes... - if number of simultaneous connections > 10 for 1 min... - if number of healthy hosts < 5 for 10 minutes... #### CloudWatch actions - Alarm |--> EC2 instance - stop, terminate, reboot, or recover an instance |--> EC2 Auto Scaling - scale an Auto Scaling group in or out |--> SES email - send message to SNS topic #### CloudWatch alarms - test a selected metric against a specific threshold (freater than or equal to, less than or equal to) - the ALARM state is not necessarily an emergency condition - three possible states - OK - threshold not exceeded - ALARM - threshold exceeded - INSUFFICIENT DATA - alarm has just started, metric is not available, or insufficient data #### CloudWatch monitoring example #### CloudWatch alarms example #### Metric components Metric, Name and value - Namespace - groups related metrics together - Dimensions - Name-value pairs that further categorize metrics - Ex. InstanceId is a dimension of CPU utilization - Metric name + dimension = a new, unique metric - Period - Specified time (in seconds) over whih metric was collected #### Metric components - Namespace - Groups related metrics together #### Standard and custom metrics - Standard metrics - grouped by service name - display graphically so that selected metrics can be compared - only appear if you have used the service in the past 15 months - reachable programmatically through the CLI or API - Custom metrics - grouped by user-defined namespaces - publish to CloudWatch by using the AWS CLI, an API, or a CloudWatch agent - Alarm, time-based event, event-based event, rule #### Monitoring and security - use cloudwatch to monitor for suspicious activity - unusual, prolonged spikes in service usage, such as CPU, disk activity, or RDS usage - set alerts on billing metrics (you must enable this feature in account settings) #### CloudWatch automatic dashboards - surface data about running AWS services - can be used through existing monitoring tools #### Activate detailed instance monitoring - by default, EC2 instance report data every 5 minutes - enable detailed monitoring on an instance to increase reporting frequency to once every minute - extra charges apply ################################# ## AWS Service Integration with Amazon Athena - query data from other AWS services - handles large scale datasets with ease - serverless - avoids need to extract, transform, and load (ETL) - pay only for queries that you run - automatically runs queries in parallel so most results come back in seconds #### Getting started with Amazon Athena 1) Create an S3 bucket and load data into it - alternatively, have an existing bucket with data already in it 2) Define the schema (table definition) that describes the data structure 3) Start querying data by using SQL #### AWS service integrations with Athena - makes it easier to query logs from services such as - AWS CloudTrail - Application Load Balancer logs - Amazon VPC Flow logs ################################# ## Introduction to AWS Organization - Policy-based management - Create service control policies (SCPs) that centrally control AWS services across multiple AWS accounts - Group account management - Create groups of accounts, and then attach policies to them - Account management through APIs - Automate the creation and management of new AWS accounts - Consolidated billing - Review a combined view of charges incurred by all your accounts #### Security with AWS Organizations - control access with AWS IAM - IAM policies enable you to allow or deny access to AWS services for users, groups, and roles - Service control policies (SCPs) enable you to allow or deny access to AWS services for individuals or group accounts in an organizational unit (OU) #### Organization setup 1) Create your organization with current AWS account as the management account. Add accounts to your organization by creating new accounts or inviting existing accounts to join 2) Create organizational units (OUs) in your new organization and move the member accounts into those OUs 3) Create service control policies (SCPs), which enable you to apply restrictions to what actions can be delegated to users and roles in the member accounts 4) Test your organization's policies. Sign in as a user for each role in your OUs and see how the SCPs impact account access #### AWS Organizations Rules for names - must be Unicode, can be u to 250 characters in length Max and min values - number of accounts: 4 - number of roots: 1 - # of policies, 1000 - max sixe of SCP doc: 5120 bytes - OU max nesting: 5 - invitations sent per day: 20 - # of member accounts you can create concurrently: up to 5 in progress at one time - # of entities that you can attach a policy to: Unlimited #### Accessing AWS Organizations - Console, CLI, SDK, HTTPS query ################################# ## Tagging - purpose and function of tagging - describe cost management strategies associated with tagging - enforce tagging by using IAM policies #### What is a tag? - key-value pair that can be attached to an AWS resource - enables you to identify and categorize resources #### Tag characteristics - you can tag many resources (EC2 instance, EBS volumes, etc) during creation - must complete a separate tagging action after the resource is created - tag keys and values are case-sensitive - best practice is to use a consistent and standard format - some tags are built-in and cannot be removed - tages with aws: prefix are built-in. ex. aws:createdBy, aws:cloudformation:stateDiagram-v2 [*] --> Data Data --> shiftRight shiftRight --> Sum1 Data --> Sum1 Sum1 --> Mult1 Data --> shiftLeft shiftLeft --> Sum2 Data --> Sum2 Sum2 --> Mult2 Mask --> Mult1 Mask --> Invert Invert --> Mult2 Mult1 --> Sum3 Mult2 --> Sum3 Sum3 --> [*]how are you