Networking notes

  • Ethernet
    • No TTL, pronce to problems in the presence of a switching loop
    • The solution is to allow physical loops, but create a loop-free logical topology using the shortest path bridging (SPB) protocol or the older spanning tree protocols (STP) on the network switches.
    • Has Frame Check Sequence (FCS) to detect errors. Damaged frames are discarded
    • [Preamble][SFD][Dest MAC][Src MAC][EtherType/Size][Payload][FCS][InterFrameGap]
      • EtherType can be [802.1Q Header][EtherType] to support VLAN
    • If EtherType value is >= 1536, Ethertype is assumed, else size
  • ARP
    • Discover link layer (mac) address associated with given IP address
    • Link Layer request-response protocol
    • Encapsulated by link-layer protocol, e.g. Ethernet
    • Works for IPv4, IPv6 equivalent is NDP
    • Gratuitous ARP reply (or request) announcing IP or MAC changes
      • It doesn’t solicit a reply
    • ARP spoofing
      • ARP does not provide a way to authenticate replies, therefore any host can reply to a request for another system’s address
      • This can also be legit in the ARP proxy scenario, where a proxy is answering ARP requests on behalf of another system for which it forwards traffic
  • DHCP
    • Dynamic Host Configuration Protocol
    • IP/UDP Protocol (Application Layer)
    • Server at port 67, clients at port 68
    • Leases IPs, provides net conf info like DNS, gateway, time servers
    • DORA (Discovery, Offer, Request, Ack)
      • Discovery: Ethernet broadcast from client
      • Offer: Eth unicast from server to client
      • Request: Eth broadcast from client to server (lets other DHCPd’s know)
      • Ack: Eth unicast from server to client
    • After DORA, client should do ARP Probes to ensure noone is using the IP. The server should do ARP probes before offering an IP, acc to the RFC
    • Also Info and Release messages
      • Info: request configuration info (dns, gw etc) from dhcp server
      • Release: not required (devices can be unplugged w/o notice)
    • Security
      • Rogue DHCP servers can provide wrong information to clients
      • DHCP address exhaustion
      • DHCP snooping: limit DHCP server messages in specific switch ports
  • Tunneling
    • SSH tunneling
      • Local or remote port forwarding
      • Local forwarding (-L) opens a port locally, creates a tunnel with your server and allows you to connect to a server outside e.g.
        • ssh -L 8080:imgur.com:80 lekkas.org -fN
        • This means that connecting to 8080 locally will get you to imgur.com:80 over lekkas.org
        • -f sends the process to the background, -N says to execute no command on the remote server
      • Remote forwarding (-R) opens a port to your remote sevrer and allows outside people connecting to your remote server reach your workstation e.g.
        • ssh -R 80:localhost:8080 lekkas.org
        • This opens a port to lekkas.org:80, clients connecting there will reach your local 8080 port, over the ssh tunnel
        • For this to work you need GatewayPorts = yes enabled in your servers sshd_config
      • Only data is sent over the tunnel
    • OpenVPN
      • Creates tunnel over TCP or UDP
      • Uses OpenSSL (SSL/TLS) for encryption
      • Prone to TCP-over-TCP problem if configured to use TCP
      • Use TUN or TAP devices
        • TAP devices operate on layer 2 / data link / Ethernet
        • TUN devices operate on layer 3 / network / IP
        • TAP can be used for bridging, TUN for routing
      • TUN/TAP are virtual network, kernel interfaces. Traffic is sent/received to/by user applications
      • Wireguard alternative
  • DNS
    • Hierarchical, distributed system
    • root servers
      • 13 unique IP addresses, >1000 real servers with the same IP (anycast)
      • Hold info for top level domain name servers
      • Resolvers typically use root.hint file with IPs of root servers
      • https://www.internic.net/domain/named.root
      • a to m.root-servers.net
      • root servers basically serve the root zone file, which has the name servers for all top level domains TLDs (.org, .com etc.)
      • https://www.internic.net/domain/root.zone
      • Root Trust Anchor
        • The Root Trust Anchor, or Key Signing Key, is used by DNSSEC-enabled software to verify the contents of the DNS root zone is valid. It additionally enables a single chain of trust to DNSSEC-enabled top-level domains and beyond.
    • Top Level Domain Registry operators
      • Hold info for .com, .org, .gr etc
      • E.g. for .gr it’s FORTH
    • Second level
      • amazon.com, wikipedia.org, skroutz.gr etc
      • Here’s where domain name registers work (e.g. papaki.gr)
    • Organizational Hierarchy
      • IANA
        • Maintains root-servers.net, .arpa
        • Delegates other top-level domain name authority to domain registry operators
          • The full list can be found at the root zone file
      • Registry Operator
        • e.g. VeriSign for .com (gTLD) or Forth for .gr (ccTLD)
        • Maintains the Domain Name Registry (DB of domain names and registrar information) of a specific domain
        • It might also function as a domain name registrar or delegate that to other entities
        • The allocated and assigned domain names are made available by registries by use of the WHOIS system and via their domain name servers.
      • Domain Name Registrar
        • Accredited by a gTLD or ccTLD, manages reservation of domains
        • Sometimes they provide DNS services, e.g. setting NS for domain. When this happens, they update the NS records of that particular domain to the respective TLD Registry Operator. WHOIS info is also updated in that TLD for your domain
      • Other
        • Name servers (NS)
          • Authoritative
            • They return answers only to queries about domain names that have been specifically configured by the administrator
              • Authoritative answers are indicated by the AA bit set in the response
            • Can be primary or secondary.
              • Primary has the SOA record
              • Secondaries keep a copy of SOA’s db, e.g. by zone transfers
          • Recursive name servers
            • When a name server cannot answer a query it may recursively query name servers higher up in the hierarchy.
          • In principle, authoritative servers are enough. But if we only had them and not recursive name servers, then every user system would have to perform recursive queries, starting from the root name servers, to resolve a domain name.
          • Caching name servers are often also recursive name servers
      • Resolving types
        • Non recursive
        • Recursive
        • Iterative
      • OPT pseudo-resource record for EDNS
        • Original RFC limited DNS size to 512
        • Required for DNSSEC
        • DNS amplification attack (large response comared to query size)
      • Zone transfer
        • Examples at
          • dig @zonedata.iis.se se AXFR > se.zone.txt
            • dig @zonedata.iis.se nu AXFR > nu.zone.txt
      • DNSSEC walkthrough
        • See https://dnsviz.net/d/cloudflare.com/dnssec/
        • Administrator of example.com wants to secure A records
        • He creates RRSIG record. This is the set of A records, signed with DNSKEY
        • RRSIG, DNSKEY records are added in the authoritative name server of example.com zone
        • One more RRSIG entry for the DNSKEY records is added.
        • The administrator of the zone informs the registrar of the DS record.
        • DS is a hash of the DNSKEY signing key that signs the RRSIG of DNSKEY records. It is parent authoritative server, in this case .com
  • Routing Basics
    • IGP / Interior Gate Protocol
      • A type of protocol used for exchanging routing information between gateways (commonly routers) within an autonomous system (for example, a system of corporate local area networks)
      • Examples
        • OSPF (Link Layer)
        • RIP (IP/UDP)
    • Exterior Gateway Protocol
      • Used to exchange routing information between autonomous systems and rely on IGPs to resolve routes within an autonomous system.
      • Examples
        • EGP, BGP
  • TCP
    • Features
      • connection oriented
      • retransmission
      • error detection
      • reliability > latency
      • Accurate than timely delivery
        • UDP is the opposity, latency > reliability
      • Congestion avoidance
        • 4 interwined algorithms
          • Slow start
          • Conestion avoidance
          • Fast retransmit
          • Fast recovery
      • Flow control
    • Initiation
      • c: SYN
        • sequence: random value
      • s: SYN/ACK
        • sequence: random value
        • acknowledgment: seq + 1
      • c: ACK
        • acknowledgment: seq + 1
    • Teardown
      • c: FIN
        • c: FIN_WAIT_1 (waiting ACK for sent FIND)
        • s: CLOSE_WAIT
      • s: ACK
        • c: FIN_WAIT_2 (waiting for FIN from other party)
        • s: CLOSE_WAIT
      • s: FIN
        • c: TIME_WAIT
        • s: LAST_ACK
      • c: ACK
        • s: CLOSED
        • c: Stays at TIME_WAIT
      • The initiator should keep reading data till the other party has sent FIN. Connections can be half-open that way
      • Can also happen in three steps, if receiver combines FIN & ACK steps
    • Attacks
      • Connection hijacking
      • TCP Veto
      • TCP reset attack
      • DoS
      • SYN Flooding
        • SYN Cookies
      • TCP Cookie transactions
        • Useful for short-lived sessions, like in DNSSEC
        • No resource allocation before 3-way handshake or after TIME_WAIT
        • Deprecated in favour of TCP fast open
      • TCP Fast open
        • Cookie sent from sever to client during initial handshare
        • Client uses it in SYN for later connections
        • Then server starts sending data immediately, before client’s final ACK
      • Selective Acknowledgments (SACK)
        • Allows the receiver to acknowledge discontinuous blocks of packets which were received correctly
    • IP packets may be lost, duplicated or delivered out of order
      • TCP asks for retransmissions, reorders and helps with congestion
    • While IP handles actual delivery of the data, TCP keeps track of segments
    • TCP first reassembles, then delivers, data to application
    • Timers
      • Retransmission timer
        • If segment is not acknowledged timely, it’ll be retransmitted
      • Persistence timer
        • Prevent deadlock
        • Receiver sends window size 0, sender pauses
        • If receiver later increases window size but the packet is lost, the hosts are deadlocked
        • To prevent this, sender periodically sends a probe to the receiver to get window time. If it’s still 0 it resets the timer and waits.
      • Keepalive timer
        • Periodic probes to see if other end is responding
      • TIME_WAIT timer
        • Used to make sure all packets of this session have died off
    • Dropped packets are a sign of congestion, which is problematic for wireless networks. TCP responds to that by dramatically reducing window size.
      • Alternative congestion algorithms (vegas/westwood) address this problem
      • TCP Tahoe the original congestion algorithm
    • TCP Window
      • Flow control
      • Set by receiver, specifies how many bytes the sender can send
      • Can be 0 (see persistence timer)
      • Client usually has larger window than server
    • Sequence number tracks bytes sent by sender
    • TCP Window scaling
      • Max window size is 65535, which under-utilizes links
      • To deal with it, there’s a TCP option defining a scaling factor
      • With that, endpoints can define windows > 1GB
    • Maximum Segment Size
      • Set in TCP Options
      • Doesn’t count TCP and IP headers
      • Can span over several IP fragments
      • Set initially in the SYN packet, cannot change afterwards
      • MSS = MTU - TCPHdrLen - IPHdrLen
    • Flags
      • SYN
      • ACK
      • FIN
      • RST (immediate abort of connection)
      • PSH
      • URG
  • IPv4
    • Responsible for
      • Addressing
      • Enapsulating data into datagrams
        • Including fragmentation and reassembly
      • Routing datagrams
    • Best effort delivery
      • Characterized as unreliable
    • Data corruption, lost or duplication might occur
    • Out of order delivery can also happen
    • MTU discovery
      • Send IP/UDP packet with Don’t Fragment (DF) bit set
      • Devices in the path with smaller MTU than that packet will drop it and send an ICMP ‘Framengetation Needed’ message back to the host
      • Repeat the process till packet traverses full path without framgentation
    • First and last addresses in a subnet are reserved
      • First to identify subnet
      • Last as a local broadcast address
    • Packer header fields
      • Version
      • IHL (Internet Header Length)
      • DSCP (DiffServ)
      • ECN (Congestion Notification)
      • Length
      • Identification (used for fragments)
      • Flags
        • Don’t fragment (DF)
        • More Fragments (MF)
      • TTL
        • Prevent datagrams from going in circles
        • Hop Count
        • Used by ICMP, when TTL exceeds an ICMP Time exceeded message is sent
      • Header checksum
      • Source and Destination address
  • IPv6
    • Main motivation was exhaustion of IPv4 address space
    • First 64bits are network, last 64 bits are host (fixed subnetting)
    • Differences with IPv4
      • Larger address space
      • Simpler header structure
      • Multicasting
      • Does not implement traditional broadcast, uses “all nodes” multicast group
      • Stateless address autoconfiguration (SLAAC)
        • Can also use DHCPv6
      • IPSec
      • No checksum field
        • Depends on link layer checksum
        • For higher level (TCP/UDP) they have their own
          • Therefore DEMANDS checksum in UDP
        • End-to-end principle, most processing happening on end nodes, not by routers
      • IPv6 routers don’t do fragmentation
        • expect hosts to do mtu discovery
      • NDP, instead of ARP, which depends on ICMPv6 and IPv6 Multicasting
  • IPv4 vs IPv6
    • Anycast explicitly supported in IPv6, only through BGP in IPv4
    • 32 bit addressing vs 128 bit addressing
  • UDP
    • Optional checksum in header
      • Mandatory for IPv6
    • Optional source port in header
    • Features
      • Unreliable
      • Not ordered
      • Lightweight
      • Datagrams
        • Checked for integrity only if they arrive
      • No congestion control
      • Broadcasts
      • Multicasts
  • HTTP
    • Stateless, request-response protocol
      • HTTP cookies can used to keep state
    • Persistent connections since HTTP/1.1
    • Request message
      • Request line
      • Headers
      • Emtpy line
      • Body
    • Methods
      • Put, post, get, head, trace, options
        • All these are indepondent
    • HTTP Pipelining
      • Multiple requests sent over same connection w/o waiting for response
      • Superseded by HTTP/2 multiplexing
      • Susceptible to Head-of-Line blocking
    • Chunked transfer encoding
      • Chunked transfer encoding allows a server to maintain an HTTP persistent connection for dynamically generated content.
      • In this case, the HTTP Content-Length header cannot be used to delimit the content and the next HTTP request/response, as the content size is not yet known.
    • Headers
      • Upgrade
        • Used with TLS and WebSocket
    • Websocket
      • Communication done over ports 80 or 443
      • After request/response handshake, communication switches to binary protocol that doesn’t conform with HTTP
    • Security
      • Same origin policy
        • Can be relaxed with CORS (Access-Control-Allow-Origin header)
          • Protects the server from unauthorized access
        • Scripts allows to run only on pages with the same origin
        • Restriction for scripts only, not for images or other resources
      • Cross-site request forgery
        • One click attack
        • Several mitigations, included secret tokens in forms or headers
      • CSP
        • Content Security Policy
          • Content-Security-Policy header
          • Defines where web app can perform requests to
  • HTTP/2
    • Features
      • Data compression of headers
      • pipelining of requests
      • request multiplexing
    • Drawbacks
      • Still has HOL blocking problem on TCP level
  • Load Balancing
    • Distribute taks over a set of resousrces
    • Efficiently distributing incoming network traffic across a group of backend servers, also known as a server farm or server pool
    • Ensures high availability and reliability by sending requests only to servers that are online
    • Provides the flexibility to add or subtract servers as demand dictates
    • Global Server Load Balancing
      • Global Server Load Balancing or GSLB is the practice of distributing Internet traffic amongst a large number of connected servers dispersed around the world.
      • The benefits of GSLB include increased reliability and reductions in latency.
    • Anycast
      • Network addressing and routing method in which incoming requests can be routed to a variety of different locations or “nodes.”
      • It is not easy to setup a true Anycasted network. Proper implementation requires that a CDN provider maintains their own network hardware, builds direct relationships with their upstream carriers, and tunes their networking routes to ensure traffic doesn’t “flap” between multiple locations.
      • Ideal for DNS
      • Can be used with TCP also, but flapping needs to be dealt with
        • The reason is that route can change and send packets to other node
    • Algorithms
      • Least connections
      • Round Robin (also weighted)
      • Hash (e.g. ip hash)
      • Least reponse time
    • Benefits
      • Scalable
      • Redunndancy
      • Flexibility
    • Companies
      • Akamai
      • Cloudflare
    • DNS Round-Robin
    • Global Server Load Balancing
    • Features
      • Assymetric load (When some servers have bigger capacity)
      • TLS Offload & Acceleration
        • LBs can terminate SSL/TLS and forward HTTP to servers
      • DDOS protection
      • HTTP Compression
      • HTTP caching
      • HTTP security
  • Misc
  • A bridge connects to separate LAN networks. A switch is basically a multiport bridge