What is this article about? #
I always wanted to find out what exactly HTTPS means, how and why it is using TLS and how the messages are encrypted between the client and the server. This guide is written for my personal reference, but i wanted to share it with anyone who might have the same desire with me and find my notes helpful.
I will start by explaining HTTPS and TLS from the OSI Model perspective. In my opinion, having a glance under the hood will give you a better understanding on how HTTP, HTTPS, TLS, TCP are related.
Then, i will show you how symmetric and asymmetric key encryption works. This will help you understand later on how the TLS Certificates are signed and verified by the browser, as well as how messages are encrypted/decrypted between the client and the server.
In chapters 5 & 6, i will explain what Certificate Authorities are, the anatomy of a TLS Certificate and how TLS Certificates are related to Certificate Authorities.
Chapter 7 will be all about TLS in depth. I will explain what TLS handshake is and how the encryption key is generated and exchanges on the client and the server, and how they use this key in order to securely communicate with each other.
HTTP vs HTTPS #
HTTPS is the secure version of the HTTP Protocol. It uses TLS in order to encrypt and signing normal HTTP requests and responses. HTTP is served on TCP port 80, while HTTPS is served on TCP port 443.
This is what we all hear when it comes to HTTPS, but what does all that mean?
HTTP and HTTPS in OSI Model #
It took long time for me to understand what is the OSI Model and why it was always mentioned in network, protocols or security topics.
To understand this, without getting too deep into it, think of it as the path that our data gets, in order to be delivered to the destination machine and the way the machine will send the response back to us.
I will try to explain this with illustrated images and break down the most important pieces of what we should know about HTTP and what is the role of the OSI model.
First, let’s see the path our request gets by using the HTTP Protocol.
1) Application Layer 7
HTTP belongs to the Layer 7 Application Layer. That means, that for example, if we want to make a POST request to a destination machine (usually a server), we are using HTTP as the entry gate to the journey of our message delivery through the OSI model.
2) Transport Layer 4
We will skip Layers 6 and 5, since we don’t really care about the mechanisms applied on these, for the topic that we want to discuss.
The Layer 4 Layer is directly associated with the TCP Layer. The TCP is responsible for establishing a new connection between the source and the destination (client and server). The TCP is also responsible for the Transport of our message, by adding the source and the destination tcp ports to that message and creating a segment. Think of TCP as a bus which is making the same route (sending the message from street number 234 to street number 80), many times in the same day (connection is alive), and transporting people in it ( {“msg”: “hello”} ).
However, TCP by itself is not secure. TCP transports the content that it takes from the Layer above, and pushing it to the next Layer, as i explained above. This means, that if an attacker sits in between the delivery of the message, he can see all the content that is sent, since it is just a plain text.
In the next picture, we will see how the message is delivered from the source, through the layers of the OSI model, up to the Application Layer of the destination.
Now, we will see what HTTPS means, in terms of OSI model and what makes it secure.
Simply put, in comparison with the above visual examples, everything stays exactly the same. The only thing that changes is the encryption of the message. The encryption of the message is handled by the TLS Protocol, in Layer 6 Presentation Layer.
Let’s see, some additional illustrations and put TLS into the game.
1) Application Layer 7
This step stays exactly the same. The only thing that changes is that in order to let the client and the server know that we want to use the TLS Protocol in Layer 6, is to type https:// instead of http:// when sending the request.
2) Presentation Layer 6
This Layers is extremely important. This is where the security comes into play.
TLS belongs to this Layer. The TLS have 2 major roles:
- Client and server will communicate over TLS and agree to a pair of encryption keys that they will use in order to encrypt/decrypt the message passed from the Layer 7
- When the key pair will be obtained, TLS will encrypt the message by using specific algorithms and its key in the client. The server will get the encrypted message from the Layer 4 (TCP will pass it to it) and decrypt the message, passing it in plain text to the Application Layer 7.
3) Transport Layer 4
Nothing really changes here. TCP doesn’t really know what is transfers. This time, the message that it will get from the Presentation Layer will be encrypted and it will pass the segment down to the OSI Layers.
What does this mean? That any attacker sitting in between the message’s journey, will read a bunch of gibberish, since it is encrypted.
Now, we will see the entire journey of the message delivery from the client to the server.
Symmetric vs Asymmetric Key Encryption #
In this section we will analyse what is the Symmetric, Asymmetric encryption and the supported algorithms for them.
Understanding what is symmetric and asymmetric key encryption will help us understand later, how TLS Certificates are signed and validated, as well as how the data are securely transmitted (encrypt and decrypt a message between client and server) over the internet.
Symmetric Encryption #
Simply put, symmetric key means that we encrypt and decrypt data with the same key.
Symmetric keys, are used for encrypting and decrypting data between the client and the server. We will analyse later how both the client and the server obtain the same key in order to do that.
Symmetric key algorithms: DES, 3DES (doth obsolete) & AES
Asymmetric Encryption #
Asymmetric encryption means that the data gets encrypting using one key (public keys) and gets decrypted using a different key (private key). These 2 keys are a pair of mathematically linked numbers.
Encryption using asymmetric keys #
The beauty of using asymmetric keys, is that we can encrypt the data and share the public key with anyone, because only the private key can decrypt this data.
Signing data using asymmetric keys #
In this use case, the data receiver needs to ensure that the data was 100% sent by the data owner.
To achieve that, the data owner takes that data and creates hash using an asymmetric algorithm (RSA) and then encrypts the hash using the private key, resulting an encrypted hash. After that, the data owner sends that data along with the encrypted hash to the data receiver.
Then, the data receiver takes the data, creates a hash using the same algorithm, decrypts the encrypted hash data with the public key and compares the decrypted hash and the hash that the receiver generated.
Asymmetric key algorithms: RSA
Certificate Authority (CA) #
Understanding what Certificate Authorities are, will help us understand how a TLS Certificate is signed, and how the browser knows that it can trust this specific Certificate that receives from the server.
CA, is the entity which signs a certificate. If our certificate is signed by a CA and we trust this CA, we trust the owner of the certificate as well. The owner of the certificate could be akentominas.com and the CA could be a 3rd party company, for example, DigiCert.
How we know a certificate is trusted #
- Certificate Authorities are trusted 3rd parties
- Browsers keep a list of CAs
- Browsers trust the CAs
- The CA certifies that a website (www.akentominas.com) owns a valid public key (certificate)
- The browser trust the public key
Chain of trust #
There are a few Root Certificate Authorities, but the larger amount of certificate, the more CA’s we need. Since we do not want to have many Root CA’s, the Root CA’s delegate the work and trust to the Intermediate CA’s.
So the flow goes as follows:
- The Root CA trust the Intermediate CA
- The Intermediate CA verifies the Website’s Public Key (Certificate)
- Since the Browser trusts the Root CA, and the Root CA the Intermediate one, then the browser trusts the Intermediate CA as well
Let’s explore Google’s certificate. In the next screenshot, you can see that the *.google.com certificate, is issued by the intermediate GTS CA 1C3 CA, which is trusted by the Root CA, GTS Root R1.
Trust chain verification #
We discussed how we can sign documents using RSA Asymmetric Key Algorithm. In the next image, we can see how the chain is built and verified.
Taking one fragment of the this certificate chain, when we want to verify that akentominas.com certificate is valid, the Intermediate certificate which signed the akentominas.com certificate by it’s private key, will decrypt akentominas.com certificate with it’s public key and if the decryption is successful we verified that the end user (akentominas.com) certificate is valid. The same process is done between the Root CA and the Intermediate Certificate.
Types of CA’s
- payed
- free
We can obtain public key (certificates), from payed CA’s, such as Comodo, DigiCert etc, or from free CA such as Let’s Encrypt.
SSL/TLS Certificates #
The Certificates are used in order to establish secure connection between the Browser and the Client. It is actually the server authenticating itself on our browser and thus, we see the lock icon before the URL.
A Certificate is nothing more than a public key, which is a file of data. The certificate always belongs to the owner of the certificate (e.i: akentominas.com).
Anatomy of a Certificate #
The TLS Certificate holds:
- Information about the owner of the certificate (e.i, akentominas.com)
- Information about the issuer (Certificate Authority, the entity that signed this certificate)
- Signature (the encrypted hash that signed the CA)
Self Signed Certificates #
Self Signed Certificates are certificates that have been not issued by any CA. Instead, they have been issued by the certificate owner itself.
By using a self-signed certificate, we still can perform encryption, but the disadvantage is that it will not be trusted by the browser.
Use cases for self-signed certificates:
We can use self-signed certificates when 2 parties already trust each other. For example imagine that you work for a big company and you need 2 servers to communicate over HTTPS. Since you know the servers belong to the company, you don’t need any third party to verify the trust between them.
Certificate Types #
- Single Domain Certificate (www.akentominas.com)
- Wildcard Certificate (*.akentominas.com), which will be valid for all the subdomains of a domain
- Multi-domain Certificate (akentominas.com, oracle.com, apple.com)
- SAN (Subject Alternative Name)
How does the Web Browser trust certificates? #
As we described in the Chain of trust section, we came to the conclusion that our website’s certificate is trusted because the Intermediate Certificate is truster, which is trusted by the Root CA Certificate.
But why is the Root CA trusted in first place?
Because we all have the Root Certificates installed in our machines. For instance, on my Mac i can see the list of CA’s that i have installed, which you can also find in the Keychain Access.
TLS In Depth #
The Certificate itself is not used for encryption of data between the client and server.
Although the term SSL Certificate is more popular that TLS Certificate, it is actually wrong. The reason is that there are 3 versions of SSL which are all deprecated, and instead, TLS 1.2 and TLS 1.3 are recommended to use nowadays.
In fact, the security protocol is configured on the server and you always should disable SSL from the supported protocols, since it has a lot of vulnerabilities and is prone to attacks.
How TLS Session is Established #
1. The client sends to the server the Cipher Suites that it is supporting. After that, the server picks one Cipher Suite form the list that the Client sent to it.
This is an example of Cipher Suites List that the browser sends to the server.
The first part of the suite, is the protocol that is used for the communication. The second part is the algorithm that is used for secure key generation. The third part is the encryption algorithm that will be used.
2. The server sends it’s Certificate as well as all the Intermediate Certificates to the Client.
Remember that the Root Certificate exists in our computer, so the server do not need to send the Root CA to the client.
3. The browser validates the Certificates. This means that the client now have authenticated the server and the handshake can begin
4. The encryption key is generated and exchanged between the client and the server (handshake)
5. Requests and responses are now encrypted/decrypted with the generated key making communication between client and server secure
The step number 4 needs to be analysed a bit further, since the following question is raised:
How does the key is securely exchanged between and client and the server? #
1. By the web browser
- As discussed, the server sends to the client (browser) the TLS Certificate
- The browser validates the certificate (described in section How does the Web Browser trust certificates?)
- The browser, extracts the public key from the TLS Certificate
- The browser generates a random key
- Using the RSA algorithm, the browser encrypts the random generated key
- Browser sends the encrypted random key to the server. Keep in mind that even though an attacker can capture the encrypted key, he won’t be able to decrypt it since only the private key can decrypt the key (Encryption using asymmetric key)
- Server, which owns the private key, decrypts the encrypted random key and now, both the server and the browser keep the same key which can encrypt and decrypt messages sent between them.
2. Using Diffie Hellman Algorithm
In this case, the public certificate key is not used to encrypt any data.
The Diffie-Hellman key exchange algorithm was designed for generation of secure keys between client and server via public insecure network.
We are not going to get into deep detail on the Diffie-Hellman algorithm mechanics, but here is a read that can explain and allows you better understand how the key exchange happens. I would also recommend to watch the bellow video explaining in detail of how Diffie-Hellman algorithm works.
It is worth mentioning that this is the recommended method of server-client key exchange.
Uni-directional VS Bi-directional communication #
- Asymmetric encryption is uni-directional communication. Meaning that only the one party can have both the key pairs and decrypt messages, but the other party can only have public key and only encrypt the messages that ship it.
- RSA is used to sign TLS Certificates but not for encrypted communications between the client and the server, since RSA is slow and it requires both the client and the server have the same key pair. This is not possible since TLS certificates are public keys only, and the CA (that posses both the private and the public key pair) verifies that the TLS certificate is valid. In TLS transport layer we need to encrypt and decrypt messages from both sides (request and response) bi-directional communication. Hence certificates are not possible to be used.
Conclusion #
- The Certificates are used in order to authenticate the server on the client
- When the client trusts the server, the TLS handshake begins, which means that both the client and the server exchange a key that they will use for secure communications
- The client makes a request which is encrypted with the key that was obtained on TLS handshake
- The server gets the encrypted request, decrypts it using the key exchanges on TLS handshake, and makes a response which encrypts again before sending it back to the client