自己动手实现 TCP 协议 [WIP]

J4UJh.png

学不会 TCP ? 没关系,让我们就自己造一个。

环境搭建

本文专注于 TCP 层的协议实现,因此对于 网卡驱动 OS 都不在本文的实现范围内,因此我们基于现已有pnet 进行开发。

mainsource code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
fn main() {

// 获得第一个函数参数 -> 网卡名称
let interface_name = env::args().nth(1).unwrap();
let interface_names_match =
|iface: &NetworkInterface| iface.name == interface_name;

let interfaces = datalink::interfaces();

// 按照网卡名称获得网卡
let interface = interfaces.into_iter()
.filter(interface_names_match)
.next()
.unwrap();

// 打开一个发送和接受 channel
let (mut tx, mut rx) = match datalink::channel(&interface, Default::default()) {
Ok(Ethernet(tx, rx)) => (tx, rx),
Ok(_) => panic!("Unhandled channel type"),
Err(e) => panic!("An error occurred when creating the datalink channel: {}", e)
};

loop {
// 处理下次抵达的数据包
match rx.next() {
Ok(packet) => {
let packet = EthernetPacket::new(packet).unwrap();
println!("Got packet {:?}", packet);
}
Err(e) => {
// If an error occurs, we can handle it here
panic!("An error occurred while reading: {}", e);
}
}
}
}

这就是我们最初的模板环境的,源码下载

我们运行可得结果

1
2
3
4
$ cargo run -- lo0
Got packet EthernetPacket { destination : 00:00:00:00:00:00, source : 00:00:00:00:00:00, ethertype : EtherType(0), }
Got packet EthernetPacket { destination : 00:00:00:00:00:00, source : 00:00:00:00:00:00, ethertype : EtherType(0), }
Got packet EthernetPacket { destination : 00:00:00:00:00:00, source : 00:00:00:00:00:00, ethertype : EtherType(0), }

IP 协议

Internet Protocol 协议为上层协议提供了寻址的能力, ip 协议通过协议头部分的 IP Address 进行不同目标地址的区分,因为我们直接从 二层 上进行本次探索,因此对于 IP Route 的部分已经被 OS 完成,我们直接解析协议即可。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
+------+ +-----+ +-----+     +-----+
|Telnet| | FTP | | TFTP| ... | ... |
+------+ +-----+ +-----+ +-----+
| | | |
+-----+ +-----+ +-----+
| TCP | | UDP | ... | ... |
+-----+ +-----+ +-----+
| | |
+--------------------------+----+
| Internet Protocol & ICMP |
+--------------------------+----+
|
+---------------------------+
| Local Network Protocol |
+---------------------------+

IP 协议的构成如下

Internet Header Format hhttps://tools.ietf.org/html/rfc791#page-11 doc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

让我们开始 coding 吧,我们先来定义下数据结构。比 u8 还小的数据体我们都用 u8 描述。

定义格式

def
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
pub struct IPv4Header {
// internet header field, 4bit
pub version: u8,
// internet header field Internet Header Length, 4bit
pub ihl: u8,
// Type of Service, 8bit
pub toc: u8,
// total length, 16bit
pub len: u16,
// Identification, 16bit
pub identification: u16,
// flag, 3bit
pub flags: u8,
// Fragment offset, 13 bit
pub offset: u16,
// ttl, 8bit
pub ttl: u8,
// Protocol, 8bit
pub protocol: u8,
// checksum
pub checksum: u16,
// Source Address
pub source_address: u32,
// Destination Address
pub destination_address: u32,
// Options
pub options_len: u8,
// Options 部分最长 40 字节
pub options_buffer: [u8; 40],
}

pub struct IPv4 {
// header
pub header: IPv4Header,
// Data, 这里可以用 'a &[u8] 减少拷贝,因为这是教程,图方便就先不管这些了
pub data: Vec<u8>,
}

我们定义完成我们的 IP 协议格式之后,增加一个对于 Check TCP IP 的函数,并为之增加单元测试。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
pub fn is_tcp_ip(packet: &[u8]) -> bool {
if packet.len() < 10 {
return false;
}

let version = packet[0] >> 4;
let protocol = packet[9];
// 检测首位是否为 4 ,然后协议为 TCP 的协议号
return version == 4 && protocol == 6;
}

#[test]
pub fn test_is_tcp_ip() {
let hex_data = hex::decode("45000034000040004006f023c0a80d1f783504a4df98e67dd861552917e0b7ae801007fff3ac00000101080a016dfbad36f9392c");
assert!(is_tcp_ip(hex_data.unwrap().as_slice()));
}

协议解析

这部分其实没有什么花头,按照规则解析即可。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
impl IPv4 {
fn new(packet: &[u8]) -> IPv4 {
let len = BigEndian::read_u16(&packet[2..]); // 总长
let ihl = packet[0] << 4 >> 4; // 头部长
return IPv4 {
header: IPv4Header {
version: packet[0] >> 4,
ihl,
toc: packet[1],
len,
identification: BigEndian::read_u16(&packet[4..]),
flags: packet[6] >> 5,
offset: BigEndian::read_u16(&packet[6..]) << 3 >> 3,
ttl: packet[8],
protocol: packet[9],
checksum: BigEndian::read_u16(&packet[10..]),
source_address: BigEndian::read_u32(&packet[12..]),
destination_address: BigEndian::read_u32(&packet[16..]),
options_len: 0, // options 部分没有解析
options_buffer: vec![],
},
data: Vec::from(&packet[(ihl as usize) * 4..len as usize]), // data区域
};
}
}

看看效果

修改我们的 main 函数,打印出更可读的部分内容

1
2
3
4
5
6
7
if packet.len() > payload_offset && ip::IPv4::is_tcp_ip(&packet[payload_offset..]) {
let ipv4_packet = ip::IPv4::new(&packet[payload_offset..]);
println!("Got Tcp Ip Package From {} To {} Len {}",
ipv4_packet.header.source(),
ipv4_packet.header.destination(),
ipv4_packet.header.len);
}

效果如下:

1
2
3
4
5
Got Tcp Ip Package From 127.0.0.1 To 127.0.0.1 Len 64
Got Tcp Ip Package From 127.0.0.1 To 127.0.0.1 Len 40
Got Tcp Ip Package From 127.0.0.1 To 127.0.0.1 Len 100
Got Tcp Ip Package From 127.0.0.1 To 127.0.0.1 Len 52
Got Tcp Ip Package From 127.0.0.1 To 127.0.0.1 Len 100

为了快速的跑完主线,对于 Options分片 部分的逻辑暂时也没有涉及,因此对于 TCP 有自己的 MMS 的协商部分,一般不会使用到 IP 的分片逻辑。下一步我们就正式的进入我们的 TCP 代码部分吧。

TCP 协议

传输控制协议(Transmission Control Protocol)是一种面向连接的、可靠的、基于字节流的传输层通信协议,由IETF的RFC 793定义。

数据格式定义

IP 类似

TCP Header Formatlink
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
 0                   1                   2                   3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | Destination Port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data | |U|A|P|R|S|F| |
| Offset| Reserved |R|C|S|S|Y|I| Window |
| | |G|K|H|T|N|N| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Checksum | Urgent Pointer |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

我们先来定义数据结构先

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
pub struct TCPHeader {
// source port, 16 bit, index 0-1
pub source_port: u16,
// Destination port, 16 bit, index 2-3
pub destination_port: u16,
// Sequence Number , 32 bit, index 4 - 7
pub sequence_number: u32,
// Acknowledgment Number, 32 bit, index 8 - 11
pub acknowledgment_number: u32,
// Data offset, 4 bit, index 12
pub offset: u8,
// reserved, 6bit, index 13
pub reserved: u8,
// flags, 6bit, index 14
pub flags: u8,
// windows, 16bit, index 15
pub windows: u16,
// checksum, 16bit, index 17
pub checksum: u16,
// Urgent Pointer, 16 bit, index 21
pub urgent_pointer: u16,
// Options len
// Options data
}
pub struct TCP {
// tcp header
pub header: TCPHeader,
// tcp data
pub data: Vec<u8>,
}

解析数据

相同的味道和套路,再来一遍

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
impl TCP {
pub fn new(packet: &[u8]) -> TCP {
let offset = packet[12] >> 4;
return TCP {
header: TCPHeader {
source_port: BigEndian::read_u16(packet),
destination_port: BigEndian::read_u16(&packet[2..]),
sequence_number: BigEndian::read_u32(&packet[4..]),
acknowledgment_number: BigEndian::read_u32(&packet[8..]),
offset,
reserved: (BigEndian::read_u16(&packet[12..]) << 4 >> 10) as u8,
flags: packet[13] << 2 >> 2,
windows: BigEndian::read_u16(&packet[14..]),
checksum: BigEndian::read_u16(&packet[17..]),
urgent_pointer: BigEndian::read_u16(&packet[21..]),
},
data: Vec::from(&packet[(offset as usize) * 4..]),
};
}
}

记得补充单元测试

unit test
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#[test]
pub fn test_parse_tcp() {
let hex_data = hex::decode("e67ddf9817e0b77ed8615529801801f52b2300000101080a36f9392c016de81f263cdb224dcc9a4b4c2191ecb4c43c0a3daeb61233ee4af155a60c9dcac70fcebe0fca8964908c1c5f5073c50b0522eb");
let tcp = TCP::new(hex_data.unwrap().as_slice());
assert_eq!(59005, tcp.header.source_port);
assert_eq!(57240, tcp.header.destination_port);
assert_eq!(400603006, tcp.header.sequence_number);
assert_eq!(3630257449, tcp.header.acknowledgment_number);
assert_eq!(501, tcp.header.windows);

assert!(tcp.header.is_psh());
assert!(tcp.header.is_ack());
assert!(!tcp.header.is_fin());
assert!(!tcp.header.is_rst());
assert!(!tcp.header.is_syn());
assert!(!tcp.header.is_urg());


let data = hex::decode("263cdb224dcc9a4b4c2191ecb4c43c0a3daeb61233ee4af155a60c9dcac70fcebe0fca8964908c1c5f5073c50b0522eb");
assert_eq!(data.unwrap().as_slice(), tcp.data.as_slice());
}

增加Flag辅助函数

我们知道 TCP 大部分的时候,我们都需要判断 Flags 的标记,一共 6 种,我们处理一下。

tcpheader flags
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
impl TCPHeader {
pub fn is_urg(&self) -> bool {
self.flags & 0b100000 != 0
}

pub fn is_ack(&self) -> bool {
self.flags & 0b010000 != 0
}

pub fn is_psh(&self) -> bool {
self.flags & 0b001000 != 0
}

pub fn is_rst(&self) -> bool {
self.flags & 0b000100 != 0
}

pub fn is_syn(&self) -> bool {
self.flags & 0b000010 != 0
}

pub fn is_fin(&self) -> bool {
self.flags & 0b000001 != 0
}
}

源码地址

在来看看效果

1
2
3
4
5
6
7
8
9
10
if packet.len() > payload_offset && ip::IPv4::is_tcp_ip(&packet[payload_offset..]) {
let ipv4_packet = ip::IPv4::new(&packet[payload_offset..]);
let tcp_packet = tcp::TCP::new(ipv4_packet.data.as_slice());
println!("Got Tcp Ip Package From {}:{} To {}:{} Data Len {}",
ipv4_packet.header.source(),
tcp_packet.header.source_port,
ipv4_packet.header.destination(),
tcp_packet.header.destination_port,
tcp_packet.data.len());
}
1
2
3
4
5
Got Tcp Ip Package From 127.0.0.1:1080 To 127.0.0.1:64533 Data Len 48
Got Tcp Ip Package From 127.0.0.1:1080 To 127.0.0.1:64588 Data Len 47
Got Tcp Ip Package From 127.0.0.1:64533 To 127.0.0.1:1080 Data Len 0
Got Tcp Ip Package From 127.0.0.1:64588 To 127.0.0.1:1080 Data Len 0
Got Tcp Ip Package From 127.0.0.1:64531 To 127.0.0.1:64532 Data Len 48

TCP 状态储存

我们知道 TCP/IP 协议来说,我们通过 (来源IP,来源端口,目的IP,目标端口) 这四组数据可以组成一个唯一的ID,我们使用这个ID作为我们储存的KEY来储存我们的数据。

我们用一种比较简单的 u128 来表示我们的这个 ID

1
2
3
0                       31                      63                         95                  127
+-----------------------------------------------------------------------------------------------+
source address source port destination address destination port
计算ID
1
2
3
4
let mut unique: u128 = (packet.header.source_address as u128) << 96;
unique = unique & ((tcp_packet.header.source_port as u128) << 64);
unique = unique & ((packet.header.destination_address as u128) << 32);
unique = unique & tcp_packet.header.destination_port as u128;

之后我们简单的使用一个 map 来储存我们的数据。

1
2
3
4
5
6
lazy_static! {
static ref HASHMAP: HashMap<u128, TCPCurState> = {
let mut m = HashMap::new();
m
};
}

TCP 状态转移

Map

先上一份PDF

JyPES.png

三次握手

我们来实现左线关于 Server 端的 TCP 状态转移,不过因为我们并没有实现 Socket,因此数据进入系统的第一步应该是通过 Listening Port 决定是否需要丢弃 Packet,因此我们使用一个常量来模拟 Listening Port

1
const LISTENING_PORT: u16 = 10023;

那我们先来处理我们第一步需要处理的状态转换: LISTEN -> SYN_RCVD,这是我们建立 TCP 连接的第一步,从图上看我们知道,服务器接收到 syn 报文然后响应 syn,ack 报文即可。

1
2
3
4
5
6
7
8
9
10
11
12
13
    TCP A                                                TCP B

1. CLOSED LISTEN

2. SYN-SENT --> <SEQ=100><CTL=SYN> --> SYN-RECEIVED

3. ESTABLISHED <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED

4. ESTABLISHED --> <SEQ=101><ACK=301><CTL=ACK> --> ESTABLISHED

5. ESTABLISHED --> <SEQ=101><ACK=301><CTL=ACK><DATA> --> ESTABLISHED

Basic 3-Way Handshake for Connection Synchronization

参考