tcpdump mailing list archives
IP Address Anonymization Feature in tcpdump
From: Alberto Perez Bogantes via tcpdump-workers <tcpdump-workers () lists tcpdump org>
Date: Mon, 10 Jun 2024 14:39:01 -0500
--- Begin Message --- From: Alberto Perez Bogantes <aperezbogantes () hawk iit edu>
Date: Mon, 10 Jun 2024 14:39:01 -0500
Hello tcpdump workers, We've been working on adding a new feature to tcpdump that will allow IP address anonymization via the Crypto-PAn (Cryptography-based Prefix-preserving Anonymization) approach. The feature we’re adding to tcpdump is motivated by the importance of preserving user privacy and complying with data processing security regulations. The Crypto-PAn anonymization approach keeps the original IP addresses' prefixes while anonymizing the suffixes, preserving the network structure. The goal of this email is to poll the interest of the tcpdump community in merging this feature once it’s complete, and to get in touch with potential reviewers of our patch. We are aware that there are external tools that seek a similar goal, as discussed in PR #615 (https://github.com/the-tcpdump-group/tcpdump/pull/615). However, the anonymization methods used by these tools often fall short to achieve a balance between privacy and preserving data utility. For example, Black Marker sets the IP to all zeros, resulting in a complete loss of utility. Permutation can distort the original data distribution, resulting in skewed results and lower analytical value. Similarly, traditional IP randomization methods frequently treat each octet independently, omitting the importance of preserving the hierarchical structure of IP addresses and compromising the integrity of network analysis and management. For this reason, we believe that the best approach is to use prefix-preserving anonymization techniques, which are similar to permutation techniques but preserve the prefixes. The mapping is kept consistent using cryptographic keys, which addresses the issue of balancing privacy and utility in data anonymization. We believe that this functionality is well suited for tcpdump because much of the logic used to print an IP address for a specific packet can be reused to access that IP and anonymize it. The logic for dissecting packet headers can be slightly adapted to implement this feature, including anonymization of application headers. For example, much of the code written to print an IP address offered by DHCP can be used to access that address and anonymize it. We have an early prototype of this patch. The feature we’re adding uses the cryptopANT library. This library provides a comprehensive set of anonymization functions designed for IPv4 and IPv6 addresses. With the addition of a new flag, "--anon," users enable IP address anonymization in tcpdump by providing a key file that will be used by the Crypto-PAn anonymization algorithm. Here's a brief overview of how the implementation works: 1. Activation Flag: Users can activate the anonymization feature by using the "--anon" flag along with tcpdump commands. 2. Key File: A key file containing the encryption key required for the Crypto-PAn algorithm must be provided as an input parameter alongside the "--anon" flag. 3. Callback Invocation: When the "loop_pcap" function acquires a packet, the designated callback method responsible for anonymizing IP addresses is invoked. This method anonymizes the IP addresses in the packet headers. 4. Execution of Real Callback: Following anonymization, the "real callback" is triggered. This callback can do current implemented actions such as dumping packet contents, writing contents to a pcap file, etc. An example of how to use this flag is: ./tcpdump --anon keyfile.txt -n where, keyfile.txt is a file containing the key produced by cryptopANT using scramble_ips --newkey keyfile.txt. Currently, we have implemented support for anonymizing IPv4 addresses. Our roadmap includes extending support to accommodate additional anonymization methods, and enabling users to specify anonymization parameters dynamically. I am sharing my GitHub project (https://github.com/aperezb21/tcpdump), which is forked from commit bb704ed32d770e84fdc340de8276c261bb6e9ee1, containing the current prototype. We welcome any discussion or feedback, both on or off-list. Thank you, Alberto.
--- End Message ---
_______________________________________________ tcpdump-workers mailing list -- tcpdump-workers () lists tcpdump org To unsubscribe send an email to tcpdump-workers-leave () lists tcpdump org %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s
Current thread:
- IP Address Anonymization Feature in tcpdump Alberto Perez Bogantes via tcpdump-workers (Jun 10)
