tcpdump mailing list archives

Re: dlt_choice table in pcap.c


From: Guy Harris <gharris () sonic net>
Date: Thu, 4 Dec 2025 12:34:01 -0800

On Dec 4, 2025, at 11:22 AM, Michael Richardson <mcr () sandelman ca> wrote:

Guy, we have this lovely table in pcap.c:

static struct dlt_choice dlt_choices[] = {
DLT_CHOICE(NULL, "BSD loopback"),
DLT_CHOICE(EN10MB, "Ethernet"),
DLT_CHOICE(EN3MB, "experimental Ethernet (3Mb/s)"),
DLT_CHOICE(AX25, "AX.25 layer 2"),
DLT_CHOICE(PRONET, "Proteon ProNET Token Ring"),
       ...

I feel like it ought to be indexed by LINKTYPE instead.

I think the capture code in tcpdump, which I think later became the separate libpcap library, originally just supported 
the BPF capture mechanism, which used DLT_ values to indicate link-layer types. Thus, libpcap used DLT_ values in its 
APIs; there *were* no LINKTYPE_ values.

Unfortunately, when they added new link-layer types to BPF, various OSes that picked up BPF sometimes chose values such 
that the same numerical value corresponded to *different* DLT_ names and the same DLT_ name had *different* values in 
different OSes.

Equally unfortunately, pcap files used DLT_ values to indicate the link-layer type, meaning that a file captured using 
one of the offending DLT_ types would not be read correctly on machines with different choices.

So I cooked up the LINKTYPE_ list.

For the original DLT_ assignments, which everybody left alone (and for which, in several cases, ARP hardware values 
were used, hence separate "Ethernet" and "IEEE 802" values, the latter of which was repurposed for 802.5), the 
LINKTYPE_ value was the same as the DLT_ value. (I changed the *name* for type 1 to LINKTYPE_ETHERNET; DLT_EN10MB was 
named to distinguish it from DLT_EN3MB, where the former is D/I/X Ethernet and the latter is the Xerox experimental 
Ethernet, the link-layer headers and link-layer types for which are different.)

For other DLT_ assignments that weren't all over the map, I again went with LINKTYPE_ = DLT_.

For the inconsistent DLT_ values, I assigned a separate LINKTYPE_ value, in the "100 and above" range.

Libpcap has internal routines to map between LINKTYPE_ values and DLT_ values - dlt_to_linktype() and 
linktype_to_dlt(), in pcap-common.c; they're algorithmic rather than purely table-driven.

Libpcap currently doesn't expose LINKTYPE_ values; they're #defined inside pcap-common.c.

Making *existing* routines either accept LINKTYPE_ values rather than DLT_ values or return LINKTYPE_ rather than DLT_ 
values can break binary compatibility for those values where LINKTYPE_XXX != DLT_XXX.

(This came up as I poked someone about better references for the many JUNIPER entries)
Was there a table that did DLT<->LINKTYPE? (I realize it's not always 1:1).

Table, no. As noted, the conversion routines are algorithmic. That's easier to maintain, as, for the vast majority of 
LINKTYPE_ values, LINKTYPE_XXX = DLT_XXX. The code handles the exceptions on a case-by-case basis.

Maybe dlt_choice should have both values in the table.

dlt_choice() is just used to map between DLT_ values and DLT_ names - and to map DLT_ values to DLT_ descriptions, for 
pcap_datalink_name_to_val(), pcap_datalink_val_to_name(), and pcap_datalink_val_to_description().

Mapping between DLT_ values and LINKTYPE_ values is a separate operation, and is done solely inside libpcap when 
reading or writing pcap or pcapng files:

        LINKTYPE_ values in files are mapped to DLT_ values when reading (unknown LINKTYPE_ values are passed through, 
in case they're really DLT_ values from before LINKTYPE_ value were used);

        DLT_ values are mapped to LINKTYPE_ values when writing.

(as an aside, I wonder if pcapint_strcasecmp() is still needed in 2025, given
UTF-8, etc.

pcapint_strcasecmp() is used only to compare against ASCII strings. There are cases where, for user convenience, we do 
case-insensitive mapping, so that both upper-case and lower-case versions of said ASCII strings work.

It doesn't care about non-ASCII characters; it just leaves them alone (see below).

It exists to 1) make sure the mapping is *locale-independent* (Wireshark, which is linked with GLib from the GTK/GNOME 
project, uses g_ascii_strcasecmp() for the same purpose), and 2) deal with platform that don't have strcasecmp(). 
Locale-independence is necessary because, in a Turkish locale, capital-I is mapped to lower-case ı (LATIN SMALL LETTER 
DOTLESS i) and lower-case-i is mapped to upper-case İ (LATIN CAPITAL LETTER I WITH DOT ABOVE).

Not using g_ascii_strcasecmp() in Wireshark caused a *crashing bug* in Wireshark in a Turkish locale (it was in code 
that was parsing some text configuration file; the problem was that a keyword was being compared with strcasecmp(), and 
the keyword contained the Roman-alphabet "i", and the match failed in a Turkish locale when case-insensitivity was 
required, and that was compounded by a null pointer being returned in the mismatch case and the validity of the pointer 
*not* being checked).

After fixing both 1) the case-insensitive comparison and 2) the lack of a null-pointer check, I've remembered that 
quirk.

        https://en.wikipedia.org/wiki/Dotless_I

        https://en.wikipedia.org/wiki/%C4%B0 (capital dotted-i)

        https://en.wikipedia.org/wiki/Dotted_and_dotless_I_in_computing

I guess that the upper-128 of that chatmap table is Latin-1?  yet it
seems to map the upper-control codes to... I'm not sure what.

The lower 128 positions are, obviously, for ASCII. (Anybody who wants to port libpcap to, say, z/OS, with APIs that 
accept EBCDIC strings, is on their own.) The only mapping they do is to map upper case letters to the corresponding 
lower-case letters.

The upper 128 don't do any mapping, so they leave non-ASCII ISO 8859-n characters, non-ASCII UTF-8 characters (which 
are up entirely of octets with the high bit set), etc. alone.

Did we need to map å -> a?.

No. This is only case-insensitive, not diacritic-insensitive.

If not, wonder why charmap is 256 entries)

So that we can just use the mapped value for all 256 octet values.
_______________________________________________
tcpdump-workers mailing list -- tcpdump-workers () lists tcpdump org
To unsubscribe send an email to tcpdump-workers-leave () lists tcpdump org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Current thread: