A
short introduction to I2P:
Since the academic
community seems to be far more aware of Tor than I2P, it may be
helpful to compare the two systems and cover some of the basics
concerning how I2P works. Both Tor and I2P use layered cryptography
so that intermediates cannot decipher the contents of connections
beyond what they need to know to forward the connection on to the
next hop in the chain. Rather than focusing on anonymous access to
the public Internet, I2P’s core design goal is to allow the
anonymous hosting of services (similar in concept to Tor Hidden
Services). It does provide proxied access to the public Internet via
what are referred to as “out proxies”, as well as various
internal services to proxy out onto the Tor and Freenet systems, but
that is not its core design goal.
Every I2P node is
also generally a router (and you can use the terms somewhat
interchangeably when it comes to I2P) so there is not a clear
distinction between a server and a mere client like there is with the
Tor network. Some I2P nodes do take on more responsibility than
others, such as floodfill routers that participate in NetDB. Unlike
Tor, I2P does not use centralized directory servers to connect nodes,
but instead utilizes a DHT (Distributed Hash Table), based on
Kademlia,
referred to as NetDB. This distributed system helps to eliminate a
single point of failure, and stems off blocking attempts similar to
what happened to Tor when China blocked access to the core directory
servers on September 25th
2009iii.
I2P’s reliance on a peer to peer system for distributing
routing information does open up more avenues for Sybil attacks and
rouge peers, but steps have been take to help mitigate this and are
covered in the documentationiv.
Instead of referring
to other routers and services by their IP, I2P uses cryptographically
identifiers to specify both routers and end point services. For
example the identifier for “www.i2p2.i2p”, the project’s
main website internal to the I2P network, is:
-KR6qyfPWXoN~F3UzzYSMIsaRy4udcRkHu2Dx9syXSz
UQXQdi2Af1TV2UMH3PpPuNu-GwrqihwmLSkPFg4fv4y
QQY3E10VeQVuI67dn5vlan3NGMsjqxoXTSHHt7C3nX3
szXK90JSoO~tRMDl1xyqtKm94-RpIyNcLXofd0H6b02
683CQIjb-7JiCpDD0zharm6SU54rhdisIUVXpi1xYgg
2pKVpssL~KCp7RAGzpt2rSgz~RHFsecqGBeFwJdiko-
6CYW~tcBcigM8ea57LK7JjCFVhOoYTqgk95AG04-hfe
hnmBtuAFHWklFyFh88x6mS9sbVPvi-am4La0G0jvUJw
9a3wQ67jMr6KWQ~w~bFe~FDqoZqVXl8t88qHPIvXelv
Ww2Y8EMSF5PJhWw~AZfoWOA5VQVYvcmGzZIEKtFGE7b
gQf3rFtJ2FAtig9XXBsoLisHbJgeVb29Ew5E7bkwxvE
e9NYkIqvrKvUAt1i55we0Nkt6xlEdhBqg6xXOyIAAAA
UQXQdi2Af1TV2UMH3PpPuNu-GwrqihwmLSkPFg4fv4y
QQY3E10VeQVuI67dn5vlan3NGMsjqxoXTSHHt7C3nX3
szXK90JSoO~tRMDl1xyqtKm94-RpIyNcLXofd0H6b02
683CQIjb-7JiCpDD0zharm6SU54rhdisIUVXpi1xYgg
2pKVpssL~KCp7RAGzpt2rSgz~RHFsecqGBeFwJdiko-
6CYW~tcBcigM8ea57LK7JjCFVhOoYTqgk95AG04-hfe
hnmBtuAFHWklFyFh88x6mS9sbVPvi-am4La0G0jvUJw
9a3wQ67jMr6KWQ~w~bFe~FDqoZqVXl8t88qHPIvXelv
Ww2Y8EMSF5PJhWw~AZfoWOA5VQVYvcmGzZIEKtFGE7b
gQf3rFtJ2FAtig9XXBsoLisHbJgeVb29Ew5E7bkwxvE
e9NYkIqvrKvUAt1i55we0Nkt6xlEdhBqg6xXOyIAAAA
This
is the base64 representation of the destination. Obviously having a
user type in this 516 byte chuck of date as an Identifier would be
somewhat less than use friendly, and it would not be valid in some
protocols anyway (HTTP for example). I2P provides some workarounds
for naming identifiers; one is called “Base 32 Names”,
similar in many ways to Tor’s .onion naming convention.
Essential the 516 byte Identifier is decoded (with some character
replacements) into its raw value, the value hashed with SHA256, then
this hash is base 32 encoded and “.b32.i2p” is
concatenated onto the endv.
The results for the “www.i2p2.i2p” identifier shown
above would be:
rjxwbsw4zjhv4zsplma6jmf5nr24e4ymvvbycd3swgiinbvg7oga.b32.i2p
This form is much
easier to work with. For most eepSite users the most common naming
solution is just to use the local I2P address book that maps a simple
name like “www.i2p2.i2p” to its much long Base 64
identifier. There is no official DNS like service to do this lookup
as that would be a single point of failure that I2P wishes to avoid.
Each I2P node has its own series of text files that contain the name
mappings in much the same way that the Internet use to use just HOSTS
files to translate names to IPs before DNS. There are however naming
subscription services inside of I2P that can be synced to if the user
wishes, though this means the user is putting some level of trust in
these services not to hijack the name mappings.
A router’s ID
is not the same as a service’s ID, so even if the service
happens to be running on a particular router the two identifiers
cannot be easily tied together. I2P also uses a few techniques to
help mitigate traffic correlation attacks. While the Tor network uses
a single changing path for communications, I2P uses the concept of
“in” and “out” tunnels so requests and
responses are not necessarily using the same paths for exchanging
information. I2P also uses an Onion routing variant referred to as
Garlic routing, where more than one message is bundled together into
a “clove”. This mixing of messages using Garlic routing
can lead to confusion for attackers attempting to correlate
transmission sizes and timings, and if “cloves” are
composed of messages from both high latency tolerant applications
(e.g. email) and low latency applications (e.g. web traffic)
correlation could become even harder. More comparisons between I2P,
Tor and other anonymity networks can be found on I2P’s “I2P
Compared to Other Anonymous Networks” pagevi.
Many services can be
hosted inside of the I2P overlay network (IRC, Bittorent, eDonkey,
Email, etc.), and the I2P team has provided an API for creating new
applications that ride on top of the I2P overlay network. As the
developers note on their page, many standard Internet applications
are not designed with anonymity in mind, so caution should be taken
when adapting an existing application to run on top of I2P. While
many applications exist and could be researched for application data
leaks, this paper will be concentrating on eepSites which are
websites internal to I2P. Some measures are taken by the default I2P
install to help filter revealing information at the application
level, but service providers do make mistakes that can lead to too
much information being revealed.
My primary motivation
for this project is to help secure the identity of I2P eepSite
service hosts by finding weakness in the implementation of these
systems at higher levels that can lead to their real IP or
administrator being revealed, or the anonymity set being greatly
reduced. Exposing these weaknesses will allow the administrators of
I2P eepSite services to avoid these pitfalls when they implement
their I2P web applications. A secondary objective would be to allow
the identification of certain groups that law enforcement might be
interested in locating, specifically pedophiles. These goals are
somewhat at odds, since law enforcement could use the knowledge to
harass groups I do support, and pedophiles could use the
knowledge to help hide themselves, neither of which are goals I would
desire, but with privacy matters you sometimes have to take the bad
with the good. A tertiary goal would be just to see if I can do it,
and what I can learn skill wise along the way. I2P was chosen as my
platform since less research has gone into it verses Tor, but many of
the same ideas and techniques should be applicable to both systems as
they offer similar functionality when it comes to hidden services
that are HTTP based. Another feature that makes this research
somewhat different is that more work has been done in the past trying
to detect users, not providers, of services in a Darknet.
While there are many
papers on attacking anonymizing networks, most seem to be pretty
esoteric. A few previous papers that may be of use in my research
are:
Locating Hidden Serversvii
Lasse Overlier, Paul Syverson, sp, pp.100-114, 2006 IEEE Symposium on Security and Privacy (S&P'06), 2006
Lasse Overlier, Paul Syverson, sp, pp.100-114, 2006 IEEE Symposium on Security and Privacy (S&P'06), 2006
Low-resource routing attacks against
anonymous systemsviii
Kevin Bauer, Damon McCoy, Dirk Grunwald, Tadayoshi Kohno, Douglas Sicker
Kevin Bauer, Damon McCoy, Dirk Grunwald, Tadayoshi Kohno, Douglas Sicker
The “Locating Hidden Servers”
paper may not be directly applicable as it seems I2P goes to some
effort to synchronize times and avoid clock skew problemsix.
A more directly I2P related analysis can be found on the I2P site’s
“I2P’s Threat Modelx”
and guides to make services more anonymous can be found on “Ugha’s
I2P Wikixi”.
The threat model page points to many more resources and papers on
possible attack vectors. More background papers that will be of use
during testing are listed in the approach section.
Section 2, Approach:
Section 2, Approach:
My main approach will
be looking at the application layer and seeing what details
about the host are given away. This has already been done in the past
against cloaked clients with much success:
Metasploit Decloaking Enginexii
EFF write-ups on client identificationxiii
Since I’m
targeting the identity of servers instead of clients the exact
vectors for attack will differ, but there will be some overlap. Many
I2P services are hosted on nodes/routers that also act as the owner’s
client node so client based attacks may also be fruitful in revealing
their identity. People regularly make mistakes in how they configure
web servers and applications that cause too much information to be
leaked out to an attacker, information that can make finding a
vulnerability much easier. This sort of information leakage is
regularly mentioned in the OWASP (Open Web App Security Project) Top
10xiv
in one form or another. One of my mantras is “Specific
exploits are temporary, bad configuration mistakes are forever”.
A few of the techniques I plan to try to reveal identifying
information about the host of an eepSite include:
- Spidering the content of the eepSite for related sites. This should be made somewhat easier because I can restrict the spidering to just sites ending in .i2p, a pseudo top level domain name commonly used in the I2P network.
- Using tools like Nikto to find directories and files that reveal server information. Just because a directory is not linked to does not mean it can’t be found by brute forcing common directory paths.
- HTTP headers may be returned by the sites that reveal information about the type of web daemon that is running (IIS/Apache/etc.). By default the I2P install package comes with the Jetty webserver, but this can be changed by the user if they desire different functionality. I imagine this sort of attack won’t lead to outright identification, but may be useful for reducing the anonymity set, especial if the administrator makes the mistake of using the same server instance on an Internet facing site.
- Putting bait in logs via the user agent string that may make the administrator of the site visit a tracking page without using an I2P outproxy. This could take the form of a simple XSS (Cross Site Scripting) redirect attack or web bugsxv embedded in a page.
- See if reverse DNS lookups done by the webserver when it generates logs give away its true IP. Some web servers are configured to automatically do a reverse DNS lookup on visiting IPs to find their host name. This may be outside of my ability as I do not control an authoritative DNS server for reverse lookups, but perhaps I can find someone to help with the research that does control such a resource.
- I plan to also ask the security and privacy community at large for more ideas, and of course give credit to their contributions. Via my contacts in the community I imagine I can elicit quite a few responses.
I sent an early draft of this proposal to
ZZZ (the lead developer of I2P and as the development is done
pseudonymously that is the only name I have for him) and he proposed
a few additional tracks I should take:
- Flesh out some of the attacks listed in the threat model page.
- Review the server and client proxy code for flaws.
- Look at the Tor change log and see if any bugs were fixed that may still exist in I2P.
Some
of the techniques that I plan to test may not be appropriate to do
against resources I do not own, so my plan is to put up my own
eepSite to do many of the tests. For common web vulnerabilities that
could lead to identity discloser I plan to install the Mutillidaexvi
training package that implements the OWASP Top 10 as a test bed.
There
will be a few challenges imposed because of the nature of the I2P
darknet. I’m sure more challenges will become apparent as I get
deeper into the research, but a few I am concerned about going into
the project are:
- Communications with the eepSites is normal done via an HTTP proxy. This is somewhat more limiting connection wise than using a SOCKS proxy, and way more limiting that having a direct TCP/IP connection. Also, the default HTTP proxy that comes with I2P does not support the “connect” command. While this is stated in the documentation, I encountered it while trying to run an Nmap scan using proxychains, and seeing the following message when I used Wireshark to try to diagnose why my attempts were failing:
Warning: Non-HTTP Protocol
The
request uses a bad protocol.
The
I2P HTTP Proxy supports http:// requests ONLY. Other protocols such
as https:// and ftp:// are not allowed.
While
this is challenging, I’m fairly confident I can work around the
problem. ZZZ tells me that SOCKS and Connect should work if I set up
the tunnels for them but so far I have not gotten those two proxy
tunnel types to successfully connect to an eepSite.
- Perhaps because of point one, many of the tools I have experimented with so far have a tendency to give false results or hang while working on spidering an eepSite. I may have to create some custom spider scripts that compensate for eepSite oddities.
- While spidering I need to be careful not to download contraband onto my own system. There is a fair amount of child pornography out on I2P, and laws in the United States are pretty unforgiving on the issue, even if the files were obtained while doing legitimate research. As such I plan to mostly spider for text, which is unfortunate as EXIF data in images hosted on eepSites may be of value in identifying individuals.
I’m
hoping that this research will be an improvement over existing work
in the following ways:
- Clearer examples of how leaked information can be found. It is one thing to say “headers can leak information”, and it’s another to give exact ncat commands to reveal the header information.
- A concentration on I2P instead of Tor. The academic world seems to write many papers on the Tor network, but I2P seems to get only a passing mention, if mentioned at all.
- A concentration on the application layer instead of the network or transport layers. Since many of the same application layer protocols are used on different anonymity networks, the research will hopefully have a broader use scope.
- Real world tests on systems that have been implemented for more than just academic purposes. Some of the papers I’ve read on privacy seem to cover systems that have not seen much real world deployment (Tor being a very notable exception).
- Less reliance on esoteric attack vectors. For example, timing attacks are interesting, but I’m not convinced they would be easy to pull off under real world conditions and on in-use systems.









0 comments:
Post a Comment