This paper studies countermeasures to traffic analysis attacks. A common strategy for such countermeasures is link padding. We consider systems where payload traffic is padded so that packets have either constant interarrival times or variable interarrival times. The adversary applies statistical recognition techniques to detect the payload traffic rates by using statistical measures like sample mean, sample variance, or sample entropy. We evaluate quantitatively the ability of the adversary to make a correct detection and derive closedform formulas for the detection rate based on analytical models. Extensive experiments were carried out to validate the system performance predicted by the analytical method. Based on the systematic evaluations, we develop design guidelines for the proper configuration of a system in order to minimize the detection rate.
A significant portion of the Internet traffic today is encrypted, and there are strong indications that this portion will increase at a high rate. However, encryption alone may not be sufficient for secured communications. A number of noncryptographic attacks ([5,10,15,,19]) have illustrated how the observations of traffic behavior allow an adversary to infer significant information about participants and their communications. For example, [18] shows that timing analysis of SSH traffic can greatly simplify the breaking of passwords. This paper deals with timing based traffic analysis attacks and their countermeasures.
Link padding is one effective approach in countering traffic
analysis attacks. The idea is based on
Differing from the previous studies, we establish a formal theoretical framework for link padding systems and derive closedform formulae for estimation of detection rates. Our formulae correctly describe the relationship between detection rate and system parameters such as the padded traffic type, sample size, and location in the network where the adversary can collect traffic samples. We report results from extensive experiments in various situations including local area network in a laboratory, campus networks, and wide area networks. Our data consistently demonstrates the usefulness of our formal model and correctness of performance predicted by the closedform formulae. Based on the observations, we develop design guidelines that allow a manager to properly configure a system in order to minimize the detection rate.
The rest of this paper is organized as follows. Section 2 briefly reviews the related work and
summarizes that
The study of traffic analysis and its countermeasures for computer networks is not new. Baran [2] proposed the use of heavy unclassified traffic to interfere with the adversary's tampering on the links of a security network system for classified communication, and suggested adding dummy, i.e. fraudulent, traffic between fictitious users of the system to conceal traffic loads.
To protect the anonymity of email transmission, Chaum
[3]
proposed the use of a
A survey of countermeasures for traffic analysis is given in [25]. To mask the
frequency, length, and origindestination patterns of endtoend communication,
the use of dummy messages is suggested to make the traffic adhere to a
predefined pattern. From the discussion of
The authors in ([12,13,24]) give a mathematical framework to optimize the bandwidth usage while preventing traffic analysis of the endtoend traffic rates. Timmerman [23] proposes an adaptive traffic masking (hiding) model to reduce the overhead caused by link padding. But, when the rate of real traffic is low, the link padding rate is reduced as well, in order to conserve link bandwidth. Perfect secrecy is violated in this case, as largescale variations in traffic rates become observable.
Raymond in [15] gives an informal survey of many ad hoc traffic analysis attacks on systems providing anonymous service. One conclusion is that dummy messages must be used to achieve high information assurance for the system. It is even claimed [1] that we have to use padding to each link of an anonymity network (although more research is needed to clear this claim).
In our previous work, NetCamo [9], we describe how to provide endtoend prevention of traffic analysis while at the same time guaranteeing QoS (worstcase delay of message flows). It turns out that the delay experienced by packets of a protected flow is tightly coupled to the bandwidth required to send both payload and dummy packets. We propose methods such as QoS routing to tackle the QoS problem for systems using link padding strategies.
In this section, we present the model of the network in our study and then discuss link padding mechanisms that are used as a countermeasure for traffic analysis attacks. Finally we formally define the model of the adversary, who uses statistical pattern recognition strategies for traffic analysis attacks.
In this work, we assume that the network consists of protected subnets, which are interconnected by unprotected networks. Traffic within protected subnets is assumed to be shielded from observers. Unprotected network can be public networks (e.g., the Internet), or networks that are deployed over an easily accessible broadcast medium. These networks are accessible to observation by thirdparties, and are therefore open to traffic analysis. This model captures a variety of situations, ranging from battleship convoys (where the largescale shipboard networks are protected and the intership communication is wireless) to communicating PDAs (where the protected networks consist of single nodes).
Figure 1 illustrates the setup of the network in this study. Two security gateways GW1 and GW2 are placed at the two boundaries of the unprotected network and provide the link padding necessary to prevent traffic analysis of the payload traffic exchanged between the protected subnets A and B.
Note that the gateways can be realized either as standalone boxes, modules on routers or switches, software additions to network stacks, or device drivers at the end hosts. In this paper, we assume that they are standalone boxes. Nevertheless, the analysis in this paper is also effective for other implementations. To simplify the discussion, the communication is oneway from Subnet A to Subnet B. Consequently, GW1 and GW2 are also called sender gateway and receiver gateway respectively.
The goal of the adversary is to perform traffic analysis and infer critical characteristics of the payload traffic exchanged between protected subnets over the unprotected network. We limit the interest of the adversary to the payload traffic rate, that is, the rate at which payload traffic is exchanged between protected networks. The traffic rate is a piece of important information in many missioncritical communication applications [15]. Specifically, we assume that there is a set of discrete payload traffic rates {w_{1}, ¼, w_{m} }. The rate of payload traffic from the sender may be one of those m rates at a given time. Consequently, the objective of the adversary is to identify at which of the m rates the payload is being sent.
One way to counter the traffic analysis attacks is to ``pad" the payload traffic, that is, to properly insert ``dummy" packets in the payload traffic stream so that the real payload status is camouflaged. There are many possible implementations of link padding algorithms on the two gateways in Figure 1. The most common method uses a timer to control packet sending, and works as follows: (a) On GW1, incoming payload packets from the sender are placed in a queue. (b) An interruptdriven timer is set up on GW1. When the timer times out, the interrupt processing routine checks if there is a payload packet in the queue: (1) If there are payload packets, one is removed from the queue and transmitted to GW2; (2) Otherwise, a dummy packet is transmitted to GW2.
We need to make a few remarks before we proceed further.
(1) In this paper, we assume that packet contents are perfectly encrypted (e.g., by IPSec with appropriate options) and are thus nonobservable. In particular, the adversary cannot distinguish between payload packets and ``dummy" packets used for padding.
(2) It is obvious from the implementation described above, the only tunable parameter is the time interval between timer interrupts. The choice of this parameter discriminates different padding approaches. A system is said to have a constant interval timer (CIT) if the timer is a periodic one, i.e., the interval between two consecutive timer interrupts is constant. This is the most common method used for padding. On the other hand, a system is said to have a variable interval timer (VIT) whenever the interval between two consecutive timer interrupts is a random variable and satisfies some distribution.
As we will see in the later part of this paper, CIT and VIT systems may perform significantly differently in preventing traffic analysis attacks.
(3) We assume that all packets have a constant size. Thus, observing the packet size will not provide any useful information to the adversary. The only information available for the adversary to observe and analyze is the timing of packets. This assumption should simplify the discussion without loss of the generality. See [7] for a discussion on how to extend our results in this paper to the case where packets may have variable sizes.
Recall that we assume that the objective of the adversary is to identify at which of the m possible rates the payload is being sent, and the adversary limits himself to passive attacks, i.e., observations of the traffic. In addition, the adversary's access to the system is limited to the unprotected networks. The protected subnets and hosts within are not accessible. Neither is the link padding infrastructure. This means that, in Figure 1, the adversary can only tap somewhere between gateways GW1 and GW2.
We also assume that the adversary has complete knowledge about the gateway machines and the countermeasure algorithms used for preventing traffic analysis. For example, the adversary can simulate the whole system, including the gateway machines, to obtain a priori knowledge about traffic behavior. In many studies on information security, it is a convention that we make worstcase assumptions like this.
Based on these assumptions, the adversary can deploy a strategy based on Bayes decision theory [4]. The entire attack strategy consists of two parts: Offline training and runtime classification. We now describe them below.
Offline training The offline training part can be decomposed into the following steps:
(1) The adversary selects a statistical feature of the Packet InterArrival Time (PIAT) that will be used for traffic rate classification. Possible features we study in this paper are sample mean, sample variance, and sample entropy.
(2) The adversary reconstructs the entire link padding system and collects timing inforamtion at different payload traffic rates. From this information, the adversary derives the Probability Density Functions (PDF) of the selected statistical feature. As histograms are usually too coarse for the distribution estimation, we assume that the adversary uses the Gaussian kernel estimator of PDF [17], which is effective in our problem domain.
(3) Based on the PDFs of statistical features for different payload traffic rates, Bayes decision rules are derived. Recall that there are m possible payload traffic rates w_{1}, ¼, w_{m} . The Bayes decision rule can be stated as follows:
The sample represented by feature s corresponds to payload rate w_{i} if


for all j = 1, ¼, m.
Here P(w_{i}) is the a priori probability that the payload traffic is sent at rate w_{i}, and P(w_{i}s) is the post priori probability that the payload traffic is sent at rate w_{i} when the collected sample has the measured feature s.
Runtime Classifcation Once the adversary completes its training phase, he can start the classification at run time. We assume the adversary uses some means to tap the network between gateways GW1 and GW2. In particular, when he wants to determine the current payload rate, the adversary collects a sample of packet interarrival times. He calculates the value of the statistical feature from the collected sample, and then uses the Bayes decision rules derived in the training phase to match the collected sample to one of the previously defined payload traffic rates.
Given the models described in the previous section, we would like to evaluate the sCalling ghostscript to convert, please wait ... ystem security in terms of detection rate. Detection rate is defined as the probability that the adversary can correctly identify the payload traffic rate. In this section, we derive the closedform formulae for detection rates when the adversary uses sample mean, sample variance, or sample entropy, as the statistical feature, respectively. Our formulae will be approximate ones due to the complexity of the problem. Nevertheless, these formulae do correctly reflect the impact of various system parameters, including the type of padded traffic, sample size, and statistical feature used. These relationships are extremely useful in design of a link padding system so that the overall detection rate can be minimized. In the next section, we will see that experimental data well matchs the performance predicated by our approximation formulae.
We will focus our discussion on systems with only two payload traffic rates, namely w_{l} as the low traffic rate and w_{h} as the high traffic rate, and assume that both traffic rates occur with equal probability. Extensions on this will be discussed in Section 6.
Figure 2: Bayes Decision
Figure 2 shows the PDFs of the statistical features conditioned on two alternative payload traffic rates. Let d be the solution of the equation

(3) 
and assume that there is a unique solution to the equation. Consequently, the Bayes decision rule now becomes

The error rate for the Bayes decision rule can be calculated as follows:

The detection rate is then given by

While numerical methods can be applied to calculate the detection rates, for example with the use of (), our goal here is to derive closeform formulae that can reveal the relationship between the detection rate and other system parameters.
Recall that the adversary collects a sample of packet interarrival time at run time in order to perform the classification. Thus, to derive the detection rate, we need to formally model the packet interarrival time. For a given system, let random variable X be the packet interarrival time. X can be considered as the sum of three other random variables:

where T is the designed interval of two consecutive timer interrupts for the timer, and d_{gw} and d_{net} reflect the noise added by disturbance in the gateway system and by congestion in the internetwork, respectively.
Note that T is defined by the link padding policy. T is constant for CIT link padding but follows a specific distribution for VIT link padding.
d_{gw} is caused by a number of factors, which may impact the accuracy of the timer's interrupt: (1) First, the context switching from other running process to the timer's interrupt routine may take a random time. (2) Furthermore, a timer interrupt may be temporally blocked due to other activities. For example, if an payload packet from the sender is arriving at the network interface card of the gateway, the network interface card would generate an interrupt request, which can block all the processes including the (scheduled) timer interrupt ^{1}. Thus, the timer's interrupts may be subtly but randomly delayed by incoming payload packets. This implies that the padded traffic's PIAT may be correlated with the payload traffic.
d_{net} captures the disturbance on the padded traffic's PIAT caused by crossover traffic at routers and switches. Clearly, d_{net} depends on the position at which the adversary collects its sample. If the collection is done right at the output of the sender gateway, this noise may be ignored. However, if the adversary collects its sample far away from the sender gateway, the noise level can be high as crossover traffic may significantly interfere with the padded traffic.
In this paper, we assume that both T, d_{gw} and d_{net} are normally distributed. These assumptions simplify analysis without loss of generality and will be validated by our experiments in Section 5. Specifically,

(9) 
where s_{T}^{2} = 0 in the case of CIT link padding. And

(10) 
where s_{net}^{2} = 0 when the adversary observes the padded traffic at a position next to the sender's gateway GW1. Similarly

(11) 
As d_{gw} may be correlated to the payload traffic, we denote s_{gw}_{, l}^{2} and s_{gw}_{, h}^{2} as the variances of d_{gw} when the payload traffic rate is low and high, respectively. Consequently, we denote X_{l} and X_{h} are random variable X when the payload traffic rate is low and high, respectively. Thus,

(12) 

Similarly,


Here we assume that X_{l} and X_{h} have the same mean. This assumption will be validated by our experiments later.
For the convenience of the discussion in the rest of this paper, we need to introduce the ratio

where s_{T}^{2}, s_{net}^{2}, s_{gw}_{,l}^{2} and s_{gw}_{,h}^{2} are defined in (9), (10), (13), and (15), respectively. The use of r will become clear when we derive the formulae for detection rates for three different statistical features, namely, sample mean, sample variance, and sample entropy.
Let { X_{1}, X_{2}, ¼, X_{n} } be a random sample of packet interarrival times. The sample mean is the average of the elements in the sample:

Note that sample mean [`(X)] is a random variable, and an unbiased estimation of X's mean m.
The following theorem provides a closedform formula for estimation of detection rate when the adversary uses sample mean as the feature statistic.
Theorem 1 The detection rate by sample mean can be
estimated as follows

where r is defined in (16).
The proof of Theorem 1 can be found in the first part of Appendix A in [6]. From Theorem 1 the following observations can be made:
(1) The detection rate in (18) is independent on sample size n. That is, when sample mean is used as feature statistic, changing the sample size has no impact on detection rates.
(2) As shown in the second part of Appendix A in [6], the detection rate v_{[}_{`(X)]} is an increasing function of r, where r ³ 1. That is, the smaller r, the lower the corresponding detection rate. When r = 1, the detection rate reaches 50%  its absolute lower bound. In reality, r = 1 may occur when s_{T}^{2} is sufficiently large. This corresponds to the case when the VIT padding is used.
Let { X_{1}, X_{2}, ¼, X_{n} } be a random sample of size n from the distribution of X. The sample variance Y is defined as follows

Note that sample variance Y is a random variable, and an unbiased estimation of X's variance.
Recall that s_{h}^{2} is the variance of padded traffic's PIAT conditioned on the high payload traffic rate and s_{l}^{2} the variance of padded traffic's PIAT conditioned on the low payload traffic rate. s_{h}^{2} is slightly larger than s_{l}^{2}, which is validated by our experiments in Section 5. Based on these observations, the following theorem provides a closedform formula for estimation of detection rate when the adversary uses sample variance as the feature statistic.
Theorem 2 Using sample variance with sample size n
as the classification feature gives rise to an estimated detection rate v_{Y}

where C is calculated
in (21).

and r = [(s_{h}^{2})/( s_{l}^{2})] as defined in (16).
The proof of Theorem 2 can be found in the first part of Appendix B in [6]. From Theorem 2 the following observations can be made:
(1) The detection rate v_{Y} is an increasing function in terms of sample size n. When n ® ¥, the detection rate is 100%. This means that the payload traffic lasts for a long time at one rate, either low or high, the adversary gets such a sample and may detect the payload traffic rate by sample variance of padded traffic's PIAT.
(2) As shown in the second part of Appendix B in [6], the detection rate v_{Y} is an increasing function of r in (16), where r ³ 1. That is, the smaller r, the lower the corresponding detection rate. When r = 1, the detection rate is 50%. This corresponds to the case when VIT padding with sufficiently large s_{T}^{2}. This suggests that although the adversary may use a big size of sample to detect the payload rate by sample variance, using a VIT padding with a large interval variance can make such an attack impossible, since no payload traffic can last very long at a fixed rate in practice and the adversary cannot get a sample big enough.
While there are many empirical entropy estimators available, it's generally very difficult to get those estimators' PDFs. In this work, we take advantage of the relation between entropy and variance of a normal distribution in order to describe sample entropy's effectiveness as the feature statistic. We will then use an empirical robust histogrambased entropy estimator for our experiments.
The following theorem provides a closedform formula for estimation of detection rate when the adversary uses sample entropy as the feature statistic.
Theorem 3 Sample
entropy with sample size n has an estimated detection rate v_{[}_{(H)\tilde]}

where C_{[(H)\tilde]}
is calculated in (23)

and r = [(s_{h}^{2})/( s_{l}^{2})] as defined in (16).
The proof of Theorem 3 can be found in the first part of Appendix C in [6]. From Theorem 3 we can make a similar set of observations to that of the case of sample variance.
(1) Detection rate v_{[}_{(H)\tilde]} is an increasing function in terms of sample size n.
(2) As shown in the second part of Appendix C in [6], the detection rate v_{[}_{(H)\tilde]} is an increasing function of r in (16), where r ³ 1. When r = 1, the detection rate reaches 50%. In reality, r = 1 may occur when s_{T}^{2} is sufficiently large. This corresponds to the case when VIT padding with sufficiently large s_{T}^{2} is used.
From statistical knowledge, we know sample variance is very sensitive to outliers^{2}. In order for empirical estimation of sample entropy to be robust against outliers, we use the method developed in [11]: First, we create a histogram of the PIAT sample for a given bin size (say, Dh). Then, according to [11], the differential entropy estimator of a random variable X's continuous distribution is

(24) 
where n is the sample size, k_{i} is the number of sample points in the i^{th} bin, and Dh is the histogram's bin size. If a constant bin size is used throughout the experiment, the term logDh in (24) is a constant and hence does not influence the recognition result. It can therefore be discarded, and the entropy estimation formula simplifies to

(25) 
This entropy estimator is robust in the sense that it is based on
probability weighted sum. Generally, outliers have a small probability to
occur. So the probability weight reduces the noise's impact on the entropy
estimation.
In this section, we report results on evaluating system security in terms of detection rate. The evaluations will be based on both theoretical analysis (from the previous section) and experiments.
In the experiments, we assume that the adversary uses a highperformance network analyzer, such as Agilent's J6841A [21], to dump the padded traffic for traffic analysis. A series of experiments were carried out: In tCalling ghostscript to convert, please wait ... Calling ghostscript to convert, please wait ... Calling ghostscript to convert, please wait ... erms of padded traffic type, we measure both systems with CIT and VIT padding, In terms of experimental environments, we consider the following cases: a) a laboratory environment, b) a campus network, and c) a wide area network.
GW1 and GW2 in Figure 1 are installed with TimeSys Linux/RealTime [22]. Both CIT and VIT paddings use a timer with interrupt interval mean equal to 10ms, i.e., E(T) = 10ms with T in (8). The payload has two rate states: 10 packet per second (pps) and 40pps. We assume both rates occur in equal probability, that is, P(w_{l}) = P(w_{h}) = 50% in (7). Note that for such a system with two possible payload traffic rates, the detection rate for the adversary is lowerbounded at 50% corresponding to random guessing.
Figure 3: Experiment setup
in laboratory
The advantage of performing the experiments in a laboratory environment is that we can control the cross traffic over the network. The disadvantage is that the generated cross traffic may not have the same characteristics as that in a real network. Nevertheless, our experiment setup is shown in Figure 3.
The two gateways
are connected by a
For the case of no cross traffic, the workstation in subnet C does not transmit, and the router only deals with the padded traffic from GW1. That is, s_{net} in (16) is 0. Hence, the variance ratio r becomes

This situation is a best case for the adversary as he can observe traffic with minimum disturbance. Hence this is the worstcase for us who wants to prevent traffic analysis attacks.
CIT Link Padding
Figure 4: CIT Padding without cross traffic
First, we analyze systems that use CIT link padding. That is, s_{T}^{2} in (16) is zero. Hence, (26) is further simplified as

From the theorems in Section 4, we see that the detection rate is a functions of sample size n and the ratio r.
Figure 4 (a) shows the distributions of padded traffic's PIAT under lowrate (10pps) and highrate (40pps) payload traffic. We have the following observations:
(1) The two distributions are almost bellshaped. This partially validates our assumption that the padded traffic's PIAT has a normal distribution.
(2) The means of padded traffic's PIAT under different rates of payload traffic are the same. This is also consistent with the assumption made in Section 4.2.
(3) The two distributions are slightly different. The variance of padded traffic's PIAT conditioned on the highrate payload traffic, s_{gw}_{, h}^{2} in (15) is slightly larger than the variance of padded traffic's PIAT conditioned on the lowrate payload traffic, s_{gw}_{, l}^{2} in (13). This implies

Figure 4 (b) shows both empirical and theoretical curves of detection rate for different feature statistics. We have the following observations:
(1) The empirical detection rate curves coincide well with their theoretical curves. This validates our theories. The empirical detection rate curve of sample variance is a little lower than its theoretical curve because sample variance is very sensitive to outliers in the data.
(2) The detection rate of sample mean is almost 50%. Sample mean is not an effective feature for the adversary.
(3) On the other hand, as the sample size increases, detection rates for both sample variance and sample entropy increase as predicted by our theorems 1 and 3. At sample size of 1,000, both features achieve almost 100% detection rate. This means that CIT padding fails if the adversary uses sample variance or sample entropy as feature statistic. Generally speaking, sample entropy performs empirically better than sample variance in terms of detection rate.
VIT Link Padding
Recall from (26) how the variance ratio r in (16) is given by
where s_{T}^{2} ³ 0 since we are using VIT padding. Theorems in Section 4 show that when r approaches 1, the detection rates approach 50% for all the three feature statistics. We note that for CIT padding, the value of r decreases with increasing values of s_{T}^{2}. Figure 5 (a) displays the empirical curves of detection rate in terms of s_{T} for a fixed sample size of 2,000. We can see that when s_{T} increases, the detection rate quickly drops and approaches 50%, as expected. Clearly, a system with VIT padding performs better (i.e., with lower detection rate) than one with CIT padding. Figure 5: VIT padding  detection
rate vs. sample size In any case, as shown in (18) and (22), when the size of sample increases, the detection rate increases as well. An interesting question is: How large a sample has to be in order for the adversary to have sufficient high probability in making a correct detection? Let n(p) be the sample size that can achieve a detection rate of p percent. Figure 5 (b) provides the theoretical curve of n(99%) vs. s_{T}. We can see that with a reasonable value of s_{T}, the sample size needs to be extremely large in order to achieve a 99% detection rate. For example, when the timer interval standard deviation s_{T} = 1ms, to achieve 99% detection rate, the sample size has to be greater than 10^{11}. It is virtually impossible for an attacker to retrieve such a large sample. This clearly shows the effectiveness of VIT padding. 5.2 The Case of NonZero
Cross Traffic
Figure 6: Empirical detection rate with cross traffic in laboratory Recall that the case of zero cross traffic is the best case for the adversary. As VIT has shown to be effective in the case of zero cross traffic, we will no longer have to consider systems with VIT padding here since VIT has been shown to be effective even for the adversary's bestcase scenario (zero crosstraffic with a line tap very near the sender gateway). We thus concentrate on the system with CIT padding. In a system with cross traffic, s_{net}^{2} in (16) may no longer be zero. As for CIT padding, where s_{T}^{2} = 0, the variance ratio r in (16) now becomes
We observe that r decreases with increasing s_{net}^{2}, resulting in a low detection rate for all feature statistics. Thus, the bigger s_{net}^{2}, the smaller the detection rate. In the experiments described here, cross traffic generated from in subnet C causes the router's congestion, which in turn affects the observation by the adversary. Figure 6 shows how the detection rate is impacted by the amount of cross traffic. We can make the following observations: (1) Note that the PIAT for the padded traffic is 10ms. Hence, the amount of cross traffic is directly proportional to the utilization of the link shared between Subnet B and Subnet D. The data shows that as the link utilization increases, the detection rate by sample entropy and sample variance decrease. Intuitively, this is because the crossover traffic between Subnet C and Subnet D interferes with the padded traffic between GW1 and GW2, and s_{net}^{2} increases with the shared link's utilization. The sample mean's detection rate remains low, as expected. (2) We observe that sample entropy results in a better detection rate than sample variance does. It can be perceived that, with the increase of shared link's utilization, outliers have more chance of occurring. Sample variance is much more sensitive to outliers and, hence, it has a low detection rate. (3) Even with the link utilization of 40%, sample entropy still can have about a detection rate of 70%, implying that CIT padding may still not be effective in this kind of situation. 5.3 Experiments over Campus and Wide Area NetworksFigure 7: Experiment setup over campus and wide
area networks (WAN) Figure 7 shows the setup for the
experiments discussed in this subsections. Figure 7 (a) is a setup for
experiments over the Texas A& Figure
8: Empirical detection rates for experiments over
campus and WAN (sample size=1000) In each case, we
collect data continuously for a complete day (24 hours). The data for the
case of Texas A& Figures 8 (a) and (b) display the detection rate throughout the observation period. We make the following observations: (1)
When the padded traffic traverses just the Texas A& (2)
When the padded traffic traverses more network elements, such as the span of
the Internet between (1) In the case of
wide area networks, sample entropy and sample variance can still get over 65%
detection rates during periods of relatively low network activity (such as at
2:00A 6 Conclusions and Final Remarks
By
statistical analysis of different feature statistics (sample mean, sample
variance and sample entropy) of the padded traffic's packet interarrival times and a lot of experiments, we found
that sample variance and sample entropy can exploit the correlation between
payload traffic rate and packet interarrival times
of padded traffic when the padded traffic is dumped and explored next to the
sender gateway or at a remote site across one or more congested routers. The
reason for CIT padding's failure is that user
traffic causes small disturbances to the timer's interval, which is used to
control packet sending. After a careful analysis, we propose VIT link padding as an alternative to the most common CIT link padding. Both theoretical analysis and empirical results validate the effectiveness of VIT padding strategy. The importance of VIT padding technique is validated by extensive experiments showing that CIT link padding may be compromised even at a remote site behind noisy routers. In this paper we discuss the simple case where two classes of traffic rates should be distinguished. Our technique can be easily extended to multiple ones by performing more offline training. particular to the environment), the user traffic fitting these distributions can be simulated. This allows the design of link padding schemes that are tuned to this a priori knowledge to prevent traffic analysis. Acknowledgements
We thank Gerry Creager, Nolan Flowers and Xun Wang for the help of setting up the testing enviroment. References
[1] Onion Routing Development Achives. Link padding and the intersection attack. http://archives.seul.org/or/dev/Aug2002/msg00004.html, 8 2002. [2] P. Baran. On
distributed communications: Ix security, secrecy, and tamperfree
considerations. [3] D. L. Chaum.
Untraceable electronic mail, return addresses, and digital pseudonyms. Communications
of the AC [4] R. O. Duda and P. E. Hart. Pattern Classification. John Wiley & Sons, 2001. [5] Edward W. Felten and [6] [7] [8] S. Ghosh and R. Rajkumar. Resource management of the os network subsystem. Proceedings of the Fifth IEEE International Symposium on ObjectOriented RealTime Distribute Computing, April 2002. [9] Y. Guan, X. Fu, D. Xuan,
P. U. Shenoy, R. Bettati, and
W. Zhao. Netcamo: Camouflaging network traffic
for qosguaranteed critical allplications.
In IEEE Transactions on Systems, [10] SafeWeb inc. Safeweb. http://www.safewebinc.com/, 2002. [11] R. [12] R. E. NewmanWolfe and B. R. Venkatraman. High level prevention of traffic analysis. Computer Security Applications Conference, Seventh Annual, pages 102 109, 1991. [13] R. E. NewmanWolfe and B. R. Venkatraman. Performance analysis of a method for high level prevention of traffic analysis. Computer Security Applications Conference, Eighth Annual, pages 123 130, 1992. [14] [15] J. Raymond. Traffic analysis: Protocols, attacks, design issues and open problems. In H. Federrath, editor, Designing Privacy Enhancing Technologies: Proceedings of International Workshop on Design Issues in Anonymity and Unobservability, volume 2009 of LNCS, pages 1029. SpringerVerlag, 2001. [16] C. E. Shannon. Communication
theory of secrecy systems. [17] B. W.
Silverman. Density estimation for statistics and data analysis.
Chapman and Hall, [18] D. X. Song, D. Wagner, and X. Tian. Timing analysis of keystrokes and timing attacks on ssh. 10th USENIX Security Symposium, 2001. [19] Qixiang Sun,
Daniel R. Simon, Yi [20] P. F.
Syverson, D. [21] Agilent
Technologies. Agilent j6841a network analyzer
software. http://onenetworks.comms.agilent.com/NetworkAnalyzer
/J6841A.asp, [22] TimeSys. Timesys linux docs. http://www.timesys.com/index.cfm? hdr=home_header.cfm&bdy=home_bdy_library.cfm, 2003. [23] Brenda Timmerman. a security model for dynamic adaptive traffic masking. New Security Paradigms Workshop, 1997. [24] B. R. Venkatraman and R. E. NewmanWolfe. Performance analysis of a method for high level prevention of traffic analysis using measurements from a campus network. Computer Security Applications Conference, 10th Annual, pages 288 297, 1994. [25] V. Voydoc and Footnotes:^{1} For TimeSys Linux [22] used in our experiments, this request proceeds before the incoming packet reaches the IP layer [8]. From that instant on, the network subsystem in the kernel becomes preemptive. Other high priority tasks such as the timer interrupt routine can then proceed as scheduled. ^{2} An outlier is an observation that lies an abnormal distance from other values in the sample of the padded traffic PIAT. File
translated from 