next up previous
Next: The Latency of Remote Up: Performance Results Previous: Scaling the Input

Scaling the Network Bandwidth

Although figure 3 suggests that the performance of remote memory (parity logging) is significantly better than the performance of disk, the completion time of an application even under remote memory may be unacceptably high. Hopefully, the performance of remote memory will be improved as soon as the Ethernet interconnection network is substituted by a faster one (e.g. FDDI, ATM, FCS, etc.). To evaluate the performance of the applications on top of faster networks we make detailed performance measurements that separate the completion time of the application into the following factors: (i) user time (utime), (ii) system time (systime) (iii) initialization time (inittime) (iv) page transfer time (ptime). Using the provided time command we measure the utime, systime, and elapsed time (etime) for each application. Subtracting the utime and systime from the etime for instances of the applications that perform no paging we calculate the inittime, that is the time it takes the operating system to load and start executing the application. The ptime consists of the protocol processing time (pptime) and the bandwidth dependent blocking time (btime). We measured the pptime and found it to be equal to 1.6 ms per page for TCP/IP. We calculate the btime using the formula : tex2html_wrap_inline646 Assuming that a network with X times higher bandwidth will decrease btime by a factor of X, we can predict the etime of the application over this high bandwidth network. Thus, the formula used is : tex2html_wrap_inline654

We made all these measurements on our FFT application, and predict its performance on a system with an interconnection network which provides ten times more bandwidth than the Ethernet. We also predict its completion time on a system that has enough memory to hold all the working set of the application (ALL_MEMORY) by adding the utime, systime and inittime. The predicted execution times, along with the measured execution times of DISK and PARITY_LOGGING are plotted in figure 4. We see that ETHERNET*10 performs very close to ALL_MEMORY, and significantly better than both ETHERNET and DISK.

To understand the results shown in figure 4, we analyze the execution time of FFT with 24MBytes of input when PARITY_LOGGING is used. The measured elapsed time is 130.76 seconds, consisting of 66.138 sec of useful user time, 3.133 sec of system time, 0.21 sec of initialization time and 61.279 sec of page transfer time. During the same run, the application suffered 2718 pageouts and 2055 pageins. Since 4 servers were used plus a parity server the number of page transfers was equal to 3397 + 2055 = 5452. Thus the protocol overhead was equal to 5452*0.0016, or about 8.723 sec. The bandwidth dependent blocking time was equal to 61.279 - 8.723, or about 52.556 sec. Using a ten times faster interconnection network, the bandwidth dependent waiting time will be reduced to 5.255 sec. Thus, the total completion time of FFT would be 66.138 + 3.133 + 0.21 + 8.723 + 5.255 sec, or 83.459 sec, divided as follows: 79.246% in user time, 3.754% in system time, 0.252% in initialization time and 16.748% in page transfer time. We see that a 100 Mbit/sec interconnection network reduces the total paging overhead to less than 17% of the total application execution time. We believe that most users would be willing to pay such an overhead in order to run an application that does not fit in main memory. After all, the only other option they have is to suffer from disk thrashing.

   figure166
Figure: Performance of FFT for various Architecture Alternatives. DISK is the measured completion time when paging to a local disk. ETHERNET is the measured completion time of parity logging to remote memory on top of the Ethernet. ETHERNET*10 is the predicted completion time when using remote memory as a paging device, on top of a network that provides ten times more bandwidth than the Ethernet interconnection network. ALL_MEMORY is the predicted completion time of FFT when we use the same workstation but with enough memory to hold its entire working set.


next up previous
Next: The Latency of Remote Up: Performance Results Previous: Scaling the Input

Evangelos Markatos
Wed Aug 7 11:36:29 EET DST 1996