Multi-Queue Management for Advanced QoS
in High-Speed Communication Systems

Computer Architecture and VLSI Systems Division,
Institute of Computer Science (ICS), FORTH
Science and Technology Park of Crete, P.O.Box 1385, Heraklion, Crete, GR 711 10 Greece

The provision of quality-of-service (QoS) guarantees by modern, advanced-architecture network systems requires the differentiation of traffic into multiple flows, each with its own behavior characteristics and service level. The first prerequisite for such differentiation is that traffic belonging to different flows be placed on different queues, i.e. per-flow queueing.

Per-flow queueing typically requires the implementation of a large number of logical queues --hundreds or thousands to possibly millions in the future-- inside one or a few physical memories. When the communication system is to operate at high speed, the management of these multiple queues typically requires the assistance of specialized hardware. We have worked on such multi-queue management implementations at different cost and performance levels, as described below.

Pipelined Multi-Queue Management in a VLSI ATM Switch Chip with Credit-Based Flow Control

by George Kornaros, Christoforos Kozyrakis, Panagiota Vatsolaki, and Manolis Katevenis

Proceedings of ARVLSI'97 (17th Conference on Advanced Research in VLSI), Univ. of Michigan at Ann Arbor, MI USA, Sept. 1997, IEEE Computer Soc. Press, ISBN 0-8186-7913-1, pp. 127-144

ABSTRACT:

We describe the queue management block of ATLAS I, a single-chip ATM switch (router) with optional credit-based (backpressure) flow control. ATLAS I is a 4-million-transistor 0.35-micron CMOS chip, currently under development, offering 20 Gbit/s aggregate I/O throughput, sub-microsecond cut-through latency, 256-cell shared buffer containing multiple logical output queues, priorities, multicasting, and load monitoring.

The queue management block of ATLAS I is a dual parallel pipeline that manages the multiple queues of ready cells, the per-flow-group credits, and the cells that are waiting for credits. All cells, in all queues, share one, common buffer space. These 3- and 4-stage pipelines handle events at the rate of one cell arrival or departure per clock cycle, and one credit arrival per clock cycle. The queue management block consists of two compiled SRAM's, pipeline bypass logic, and multi-port CAM and SRAM blocks that are laid out in full-custom and support special access operations. The full-custom part of queue management contains approximately 65 thousand transistors in logic and 14 Kbits in various special memories, it occupies 2.3 mm², it consumes 270 mW (worst case), and it operates at 80 MHz (worst case) versus 50 MHz which is the required clock frequency to support the 622 Mb/s switch link rate.

KEYWORDS: single-chip ATM switch, VLSI router, pipelined queue management, credit-based flow control.

The Full Paper is Available in:

© Copyright 1997 by IEEE.
Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions / IEEE Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone: +1 (908) 562-3966.

Modular, Scalable Memory Manager for High-Speed Communication Systems

by Dimitrios Serpanos, and Panagiotis Karakonstantis

1997-98

ABSTRACT:

An important module in high-speed communication systems is the memory manager, which manages logical data structures --typically queues-- efficiently and enables high-speed data transfers between system memory and link interfaces. We have developed a high-speed, scalable and re-usable Queue Manager, suitable for a wide range of ATM systems, such as workstation adapters, switches, routers, etc. The Queue Manager is a special-purpose processor that executes memory management instructions. To achieve re-usability, we have analyzed the requirements of the most common ATM functions (flow control, segmentation-and-re-assembly, etc.) and identified an instruction set that is sufficient to implement these functions. To achieve scalability, we have identified a minimal set of instructions as well as a minimal set of data structures to support.

We have developed an architecture and several hardware implementations, which provide increasing performance at the cost of more complex hardware. A typical low cost implementation, using an FPGA and external SRAM, supports 1024 queues of 8192 ATM cells and performs an Enqueue or a Dequeue operation in 130 or 200 ns, respectively. Such an implementation supports ATM systems with aggregate throughput close to 1 Gbps.

Considering the processing power that exists in some systems in the form of embedded processors, we have developed a software implementation for embedded systems as well, using the CYCLONE evaluation board with the Intel i960 processor at 40 MHz. The average delays of the Enqueue and Dequeue operations in the software implementation are 0.75 and 0.95 microseconds, respectively. We have evaluated the performance, cost and scalability of all implementations, so that one can choose the solution with the desired characteristics.

Multi-Queue Management for Advanced QoS in High-Speed Communication Systems

Pipelined Multi-Queue Management in a VLSI ATM Switch Chip with Credit-Based Flow Control

The Full Paper is Available in:

Modular, Scalable Memory Manager for High-Speed Communication Systems

Multi-Queue Management for Advanced QoS
in High-Speed Communication Systems