# Energy, Throughput and Area Evaluation of Regular and Irregular Network on Chip Architectures <sup>1</sup>Umamaheswari S, <sup>2</sup>Rajapaul Perinbam J <sup>1</sup>Department Of Information Technology Madras Institute Of Technology, Anna University uma\_sai@annauniv.edu <sup>2</sup>Department of Electronics and Communication Engineering, RMK Engineering College, Chennai. jrp.ece@rmkec.ac.in #### **ABSTARCT** Network-on-chip has been proposed in System-on-Chip to achieve high performance, reusability and scalability through generating application specific topologies. Application specific topologies are irregular in structure and take into account certain factors like communication weight, area and energy constraints while building up the topology. Regular topologies like 2D mesh, spidergon are more structured and are built not considering much about the system characteristics and other requirements. Consequently the throughput, power utilization and silicon area vary depending on the topology. This paper provides an evaluation of the performance measures of the regular topological structures and irregular application specific NoC. #### *KEYWORDS* Network on Chip (NoC), Irregular topology, performance comparison, Topology generation, Throughput, Energy, silicon area. ## 1 Introduction System-on-Chip (SoC) refers to integrating all components of a system like computer or other electronic system into a single chip. The components of a System on Chip are analog or digital in nature, like FPGAs, DSPs and other Intellectual Properties (IP). These IPs communicate with each other, in such a manner that they provide the functionality of the system. Earlier SoCs used normal bus based system for communication. But this causes problems like synchronization, energy consumption, area constraints, lack of modularity other clock skew problems. Scalability is very important for SoCs, because of the shrinking technology sizes and increasing scale of complexity. Network on Chip (NoC) is the new emerging trend in the area of SoC. NoC replaces design-specific global on-chip wiring with a general-purpose on-chip interconnection network. As the system becomes general purpose, it supports scalability and reusability. Using a network in the place of wiring has several more advantages like structure, performance and interoperability. The major components in NoC are the IPs which act as the nodes of the network and the routers which hold the routing logic. NoCs reduce the length of wiring by splitting the wires between the nodes and the router and between the routers. The network concept provides modularity and high level optimization. Though NoC provides scalability and reusability, it has certain challenges to face too. The growing components like the routers and the introduction of complex logic into the routers consume extra power. The system should be designed in such a way that, it works within the power constraints while providing the functionality. DOI: 10.5121/ijdps.2011.2504 47 There are two main phases involved in building up a system. They are network topology generation and floor-planning. Floor-plan determines the physical placing of cores and routers. This influences the overall area and the length of physical links. The network topology indicates the overall connection between cores and routers, and between routers. The network layout may be regular or irregular in structure. Regular topologies are 2D Mesh, Spidergon, Ring and Tree networks. These regular topologies have the advantages of topology reuse and low design complexity and are suitable for homogeneous cores, e.g., general purpose CPUs, FPGAs, etc. Application specific networks are developed by constructing irregular topologies. These networks are custom build and tailored to specific application. The high level simulation of on chip network is still in progression. There are lots of trade off involved in choosing simulation tool and defining the physical constraints for it. Due to similarities between NoCs and networks, NS2 is emerging as the most suitable tool for evaluating NoC design. In this paper we compare the performance of irregular topology networks against the regular ones. Focus is given on throughput and energy consumption. The paper is organized as follows; section 2 gives the background and related literature to our work. Section 3 discusses the regular topological structures. Section 4 describes the irregular topological structure with the algorithm for generation of the same. # 2 Background and Related Work In designing NoC systems, there are several issues to be concerned with, such as topologies, routing algorithms, performance, latency, complexity and so on. The design of system-on-chip (SoC) in [1],[4] provides integrated solutions to challenging design problems in the telecommunications, multimedia, and consumer electronics domains. Micro-network control [1] model provides good quality of service and manages the network resources by providing dynamic control. Focusing on using probabilistic metrics to quantify design objectives such as performance and power will lead to a major change in design methodologies. The amount of energy utilized and hence the amount of power consumed varies in accordance with the number of cores in the chip [2]. The heterogeneous cores used in SoCs have varied functionalities to support highly sophisticated applications of SoCs. When these cores are structured with regular topologies, keeping in consideration of the sophisticated functionalities of the applications, they may degrade the performance of other components by overriding their needs [5],[6]. As a solution to this problem, irregular interconnection topologies understand the real need of the applications. The proposed low power irregular topology generation algorithm [8], builds application specific networks that have the interconnection architecture which suits the traffic characteristics of the application. This reduces the power consumption in the application by 49%. Higher level protocols that are layered on top of simple network interface [3] provide a simple reliable datagram interface to each IP in the system. The High Throughput Chip-Level Integration of Communicating Heterogeneous Elements (HT-CLICHÉ) [7], optimizes the system circuit and increases the number of virtual channels for communication from four to eight. This increase in the number of virtual channel increases the throughput while preserving the frequency. The performance metrics of NoC are being studied under different topological structures. NoC mesh architecture is constructed and the behavior is observed under different traffic modes such as Exponential, Pareto and Constant Bit Rate(CBR) mode with the network simulator NS2 [9]. By this simulation, the authors analyzed the common network performance metrics such as packet delay, throughput, communication load and drop probability under different buffer sizes and different traffic injection rates. International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.5, September 2011 For low power application specific NoCs, a two-phase flow-topology generation and floor planning algorithm is designed to reduce the number of routers, to guarantee deadlock free and minimize power [12]. The case study in [11] presents the multimedia application VOPD, which is the best suited application for obtaining more different topologies with more cores. The website [10] provides guidelines for working in NS2 and programming in Tcl. ## 3 Regular Topologies In this section we consider three network topologies under regular interconnection structure. They are, - 1. 2D Mesh, - 2. WK Recursive and - 3. Spidergon. ## 3.1 2D Mesh Topology Fig. 1. A Mesh Network with 16 Nodes Two-Dimensional mesh topology (Fig.1) consists of a grid structure with routers in the intersection of the lines in the grid. All routers placed in the topology are connected to routers on three sides. The boundary routers in 2D Mesh have connections with the neighboring routers on three sides and the core in the fourth side. The boundary routers in 2D Mesh have connections with the neighboring routers on three sides and the core in the fourth side. The edge routers, which are in the corners of the boundary, have connections with two routers in sides, in addition to the core. The inner routers have connections with routers in all four sides with the core being connected as the fifth. #### 3.2 WK Recursive Topology $WK[N_d L]$ is a recursive network topology (fig.2). It is denoted as $WK(N_d L)$ , where, 1. $N_d$ represents the node degree. It means the amplitude that represents the Fig. 2. A WK(4,3) Network number of virtual nodes that constitute the fully connected undirected graph. 2. L represents the expansion level. Each of the $N_d$ virtual node represents a WK topology WK( $N_d$ , L-I) for all L > 1. ## 3.3 Spidegon Topology The Spidergon NoC network is a combination star and ring topology. It is constructed based on elementary polygon network which is formed by arranging 4R+1 (R=1, 2, etc.) routers in a fashion that combines the topological structures of ring and star. There is a single central router that connects to the 4R routers which giving star topology. The peripheral routers are connected to each other in the form of ring network. The valence (m) of the network is characterized by m=4R, which represents the number of peripheral nodes. Fig. 3. Spidergon Topology # 4 Irregular Topology Irregular topologies are derived to make the system more application and specification oriented. This is achieved by taking several constraints into consideration while, forming the network layout. As a result, the optimization objectives such as power consumption, area of the chip, number of routers in the system can be optimized easily. In [13] the authors proposed a two step topology generation algorithm, which we use in our paper to generate the irregular, interconnects. The first part of the algorithm, involves initial cluster formation based on the communication characteristics between the IPs. Each cluster is then assigned to a router. In the second part, the topology is constructed by connecting the clusters to each other one by one, based on the communication weights between the clusters. The pseudo code used in the generation of irregular topology is shown in the Figure 5. ``` #1 compute r: n <= p*r-2*(r-1) #2 call: create_cluster (leading_node) #3 FOR each cluster formed in step2 #3.1 find: total communication load #3.2 choose clusters based on total communication Load END FOR #4 optimize the number of clusters #5 Assign routers to each cluster #6 Form topology by connecting the routers #7 Induce traffic in the topology ``` Fig 5. Pseudo code for irregular topology generation The number of routers to be used can be minimized by using the following equation. $$n \le pr-2(r-1) \tag{1}$$ where, n represents the number of nodes in the network, p represents the number of ports in each router, r represents the minimum number of routers that will be calculated with the Eq.(1). ## 5 Simulation In this section we explain the parameters used for simulating the NoC model in NS2. #### 5.1 Traffic Models and Parameters We use three different traffic models to make the comparative study of the NoC performance under different topologies. They are, - 1. Constant Bit Rate(CBR) traffic model, - 2. Exponential traffic model and - 3. Pareto traffic model. These three traffic models have different traffic generation patterns which provide ease in observing the behaviour of NoC under different traffic scenarios. The number of flits generated under different traffic models under different traffic rates can be observed from the output trace files. The Table1 shows the count of flits. **Table 1.** No. of Flits Generated in different Traffic Models | Traffic Model | Traffic Rate | No. of flits | | |---------------|--------------|--------------|--| International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.5, September 2011 | | (Mbps) | generated | |-------------|--------|-----------| | CBR | 100 | 350000 | | | 140 | 525000 | | | 200 | 650000 | | | 100 | 300000 | | Exponential | 140 | 475000 | | | 200 | 575000 | | | 100 | 300000 | | Pareto | 140 | 475000 | | | 200 | 575000 | We observed the throughput achievements of all network layouts under the three traffic models and under the traffic rates 100, 140, 200 Mbs. ## 5.2 Simulation Parameters In each traffic model we define set of traffic parameters to observe the perfect behaviour of the network under those models. We assign the flit size to be constant as 8 bytes. The ON and OFF times of Pareto and Exponential traffic models are taken to be the same as 0.1 s and 1 ms respectively. During the ON state, there will be a sudden burst of traffic all through the network. With all the participating nodes as sources and all nodes sinks we analyse the performance of each network. The exponential and pareto traffic models are similar in the fact that they induce sudden burst of traffic during the ON time and no traffic in the OFF time. They differ only in the point that exponential traffic model follows normal distribution and Pareto model follows Pareto distribution #### 6 Results The simulation results are obtained for 2D mesh, WK Recursive, Spidergon and application specific topologies under CBR, Exponential and Pareto traffic models. We analyzed throughput, energy consumption and silicon area for the above topologies and comparisons are made. ## 6.1 Throughput comparison The throughput achieved in different network models under CBR traffic model is analyzed. From the simulation results it becomes obvious that custom build application specific topology gives higher throughput compared to other network topologies. Among the regular topologies, WK outperforms Spidergon and mesh. Here we present the graphs obtained while comparing the throughputs of regular and application specific networks under CBR and Pareto traffic models. #### 6.1.1 CBR Traffic Model CBR traffic model, as it name implies, induces a constant rate traffic in the network. We can fix the desired rate in NS2. In our simulation we considered three traffic rates 100, 140 and 200 Mps. The fig.6 gives the comparison of performance in terms of throughput for CBR model. The graph is plotted for throughput in Mbps against the time scale in microseconds (ms). The CBR graph analyzes shows, application specific network achieves maximum throughput than any other regular topologies. This is due to the reason, when highly communicating nodes are placed under the same cluster the success rate of a packet being delivered to the target increases as direct communicating links connect the source and destination nodes. WK recursive network also follows a cluster based topology like application specific network, with difference that the former is built without considering the parameters relating to communication weights and it is rigid in the choice of number of routers to be used and number of nodes in each cluster. These factors lead to the difference in throughput achievement. The difference in throughput between Spidergon and Mesh topology is due to the fact that, though Mesh topology has higher number links between routers, the shortest path between any two communicating nodes includes minimum number of hops in the Spidergon network than in Mesh network. Fig. 6. Throughput achieved in different networks under CBR model #### 6.1.2 Pareto Traffic Model The fig.7 gives the comparative analysis of throughput of different NoC topologies under pareto traffic model. This model generates traffic according to pareto distribution, which induces sudden burst of traffic with alternating periods of idleness characterized by zero traffic. The ON time and OFF time are set to 1ms and 0.1s respectively. The graph shows that, at the start of traffic burst, mesh topology seems to outweigh. But when ON and OFF periods alternates the performance of mesh declines. At the same time, application specific topologies maintains their packet delivery rate leading to higher throughput. The throughput of WK, Spidergon topologies varies between those of mesh and application specific topologies. Fig. 7. Throughput achieved in different networks under Pareto Model #### 6.2 Energy Comparison We use the energy model suggested in [6] to compare the energy consumed in the four topologies. The eqn. 2 is used in the calculation of energy. International Journal of Distributed and Parallel Systems (IJDPS) Vol.2, No.5, September 2011 $$E_{Av} = Nb_{flits}S_{flis} (h_{av} (0.39+0.12 \times l_{avw}) + 09776 h_{av}-1))$$ (2) The graph analysis (Fig.8) shows that application specific topologies consume minimum energy while achieving higher throughput. Mesh and Spidergon topologies vary slightly in the total energy consumed and WK tops the analysis. Fig. 8. Energy consumed in different topologies ## 6.3 Area Comparison The silicon area required for these NoC architectures are evaluated in this section. It depends on two major components namely size of buffers and routing logic. The factors which occupy chip area are links, switches and resources. The chip area can be calculated using the model given in [6]. $$A - \sum_{i=1}^{Ns} As(i) + \sum_{j=1}^{Nr} Ar(j) + \sum_{k=1}^{Nw} Aw(k)$$ where Ns is the number of switches, As is area of each switch, Nr is number of each PEs, Ar is area of PE, Nw is number of links and Aw is area of each link. From thi equation average area requirement can be derived as $$A_v = Ns(Rs + a_s d_g S_{flit} B_s) + NrAr + a_w Nw l_w^{av}$$ where Rs is the area required for routing table and routing logic, Bs is buffer size, $a_s$ is the area required for one byte, $d_g$ is average number of buffers, $S_{flit}$ is flit size in bytes, $a_w$ is the area required for a link and $l_w^{aw}$ in length of the link. The are requirement is calculated and compared for different topologies. The comparison chart is shown in figure 9. Fig. 9. Average area requirement in different networks Area requirement for application specific topology is considerably less compared to regular structures. The analysis shows that application specific topology usage improves the performance and reduces the energy consumption and silicon area. #### 7 Conclusion and future work Thus the choice of topology depends upon the requirement of the application. The overall performance of application specific network is high compared to all other network topologies. At the same time, designing the topology generation algorithms for these networks is a challenge. There exists a tradeoff between the choice of network topology and the performance achieved. High level simulation studies are useful for analysis of NoC architectures. But they provide little insight into actual design of NoC architectures. In future work, we compare the various application specific topology generation algorithms. The objective is to suggest algorithm for well customized topology generation that suits most of the embedded system applications. Also we implement the NoC architecture using HDL in real hardware to prove the concepts. ## References - 1. Benini L. and De Micheli G. 'Networks on Chips: A New SoC Paradigm' and IEEE Computer pp. 70-78, 2002. - Chang K.-C. and Chen T.-F. 'Low-power algorithm for automatic topology generation for application-specific networks on chips' and IET Comput. Digit. Tech. vol. 2 pp. 239–249, 2008. - 3. DALL W.J. and TOWLES B, 'Route packets and not wires: on-chip interconnection networks' and. Proc. Design and Automation Conf. pp. 684–689, 2001. - Henkel J. and Wayne Wolf and Srimat Chakradhar, 'On-chip networks: A scalable and communication-centric embedded system design paradigm' 17th International Conference on VLSI Design, 2004. - Lorenzo Verdoscia et al., 'An Adaptive Routing Algorithm for WK-Recursive Topologies' Source computing Journal and vol.63, 1999. - 6. Mohamed Bakhouya, 'Evaluating the energy consumption and the Silicon area of on-chip interconnect architectures', Journal of Systems Architecture, ELSIVER, 2009. - M A El-Moursy ., El Ghany, M.A.A., Ismail, M., Electron. Eng. Dept., German Univ. in Cairo, Cairo, Egypt , 'High throughput architecture for CLICHÉ Network on Chip', IEEE International SOC Conference, 2009. - 8. Suboh and S., Bakhouya M., J. Gaber and El-Ghazawi T., 'An interconnection architecture for network-on-chip systems', Telecommunication Syst. and Springer Science+Business Media and LLC, 2008. - 9. Sun Y.R., Kumar S. and Jantsch A., 'Simulation and evaluation of a network on chip architecture Using ns2' and in: Proc. the IEEE NorChip Conference, 2002. - 10. The Network Simulator ns-2, http://WWW.nsnam.org/ - 11. Van Der Tol E.B. And Jaspers E.G.T. 'Mapping of mepg-4 decoding on a flexible architecture platform'. SPIE2002, pp. 1–13, 2002. - 12. Wan-Yu Lee Iris Hui-Ru Jiang, 'Topology Generation and Floorplanning for Low Power Application-Specific Network-on-Chips' IEEE International Symposium on VLSI Design, Automation and Test, 2008. VLSI-DAT 2008. - 13. Yilmaz Ar, Suleyman Tosun and Hasaan Kaplan, 'TopGen: A New Algorithm for Automatic Topology Generation for Network on Chip Architectures to Reduce Power Consumption', 4<sup>th</sup> International Conference On Application of Information and Communication Technologies 2009.