Network-on-Chip and Special Function Units
Contemporary computer architectures are increasingly using multiple
computing cores. This decision is primarily driven by the fact that the
total power consumption of the chip has a hard constraint. It is envisioned
that in the future, we will have a very large number of heterogeneous cores
on the same die.
This leads to two key problems. The first problem is to
design a fast, power-efficient Network on Chip (NoC) to interconnect these
cores. The second problem arises from an opportunity. With the large number
of available cores, there is the opportunity to design special function
units tuned for specific computational tasks. Our work addresses the first
problem by using a source-synchronous ring-based NoC. The data in the rings
is transmitted in a source-synchronous fashion, strobed off of an extreme
high speed, low power ring-based standing-wave resonant clocking
paradigm. Our results
indicate a 4.5X improvement in bandwidth and about 7.5X improved
contention free latency using this approach, compared to the best
existing approach. Ongoing work includes exploring network topologies,
and benchmarking the NoC against real and synthetic traffic.
To address the second problem, my group has developed special purpose units
for computational tasks such as sorting, comparison
of two numbers, logarithm and antilogarithm computation, cryptographic key
generation and Boolean Satisfiability.
Publications, patents and artefacts:
- "A Fast, Source-synchronous Ring-based
Network-on-Chip Design", Mandal, Khatri, Mahapatra. Design
Automation and Test in Europe (DATE) conference
2012. Mar 12-26, Dresden, Germany. In this paper, we report an extremely
fast NoC design using a source-synchronous data transfer. The clock
used is an extremely fast, low power resonant clock.
- "CMOS Comparators for High-Speed and
Low-Power Applications", Menendez, Maduike, Garg, Khatri. IEEE
International Conference on Computer Design (ICCD), Oct 1-4, 2006, San
Jose, CA, pp. 76-81. We present two novel ways to design hardware
comparators, yielding about 37% improvement over competing approaches.
- "Sorting Binary Numbers in Hardware - a
Novel Algorithm and its Implementation", Alaparthi, Gulati,
Khatri. International Symposium on Circuits and Systems (ISCAS) 2009,
Taipei, Taiwan. May 24-27, 2009. In this paper, we present a fast special
function unit for sorting, which is based on a column scan, and is significantly faster than the best
known existing approach, with lower area (for larger numbers).
- "A Novel Cryptographic Key Exchange Scheme
using Resistors", Lin, Ivanov, Johnson, Khatri. IEEE International
Conference on Computer Design (ICCD) 2011, Amherst, MA, Oct 2011. pp
451-452. In this paper, we report a practical FPGA based implementation of the Kish
cipher, intended to use over the internet. Given a single shared secret
between Alice and Bob, they are both able to generate a new shared secret
(cryptographic key).
- "VLSI Implementation of a Non-Linear
Feedback Shift Register for High-Speed Cryptography Applications",
Lin, Khatri. Great Lakes Symposium on VLSI (GLS-VLSI) 2010. Providence,
RI May 16-18, 2010. This paper presents an extremely fast LFSR based
cryptographic key generator, which can operate at rates which match
OC-768 optical fiber communication rates.
- "A Fast Hardware Approach for Approximate,
Efficient Logarithm and Antilogarithm Computations", Paul,
Jayakumar, Khatri. IEEE Transactions on Very Large Scale Integration
Systems, vol. 17, number 2, Feb 2009, pp. 269-277.
- "An Efficient, Scalable Hardware Engine for
Boolean Satisfiability and Unsatisfiable Core Extraction", Gulati,
Waghmode, Khatri, Shi. IET Computers and Digital Techniques, vol. 2, number
3, May 2008, pp. 214-229. This paper represents a hardware custom IC based
implementation of a SAT solver. Boolean constraint propagation is done in
hardware, in a fast, scalable manner.