Francesco Fusco - LLM and ML Systems Researcher

Papers

	Advancements in Traffic Processing Using Programmable Hardware Flow Offload L. Deri, A. Cardigliano, F. Fusco Proc. of the 1st International Workshop on Data-Plane Programmability and Edge Network Acceleration (NetAccel-AI 2024) The exponential growth of data traffic and the increasing complexity of networked applications demand effective solutions capable of passively inspecting and analysing the network traffic for monitoring and security purposes. Implementing network probes in software using general-purpose operating systems has been made possible by advances in packet-capture technologies, such as kernel-bypass frameworks, and by multi-queue adapters designed to distribute the network workload in multi-core processors. Modern SmartNICs, in addition, have introduced stateful mechanisms to associate actions to network flows such as forwarding packets or updating traffic statistics for an individual flow. In this paper, we describe our experience in exploiting those functionalities in a modern network probe and we perform a detailed study of the performance characteristics under different scenarios. Compared to pure CPU-based solutions, SmartNICs with flow-offload technologies provide substantial benefits when implementing forwarding applications. However, the main limitation of having to keep large flow tables in the host memory remains largely unsolved for realistic monitoring and security applications.
	pNLP-Mixer: an Efficient all-MLP Architecture for Language F. Fusco, D. Pascual, P. Staar, D. Antognini Proc. of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023) Large pre-trained language models based on transformer architecture have drastically changed the natural language processing (NLP) landscape. However, deploying those models for on-device applications in constrained devices such as smart watches is completely impractical due to their size and inference cost. As an alternative to transformer-based architectures, recent work on efficient NLP has shown that weight-efficient models can attain competitive performance for simple tasks, such as slot filling and intent classification, with model sizes in the order of the megabyte. This work introduces the pNLP-Mixer architecture, an embedding-free MLP-Mixer model for on-device NLP that achieves high weight-efficiency thanks to a novel projection layer. We evaluate a pNLP-Mixer model of only one megabyte in size on two multi-lingual semantic parsing datasets, MTOP and multiATIS. Our quantized model achieves 99.4% and 97.8% the performance of mBERT on MTOP and multi-ATIS, while using 170x fewer parameters. Our model consistently beats the state-of-the-art of tiny models (pQRNN), which is twice as large, by a margin up to 7.8% on MTOP.
	Extracting Text Representations for Terms and Phrases in Technical Domains F. Fusco, D. Antognini Proc. of the 61st Annual Meeting of the Association for Computational Linguistics (ACL 2023) Extracting dense representations for terms and phrases is a task of great importance for knowledge discovery platforms targeting highly-technical fields. Dense representations are used as features for downstream components and have multiple applications ranging from ranking results in search to summarization. Common approaches to create dense representations include training domain-specific embeddings with self-supervised setups or using sentence encoder models trained over similarity tasks. In contrast to static embeddings, sentence encoders do not suffer from the out-of-vocabulary (OOV) problem, but impose significant computational costs. In this paper, we propose a fully unsupervised approach to text encoding that consists of training small character-based models with the objective of reconstructing large pre-trained embedding matrices. Models trained with this approach can not only match the quality of sentence encoders in technical domains, but are 5 times smaller and up to 10 times faster, even on high-end GPUs.
	Evaluating IP Blacklists effectiveness L.Deri, F. Fusco Proc. of the 6th International Workshop on Cyber Security (CSW-2023) IP blacklists are widely used to increase network security by preventing communications with peers that have been marked as malicious. There are several commercial offerings as well as several free-of-charge blacklists maintained by volunteers on the web. Despite their wide adoption, the effectiveness of the different IP blacklists in real-world scenarios is still not clear. In this paper, we conduct a large-scale network monitoring study which provides insightful findings regarding the effective- ness of blacklists. The results collected over several hundred thousand IP hosts belonging to three distinct large production networks highlight that blacklists are often tuned for precision, with the result that many malicious activities, such as scan- ning, are completely undetected. The proposed instrumentation approach to detect IP scanning and suspicious activities is implemented with home-grown and open-source software. Our tools enable the creation of blacklists without the security risks posed by the deployment of honeypots.
	Unsupervised Term Extraction for Highly Technical Domains F. Fusco, P. Staar, D. Antognini Proc. of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022) Term extraction is an information extraction task at the root of knowledge discovery platforms. Developing term extractors that are able to generalize across very diverse and potentially highly technical domains is challenging, as annotations for domains requiring in-depth expertise are scarce and expensive to obtain. In this paper, we describe the term extraction subsystem of a commercial knowledge discovery platform that targets highly technical fields such as pharma, medical, and material science. To be able to generalize across domains, we introduce a fully unsupervised annotator (UA). It extracts terms by combining novel morphological signals from sub-word tokenization with term-to-topic and intra-term similarity metrics, computed using general-domain pre-trained sentence-encoders. The annotator is used to implement a weakly-supervised setup, where transformer-models are fine-tuned (or pre-trained) over the training data generated by running the UA over large unlabeled corpora. Our experiments demonstrate that our setup can improve the predictive performance while decreasing the inference latency on both CPUs and GPUs. Our annotators provide a very competitive baseline for all the cases where annotations are not available.
	Using Deep Packet Inspection in CyberTraffic Analysis L. Deri, F. Fusco Proceedings of 2021 IEEE Intl Conference on Cyber Security and Resilience (CSR 2021) In recent years we have observed an escalation of cybersecurity attacks, which are becoming more sophisticated and harder to detect as they use more advanced evasion techniques and encrypted communications. The research community has often proposed the use of machine learning techniques to overcome the limitations of traditional cybersecurity approaches based on rules and signatures, which are hard to maintain, require constant updates, and do not solve the problems of zero-day attacks. Unfortunately, machine learning is not the holy grail of cybersecurity: machine learning-based techniques are hard to develop due to the lack of annotated data, are often computationally intensive, they can be target of hard to detect adversarial attacks, and more importantly are often not able to provide explanations for the predicted outcomes. In this paper, we describe a novel approach to cybersecurity detection leveraging on the concept of security score. Our approach demonstrates that extracting signals via deep packet inspections paves the way for efficient detection using traffic analysis. This work has been validated against various traffic datasets containing network attacks, showing that it can effectively detect network threats without the complexity of machine learning-based solutions.
	RecoNet: An Interpretable Neural Architecture for Recommender Systems F. Fusco, M. Vlachos, V. Vasiliadis, K. Wardatzky, J.Schneider Proc. of the 28th Int. Joint Conference on Artificial Intelligence (IJCAI 2019) Neural systems offer high predictive accuracy but are plagued by long training times and low interpretability. We present a simple neural architecture for recommender systems that lifts several of these shortcomings. Firstly, the approach has a high predictive power that is comparable to state-of-the-art recommender approaches. Secondly, owing to its simplicity, the trained model can be interpreted easily because it provides the individual contribution of each input feature to the decision. Our method is three orders of magnitude faster than general-purpose explanatory approaches, such as LIME. Finally, thanks to its design, our architecture addresses cold-start issues, and therefore the model does not require retraining in the presence of new users.
	On Improving Co-Cluster Quality with Application to Recommender Systems M. Vlachos, F. Fusco, H. Mavroforakis, A.Kyrillidis, V. Vasiliadis Proc. of the 13th ACM Int. Conference on Information and knowledge Management (CIKM 2014) Businesses store an ever increasing amount of historical customer sales data. Given the availability of such information, it is advantageous to analyze past sales, both for revealing dominant buying patterns, and for providing more targeted recommendations to clients. In this context, co-clustering has proved to be an important datamodeling primitive for revealing latent connections between two sets of entities, such as customers and products. In this work, we introduce a new algorithm for co-clustering that is both scalable and highly resilient to noise. Our method is inspired by k -Means and agglomerative hierarchical clustering approaches: ( i ) first it searches for elementary co-clustering structures and ( ii ) then combines them into a better, more compact, solution. The algorithm is flexible as it does not require an explicit number of co-clusters as input, and is directly applicable on large data graphs. We apply our methodology on real sales data to analyze and visualize the connections between clients and products. We showcase a real deployment of the system, and how it has been used for driving a recommendation engine. Finally, we demonstrate that the new methodology can discover co-clusters of better quality and relevance than state-of-the-art co-clustering techniques.
	pcapIndex: An Index for Network Packet Traces with Legacy Compatibility F. Fusco, X. Dimitropoulos, M. Vlachos, L. Deri Computer Communication Review (CCR), Vol. 42, No. 1, Jan 2012 Long-term historical analysis of captured network traffic is a topic of great interest in network monitoring and network security. A critical requirement is the support for fast discovery of packets that satisfy certain criteria within large-scale packet repositories. This work presents the first indexing scheme for network packet traces based on compressed bitmap indexing principles. Our approach supports very fast insertion rates and results in compact index sizes. The proposed indexing methodology builds upon libpcap, the de-facto reference library for accessing packet-trace repositories. Our solution is therefore backward compatible with any solution that uses the original library. We experience impressive speedups on packet-trace search operations: our experiments suggest that the index-enabled libpcap may reduce the packet retrieval time by more than 1100 times.
	Real-time creation of bitmap indexes on streaming network data F. Fusco, M. Vlachos, M. Stoecklin The VLDB Journal, Vol. 21, Issue 3 , June 2012 High-speed archival and indexing solutions of streaming traffic are growing in importance for applications such as monitoring, forensic analysis, and auditing. Many large institutions require fast solutions to support expedient analysis of historical network data, particularly in case of security breaches. However, “turning back the clock” is not a trivial task. The first major challenge is that such a technology needs to support data archiving under extremely highspeed insertion rates. Moreover, the archives created have to be stored in a compressed format that is still amenable to indexing and search. The above requirements make general-purpose databases unsuitable for this task and dedicated solutions are required. This work describes a solution for high-speed archival storage, indexing, and data querying on networkflowinformation.We make the two following important contributions: (a) we propose a novel compressed bitmap index approach that significantly reduces both CPU load and disk consumption and, (b) we introduce an online stream reordering mechanism that further reduces space requirements and improves the time for data retrieval.The reordering methodology is based on the principles of locality-sensitive hashing (LSH) and also of interest for other bitmap creation techniques. Because of the synergy of these two components, our solution can sustain data insertion rates that reach 500,000–1million records per second. To put these numbers into perspective, typical commercial network flow solutions can currently process 20,000–60,000 flows per second. In addition, our system offers interactive query response times that enable administrators to perform complex analysis tasks.
	Indexing million of packets per second using GPUs F. Fusco, M. Vlachos, X. Dimitropoulos, L. Deri, Proc. of the 13th ACM SIGCOMM Internet Measurement Conference (IMC 2013) Network traffic recorders are devices that record massive volumes of network traffic for security applications, like retrospective forensic investigations. When deployed over very high-speed networks, traffic recorders must process and stor e millions of packets per second. To enable interactive explo rations of such large traffic archives, packet indexing mecha nisms are required. Indexing packets at wire rates (10 Gbps and above) on commodity hardware imposes unparalleled requirements for high throughput index creation. Such indexing throughputs are presently untenable with modern indexing technologies and current processor architecture s. In this work, we propose to intelligently offload indexing to commodity General Processing Units (GPUs). We introduce algorithms for building compressed bitmap indexes in real time on GPUs and show that we can achieve indexing throughputs of up to 185 millions records per second, which is an improvement by one order of magnitude compared to the state-of-the-art. This shows that indexing network tra ffic at multi-10-Gbps rates is well within reach.
	Realtime MicroCloud-based Flow Aggregation for Fixed and Mobile Networks L. Deri, F. Fusco Proc. of the Intern. Proc of the 9th Intern. Wireless Communications and Mobile Computing Conference (IWCMC 2013) Monitoring of large distributed networks requires the deployment of several probes at different network locations where traffic to be analyzed is flowing. Each probe analy zes the traffic and sends the monitoring data toward a centralized management station. This semi-centralized architecture based on the push model is extensively adopted to analyze large distributed networks. However, this architecture presents serious limitations when used to provide real-time traffic monitoring and correlation capabilities across all probes. This paper describes a novel architecture that addresses the problem of real-time traffic correlation and alerting, by exploiting modern cloud infrastructures. In particular, we propose the adoption of a small-sized cloud to provide a consistent data space that is: i) shared among distinct probes to selectively store monitoring data, and, ii) accessible by external applications to retrieve selected information. We validate our architecture on large distributed networks in the context of DNS and 3- and 4G mobile traffic monitoring.
	MicroCloud-based Network Traffic Monitoring L. Deri, F. Fusco Proc. of the Intern. Symposium on Integrated Network Management (IM 2013 Monitoring of large distributed networks requires the deployment of several probes at different network locations where traffic to be analyzed is flowing. Each probe analyzes the traffic and sends the monitoring data toward a centralized management station. This semi-centralized architecture based on the push model is extensively adopted to analyze large distributed networks. However, this architecture presents serious limitations when used to provide real-time traffic monitoring and correlation capabilities across all probes. This paper describes a novel architecture that addresses the problem of real-time traffic correlation and alerting, by exploiting modern cloud infrastructures. In particular, we propose the adoption of a small-sized cloud to provide a consistent data space that is: i) shared among distinct probes to selectively store monitoring data, and, ii) accessible by external applications to retrieve selected information. We validate our architecture on large distributed networks in the context of DNS traffic monitoring.
	10 Gbit Line Rate Packet-to-Disk Using n2disk L. Deri, A. Cardigliano, F. Fusco Proc. of 2013 Traffic Monitoring and Analysis workshop (TMA 2013) Capturing packets to disk at line rate and with high precision packet timestamping is required whenever an evidence of network communications has to be provided. Typical applications of long-term network traffic repositories are network troubleshooting, analysis of security violations, and analysis of high-frequency trading communications. Appliances for 10 Gbit packet capture to disk are often based on dedicated network adapters, and therefore very expensive, making them usable only in specific domains. This paper covers the design and implementation of n2disk, a packet capture to disk application, capable of dumping 10 Gbit traffic to disk using commodity hardware and open-source software. In addition to packet capture, n2disk is able to index the traffic at line-rate during capture, enabling users to efficiently search specific packets in network traffic dump files.
	RasterZip: Compressing Streaming Network Monitoring Data with Support for Partial Decompression F. Fusco, M. Vlachos, X. Dimitropoulos Proc. of the 12th ACM SIGCOMM Internet Measurement Conference (IMC 2012) Network traffic archival solutions are fundamental for a number of emerging applications that require: a) efficient storage of high-speed streams of traffic records and b) support for interactive exploration of massive datasets. Compression is a fundamental building block for any traffic archival solution. However, present solutions are tied to general-purpose compressors, which do not exploit patterns of network traffic data and require to decompress a lot of redundant data for high selectivity queries. In this work we introduce RasterZIP, a novel domain-specific compressor designed for network traffic monitoring data. RasterZIP uses an optimized lossless encoding that exploits patterns of traffic data, like the fact that IP addresses tend to share a common prefix. RasterZIP also introduces a novel decompression scheme that accelerates highly selective queries targeting a small portion of the dataset. With our solution we can achieve high-speed on-the-fly compression of more than half a million traffic records per second. We compare RasterZIP with the fastest Lempel-Ziv-based compressor and show that our solution improves the state-of-the-art both in terms of compression ratios and query response times without introducing penalty in any other performance metric.
	tsdb: A Compressed Database For Time Series L. Deri, S. Mainardi, F. Fusco Proc. of 2012 Traffic Monitoring and Analysis workshop (TMA 2012) Large-scale network monitoring systems require efficient storage and consolidation of measurement data. Relational databases and popular tools such as the Round-Robin Database show their limitations when handling a large number of time series. This is because data access time greatly increases with the cardinality of data and number of measurements. The result is that monitoring systems are forced to store very few metrics at low frequency in order to grant data access within acceptable time boundaries. This paper describes a novel compressed time series database named tsdb whose goal is to allow large time series to be stored and consolidated in real-time with limited disk space usage. The validation has demonstrated the advantage of tsdb over traditional approaches, and has shown that tsdb is suitable for handling a large number of time series.
	vPF_RING: Towards Wire-Speed Network Monitoring Using Virtual Machines A. Cardigliano, L. Deri, J. Gasparakis, F. Fusco Proc. of the 11th ACM SIGCOMM Internet Measurement Conference (IMC 2011) The demand of highly flexible and easy to deploy network monitoring systems has pushed companies toward software based network monitoring probes implemented with commodity hardware rather than with expensive and highly specialized network devices. Deploying software probes under virtual machines executed on the same physical box is attractive for reducing deployment costs and for simplifying the management of advanced network monitoring architectures built on top of heterogeneous monitoring tools (i.e. Intrusion Detection Systems and Performance Monitoring Systems). Unfortunately, software probes are usually not able to meet the performance requirements when deployed in virtualized environments as virtualization introduces severe performance bottlenecks when performing packet capture, which is the core activity of passive network monitoring systems. This paper covers the design and implementation of vPF_RING, a novel framework for efficiently capturing packets on virtual machines running on commodity hardware. This solution allows network administrators to exploit the benefits of virtualization such as reduced costs and centralized administration, while preserving the ability to capture packets at wire speed even when deploying applications in virtual machines. The validation process has demonstrated that this solution can be profitably used for multi-gigabit network monitoring, paving the way to low-cost virtualized monitoring systems.
	NET-FLi: On-the-fly Compression, Archiving and Indexing of Streaming Network Traffic F. Fusco, M. Stoecklin, M. Vlachos Proc. of the 36th International Conference on Very Large Databases (VLDB 2010) The ever-increasing number of intrusions in public and commercial networks has created the need for high-speed archival solutions that continuously store streaming network data to enable forensic analysis and auditing. However, “turning back the clock” for post-attack analyses is not a trivial task. The first major challenge is that the solution has to sustain data archiving under extremely high-speed insertion rates. Moreover, the archives created need to be stored in a format that is compressed but still amenable to indexing. The above requirements make general-purpose databases unsuitable for this task, and, thus, dedicated solutions are required. In this paper, we describe a prototype solution that satisfies all requirements for high-speed archival storage, indexing and data querying on network flow information. The superior performance of our approach is attributed to the on-the-fly compression and indexing scheme, which is based on compressed bitmap principles. Typical commercial solutions can currently process 20,000-60,000 flows per second. An evaluation of our prototype implementation on current commodity hardware using real-world traffic traces shows its ability to sustain insertion rates ranging from 500,000 to more than 1 million records per second. The system offers interactive query response times that enable administrators to perform complex analysis tasks on-the-fly. Our technique is directly amenable to parallel execution, allowing its application in domains that are challenged by large volumes of historical measurement data, such as network auditing, traffic behavior analysis and large-scale data visualization in service provider networks
	High-speed Network Traffic Analysis with Commodity Multi-core Systems F. Fusco and L. Deri Proc. of the 10th ACM SIGCOMM Internet Measurement Conference (IMC 2010) Multi-core systems are the current dominant trend in computer processors. However, kernel network layers often do not fully exploit multi-core architectures. This is due to issues such as legacy code, resource competition of the RX-queues in network interfaces, as well as unnecessary memory copies between the OS layers. The result is that packet capture, the core operation in every network monitoring application, may even experience performance penalties when adapted to multi-core architectures. This work presents common pitfalls of network monitoring applications when used with multi-core systems, and presents solutions to these issues. We describe the design and implementation of a novel multi-core aware packet capture kernel module that enables monitoring applications to scale with the number of cores. We showcase that we can achieve high packet capture performance on modern commodity hardware.
	Towards Monitoring Programmability in the Future Internet: challenges and solutions F.Fusco, L. Deri, J. Gasparaki Proc. of the 21st Tyrrhenian Workshop on Digital Communications: Trustworthy Internet, (ITWDC 2010) Internet is becoming a global IT infrastructure serving interactive and real-time services ubiquitously accessible by heterogeneous network-enabled devices. In the Internet of Services (IoS) era, monitoring infrastructures must provide to network operators fine-grained service-specific information which can be derived by dissecting application level protocols. To accommodate these new monitoring requirements network probes must be flexible, easy to extend and still be capable to cope with high-speed network streams. Despite the increased complexity, software and hardware technologies on top of which network probes are implemented have been designed when monitoring requirements were substantially different and almost left unchanged. As a result, implementing modern probes is challenging and time consuming. In this paper we identify desirable features for reducing the development effort of complex probes, and we present a home-grown comprehensive software framework that significantly simplifies the creation of service-oriented monitoring applications.
	Wire-Speed Hardware-Assisted Traffic Filtering with Mainstream Network Adapter L. Deri, J. Gasparakis, P. Waskiewicz Jr, F. Fusco Proc. of the 1st International Workshop on Network Embedded Management and Applications (NEMA 2010) Modern computer architectures are founded on multi-core processors. In order to efficiently process network traffic, it is necessary to dynamically split high-speed packet streams across cores based on the monitoring goal. Most network adapters are multi-core aware but offer limited facilities for assigning packets to processor cores. In this paper we introduce a hybrid traffic analysis framework that leverages flexible packet balancing mechanisms available on recent 10 Gbit commodity network adapters not yet exploited by operating systems. The main contribution of this paper is an open source hardware-assisted software layer for dynamically configuring packet balancing policies in order to fully exploit multi-core systems and enable 10 Gbit wire-speed network traffic analysis.
	Enabling High-Speed and Extensible Real-Time Communications Monitoring F. Fusco, F. Huici, L. Deri, S. Niccolini, T. Ewald Proc. of the 11th IFIP/IEEE International Symposium on Integrated Network Management (IM 2009) The use of the Internet as a medium for real-time communications has grown significantly over the past few years. However, the best-effort model of this network is not particularly well-suited to the demands of users who are familiar with the reliability, quality and security of the Public Switched Telephone Network. If the growth is to continue, monitoring and real time analysis of communication data will be needed in order to ensure good call quality, and should degradation occur, to take corrective action. Writing this type of monitoring application is difficult and time consuming: VoIP traffic not only tends to use dynamic ports, but its real-time nature, along with the fact that its packets tend to be small, impose non-trivial performance requirements. In this paper we present RTC-Mon, the Real-Time Communications Monitoring framework, which provides an extensible platform for the quick development of high-speed, real-time monitoring applications. While the focus is on VoIP traffic, the framework is general and is capable of monitoring any type of real-time communications traffic. We present testbed performance results for the various components of RTC-Mon, showing that it can monitor a large number of concurrent flows without losing packets. In addition, we implemented a proof-ofconcept application that can not only track statistics about a large number of calls and their users, but that consists of only 800 lines of code, showing that the framework is efficient and that it also significantly reduces development time.