Phát hiện webshell bằng phương pháp học sâu

Le Viet Ha

Luận án tiến sĩ: Phát hiện webshell với các phương pháp học sâu

Luận án tiến sĩ đề xuất các phương pháp học sâu để phát hiện webshell thông qua phân tích mã nguồn và lưu lượng HTTP. Nghiên cứu tích hợp kỹ thuật dựa trên chữ ký với thuật toán học sâu, cải thiện khả năng phát hiện cả webshell đã biết và chưa biết.

Trường ĐH

Vietnam National University - University of Engineering and Technology

Chuyên ngành

Information Systems

Tác giả

Luan An

Thể loại

Luận án tiến sĩ

Năm xuất bản

2024

Số trang

139

Thời gian đọc

21 phút

Lượt xem

2

Lượt tải

0

Phí lưu trữ

40 Point

I. Tổng Quan Phát Hiện Webshell Bằng Deep Learning

Webshell đã trở thành mối đe dọa nghiêm trọng đối với bảo mật ứng dụng web. Các phương pháp truyền thống không còn hiệu quả trước webshell mới. Deep learning security mang đến giải pháp đột phá. Công nghệ này kết hợp machine learning cybersecurity với phân tích mã nguồn độc hại. Luận án nghiên cứu hai hướng chính: quét mã nguồn và phân tích HTTP traffic. Hướng đầu tiên sử dụng neural network malware detection để nhận diện webshell trong code. Hướng thứ hai áp dụng CNN phát hiện webshell qua lưu lượng mạng. Cả hai phương pháp đều tích hợp RNN LSTM security để nâng cao độ chính xác. Kết quả đạt được 01 bằng sáng chế quốc gia, 02 bài báo SCI-E, 01 bài E-SCI. Nghiên cứu còn được ứng dụng trong dự án KC01.19/16-20 của Bộ Khoa học và Công nghệ. Framework ASAF được phát triển cho cả ngôn ngữ thông dịch và biên dịch. Mô hình đã được thử nghiệm trên dataset CSE-CIC-IDS2018 chuẩn quốc tế.

1.1. Bối Cảnh Và Thách Thức Bảo Mật Web Hiện Đại

Tấn công webshell gia tăng nhanh chóng trong môi trường số hóa. Hacker sử dụng kỹ thuật evasion phức tạp để qua mặt hệ thống bảo mật. Phương pháp signature-based truyền thống không phát hiện được webshell mới. Static analysis webshell gặp khó khăn với code obfuscation. Dynamic analysis webshell tốn nhiều tài nguyên hệ thống. Feature extraction malware đòi hỏi chuyên môn cao từ chuyên gia. Khoảng cách giữa khả năng tấn công và phòng thủ ngày càng lớn. Deep learning mang lại khả năng tự động học các pattern mới. Công nghệ này giảm thiểu sự phụ thuộc vào cập nhật signature thủ công.

1.2. Vai Trò Của Deep Learning Trong Cybersecurity

Deep learning đã cách mạng hóa phát hiện mã độc webshell. Neural network có khả năng học đặc trưng phức tạp từ dữ liệu. CNN phát hiện webshell hiệu quả qua phân tích pattern không gian. RNN LSTM security xử lý tốt chuỗi dữ liệu thời gian trong HTTP traffic. Machine learning cybersecurity tự động trích xuất feature từ raw data. Mô hình có thể phát hiện cả webshell zero-day chưa từng xuất hiện. Độ chính xác cao hơn đáng kể so với phương pháp truyền thống. Thời gian phát hiện giảm từ giờ xuống còn mili giây. Khả năng mở rộng tốt cho hệ thống enterprise lớn.

1.3. Phạm Vi Và Phương Pháp Nghiên Cứu

Luận án tập trung vào hai hướng nghiên cứu chính. Hướng thứ nhất: phân tích mã nguồn ứng dụng web bằng static analysis. Hướng thứ hai: giám sát HTTP traffic real-time bằng dynamic analysis. Nghiên cứu chọn PHP cho ngôn ngữ thông dịch, ASP.NET cho ngôn ngữ biên dịch. Framework ASAF được thiết kế linh hoạt cho nhiều ngôn ngữ lập trình. Thuật toán cải tiến loss function giải quyết vấn đề data imbalance. Thử nghiệm trên dataset chuẩn CSE-CIC-IDS2018 để đánh giá khách quan. Tích hợp với NetIDPS system để triển khai thực tế. Phương pháp đo lường bao gồm precision, recall, F1-score và accuracy.

II. Framework ASAF Phát Hiện Webshell Qua Code

ASAF (Advanced DL-Powered Source-Code Scanning Framework) là giải pháp toàn diện. Framework kết hợp signature-based detection với deep learning algorithms. Static analysis webshell được tăng cường bởi neural network malware detection. Hệ thống phát hiện cả webshell đã biết và chưa biết hiệu quả. Feature extraction malware tự động từ abstract syntax tree (AST). CNN phát hiện webshell qua pattern recognition trong code structure. Kiến trúc modular cho phép customize cho từng ngôn ngữ lập trình. Phần Yara-based detector xử lý webshell signature truyền thống. Phần deep learning model học đặc trưng mới từ unknown samples. Tích hợp hai phương pháp tạo ra defense-in-depth strategy mạnh mẽ. Thử nghiệm với PHP và ASP.NET cho kết quả vượt trội. Độ chính xác đạt trên 98% với tỷ lệ false positive thấp.

2.1. Kiến Trúc Framework ASAF Đa Tầng

ASAF sử dụng kiến trúc phân lớp linh hoạt và mở rộng. Tầng preprocessing chuẩn hóa source code từ nhiều định dạng khác nhau. Parser chuyển đổi code thành abstract syntax tree chuẩn. Feature extractor trích xuất đặc trưng cú pháp và ngữ nghĩa. Tầng detection kết hợp rule-based và model-based approaches. Signature engine sử dụng Yara rules cho known webshell patterns. Deep learning engine áp dụng CNN và RNN cho unknown detection. Post-processing layer phân tích kết quả và giảm false positives. Output module tạo báo cáo chi tiết với severity scoring. API gateway cho phép tích hợp với CI/CD pipeline. Framework hỗ trợ batch scanning và real-time monitoring mode.

2.2. Phát Hiện PHP Webshell Với Deep Learning

PHP là ngôn ngữ phổ biến nhất cho web development. Đây cũng là mục tiêu chính của webshell attacks. ASAF-PHP module phân tích cú pháp đặc thù của PHP. Mô hình CNN xử lý token sequence từ PHP source code. LSTM layer học context dependencies trong code flow. Feature extraction bao gồm dangerous functions, eval usage, obfuscation patterns. Training dataset gồm 15,000+ PHP webshell samples từ nhiều nguồn. Model đạt accuracy 98.7%, precision 97.9%, recall 98.5%. False positive rate chỉ 1.3% trên legitimate PHP applications. Thời gian scan trung bình 50ms cho file 10KB. Khả năng phát hiện polymorphic và metamorphic webshells tốt.

2.3. Phát Hiện ASP.NET Webshell Qua Compiled Code

ASP.NET webshell thường tồn tại dưới dạng compiled assemblies. ASAF-ASPNET module decompile và phân tích IL code. Static analysis webshell áp dụng trên intermediate language bytecode. Feature extraction malware từ API calls, reflection usage, dynamic loading. Deep learning model học từ control flow graph và data flow analysis. Dataset training bao gồm 8,000+ ASP.NET webshell samples. Model đạt accuracy 97.2%, với F1-score 96.8%. Phát hiện được cả webshell embedded trong legitimate DLLs. Integration với build process để scan trước khi deployment. Performance overhead minimal không ảnh hưởng development workflow.

III. Phát Hiện Webshell Qua HTTP Traffic Analysis

Phân tích HTTP traffic mang lại góc nhìn runtime về webshell activity. Dynamic analysis webshell bắt được hành vi thực thi thực tế. RNN LSTM security xử lý sequence của HTTP requests hiệu quả. Mô hình học pattern bất thường trong web application behavior. Feature extraction malware từ headers, parameters, payload content. CNN phát hiện webshell qua spatial patterns trong network traffic. Dataset CSE-CIC-IDS2018 cung cấp labeled traffic cho training. Thuật toán custom loss function giải quyết class imbalance problem. Machine learning cybersecurity tự động phân loại benign và malicious requests. Real-time detection với latency dưới 10ms cho mỗi request. Tích hợp NetIDPS system để blocking tự động attack sources. Blacklist management và URI filtering bảo vệ web server proactively.

3.1. Deep Neural Network Cho Traffic Analysis

Kiến trúc neural network kết hợp CNN và LSTM layers. CNN layer trích xuất spatial features từ HTTP request structure. Max pooling giảm dimensionality và tăng translation invariance. LSTM layer học temporal dependencies giữa các requests liên tiếp. Attention mechanism tập trung vào các phần quan trọng của traffic. Bidirectional LSTM nắm bắt context từ cả hai hướng. Dense layers với dropout regularization tránh overfitting. Output layer sử dụng softmax cho multi-class classification. Model training với Adam optimizer và custom loss function. Batch normalization tăng tốc convergence và stability. Architecture tối ưu qua extensive hyperparameter tuning.

3.2. Giải Quyết Data Imbalance Trong Training

Webshell traffic chiếm tỷ lệ rất nhỏ trong tổng HTTP requests. Class imbalance gây bias model về majority class. Thuật toán custom loss function tăng weight cho minority class. Focal loss tập trung vào hard-to-classify examples. SMOTE technique tạo synthetic samples cho webshell class. Under-sampling majority class để cân bằng training data. Cost-sensitive learning gán penalty cao hơn cho false negatives. Ensemble methods kết hợp multiple models trained khác nhau. Evaluation metrics bao gồm precision, recall, F1-score, AUC-ROC. Cross-validation đảm bảo model generalization tốt. Results cho thấy improvement đáng kể so với standard loss.

3.3. Tích Hợp NetIDPS Và Auto Blocking

NetIDPS system cung cấp nền tảng cho real-time detection. Deep learning model được deploy như detection engine module. Traffic mirroring đảm bảo zero impact lên production performance. Detection triggers automatic response actions ngay lập tức. Malicious IP addresses được thêm vào blacklist tự động. URI patterns của webshell được block tại web server level. Firewall rules được update động dựa trên detection results. Alert system thông báo SOC team về suspicious activities. Logging và forensics data được lưu cho incident investigation. Dashboard visualization hiển thị security metrics real-time. Integration testing đảm bảo reliability và false positive management.

IV. Kỹ Thuật Feature Extraction Cho Malware Detection

Feature extraction malware là bước quan trọng trong machine learning cybersecurity. Static analysis webshell trích xuất features từ source code structure. Dynamic analysis webshell thu thập features từ runtime behavior. Phương pháp kết hợp hai approaches tạo feature set toàn diện. Từ source code: AST nodes, function calls, string literals, control flow. Từ HTTP traffic: request frequency, parameter patterns, payload entropy. Neural network malware detection tự động học high-level features. Manual feature engineering vẫn cần thiết cho domain knowledge. Dimensionality reduction techniques giảm feature space complexity. PCA và t-SNE visualization giúp hiểu feature distributions. Feature importance analysis xác định predictive power của từng feature. Continuous feature engineering cải thiện model performance over time.

4.1. Static Features Từ Source Code Analysis

Abstract syntax tree cung cấp structural representation của code. Function call graph cho thấy control flow và dependencies. Dangerous API usage như eval, exec, system commands. String analysis phát hiện obfuscated và encoded payloads. Variable naming patterns thường khác biệt trong malicious code. Code complexity metrics: cyclomatic complexity, nesting depth. Import statements và library dependencies analysis. Regular expression patterns cho command injection detection. Entropy calculation của string literals phát hiện encryption. N-gram features từ token sequences trong source code. Opcode sequences từ compiled bytecode cho compiled languages.

4.2. Dynamic Features Từ HTTP Traffic Patterns

Request frequency và timing patterns của webshell communication. HTTP method distribution khác biệt giữa normal và malicious. Header anomalies như unusual user-agents, custom headers. Parameter names và values thường chứa command indicators. Payload size distribution và entropy của POST data. Session characteristics như duration, request count per session. Response time patterns có thể indicate command execution. Referrer và origin headers cho cross-site request analysis. Cookie patterns và authentication token behaviors. Geographic và IP reputation features của request sources. Protocol compliance và RFC violation detection.

4.3. Automated Feature Learning Với Deep Networks

Deep learning tự động học representation từ raw data. Convolutional layers trích xuất local patterns và hierarchies. Pooling operations tạo translation-invariant features. Recurrent layers capture sequential dependencies automatically. Attention mechanisms identify salient features dynamically. Embedding layers học dense representations cho categorical data. Autoencoders phát hiện anomalies qua reconstruction error. Transfer learning leverage pre-trained models cho new tasks. Feature visualization techniques hiểu neural network decisions. Ablation studies đánh giá contribution của từng layer. End-to-end learning giảm dependency vào manual engineering.

V. Thử Nghiệm Và Đánh Giá Hiệu Quả Detection

Evaluation methodology sử dụng dataset chuẩn CSE-CIC-IDS2018. Dataset chứa labeled network traffic với nhiều attack types. Webshell samples được collect từ public repositories và honeypots. Training/validation/test split theo tỷ lệ 70/15/15 chuẩn. Metrics bao gồm accuracy, precision, recall, F1-score, AUC-ROC. Confusion matrix phân tích chi tiết true/false positives/negatives. Cross-validation với 5 folds đảm bảo statistical significance. Comparison với baseline methods: signature-based, traditional ML. Performance testing đo throughput và latency trong production. Ablation studies đánh giá contribution của từng component. Results cho thấy improvement vượt trội so với existing approaches. Real-world deployment validation tại enterprise environments thành công.

5.1. Dataset Và Experimental Setup Chi Tiết

CSE-CIC-IDS2018 dataset gồm 16 million network flows. Labeled data bao gồm normal traffic và 14 attack categories. Webshell-specific samples được augment từ external sources. Total 25,000+ webshell samples cho training và testing. Hardware setup: NVIDIA Tesla V100 GPU, 64GB RAM. Software stack: TensorFlow 2.x, Keras, Python 3.8. Training configuration: batch size 128, learning rate 0.001. Early stopping với patience 10 epochs tránh overfitting. Data augmentation techniques cho source code samples. Normalization và standardization của numerical features. Class weight balancing trong loss function calculation.

5.2. Kết Quả So Sánh Với Baseline Methods

ASAF framework đạt accuracy 98.7% trên PHP webshell detection. Vượt trội signature-based methods (85.3%) và traditional ML (92.1%). HTTP traffic analysis model đạt F1-score 96.8%. Precision 97.2% đảm bảo low false positive rate. Recall 96.4% catch được majority của webshell attacks. AUC-ROC score 0.989 cho thấy excellent discrimination ability. Comparison với published research papers trên same dataset. ASAF outperforms existing deep learning approaches 3-5%. Inference time 50ms cho source code, 10ms cho traffic. Memory footprint reasonable cho production deployment. Scalability testing với concurrent requests cho positive results.

5.3. Real World Deployment Và Practical Impact

Integration với national research project KC01.19/16-20 thành công. Deployment tại multiple enterprise web applications. Detection của zero-day webshells chưa có trong signature databases. Reduction 85% trong incident response time. False positive rate dưới 2% acceptable cho security teams. Automatic blocking prevented 1,200+ attack attempts. Cost savings từ reduced manual analysis và faster remediation. User feedback positive về system usability và effectiveness. Continuous learning từ new samples improve model over time. Patent applications filed cho novel detection techniques. Publications trong SCI-E và E-SCI journals validate contributions.

VI. Hướng Phát Triển Deep Learning Security Tương Lai

Machine learning cybersecurity đang evolve với adversarial AI. Attackers sử dụng deep learning để evade detection systems. Adversarial training cần thiết để robust against evasion attempts. Explainable AI (XAI) giúp security analysts hiểu model decisions. Federated learning cho phép collaborative training without data sharing. AutoML tự động tối ưu architecture và hyperparameters. Transfer learning giảm dependency vào large labeled datasets. Online learning adapts model real-time với new attack patterns. Quantum computing có thể revolutionize cả attack và defense. Integration với threat intelligence platforms cho context-aware detection. Multi-modal learning kết hợp code, traffic, và system logs. Future research directions promising cho next-generation security systems.

6.1. Adversarial Machine Learning Và Robustness

Adversarial examples có thể fool deep learning models. Attackers craft inputs để bypass detection với minimal changes. Adversarial training include perturbed samples trong training data. Defensive distillation reduces model sensitivity to perturbations. Gradient masking techniques hide gradients từ attackers. Ensemble methods increase robustness qua model diversity. Input transformation và randomization defend against attacks. Certified defenses provide provable robustness guarantees. Detection của adversarial examples trước khi classification. Research ongoing về arms race giữa attacks và defenses. Robustness evaluation critical cho production deployment.

6.2. Explainable AI Cho Security Operations

Black-box models gây khó khăn cho security analysts. Explainability builds trust và facilitates human-in-the-loop. LIME và SHAP techniques explain individual predictions. Attention visualization shows which features model focuses on. Saliency maps highlight important regions trong input data. Rule extraction from neural networks cho interpretable policies. Counterfactual explanations show what changes would alter prediction. Model debugging identifies failure modes và biases. Compliance requirements demand explainability trong certain industries. XAI enables faster incident investigation và response. Balance giữa accuracy và interpretability remains challenge.

6.3. Tích Hợp Threat Intelligence Và Automation

Threat intelligence feeds provide context cho detections. Integration với MITRE ATT&CK framework cho tactic mapping. Automated correlation giữa multiple detection signals. SOAR platforms orchestrate response actions automatically. Threat hunting powered by machine learning anomaly detection. Predictive analytics forecast future attack trends. Risk scoring combines multiple factors cho prioritization. Automated reporting generates actionable intelligence cho stakeholders. Integration với SIEM systems cho centralized monitoring. Continuous improvement qua feedback loops từ analysts. Future vision: fully autonomous security operations centers.

24/03/2026

Xem trước tài liệu

Tải đầy đủ để xem toàn bộ nội dung

Luan-An-Tien-Si-He-Thong-Thong-Tin-Enhancing-Webshell-Detection-With-Deep-Learning-Powered-Methods-Nghien-Cuu-Mot-So-Phuong-Phap-Hoc-Sau-Trong-Phat-Hien-Doan-Ma-Doc.pdf

Tải xuống file đầy đủ để xem toàn bộ nội dung

Tải đầy đủ (139 trang)

Trích đoạn nội dung luận án

Tải xuống để đọc toàn bộ

VIETNAM NATIONAL UNIVERSITY HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY Le Viet Ha ENHANCING WEBSHELL DETECTION WITH DEEP LEARNING-POWERED METHODS PHD DISSERTATION IN INFORMATION SYSTEMS Ha Noi - 2024 VIETNAM NATIONAL UNIVERSITY HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY Le Viet Ha ENHANCING WEBSHELL DETECTION WITH DEEP LEARNING-POWERED METHODS Major: Information Systems Code: 9480104.01 PHD DISSERTATION OF INFORMATION SYSTEMS PhD STUDENT SUPERVISORS Le Viet Ha Nguyen Ngoc Hoa Phung Van On CONFIRMATION OF THE TRAINING UNIVERSITY Ha Noi - 2024 DECLARATION OF AUTHORSHIP I, Le Viet Ha, declare that this dissertation titled, "ENHANCING WEBSHELL DETECTION WITH DEEP LEARNING-POWERED METHODS" and the work presented in it are my own. I confirm that: m This work was done mainly while in candidature for the degree of Ph.D at VNU University of Engineering and Technology. m This dissertation has not previously been submitted for any degree. m The results in my dissertation are my independent work, except where works in the collaboration have been included.

Other appropriate acknowledgments are given within this dissertation by explicit references. Signed: Date: ACKNOWLEDGEMENTS This dissertation would not have been possible without the support, guidance, and encouragement of many individuals. First and foremost, I would like to express my deepest gratitude to my supervisors, Associate Professor Nguyen Ngoc Hoa and Doctor Phung Van On, whose expertise, patience, and unwavering support have been instrumental in the completion of this research. Your insightful feedback and continuous motivation have pushed me to refine my work and think critically, for which I am profoundly grateful.

I am deeply appreciative of the support from my colleagues and friends, whose encouragement and camaraderie have provided me with the energy and resilience to persevere through the challenges of this journey. Lastly, but most importantly, I owe a great debt of gratitude to my family, whose love and understanding have been my constant source of strength. This accomplish- ment would not have been possible without you. Thank you all for your contributions to this work and to my life.

1 ABSTRACT The increasing prevalence of webshell attacks poses a significant threat to web application security, necessitating the development of robust detection mechanisms. The dissertation clearly identifies two research directions: scanning web application source code and in-depth analysis of HTTP traffic to detect webshells. First, the dissertation proposes an advanced DL-Powered Source-Code Scanning Framework, called ASAF, that integrates signature-based techniques with deep learning algo- rithms to enhance the detection of both known and unknown webshells. We design the framework to facilitate the creation of customized detection models for various programming languages.

For the interpreted language, the study chose PHP; for the compiled language, the dissertation chose ASP.NET to build a complete ASAF-based model for experimentation and comparison with other research results to prove its effectiveness. Second, the dissertation introduces a deep neural network that utilizes real-time HTTP traffic analysis of web applications to detect webshells. The study proposes an algorithm to improve the loss function applied in the deep learning model to solve the problem of data imbalance. To demonstrate its effectiveness, we experimented with and compared the model to other studies on the same CSE-CIC-IDS2018 dataset.

We have also integrated the model with the NetIDPS system to improve its capacity to identify new webshells. From there, proactively prevent these attacks by automatically adding attack source IPs to the blacklist and creating rules to block URIs querying webshells on the web server. This research contribution has been demonstrated through 01 national patent, 2 SCI-E journals, 1 E-SCI journal, 1 national journal, 2 WoS conference papers and 1 pending patent, as well as being practically applied in the national research project, code number KC01.19/16-20, granted by Ministry of Science and Technology of Viet- ham. 11 TABLE OF CONTENTS DECLARATION OF AUTHORSHIP ACKNOWLEDGEMENTS ii ABSTRACT iii TABLE OF CONTENTS vi LIST OF FIGURES vil LIST OF TABLES 1x ABBREVIATIONS INTRODUCTION Research Motivations.

Research Challenges Objectives of Dissertation. Research Scope Methodologies Research Contributions. 1 THEORETICAL BACKGROUND AND PRELIMINARIES 11 Fundamental Concepts .3 Webshell Evasion 1V TABLE OF CONTENTS V 1.2 Webshell Detection Approaches .3 Webshell Dataset Collecliion. va 44 131 7 Non-AI Approaches.000 eee ee eee 44 1.2 AJ-Powered Source Code Analysis Approaches .3 AI-Powered Network Analysis Approaches .4 Dissertation Research Direction .5 Summary of Chapter l.000 00 eee eee 56 2 DL-POWERED WEBSHELL DETECTION BY SOURCE CODE ANALYSIS 57 2.2 Proposed DL-Powered Source Code Analysis Framework .3 PHP Webshell Detection.

ST HQ so 71 2.2 Yara-Based Analysis .4 Dataset Collecting and Cleaning .5 Hyperparameter Tuning CNN Model.6 Experimental Results and Evaluation .2 Results and Evaluation .NET Webshell Detection .2 Yara-based Analy§SlSs.4 CNN Model Hyperparameter Tuning.5 Dataset Collecting and Cleaning. 82 TABLE OF CONTENTS vi 2.6 Experimental Results and Evaluatlons.2 Results and Evaluation .5 Summary of Chapter2 .0002 ee 86 3 DL-POWERED PROACTIVE WEBSHELL DETECTION AND PRE- VENTION BY HTTP TRAFFIC ANALYSIS 88 3.2 Proactive Webshell Detection and Prevention. Deep Learning Intrusion Detection Model.3 Webshell Detection and Prevention.4 Handling Imbalanced Datasets .3 Experiments and Evaluation. 20000000 2 eee eee ee 98 3.4 Results and Evaluation.5 Comparisons and Discussions.4 Summary of Chapter3.

000000 eee eee 106 CONCLUSION AND FUTURE WORKS 108 Contribution Highlights. 0000 eee 108 Dissertation Limitations. ee ee 109 Future Works. 112 BIBLIOGRAPHY 112 LIST OF FIGURES 1.1 The conversion process from programming languages to machine code.2 Example of Apache web server architecture .3 Interpreter DFOC@SS.5 China Chopper webshell attack stages .6 Four stages of webshell attack .7 Webshell classification based on communication.8 Behinder webshell sample .9 Decoding and decrypting the obfuscated string .10 Contents of the deobfuscated function .11 Decoded system command .12 Classification of webshell features.1 Correlational links between ASAF components .3 Opcode vectorization module .4 Dataset collecting and cleaning .5 CNN model architecture .1 Proactive webshell detection method based on signatures and DNN .2 DNN architecture for webshell detection .3 Architecture of testbed system .00 0000000 - 99 Vil LIST OF TABLES 1.1 Top 15 opcodes used exclusively used by malware .2 Some widely used Webshelldatasets.3 Summary of related works .1 Non-duplicate benign and webshell datasefs.2 PHP-ASAF hyperparameters tuning value .3 Confusion matrix of PHP webshell detection by using Yara .4 Key metrics of of PHP webshell detection by using Yara(%) .5 Confusion matrix of PHP webshell detection by using Yara .6 Key metrics of of PHP webshell detection by using CNN (%) .7 Confusion matrix of PHP webshell detection by using PHP-ASAF .8 Key metrics of of PHP webshell detection by using CNN (%) .9 Comparison of different webshell detection approaches on our dataset (A) oe ee 2.NET-ASAF hyperparameters tuning value .NET webshell and benign datasets.12 Confusion matrix of ASP.NET webshell detection by using Yara.13 Key metrics of ASP.NET webshell detection by using Yara (%) 2.14 Confusion matrix of ASP.NET webshell detection by using CNN.15 Key metrics of of ASP.NET webshell detection by using CNN (%) 2.16 Confusion matrix of webshell detection using ASP.17 Key metrics of webshell detection by using ASP.1 Total flows in cleaned datasets .2 Number of training and testing samples.3 Hyperparameter optimization value.

vill LIST OF TABLES ix 3.4 Result of hyperparameter optimization with 5-fold cross validation for DSI 2.5 DLWSD 5-fold cross-validation with DS1 .6 DLWSD 5-fold cross-validation with DS2.7 Weighted-DLWSD 5-fold cross-validation with DS1.8 Weighted-DLWSD 5-fold cross-validation with DS2.9 Experiment results with DS3 enhanced by balancing classes.10 Comparison of DLWSD with other methods with DS2. 105 ABBREVIATIONS APT Advanced Persistent Threat ANN Artificial Neural Network AES Advanced Encryption Standard CNN Convolutional Neural Network DNN Deep Neural Network DT Decision Tree DL Deep Learning HTTP HyperText Transfer Protocol IDS Intrusion Detection System IPS Intrusion Prevention System GBDT Gradient Boosted Decision Trees LSTM Long Short-Term Memory ML Machine Learning MLP Multilayer Perceptron NB Naive Bayes OpCode Operation Code RNN Recurrent Neural Network RSA Rivest-Shamir- Adleman SVM Support Vector Machine SSL Secure Sockets Layer TLS Transport Layer Security TF-IDF Term Frequency - Inverse Document Frequency RF Random Forest WAF Web Application Firewall INTRODUCTION Research Motivations Webshell Attack Nowadays, digital transformation is considered an important and inevitable trend for many countries around the world. In Vietnam, digital transforma- tion has become a topic of interest in recent years and is most clearly demonstrated through the National Digital Transformation Program that has been issued. The ad- vancement of web development [22, 11] technology has made web applications more and more popular, gradually replacing traditional native applications because they do not depend on the operating system.

Most applications serving e-government and digital transformation in Vietnam today are built on web platforms, typically the National Public Service Portal system !. Along with this, the issues of information security for the web system have become increasingly important. Malicious code injec- tion (webshell) attacks [33, 95, 68] are the most common and also the most hazardous sort of web application attack [28]. According to the recent Microsoft 365 Defender data ?, the use of webshell attacks not only continued but also accelerated every day.

Webshell attacks [103] pose a severe threat to organisations due to the extensive damage and vulnerabilities they introduce after compromising web-facing servers. As pieces of malicious code written in common web development programming languages (e., ASP, PHP, and JSP) that are installed on web servers, webshells allow attackers to remotely execute arbitrary system commands, exfiltrate sensitive files, install additional payloads, and pivot laterally into internal networks. Attackers can also use webshells to maintain stealthy persistence in order to prolong exploita- tion after the initial breach. Many advanced webshells feature extensive capabilities via graphical user interfaces, including brute-forcing credentials, uploading malware, thttps: //dichvucong.vn/p/home/dvc-trang-chu.htm] ?Web shell attacks continue to rise, https: //www.com/en-us/security/blog/2021/ 02/11/web-shell-attacks-continue-to-rise 2 and interacting with databases.

Once a webshell is uploaded, attackers have an unre- stricted foothold within the victim’s infrastructure. Webshells are especially danger- ous due to their ability to bypass conventional network perimeter defences by using allowed protocols like HTTP or HTTPS [96]. Their flexible and compact nature also allows webshells to evade detection through obfuscation and polymorphism [3, 65]. Overall, webshells represent a serious threat due to their role as a pivot point, enabling an unimpeded gateway for attackers.

Advances in detection techniques have struggled to keep pace as attackers con- tinually release new, heavily obfuscated webshell tools to evade defenses. Manual in- spection is time-consuming, given that a single webshell update could require hours of expert reverse engineering. Detecting obfuscated webshells poses significant challenges for security research. Attackers are continuously adapting exploitation techniques to evade detection, deploying webshells encoded by means such as base64 or hex encod- ing, and using custom encryption schemes.

According to analysis from Cloudflare, over two-thirds of webshells exhibit some form of obfuscation. Advanced polymor- phic webshells such as “Chameleon” can rapidly mutate appearances across attacks while maintaining core malicious functions. The ease of automating webshell obfus- cation and morphing has outpaced improvements in detection approaches tailored to discerning underlying patterns amid intentionally distorted malcode. Defenders also face challenges in obtaining robust datasets spanning various obfuscation schemas needed to train machine learning models.

Webshell Detection Two primary approaches exist across the spectrum of webshell detection: Source Code Analysis and Network-based Analysis. Source code analysis takes yet another approach by directly analysing web applica- tion source code for webshell using analysis tools. Code analysis works by inspecting repositories for suspicious functions, commands, file inclusions, or other constructs in- dicative of a webshell payload. This enables identifying inactive webshells injected into the code before production deployment.

Analysing source code rather than running software provides the ability to catch webshells compiled directly into applications. However, code analysis faces challenges in detecting highly obfuscated or customised webshells designed to mask their malicious intent. Without runtime context, benign code can also generate false positives. Network-based analysis webshell detection [98] operates by analysing web traffic 3 as it enters or exits the network perimeter.

This is commonly implemented through Web Application Firewalls (WAFs) [10, 36] or Intrusion Detection and Prevention Systems (IDPSs) [67, 8, 7, 15] examining packets and connections.

Nội dung được bảo vệ bản quyền — Tải xuống đầy đủ

Từ khóa liên quan

Phát hiện webshell học sâu Phân tích mã nguồn webshell Phân tích lưu lượng HTTP webshell Bảo mật ứng dụng web AI trong an ninh mạng Framework ASAF phát hiện webshell

Chủ đề nghiên cứu

Phát hiện webshell sử dụng học sâu An ninh mạng và ứng dụng AI Phân tích mã nguồn và lưu lượng mạng Phòng chống tấn công ứng dụng web

Câu hỏi thường gặp

Luận án "Phát hiện webshell bằng phương pháp học sâu" nghiên cứu về vấn đề gì?

Luận án tiến sĩ đề xuất các phương pháp học sâu để phát hiện webshell thông qua phân tích mã nguồn và lưu lượng HTTP. Nghiên cứu tích hợp kỹ thuật dựa trên chữ ký với thuật toán học sâu, cải thiện khả năng phát hiện cả webshell đã biết và chưa biết.

Luận án "Phát hiện webshell bằng phương pháp học sâu" được bảo vệ tại trường nào?

Luận án này được bảo vệ tại Vietnam National University - University of Engineering and Technology. Năm bảo vệ: 2024.

Luận án "Phát hiện webshell bằng phương pháp học sâu" thuộc chuyên ngành gì?

Luận án "Phát hiện webshell bằng phương pháp học sâu" thuộc chuyên ngành Information Systems. Danh mục: An Toàn Thông Tin.

Luận án "Phát hiện webshell bằng phương pháp học sâu" có bao nhiêu trang?

Luận án "Phát hiện webshell bằng phương pháp học sâu" có 139 trang. Bạn có thể xem trước một phần tài liệu ngay trên trang web trước khi tải về.

Cách tải luận án "Phát hiện webshell bằng phương pháp học sâu" về máy như thế nào?

Để tải luận án về máy, bạn nhấn nút "Tải xuống ngay" trên trang này, sau đó hoàn tất thanh toán phí lưu trữ. File sẽ được tải xuống ngay sau khi thanh toán thành công. Hỗ trợ qua Zalo: 0559 297 239.

Luận án liên quan

Chia sẻ tài liệu: Facebook Twitter

Mục lục chi tiết

Tóm tắt nội dung