Cyber Security on Email Using Spam Mail Detection
Electronic mail, also known as email or e-mail, is a method of exchanging digital messages from an author to one or more recipients. Email is the most efficient way to communicate or transfer our data from one to another. While transferring or communicating through email there is the possibility of misbehave. In the existing system Spam method is used to avoid the unwanted Email receiving. Email spam, also known as unsolicited bulk Email (UBE), junk mail, or unsolicited commercial email (UCE), is the practice of sending unwanted email messages, frequently with commercial content, in large quantities to an indiscriminate set of recipients.
In the proposed method users can able to prevent the spam messages entering into their inbox. Cyber attacked mails received not only in text format also in image format like banner. The banner which contains the phishing attacked URL to steal your data. So we create admin control to identify the URL from the banner and find it whether it is normal mail or the attacked mails. User preference is also given if the email reported as spam and number of times then it will be added to block list.
An e-mail is considered “spam” when a massive number of them are sent to multiple recipients. Spam email is usually used for advertisement or marketing. These unwanted emails cause drawbacks to the recipient, and consume the users’ network resources. The disadvantages of spam emails have been addressed in many occasions. In some cases for a single user 9 out of 10 emails are spam that fill his/her inbox. The United States Federal Trade Commission described that 66% of spams have false information somewhere in the message and 18% of spams advertise “Adult” material. According to another report 12% of users spend half hour or more per day dealing with spam emails.
There are several major problems with spam mails. First of all, they are high in volume and fill in mailbox of users. Secondly, there is no correlation between receivers’ area of interests and the contents of spam mails. Thirdly, they cost money for ISPs because the bandwidth and the memory of system are wasted. Finally, Spam e-mails cause a lot of security problems because most of them include Trojan, Malwares, and viruses. Many filtering techniques have been developed to control the flow of spam emails. Unfortunately, even with these available techniques, the number of spam emails is growing and the flow has not been controlled completely. The setback is that there is no actual solution because a spammer; an unidentified user with enough knowledge is able to be familiar with the logic of the filtering mechanisms.
As a result, bypassing the filter and sending the spam seems not to be a difficult task for such spammers. In such cases, the spam emails are not detected and are considered as legitimate ones. There are studies regarding spam email filtering. The common issue with the usage of all of these techniques is that the filtering systems are set up in the receiver mail server, consequently, causing network load and wasting network resources. To preserve network resources such as bandwidth and memory, and to reduce network load, this paper proposes to locate spam email filtering in the sender mail server rather than the receiver mail server.
Filtering technique is one of the effective methods which help us to get rid of the spam emails. One of the problems of filtering is that it cannot detect spam emails accurately when the concepts change or drift happens as time goes by. Therefore, it is required to handle concept drift accurately and quickly. This paper proposes a new algorithm for concept drift detection with three different levels; control, warning, and alarm level. The results show that the proposed algorithm can detect concept drift more accurately compared with the previously proposed ones. In addition, it can detect sudden concept changes more accurately.
Time-efficient spam e-mail filtering. This project analyses that spam e-mail filtering methods having high accuracies and low time complexities. The methods are based on the n-gram approach and a heuristics which is referred to as the first n-words heuristics. We develop two models, a class general model and an e-mail specific model, and test the methods under these models. The models are then combined in such a way that the latter one is activated for the cases the first model falls short. Though the approach proposed and the methods developed are general and can be applied to any language, we mainly apply them to Turkish, which is an agglutinative language, and examine some properties of the language. Extensive tests were performed and success rates about 98% for Turkish and 99% for English were obtained. It has been shown that the time complexities can be reduced significantly without sacrificing performance.
Many Internet Service Providers (ISPs), anti-virus companies, and enterprise email vendors use Domain Name System-based Blackhole Lists (DNSBLs) to keep track of IP addresses that originate spam, so that future emails sent from these IP addresses can be rejected out-of-hand. DNSBL operators populate blocking lists based on complaints from recipients of spam, who report the IP address of the relay from which the unwanted email was sent.
Most of existing spams filtering techniques are unable to detect spam because spammers know how to make spam to reach the destined email account without being filtered. In such situation, naïve bayes spam filter is proved to be a great technique, because several aspects are there to improve the performance of spam filter. Hence, it is an important research field in detecting spams. In this dissertation, technique for spam detection and filtering has been proposed based on Naïve Bayes classification technique, which is the existing spam filtering technique.
Some enhancements are made in making it adaptive to new kind of spams. In existing spam filtering techniques, static filtering technique has been used, but we proposed dynamic and enhanced filtering technique, which helps in fast and accurate spam detection. Regular training of classifier should be done, database of spam should be updated all the time, and also a particular word should not be always behaved as spam word or a genuine word. Experimental results show that proposed enhancements improves accuracy of spam filtering.
In a proposed system SAFE-PC is used, which improves the state-of-practice for detecting novel phishing campaigns. First, it is customized to extract features from phishing campaigns, such as, the presence of keywords like “account”, “expire”, a [image: ]most commonly used method feature in phish detection. Second, it performs feature engineering to thwart phishing strategies, such as deliberate misspellings, that disrupt feature extraction.
Third, it uses NLP techniques to create “higher level” features, such as, through Named Entity Recognition (NER) and Freebase and through synonym substitution. Finally, SAFE-PC builds a ensemble classifier customized to handle the unbalanced nature of email datasets as there are likely to be many more legitimate emails than phishing emails. An online variant of SAFE-PC, which is periodically and incrementally retrained as new samples become available, helps protect against phishing campaigns that evolve over time. The online variant demonstrates that the detection performance improves gradually over time with little increase in training time.
While there is much work on spam and phishing detection, our work is novel in the following ways:
- We provide a method to extract features from freeform emails. These features help to reduce common subterfuges used in crafting phishing emails. We incorporate synonym analysis, Freebase, and NER into our classifier, an amalgamation that has not been presented before.
- Prior work in NLP for spam or phishing detection falls short in handling the real-world challenges that our data brings out. Specifically, our work is an improvement of feature selection portable and based on empirical observation of evolving datasets.
- We demonstrate feasibility of online learning, thereby enabling a practical deployment in which new manually flagged emails can incrementally train the system to improve it continuously without paying the cost of complete retraining on the entire corpus.
The Naive Bayes algorithm is a simple probabilistic classier that calculates a set of probabilities by counting the frequency and combination of values in a given dataset. In this research,Naive Bayes classifier use bag of words features to identify spam e-mail and a text is represented as the bag of its word. The bag of words is always used in methods of document classification,where the frequency of occurrence of each word is used as a feature for training classier. This bag of words features are included in the chosen datasets.Nave Bayes technique used Bayes theorem to determine that probabilities spam e-mail. Some words have particular probabilities of occurring in spam e-mail or non-spam email. To calculate the probability that email is spam or non-spam Naive Bayes technique used Bayes theorem as shown in formula below.
- P(spamword) is probability that an email has particular word given the email is spam.
- (spam) is probability that any given message is spam.
- P(wordspam) is probability that the particular word appears in spam message.
- P(non − spam) is the probability that any particular word is not spam.
- P(wordnon − spam) is the probability that the particular word appears in non-spam message.
In an Email spam blocking system, which blocks the spam or unwanted messages. In this method users can able to prevent the spam messages entering into their inbox . Since the unwanted emails are blocked from the spammer. So we can save the mail memories, because of mail memories are limited we need to save our memory. We are not in need to view our Spam box. In case of emergency communication the blocked person email can be unblocked by the recipient user who blocked the spammer. This emergency communication is possible only once for a blocked account. If they are unblocked that particular account they can communicate frequently like normal user, if they are misusing the emergency communication then the same spammer account can be blocked again by then the spammer cannot communicate with the recipient in future.
A content based classification of spam mails with fuzzy word ranking. There are many classifiers and filters available for classifying and filtering spam mails. The proposed work used two sets of linguistic terms for ranking and classifying spam mails. This method has extracted only the features from the content of an email instead of extracting all the features from the mail. The actual words are extracted from the inbox of an email are compared with a list of spam words in the database and the words are categorized according to its rank value.
This input value is passed to the fuzzy inference system. FIS classifies the spam and produces the output. This work obtains a better result from ranking and classifying of spam words. An efficient approach for spam email detection. To shift the location of spam email filtering system from receiver mail server to sender mail server. The purpose of this novel idea is to detect spam emails in the shortest time and consequently to prevent wasting the network resources from misusage of spammers.
In addition, by experimental results we proved that our idea is efficient because just the resources in the sender side are accessed. This implies that if an email is identified as spamone, the receiver’s bandwidth and memory is preserved which will assure a better performance. Finally, by locating the filtering system in the sender mail server; the processed time becomes n times less than the time when the filtering system is in the receiver mail server when n indicates the number of processed emails.