The Potential Role of Computational Linguistics by Author Discrimination in the Development of ‘Ulum Al-Hadith

By: Fajri M. Muhammadin


In 2012, an article titled “Author discrimination between the Holy Quran and Prophet’s statements” written by Prof Halim Sayoud was published in Literary and Linguistic Computing Vol. 27(4) year 2012. This amazing article performed author discrimination by stylometric analysis towards the language of the Qur’an and authentic ahadith.

The results show that there is a big difference between the language style of the Qur’an and authentic ahadith, proving that each have distinct authors. This adds up to the already numerous scientific evidences that Prophet Muhammad ﷺ did not author the Qur’an.

However, research is but a relay. A researcher can only run as far as he/she can until eventually passing on the baton to the next researcher who will continue the race. This is what researchers have always done throughout generations.

Six years later, 2018, I sent an email to Prof Halim Sayoud. He is a Professor of Electronics and Informatics at the University Of Science And Technology Houari Boumediene, Algeria. I asked him (although what I really meant was “do you think you can do this research? I will want to read it after you do.” Hehe), is it possible to do a similar research but to compare the Qur’an and authentic hadith qudsi? The hadith qudsi contains kalam attributed to Allah.

Alhamdulillah, he replied to my email two days later. He said that such research appears to be impossible. He said that stylometric researches would require very large data sets (thousands), while there are only a very little available hadith qudsi (authentic ones are even less). I was surely disappointed, but what can I do. Nonetheless, I was very happy he took the time to respond to my email.

Four years later in 2022, exactly 14 May. It was amidst the screams of my very tired son who was unable to sleep (we are weaning him) that an epiphany came. One that had nothing to do with the many researches and other duties that I was supposed to be thinking about at the time.

What if we use stylometric research to compare authentic ahadith and fabricated ones? If anyone fabricated a hadith, surely the language style would differ greatly with that of Prophet Muhammad ﷺ. After all, the ‘ulama of hadith say that, among the characteristics of fabricated hadith, was the use of imperfect Arabic language in the matn.

The stumbling block of the previous research idea with hadith qudsi does not exist here. Unfortunately, there are so many fabricated hadith out there. There are even special books dedicated to collect fabricated hadith, such as Kitab Al-Mawdu’at Al-Kubra by Imam Ibn al-Jawzi. There will be an abundance of samples for the dataset.

Successful research would provide clear and scientifically sound indicators that there would be very different styles of language between authentic and fabricated ahadith. In such a case, there maybe some prospects to further develop and utilize this research, inter alia:

  1. It may add to the science of matan critic, which might later be further developed to examine da’if (but not fabricated) hadith,
  2. One may also compare the ahadith between those considered sahih by ahlus sunnah and shi‘a.
  3. Etc

It must be noted that even successful research results will not justify its use as sole determiner of hadith status. No researches like this will ever have a 100% confidence rate, while methodological choices could definitely reduce but never eliminate error factors.

After all, even the established methodologies applied by the muhaddithin throughout the ages would hardly achieve a 100% certainty rate, except for the text of the Qur’an and mutawatir hadith (which are not very numerous). What they can do is to further reduce the error factors by analyzing as many aspects and angles as possible while continuously improving the methodology used. Perhaps computational linguistics could contribute to this effort.

Nonetheless, as far as I can think of, there are a few possible problems:

First, the ‘ulama have differed on many occasions regarding the status of a hadith. Regarding authentic hadith, Prof Halim Sayoud used samples form Kitab Sahih Al-Jami‘ or famously known as Sahih Al-Bukhari. There is little to no controversy regarding the authenticity of its content. However, more problems will show regarding the fabricated hadith. There ‘ulama sometimes differ on whether a narration is fabricated or ‘only’ very weak, for example. Even Ibn al-Jawzi’s kitab Al-Mawdu’at al-Kubra is not free from critic. These different rulings regarding hadith status would have different consequences. Hence, such a research would need to be careful in setting the parameter to identify fabricated hadith for its dataset.

Second, some narrations are ruled as fabricated not because of matan fabrication. It is possible that a fabricator actually did on occasions narrate authentic hadith, but all of his reports are rejected because he has made fabrications on other occasions. Another possibility, the matan is not fabricated but the sanad is. Perhaps things like this are why we need very large datasets as sample.

Third, the fabricated ahadith do not come from a single author. There are so many hadith fabricators. In Prof Halim Sayoud’s previous research, he compared two datasets which each has one author. In this research I am thinking about, the Sahih ahadith surely has one author i.e. Prophet Muhammad ﷺ. Meanwhile, the dataset of fabricated ahadith would have many authors. It would be wrong to analyze samples from numerous authors, conclude a style characteristic, and treat it as if it is one author. Meanwhile, each of those hadith fabricators will have different styles from each other. Is there any way to work around this problem?

Before writing this, I have just sent an email to Prof Halim Sayoud to ask what he thinks of this idea. I will update this post when (if) he replies, inshaAllah.

Unfortunately, I am unable to do this research myself as it is not my field. Insha’Allah I am doing many researches in my own field that intersects with the Islamic sciences, but not this one. I pray that Prof Halim would be inspired to make this research, or at least give constructive feedback to this idea.

I really hope that other Muslim experts on computational linguistics and hadith experts would take this idea and execute it.




PS: other research ideas with similar method: comparing the kalam of the sahabah and the matn of hadith they have narrated. For this, we would need sahabah with the following criteria: (a) there are many authentic hadith narrated by them, and (b) we have many authentic statements attributed to them. Perhaps we can start with the top hadith reporters among the sahabah and see which one of them also have their own kalam authentically reported and documented for us to access.

This may be beneficial to examine to what extent were the sahabah accurate in reporting hadith word per word or did they knowingly (or otherwise) paraphrase the Prophet’s statements with their own style of language. We do know that there is at least a high level of accuracy among the sahabah, considering how there are numerous hadith reported by multiple sahabah with the exact same wording. However, other than the fact that there are cases where multiple sahabah report very similar hadiths in terms of content but with very minor differences in wordings. Author discrimination analysis might enrich this kind of study.

