What is the most important thing and what is the most difficult thing in learning a new language? My answer is always vocabulary.
You can express yourself with faulty grammar and less than perfect pronunciation. If you do not have the words you cannot express yourself. The constant battle to acquire enough vocabulary to read what you want to read, to say what you want to say, and to understand what you want to understand—that is the hardest part.
Imperfect grammar and pronunciation do not prevent communication and enjoyment of the language. Lack of vocabulary does.
When I correct writing, it is overwhelmingly vocabulary, improper use of words and phrases that is the biggest problem, not grammar.
How do you accumulate words and phrases? You do so from input, from reading, and from listening to content that is of interest to you. You have to see the words and phrases often in different contexts.
(I will use English as an example but I believe the principles apply to al languages. Note that the relationship between word families and total words varies from language to language. For English I will accept Paul Nation's ratio of 1 word family to 1.6 words.)
1) How many words do we need to know?
A Japanese language blog put out by the ALC group cal ed Business English (BE) made the point that Japanese students of English are best advised to focus on the most frequent 2,000 words, which account for up to 80% of most written material and up to 90% of most conversations. BE cites sources that say that the average Japanese university student has a passive (receptive) vocabulary of between 2100 and 2600 word families, and an active (productive) vocabulary of 1900-2300 word families. BE quotes a certain Professor Schmitt, who claims that it is common to have a passive vocabulary 20% larger than one's active vocabulary.
BE goes on to state that 5,000 word families are needed to read English university text books, and that a survey of foreign students at US universities showed that the best group knew only 4,000 word families.
BE describes the situation in Japanese high schools, where text books are supposed to focus on the highest frequency words, but many of these words do not appear more than a few times in over one mil ion words of text. Since we need to encounter words anywhere from 5 to 10 times to learn them, BE claims that it is not surprising that there are great gaps in the known vocabulary of these students, even those who claim to know 3,000 or more words.
BE then quotes a source which shows that knowing the highest frequency 1,000 words enables learners to obtain scores of over 700 on TOEIC, 3,000 corresponds to a score over 900, and so forth. He shows a graph to this effect.
I disagree with a lot of this.
I have stated earlier, based on the vocabulary level of learners (mostly Japanese) at The Linguist (now LingQ), and their reported scores on TOEIC, that the required vocabulary level for TOEIC is much, much higher than BE implies. At LingQ we assume that a known words level of 7,500 (or 4,680 word families) is required to achieve a 750 score on TOEIC. In an earlier post, I quoted Batia Laufer whose research largely supported our observations.
If the average second year Japanese university student has a vocabulary of 2000 to 2500 words, and if 1,000 words will get you a score of over 700, why is the average score of Japanese people taking the TOEIC test around 400? I am sure that the vocabulary knowledge of these test-takers exceeds the 1000 level.
Beyond the level of traveling and shopping abroad, I believe the next goal should be fluency, with a TOEIC score as a meaningful target. And that is where piling up the words through a lot of exposure starts to be more and more important. The first 1,000 words may account for 70% of the content of a conversation, but the next 1,000 add only 3-5%, and after that there is not necessarily that much difference in the utility of words, regardless of where they place in the frequency lists. It depends more on what a person is using the language for.
So you do need a lot of exposure and an efficient system, like LingQ.
If the goal is to communicate comfortably, read, and become fluent in the language, I believe 5,000 word families, or 8,000 words (as we count them at LingQ) is a realistic goal.
Once you achieve that you will be well on your way to learn more, since you can infer more and more words from the context. If you can get to 5,000 families you can get to 7,500 families or 12,000 words on the LingQ count, which should ensure a very good score on TOEIC.
Even then, there will be many useful and necessary words that will not be covered. BE cites 'punctual' as a word that Japanese students are required to know, but which he feels is so rare that it hardly ever appears. This might be the case, but to me, as a native speaker, 'punctual' is not a rare word. It is word that a fluent speaker should know. However, not knowing a word, or forgetting a word is no disgrace. I am certain that there are many high frequency words that I either do not know, or have forgotten, or use improperly, in the foreign languages that I speak. Language learning is not about perfection. The odd mistake in TOEIC is not going sink you either.
At LingQ we set the target for 'known words' high. We will be introducing tests to measure whether the words that are claimed as 'known words' are, in fact, known. However, we will make sure that we test the learner only against the words that he/she claims to know. The important thing is to have a vocabulary level of 8,000 or 12,500, which have been 'earned' through listening and reading. If there are still many lower frequency words that the learner has not encountered in listening and reading often enough to know them, that is not a problem.
Remember that the native speaker might know 50,000 or more, and the learner cannot match that, but can focus on contexts which are relevant to him or her. There will always be holes.
2) What does knowing a word mean?
To me, knowing a word, just like knowing people, means recognition. There is such a large potential range of understanding of a word, its scope, how it is used with other words, when it is used most appropriately etc., that there is no clear point at which we can say that a learner has achieved total mastery of the word. Once we have recognition of a word, we are on our way to grasping more and more of the word, and this process might include forgetting it and relearning it. Hopefully we wil understand it when we meet it again and build on that.
I doubt that there is only a 20% difference between active and passive vocabulary in a non -native speaker. I think the difference is much larger. BE quotes a source which describes the vocabulary knowledge that Japanese students have of English as being 'large, shallow and useless'. This is unnecessarily harsh. The non-native speaker has had more limited exposure to the words he/she has learned and therefore his/her grasp of these words is necessarily shal ower. This is not unique to Japanese learners. Only continued exposure can gradual y deepen this understanding.
We will get better at using these words through use. We can build up our potential (passive) vocabulary, but ultimately to get good at using them we have to use them. As long as we have no need to use them we can happily continue building up our potential usable vocabulary, and our understanding of the scope of meaning and usage patterns of these words through meaningful input.
3) How do we best learn words?
Most people learning languages have limited opportunities to use the language. That would certainly be the case for Japanese learners. For that reason, although not only for that reason, I think the correct strategy in learning words is to focus on building up one's passive vocabulary.
This is also easier to do where there are not a lot of native speakers around. This means a great deal of emphasis on input, meaningful input. It means reading and listening to a lot content that is of interest and at an appropriate level of difficulty.
In my view, the first goal in language learning has to be a defensive one, to understand what is said and written in the language. The native speaker of English knows anywhere from 30-50,000 or more words. Even a 14 year old knows 14,000. With the native speaker, the difference between active and passive vocabulary is not as great as with the learner, so we have no idea which words the native speaker is going to use in communicating with