Cleaning your data involves taking steps to ensure that the compiled data points are complete, consistent, and correct. Data should conform to all rules in your data dictionary. In many cases, "clean" will also indicate that the data has been de-identified.
In short, "clean" means you have critically examined all the data as it was entered by human or machine, and you have verified that it is ready to be analyzed and to produce valid results.
The link below, Part 1 of a 3-part tutorial, provides a more detailed discussion what "clean" data entails and what aspects to think about.
In order to clean our data effectively and efficiently, we should establish a basic workflow that we can follow, rather than approaching the problem haphazardly. Using reproducible methods as much as possible--for example, using code, and creating robust documentation and change logs.
The link below, Part 2 of a 3-part tutorial, suggests workflow steps and documentation to consider.
Finally, the link below -- Part 3 of the 3-part tutorial -- walks through a real-world example to illustrate how to follow a data cleaning workflow.