Reinstall Yadle agent and prevent duplicate data

erik · August 17, 2020, 5:10pm

So you have been running the Yadle agent for a while and have indexed a vast amount of data located on your organizations network storage. The system that has been hosting the Yadle agent is being decommissioned for newer hardware… Great! However, installing the Yadle agent with the standard procedure will result in files that have already been indexed to be rediscovered again by the newly installed Yadle agent. Resulting in unwanted duplicate entries. This is because Yadle treats every agent device and associated indexing paths as unique entities.

For Example, let’s say I have a Yadle agent on system A with a network storage volume to be indexed mounted under /yadle. If I install another Yadle agent on system B with the same network storage volume mounted under /yadle, every file will be rediscovered as a “new” file and marked as a duplicate of the already indexed file from system A.

In order to prevent this behavior, two things must be done.

Make Yadle servers see system B as an extension of system A.
Ensure the mount point of the storage volume on system B is the same as system A.

The first is achieved by manually setting a deviceid environment variable when installing the Yadle agent on system B. For SaaS customers, the deviceid can be provided upon request. Self-Hosted customers will require instruction for accessing the database to retrieve the deviceid.

The second task is also achieved by properly setting the datadir1 environment variable.

Both of these variables will be set in the install_config file that is provided to you by Yadle.

install_config

...
datadir1=( "<same mountpoint as system A>" )
...
deviceid="<deviceid from system A>"

Once, the correct values have been entered into the install_config file, save this file and complete the Yadle agent installation as normal.