TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.

See original GitHub issue

from bertopic import BERTopic

docs = ["Hi i'm a doc", "i'm also a doc", "I'm a document", "this is an apple", "yet another topic"]

topic_model = BERTopic()
topics, _ = topic_model.fit_transform(docs)

Running BERTopic model… C:\ProgramData\Anaconda3\envs\my_project\lib\site-packages\umap\umap_.py:2213: UserWarning: n_neighbors is larger than the dataset size; truncating to X.shape[0] - 1 warn( C:\ProgramData\Anaconda3\envs\my_project\lib\site-packages\scipy\sparse\linalg\eigen\arpack\arpack.py:1593: RuntimeWarning: k >= N for N * N square matrix. Attempting to use scipy.linalg.eigh instead. warnings.warn("k >= N for N * N square matrix. "

TypeError Traceback (most recent call last) <ipython-input-28-7dcfeabe3647> in <module> 1 print(‘Running BERTopic model…’) 2 # topics, _ = tqdm(topic_model.fit_transform(docs)) ----> 3 topics, _ = tqdm(topic_model.fit_transform(short_docs))

C:\ProgramData\Anaconda3\envs\my_project\lib\site-packages\bertopic_bertopic.py in fit_transform(self, documents, embeddings) 279 280 # Reduce dimensionality with UMAP –> 281 umap_embeddings = self._reduce_dimensionality(embeddings) 282 283 # Cluster UMAP embeddings with HDBSCAN

C:\ProgramData\Anaconda3\envs\my_project\lib\site-packages\bertopic_bertopic.py in _reduce_dimensionality(self, embeddings) 1101 low_memory=self.low_memory).fit(embeddings) 1102 else: -> 1103 self.umap_model.fit(embeddings) 1104 umap_embeddings = self.umap_model.transform(embeddings) 1105 logger.info(“Reduced dimensionality with UMAP”)

C:\ProgramData\Anaconda3\envs\my_project\lib\site-packages\umap\umap_.py in fit(self, X, y) 2551 2552 if self.transform_mode == “embedding”: -> 2553 self.embedding_, aux_data = self._fit_embed_data( 2554 self._raw_data[index], n_epochs, init, random_state, # JH why raw data? 2555 )

C:\ProgramData\Anaconda3\envs\my_project\lib\site-packages\umap\umap_.py in fit_embed_data(self, X, n_epochs, init, random_state) 2578 replaced by subclasses. 2579 “”" -> 2580 return simplicial_set_embedding( 2581 X, 2582 self.graph,

C:\ProgramData\Anaconda3\envs\my_project\lib\site-packages\umap\umap_.py in simplicial_set_embedding(data, graph, n_components, initial_alpha, a, b, gamma, negative_sample_rate, n_epochs, init, random_state, metric, metric_kwds, densmap, densmap_kwds, output_dens, output_metric, output_metric_kwds, euclidean_output, parallel, verbose) 1052 elif isinstance(init, str) and init == “spectral”: 1053 # We add a little noise to avoid local minima for optimization to come -> 1054 initialisation = spectral_layout( 1055 data, 1056 graph,

C:\ProgramData\Anaconda3\envs\my_project\lib\site-packages\umap\spectral.py in spectral_layout(data, graph, dim, random_state, metric, metric_kwds) 325 try: 326 if L.shape[0] < 2000000: –> 327 eigenvalues, eigenvectors = scipy.sparse.linalg.eigsh( 328 L, 329 k,

C:\ProgramData\Anaconda3\envs\my_project\lib\site-packages\scipy\sparse\linalg\eigen\arpack\arpack.py in eigsh(A, k, M, sigma, which, v0, ncv, maxiter, tol, return_eigenvectors, Minv, OPinv, mode) 1596 1597 if issparse(A): -> 1598 raise TypeError(“Cannot use scipy.linalg.eigh for sparse A with " 1599 “k >= N. Use scipy.linalg.eigh(A.toarray()) or” 1600 " reduce k.”)

TypeError: Cannot use scipy.linalg.eigh for sparse A with k >= N. Use scipy.linalg.eigh(A.toarray()) or reduce k.

Issue Analytics

State:
Created 2 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

MaartenGrcommented, Apr 18, 2022

Since duplicating documents helped, then it might be that you have too few documents to work with in order for UMAP to create lower-dimensional representations. You could use the duplication trick to train your model and then divide all frequencies by 2 to get the correct values.

0reactions

mararn1618commented, Apr 16, 2022

Hey Maarten, to clarify a misunderstanding: Duplicating the documents did actually help.

I’m using:

bertopic 0.9.4
sentence-transformers 2.2.0
umap-learn 0.5.2
hdbscan 0.8.28
scikit-learn 1.0.2
numpy 1.21.6

Top Results From Across the Web

TypeError: Cannot use scipy.linalg.eigh for sparse A with k ...

Hello, I am using umap on Kaggle competition´s dataset: https://www.kaggle.com/c/dont-overfit-ii I have scaled the dataset and ran umap ...

Applying matrix functions like scipy.linalg.eigh to higher ...

I want to efficiently calculate the eigenvals and vectors of each 3x3 array within this larger array. So far I have tried to...

scipy.linalg.eigh — SciPy v1.9.3 Manual

Solve a standard or generalized eigenvalue problem for a complex Hermitian or real symmetric matrix. Find eigenvalues array w and optionally eigenvectors array ......

scipy.sparse.linalg.eigs — SciPy v1.9.3 Manual

An array, sparse matrix, or LinearOperator representing the operation A @ x , where A is a real or complex square matrix. kint,...

scipy.linalg.eig — SciPy v1.9.3 Manual

Solve an ordinary or generalized eigenvalue problem of a square matrix. Find eigenvalues w and right or left eigenvectors of a general matrix:...